Advantages of Using Istio for Multi-cluster Deployment

Understand the advantages (and disadvantages) of a multi-cluster deployment using Istio in a Kubernetes environment and best practices to mitigate risk.


Tutorial | Istio expert talks install, mTLS Authentication & Traffic Control

Tutorial | Istio Expert Shares how to Get Started with OS Istio from Install to mTLS Authentication & Traffic Control

Tutorial How-To from Solutions Engineer, Brian Jimerson

You’ve got Kubernetes. Now hear an Istio expert share what it takes to get started with OS Istio, and learn how to get mTLS authentication and traffic control right out of the box.

As microservices become the de-facto standard for modern applications, developers and platform operators are realizing the complexity that microservices introduces. A service mesh such as Istio can address this complexity without requiring custom application logic. Istio adds traffic management, observability, and security to your workloads on Kubernetes. Brian will share first hand what the clear advantages are and how to get started, including:

  • Installing Istio on your cluster
  • Enabling mTLS authentication between workloads by default
  • Getting better visibility into application traffic
  • Controlling traffic between applications
  • Controlling ingress and egress traffic with Gateways

Read the tutorial or watch the video presentation to learn why Istio is the technology leader for service mesh and production-grade networking for microservices.

About Brian Jimerson, Aspen Mesh Solutions Engineer

Brian Jimerson, Solutions Engineer, Aspen Mesh

Brian Jimerson is a Solutions Engineer at Aspen Mesh by F5 with 15+ years of experience driving better platform performance. He has hands-on expertise moving organizations to a cloud-native platform, optimizing critical application delivery at scale, and leveraging OS Istio to help run cloud-native applications at scale.

Not up for a video? Below is a recap of what was covered during the Tutorial Presentation from Solutions Engineer Brian Jimerson.

Istio Expert Shares How to Get Started with OS Istio – from Install to mTLS Authentication & Traffic Control
Brian Jimerson, Senior Solutions Engineer, Aspen Mesh

What is covered:

  • Microservices and how they’ve come about
  • Challenges that they can present to customers at scale
  • Istio as a service mesh and what problems it’s trying to solve with microservices
  • Istio installation and options
  • Encryption and authorization in Istio
  • Complex traffic management in Istio
  • Observability and visibility of Istio and microservices

About Aspen Mesh

Aspen Mesh is part of F5 and we’re built to bring enterprise-class Istio to the market. We have a product that is built on top of Istio to help manage it at scale and to give you visibility, insight, and configuration of multiple Kubernetes clusters in Istio. We also have services and support for large-scale customers and we’re also a large contributor to the open source Istio project—we’re in the top five in terms of contributors for open source Istio measured by pull requests.

Microservices

I think it’s important to set the stage and talk about what Istio does and how. First, looking at the current landscape of the enterprise software development industry, over the last five to seven years—I think most companies that have been historically brick and mortar and stagnant in terms of software delivery have realized that they need to build software and to use software to deliver innovation and keep up with their competitors.

Most large companies that have been around for 100 years didn’t have those mechanisms in place. They were building large monoliths that took forever to change. If you needed to update or add a new feature, it could take months to get that feature into production and they just couldn’t deliver rapid innovation.

A large part of that is just the nature of a monolith. You have multiple teams contributing to the same deployment. You need QA schedules and testing. If you made one feature, you had to deploy the whole monolith and that took forever, and it just wasn’t working.

The notion of microservices has been around for a long time and it has quickly become the de facto development architecture for these companies because microservices allow individual components to be deployed without any concern about other features and functions and other microservices.

What is a microservice?
I like Adrian Cockroft’s definition the best. There are some others out there like Martin Fowler, but a microservice is a loosely coupled service-oriented architecture with bounded context. This allows development teams to move fast, move individually and deliver innovation through software much better than they could before. I think it’s important to unpack this this term a bit because this has set the stage for what a service mesh and Kubernetes does.

The first part of this definition is “loosely coupled.” What does “loosely coupled” mean? This is not a new concept in development. You can generally test to see if it’s loosely coupled if a service must be updated in concert with all the other services, it’s not loosely coupled. You should be able to update one service and deploy it and not have to redeploy all the other services within that system or that application.

With loose coupling, instead of having a monolith that speaks in memory to different components, you now must have a protocol and a contract to communicate with other services. As long as that contract doesn’t change, you can independently deploy services without worrying about other services. Typically, those protocols and contracts are http with the rest API, or maybe you’re doing asynchronous coupling through a message broker over TCP, but that loose coupling means that you’re now doing communication over the network instead doing it in memory for an application for monolith.

The other part of this definition is bounded context. The idea with a bounded context is that a service does one thing and one thing very well and it doesn’t concern itself with other parts of the system. You know you have a bounded context if you don’t know about your surrounding services so that means that you don’t have to worry about those other services, you just do your one thing and do it very well and you communicate with them through a contract and a protocol.

These two terms are inherent in this definition and imply that there’s a lot of network communication between services that are participating in the system. Whereas in the monolithic world, that was traditionally in memory, maybe some outside calls to an API or database, but for the most part different libraries and components within a monolith just communicate with each other in memory.

Microservice Distribution
This creates a massive growth in network traffic between services. The graph on the left (see below) is kind of a contrived system but it’s not an atypical system. You have six, seven, or eight microservices and because of the loose coupling they all need to communicate with each other over the network instead of in memory.

Because it’s a bounded context and these services don’t necessarily know much about other participating services, there’s the potential that any service could have to call another service for some reason, so you must assume it’s going to happen. You can see the huge amount of traffic increase over the network as opposed to historical monoliths.

The diagram on the left depicts a very simple example with seven microservices—probably a small application or system. For comparison on the right, this is the Netflix call graph. Now that’s the other extreme. Most organizations are going to be somewhere in the middle, but you can imagine they will still have a lot of network complexity.

Microservice Distribution

Challenges with Microservices
The introduction of all these network calls and communication introduces a lot of new challenges that historically development teams haven’t had to deal with in the past. One of the most basic challenges is “how do I register my service, what is my address and how do other services discover my address?” Also, “how do I route and do load balancing between services?”

Challenges with Microservices

In a large system, you must expect that there’s going to be degradation in performance and there’s going to be services that go down. You have to be able to tolerate that from a calling service through various techniques like timing out and circuit breaker patterns.

Additional questions you want to be able to answer include “how do I see what’s going on within my system? How do I view network traffic and latency? How do I view what services are returning errors or are slow to respond? How do I aggregate different APIs and different services into a single API? How do I transform between protocols and data through there? How do I manage configuration of a highly distributed system?”

Kubernetes is one of the first what you call “cloud native” platforms that tried to address some of these questions and it’s done a really good job on the compute side, the storage side and how to schedule workloads, but it doesn’t do much in terms of the traffic between participating services and ingress and egress. There’s a need in the market to handle the network side of things now that Kubernetes is being widely adopted.

What is Istio?

There are other service meshes out there outside of Istio, but service meshes—and Istio in particular—are meant to address a lot of these challenges that are being introduced by these highly distributed loosely coupled systems. It tries to offload a lot of these challenges to an infrastructure component much like Kubernetes did with compute and scheduling, by allowing you to connect and secure microservices to give you sophisticated routing and traffic management, give you visibility into what’s going on in the system, and applying security.

I think one of the important things that Istio and Aspen Mesh and all the different vendors of Istio have realized is that this cloud native world introduces a new way to implement security. The old security controls don’t really work in this new world for several reasons. We don’t want organizations to take a step back in their security posture and so security is first and foremost across the board with Istio.

What is Istio?

Istio Architecture
What does Istio look like? Istio has two different components in its simplest form. It has a control plane which is exactly what it sounds like, much like Kubernetes control plane. There’s a component called “istiod” or a pod called “istoid” which does all the control functions within the service mesh. And there’s the data plane that’s comprised of running workloads, and also what’s called a sidecar proxy, which is a container that runs within the same pod as the service. It’s actually Envoy, which is an open source project from Lyft and that comprises the data plane. Any pod that has an Envoy sidecar proxy injected into it is part of the data plane and istiod will send information like configuration and certificate rotation, installations, telemetry, and things like that.

One thing I want to point out is istiod used to be three separate components: “mixer,” “pilot” and “citadel” and you still see references to those names. They’ve all been collapsed into a single istiod component. If you see references these terms just know that’s part of istiod now.

One of the things in the data plane that Istio does is it automatically injects an Envoy proxy into each pod that has one or more microservice containers running within it. Part of that injection is that the istiod will modify the IP tables for that pod so that all traffic leaving the pod or coming in from external clients to the pod goes to the Envoy proxy. Then the Envoy proxy will communicate with the microservice through the loopback interface so the microservice is never exposed directly to any network interface that can go outside the pod. In this case it’s all Envoy.

Istio Architecture

Installing Istio – The 4 Commands
How do you get going and install Istio? It’s easy to get a default installation of Istio. In the Istio distribution which is open source, there’s a binary CLI called istioctl, and if you run istioctl install against the Kubernetes cluster, it’ll install the control plane. Then you can label create a new namespace and label it with Istio injection or existing namespace with Istio injection and any workload deck or any pod that gets deployed to that namespace that’s labeled will automatically have the sidecar injected into the pod.

Installing Istio

Installing Istio – Commands Installed
If we tie that back to the original architecture what happens is when you run the istioctl install, that creates a namespace called istio-system which is the root namespace for Istio and it installs istiod and related components. It might install an ingress gateway and egress gateway if you want that as well and some observability tools, but all the control plane items will get installed into that istio-system namespace.

When you create a namespace and label with Istio injection, there’s a mutating webhook that gets fired when a new pod is deployed to the namespace that has that label. That mutating webhook will start an init container that modifies the IP tables so that the service only listens on the loopback address and also installs the Envoy proxy, and iptables will then route all the traffic to and from that proxy.

A couple of interesting things about this approach: One, the service itself is transparent. As long as you’re not binding specifically to an interface or IP address that service never really knows that there’s a proxy that’s intercepting all the traffic to and from it. The other nice thing about this is that it doesn’t really matter what language your service is written in. It could be node or java or whatever, the proxy itself is just communicating via network traffic. It gives you a lot of flexibility. Your development teams don’t have to do anything special aside from some configuration for routing and this all just works.

A question you may ask is when you create something, is the webhook going to add a bunch of the information that Istio needs and you don’t have to change your services? This is correct. We’ll look at some of the configuration, but there’s nothing special your application needs to do. It can reference hosts by the same host name, it can make calls it’s all transparently proxied through the sidecar proxy based on the mutant webhook.

Installing Istio - What the commands installed

Installing Istio – Answering a Few Questions
Again back to this example when you do a kubectl apply you know whatever your application is the istiod and the mutating webhook are automatically going to grab this pod that gets deployed from this deployment and inject the Envoy proxy and modify the IP tables for that.

Another question you may ask is what are the benefits of Helm versus just using istioctl? It requires a little more work with Helm instead of just using binary but it’s easier to hook into things like CI/CD tools and that sort of thing. It’s easier to override configurations. Underneath the hood they’re both calling the same thing they see a CTL and Helm they’re all ultimately doing the same deployment, it’s just what your preference is. In practice, most organizations go down the Helm path if they’re comfortable with Helm and it’s part of their process. But in a sandbox environment you want to get up and running and kick the tires on Istio, istioctl is by far the easiest way to go.

Installing Istio

Authentication and Authorization
One of the things that usually drives people to look at a service mesh like Istio out of the box is peer-to-peer authentication. If you think back to those call graphs, all the communication that happens between them, that is going over the network even though it’s an internal network and by default if you’re just running straight up Kubernetes it’s not going to be encrypted, it’s not going to be authenticated, it’s not going to be authorized or anything of that nature. It’s going to be the wild west. Anything can talk to anything in plain text and away we go.

This usually flies in the face of most security policies for organizations. They want encrypted traffic even if it’s an internal network. They want to know identities and they want to be able to audit identities of these network calls and have some level of policy around these. One of the most common ways to do this is something called mTLS, which stands for mutual TLS.

It’s your typical TLS X.509 type of protocol, but in mutual TLS, both the client and the server or the peers, both exchange the handshake for TLS. The client will initiate a TLS handshake with the server, and the server will in turn do another TLS handshake with the client. Within that TLS connection there is identity based on usually a service account and an authorization process to make sure that one service can talk to another.

This meets a lot of the requirements for secure communications between services and the good thing is that Istio does this out of the box. You install it, and it’s automatically going to have mTLS communication between pods unless you override it. It can do that because of the proxy and istiod handles the certificate management for use for rotate certificates, it revokes certificates and that sort of thing because the microservice in a proxy or over the loopback interface doesn’t need to be encrypted. You get mTLS for free out of the box and this is probably the first thing everybody looks at and says “I need a service mesh to do mTLS authentication.”

There’s also some confusion sometimes around different types of authentication mechanisms. mTLS is really peer-to-peer. It’s using X.509 encryption and it’s concerned about “is this one pod allowed to talk to this other pod.” For authentication and authorization for an end user whether it’s a browser or an external system, that doesn’t typically use mTLS.

One of the things that does work out of the box is using JWT or JSON web tokens. JWT is part of the oauth flow and an end user would authenticate with some identity management system and that would issue a JWT token that the user would then send as part of the request to an application. The sidecar proxy would then authenticate that against that issuer to make sure that the user is allowed to do whatever they’re requesting, and it’ll propagate that job token with requests to other systems as in the header. JWT is intended to be for end users to authenticate and be authorized for requests into the system and again mTLS is meant to be up the auth mechanism between pods within the system.

You might wonder if it’s terminating at the sidecar so you don’t have to do anything with your application to support this, or does it go through the sidecar at all? And the answer is yes, it’s terminated at the sidecar. The application doesn’t need to do anything with encryption or authentication or anything of that nature it’s all terminated from sidecar to sidecar.

Another question: Is your pod is going to have two containers in it? The sidecar and then your actual application? The answer is if your pod has one container normally and you deploy it as part of the data plane, it’ll have two containers. One called “istio-proxy” and the other being your application. And again, the IP tables for the pod are modified so that the only communication the container can have is on the loopback interface which is attached to istio-proxy as well. All the other IP addresses are bound to the istio-proxy. There’s nothing for the application to do. It doesn’t need to worry about certificate management which can be a nightmare. It doesn’t need to worry about binding to interfaces or routes or anything like that. It’s all handled by the istio-proxy.

Another question I’ve been asked – since the pod has both of those on the same host usually they’re only communicating to each other. They’re not leaving the hosts that they’re on either so the communication between the istio-proxy and the microservice that’s not TLS? The answer to this question is no. That’s all just on the loopback between the microservice and istio-proxy. They’re going to be on the same host, the same pod, with the same interfaces and they’re going to communicate on the loopback. Then istio-proxy will transparently proxy request responses if it goes off the node to another node or wherever it might live in Kubernetes, it’ll take care of that.

Another point I should probably call out is that there are different configurations for mTLS. Our recommendation is to enforce mTLS mesh-wide. It’s one setting when you install it and then you can override that to be more permissive if you have, say, legacy workloads that just won’t work or if you have some need for that you can override at the individual workload level. But our recommendation is turn this on just so you make sure that you know everything’s encrypted and everything’s set up.

A question I’ve been asked is if you’re installing Istio you’re probably turning on injection for almost all your namespaces, and then you just turn this off for legacy stuff instead of just not including that in an Istio namespace? By default, the namespaces that you label as this “istio-injection=enabled” is going to have every pod in that namespace and is going to have proxy injected into that pod. Which means that it’s also going to have mTLS on by default.

There are namespaces that you probably wouldn’t want to label that, so you don’t want it to be part of the data plane. Obviously like the Kubernetes kube-system namespace and your istio-system namespace and some other system stuff that aren’t running custom workloads you don’t want the label. The ones that you do label with istio-injection=enabled will be a participant in data plane which means you get the proxy, you get mTLS, you get all the Istio goodness. You can override that injection on a per-pod basis or per-deployment basis through attributes. You can also turn off mTLS and a lot of this other stuff through attributes.

What we generally recommend is zero trust, be secure as possible out of the box and if you have a specific use case where you need to override that, do it. That way your developers don’t need to worry about encryption and routing and a lot of stuff is just they’re deploying through their CI pipeline and they automatically get all the Istio benefits out of that.

Authentication and Authorization

Traffic Management and Gateways

The second set of features that most people leverage is traffic management. You can do some complex traffic management as well as controlling ingress/egress into the service mesh. Out of the box there’s a lot that you can do in terms of managing traffic to your workloads that are running in Istio.

Traffic Management
One of the things I find really cool is weighted routing. Weighted routing means having multiple versions of your application deployed side-by-side and routing a percentage of traffic to each of the versions. That’s configurable. Let’s say you deploy two different versions, you deploy a new version so you have v1 and v2 and you could send 75 percent of your traffic to v1, the old version, and 25 percent of the traffic to v2 on the inbound part.

Traffic Management

That allows you to do a lot of cool things like canary style deployments. Canary style deployments are where you employ a new version side-by-side to the old version, you start to route some percentage of traffic to the new version, and gradually increase that over time while you’re testing performance or functions until all the traffic is routed to the new version and you can retire the old version.

It also allows you to do things like A/B testing. So, maybe you have a certain group of users that are beta testers or internal users and you can route traffic to a new version that’s running side-by-side with the old version based on header attributes or other types of attributes. You can have a group of users testing a new version that you’ve released while the rest of the common users are still using the old version until you decide to cut over. These are two scenarios, but you can imagine there’s a lot you can do with weighted routing.

There’s also locality-based load balancing and failover. Not something we’ll get into today but something to be aware of.

Istio has a configuration called multi-cluster. Multi-cluster is when you have a logical Istio instance that spans multiple Kubernetes clusters. Maybe you have on-premise Kubernetes cluster and you use a public cloud like EKS as well or Platform 9. It could span those clusters or it could span clusters that are in different regions in AWS. As long as there’s network connectivity, you can span multiple clusters with a single logical mesh.

Often people have requirements  to route the users to the closest cluster to their location and you can do that in a multi-cluster configuration with locality-based load balancing. Or, maybe you have data privacy requirements for different countries and you want to route users to the country that meets the data privacy requirements for that. It’s an advanced feature but can also be useful as people start to mature on Istio.

You can also do traffic mirroring, which is like weighted routing. Traffic mirroring means you’re sending 100 percent of the traffic to two different versions, but only one version is live and the response is sent back to the client. The other version the response is sent to nowhere. This is another useful way to do feature testing where you’re sending live traffic to a mirror in conjunction with the live version, the user doesn’t know that, but you can do testing of those features.

There’s also a lot of network resiliency. As mentioned before, you must assume that there’s going to be something that’s goes wrong in this large scale. For example, if you have hundreds of microservices they’re all talking to each other over a network, there are going to be network issues and/or performance degradation of a service. You don’t want to bring down the whole system because of one of those. So, things like a circuit breaker pattern can be done with traffic in the circuit breakers where a client can handle a bad response from a server in the mesh and handle it gracefully and not just blow up itself and cause cascading issues. You can set timeouts for requests and how often you want to retry as well so you can handle a lot of that degradation.

You can also do fault injection. If you want to test the resiliency of your applications in the mesh, you can set fault or inject fault into these applications. For example, you can say five percent of the requests I want to return a 500 error and see how the client handles that. Or, 10 percent of requests I want to inject latency of 100 milliseconds or one second and see how everything handles it. If you’re in a pre-prod environment and you want to test the resiliency of all your applications, you can do fault injection. There’s obviously more, but those are the big ones when it comes to traffic management.

It’s also worth pointing out a lot of these features are what other projects have built on such as Knative and scale zero requiring retries until the service is available. Once you get a handle on Istio, you can install something like Knative and other things on top of it and take advantage. A lot of CD tools that are out there like Argo and Flux also use these features to do automated canary deployments and feature testing and things like that as well. There are a number of projects out there that build upon these features.

Traffic Management APIs
These are the four core Istio APIs that give you traffic management rules: virtual service, destination rule, gateway, and service entry.

The virtual service and destination rule are by far the most common. Read the documentation to understand the many settings in the virtual servers that you can set as it’s too much to cover right now. It gets a little confusing as well, so if you are going to play around with this in a sandbox, the virtual service/destination rule relationship is a little confusing and honestly sometimes I wish they’d just combine them because they overlap all the time.

Traffic Management APIs

Traffic Routing and Policies
In general, the virtual service configures how a request is routed to a Kubernetes service within the mesh. The destination rule at the end of that virtual service then configures how to handle traffic for that. I tend to think of virtual service being at the layer 7, and so you can do things like traffic splitting and weighted routing and anything that deals with layer 7 types of things a virtual service handles. A destination rule’s more at the layer 3/layer 4 level.

With a destination rule, you can apply TLS policies, you can do connection timeouts, things at that layer. They do tend to go in conjunction with each other—if you have one, you usually have the other. Those are the ones that give you all the sophisticated routing and management.

There’s also a gateway object. A gateway object is what manages inbound traffic or outbound traffic for an application. There’s an ingress gateway that handles all the inbound traffic and then you assign a gateway per host or set of hosts that route the traffic into your application.

The destination rule is about how to route a request and what to do with that request before it hits the pod. If internally you have service a calling service b, you could have a virtual service set up for service b that says if the URL has “/v1” send it to service b’s v1 instance. Or if it has “/v2” in the URL, send it to v2 of the service b instance. That gives you the complex routing within the mesh. The destination rule would then kick in after the virtual service evaluated whatever rules it had and the destination rule will then say I want to make sure all requests are going to terminate TLS at service b, or I want to set a connection timeout—things at the layer 4 level. They can handle external requests, but they can also handle internal requests. They’re the core building blocks of traffic management.

On the right of the slide is a simple example of a virtual service that says any traffic coming to an ingress gateway with this host, route to this back end. Frontend is the name of the Kubernetes service, the destination. You can do more complex actions in here; it doesn’t necessarily need to be bound to an ingress gateway. It could just be general traffic within the mesh as well. This yaml example handles ingress traffic to a service but it can also be traffic from another service to the service, and that’s why they created a virtual service instead of just using the ingress resource. The challenging part of Istio is there’s a lot of different pieces to chain together. In the simplest form, a virtual service is wherever the request is coming from, this is how I want it routed to the service that I’m running.

External Services
The other interesting traffic management API is service entry. There is a setting in Istio that we at Aspen Mesh recommend you turn on called outbound traffic policy. There’s a setting for that and one of the values you can set to it is registry only. What that means is if an application that’s in the mesh tries to call something outside of the mesh, maybe it’s a third-party API or maybe it’s a database or something that’s not running within the mesh, do not allow it unless it’s in the internal service registry for Istio. By default, if you set that to registry only, any calls outside of the service mesh that Istio doesn’t know about will be rejected. A service entry is a special type of API object that will add an external service by hostname to the service registry. That’s how you can control what traffic is or where egress traffic is allowed to by turning on the outbound traffic policy to registry only and then adding service entries for allowed external services.

Resilience and Testing
All the resilience and testing features that we talked about, these aren’t standalone APIs, they’re different configurations within a virtual service or destination rule, but that’s also how you configure a lot of these resilience and testing features.

Ingress and Egress – Use Controlled Ingress/Egress Gateways to Secure Your Mesh

We’ve talked a lot about traffic within the mesh, but you’re going to have external requests coming in and you’re going to have requests that are going out of the mesh to some sort of external service and there’s a concept of an ingress gateway and egress gateway to handle this.

The ingress gateway sits behind your ingress controller on your cluster and it’s an Istio-specific object, but it gives much of the routing configuration that you need. It’s another layer of configuration but it’s powerful.

One of the things you can do with your ingress gateway and egress gateway is to configure your mesh to only allow traffic going out through the egress gateway. If you have dedicated nodes for your ingress and egress gateways, it gives you a lot of capability to do things like audits or to set firewall rules to only allow traffic from those dedicated node IP addresses. It’s powerful in improving your security posture by having these dedicated gateways for ingress and egress.

Ingress and Egress

Ingress and Egress Example
To give you an example of what that might look like, below is a project I was working on before—we have an application in the blue box that makes a call to a virtual machine outside of the mesh running RabbitMQ, and that call needs to be encrypted over TLS. Through the configuration of egress gateway’s virtual service routing, we say that all traffic leaving the mesh needs to be routed through the egress gateway and then out so that you have visibility into traffic leaving. Because that connection is encrypted, the application just knows about the host name of the server and of the protocol.

The communication between the proxy running on that app’s pod and the proxy running on the egress gateway is then wrapped and tunneled in mTLS. It’s wrapping that encrypted RabbitMQ connection and mTLS between those proxies and then the proxy on egress gateway will then terminate the mTLS and send on the encrypted traffic to RabbitMQ. Another example of the app that knows nothing about this proxy it’s just sending it to the hostname and the proxy automatically picks that up and routes it through the gateway.

People may ask if the Istio proxy is a pod or is a DaemonSet running on each of the nodes, does it run on each of the controller nodes? And the answer to that is no, the Istio proxy is a container that runs within the pod. The Istio egress gateway and ingress gateway are stand-alone Istio proxies that are configured to handle inbound and outbound traffic. They are another Envoy proxy that’s deployed in Istio system namespace and it’s a pod running with just a standalone proxy container. It’s a normal deployment, it’s not a DaemonSet, it’s just a standalone and it’s configured to allow certain ports and config traffic through whether it’s inbound or outbound.

Ingress and Egress Example

Observability

The last section I want to cover is observability. One of the challenges is you’ve got all these running microservices making all these network calls and there’s an issue that the end user reports and you’re like okay well I don’t even know where to start because there’s so much going on here. Istio collects a lot of telemetry data from the data plane as well itself to give you insights into what’s going on in the mesh.

Mesh Visualization
One of the things you can do is visualize what your mesh looks like. Istio collects all this telemetry data and it can send it to Prometheus or some sort of data store, but the tools I’m showing here are visualization tools and you can use any tool you want that supports these protocols. They’re starting to move to open telemetry. I’m showing you Kiali, but there are other tools out there commercial and open source that can give you the same graph visualization based on this data.

This Kiali example shows you the call graph for a sample application all the way from the ingress gateway through services that are running multiple versions of the pods behind them. You can get data to see if these connections are healthy (these are all green arrows in this case) and the blue ones because it’s TCP. You can look at number of requests, but it’s good to visualize the whole communication path in a more complex system of microservices.

Mesh Visualization

Distributed Tracing
When you have an issue with a system it’s hard to tell where that that issue is being introduced. If you have 50 microservices that are all calling each other, which one is being slow? So distributed tracing is a useful way to go about seeing who the culprit is within a call graph. Istio does this by injecting headers into all the requests for tracing. Things like a correlation ID and a trace iD, it injects that into all the request headers and then it can send that data so you can visualize and know where there’s bottlenecks or errors being thrown that are causing an issue. In this case this is Jaeger Zipkin, there’s several distributed tracing tools out there.

Distributed Tracing

Metrics Visualization
Metrics are typically stored in Prometheus. You can use in your APM tool of choice whether it’s Datadog or Dynatrace or in this case this is a Grafana dashboard, but there’s a ton of metrics that you can build your dashboards around to look at both the Istio control plane performance and metrics, as well as your applications.

The metrics you can see in this dashboard (below), give you information about mTLS at the bottom so you can see that your pods are protected by mTLS in this case so you can see what’s encrypted and what’s not. Throughput, latency, all sorts of data you can pull into a tool of your choice.

Metrics Visualization

One question I’ve received is if Jaeger takes into account mTLS authentication time to figure out how much overhead there is for TLS? The answer to that is no, Jaeger’s just looking at the communication between the proxies in that case and then once it hits the proxy terminates the mTLS and then there’s no real data there. You can get that data from Envoy and visualize that in something like Grafana, that is a metric that’s collected for the TLS termination overhead. Kiali itself doesn’t go that deep. We have some pretty good guides around benchmark performance of Envoy and how to tune Envoy because there is some overhead with that proxy but what we’ve found is if you tune it properly for your environment, it’s minimal.

Next Steps – How to Kick the Tires on Istio

For those who want to look at Istio further here’s some quick steps to get something up and running that you can play around with. Istio is open source so there’s documentation out there.

  • Get a sandbox cluster. If I were to go about this from the beginning, I’d recommend creating some sort of sandbox cluster whether it’s local or Platform 9. Whatever a cluster is, it needs to be Kubernetes 119 or above.
  • Download and install Istio with the demo profile. They have this concept of profiles when you do the istioctl install. I would recommend the demo profile because that installs everything – the ingress gateway, egress gateway, core Istio control plane, etc.
  • Install the tools of your choice. It doesn’t have to be Kiali, Jaeger and Grafana but these three are common ones. If you have other preferences that’s fine.
  • Install a microservice-based application. There’s a bunch of these applications to demonstrate a lot of the features in Istio, but bookinfo is shipped into Istio distribution. Aspen Mesh has one called catalog-demo which is a set of microservices as well that you can download and install.
  • Explore key features like security, observability, and traffic management. There’s a lot of good examples in the documentation to be able to do that.

Have more questions about how to get started? Get in touch with the Aspen Mesh team or learn about our Professional Services and 24/7 Expert Support.

Recent related content in the Service Mesh Knowledge Hub:


Adopting a Zero-Trust Approach to Security for Containerized Applications

Adopting a zero-trust secure service mesh can help remove the burden of addressing security requirements from your application development teams, freeing them to focus on functions that provide direct value to your customers. Find out how in this whitepaper along with:


photo of magnifying glass

Getting the Most Out of Your Service Mesh

The Aspen Mesh team knows that service mesh has broad implications and benefits whether you're a product owner, a software developer, or an operations leader. Someone in Dev is going to have very different questions than someone in Ops. And an App Owner is going to want to better understand things like a service mesh’s impact on the bottom line.

This guide will help you understand the benefits no matter your role in your organization.


photo of compass

The Complete Guide to Service Mesh

Service meshes are new, extremely powerful and can be complex. If you’ve been asking questions like “What is a service mesh?” “Why would I use one?” “What benefits can it provide?” or “How did people even come up with the idea for service mesh?” then The Complete Guide to Service Mesh is for you.

Check out the free guide to find out:


you've got kubernetes promotional graphic

Get App-focused Security from an Enterprise-class Service Mesh | On-demand Webinar

In our webinar you can now view on demand, You’ve Got Kubernetes. Now you Need App-focused Security using Istio, we teamed with Mirantis, an industry leader in enterprise-ready Kubernetes deployment and management, to talk about security, Kubernetes, service mesh, istio and more. If you have Kubernetes, you’re off to a great start with a great platform for security based on Microsegmentation and Network Policy. But firewalls and perimeters aren’t enough -- even in their modern, in-cluster form.  

As enterprises embark on the cloud journey, modernizing applications with microservices and containers running on Kubernetes is key to application portability, code reuse and automation. But along with these advantages come significant security and operational challenges due to security threats at various layers of the stack. While Kubernetes platform providers like Mirantis manage security at the infrastructure, orchestration and container level, the challenge at application services level remains a concern. This is where a service mesh comes in. 

Companies with a hyper focus on security – like those in healthcare, finance, government, and highly regulated industries – demand the highest level of security possible to thwart cyberthreats, data breaches and non-compliance issues. You can up level your security by adding a service mesh that’s able to secure thousands of connections between microservices containers inside of a single cluster or across the globe. Today Istio is the gold standard for enterprise-class service mesh for building Zero Trust Security. But I’m not the first to say that implementing open source Istio has its challenges -- and can cause a lot of headaches when Istio deployment and management is added to a DevOps team’s workload without some forethought.  

Aspen Mesh delivers an Istio-based, security hardened enterprise-class service mesh that’s easy to manage. Our Istio solution reduces friction between the experts in your organization because it understands your apps -- and it seamlessly integrates into your SecOps approach & certificate authority architecture. 

It’s not just about what knobs and config you adjust to get mTLS in one cluster – in our webinar we covered the architectural implications and lessons learned that’ll help you fit service mesh into your up-leveled Kubernetes security journey. It was a lively discussion with a lot of questions from attendees. Click the link below to watch the live webinar recording.

-Andrew

 

Click to watch webinar now:

On Demand Webinar | You’ve Got Kubernetes. Now you Need App-focused Security using Istio.

 The webinar gets technical as we delve into: 

  • How Istio controls North-South and East-West traffic, and how it relates to application-level traffic. 
  • How Istio secures communication between microservices. 
  • How to simplify operations and prevent security holes as the number of microservices in production grows. 
  • What is involved in hardening Istio into an enterprise-class service mesh. 
  • How mTLS provides zero-trust based approach to security. 
  • How Aspen Mesh uses crypto to give each container its own identity (using a framework called SPIFFE). Then when containers talk to each other through the service mesh, they prove who they are cryptographically. 
  • Secure ingress and egress, and Cloud Native packet capture. 

photo of rocket

How Istio is Built to Boost Engineering Efficiency

How Istio is Built to Boost Engineering Efficiency

The New Stack Makers Podcast
How Istio is Built to Boost Engineering Efficiency

One of the bright points to emerge in Kubernetes management is how the core capabilities of the Istio service mesh can help make engineering teams more efficient in running multicluster applications. In this edition of The New Stack Makers podcast, The New Stack spoke with Dan Berg, distinguished engineer, IBM Cloud Kubernetes Services and Istio, and Neeraj Poddar, co-founder and chief architect, Aspen Mesh, F5 Networks. They discussed Istio’s wide reach for Kubernetes management and what we can look out for in the future. Alex Williams, founder and publisher of The New Stack, hosted this episode.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.

Alex Williams: Hey, it’s another episode of The New Stack Makers, and today the topic is Istio and engineering management. Today, I am joined for a conversation about Istio with Neeraj Poddar, co-founder and chief architect at Aspen Mesh. Hello, Neeraj, how are you?

Neeraj Poddar: I’m doing good. It’s great to be here Alex.

Alex Williams: Thank you for joining us – you’re live from Boulder. And live from Raleigh, North Carolina, is Dan Berg, Distinguished Engineer at IBM Cloud Kubernetes Service and Istio. That’s a mouthful.

Dan Berg: Yes, sir. I was I was worried there for a moment you weren’t going to be able to get Kubernetes out.

Alex Williams: You know, it’s been that way lately. Actually we’re just finishing our second edition of the eBook that we wrote first in 2017 about Kubernetes, service mesh was just beginning to be discussed there, and I was reading some articles and some of the articles were saying things like, well Istio is still in its early days and now today you’re telling me that you have more meetings than you can go to related to Istio. I don’t know what that means. What does that mean? What does that mean to you both? What does that say about Istio and what is Istio? So for those who may not be familiar with it.

Neeraj Poddar: You’re right. I mean, we have so many meetings and discussions, both asynchronous and and synchronously, that it’s great to see the community grow. And like you’re saying, from three years before to where we are now, it’s amazing, not just the interest from developers, it’s also the interest from end users, the feedback and then making the product and the whole community better. So coming to what Istio is, Istio is a open source service mesh platform for simplifying microservices communication. And in simple terms, it handles a lot of complicated pieces around microservices communicating with each other, things like enforcing policies, managing certificates, surfacing relevant telemetry so that you can understand what’s happening in your cluster. And those problems become more and more complicated as you add more microservices. So service mesh and Istio in a way is just taking that burden away from the developers and moving it into the infrastructure there. It’s basically decoupling the two things and enabling them to be successful at the same time.

Alex Williams: Now, Dan, you’ve been around a bit and you have your own experiences with APIs and how they evolved, and is this why we’re seeing this amazing interest in Istio? Because it takes the API to that next evolution? Is it the network effect on APIs that we’re seeing or is it something different that’s relevant to so many people?

Dan Berg: Well, I think it’s I think it’s a combination of a few things. And first off, thanks for calling me old for saying I’ve been around for a while.

Dan Berg: So I think it’s a combination of several different things. First and foremost, near and dear to my heart, obviously, is containers and the evolution of containers, especially as containers have been brought to the cloud and really driving more cloud native solutions, which drives distributed solutions in these clouds, which is driving more use of microservices. Microservices aren’t new. It’s just they’re being applied in a new way in the cloud environments. Because of that, there’s a lot of complexity around that and the distribution and delivery of those containers is a bit different than what we’ve seen in traditional VMs in the past, which means how you manage microdervices is the difference. I mean, you need the mechanism. You need a way to drive your DevOps processes that are GitOps-based, API, CLI driven. So what that naturally means is we need a better way of managing microservices and the microservices in your cloud. The evolution of Istio as a service mesh, which I often think of as the ability to program through an API, your network and your network policies. It’s a natural evolution to fit where we are today with cloud native applications based on containers. This is the modern way to manage your microservices.

Neeraj Poddar: The way Dan explained it – it’s a natural progression. I especially want to mention that in context of network policies, even when companies migrate from monoliths to microservices, when you are doing that migration, the same organisational policies lie and no one wants to give that up and you don’t want embed that into your applications. So this is the key missing piece which makes you migrate or even scale. So it gives you both the things wherever you are in your journey.

Alex Williams: So the migration and the scale. And a lot of it is almost comes down to the user experience, doesn’t it? I mean, Istio is very well suited to writing generic reusable software, isn’t it? And to manage these interservice communications, which relates directly to the network, doesn’t it?

Dan Berg: Yeah, in many ways it does. A big, big part of this, though, is that it removes a lot of the burden and the lockin from your application code. So you’re not changing your application to adopt and code to a certain microservices architecture or microservices programming model – that is abstracted away with the use of these sidecars, which is a pivotal control point within the application. But from a developer standpoint, what’s really nice about this is now you can declare your intent. A security officer can declare their intent – you know Neeraj was talking about with policies you can drive these declarations through Istio without having to go through and completely modify your code in order to get this level of control.

Alex Williams: Neeraj, so what’s the Aspen Mesh view on that? And I know you talk a lot about engineering management. This relates directly to engineering management in many ways, doesn’t it? And in terms of being able to take care of those so you can have the reusable software.

Neeraj Poddar: Absolutely. I mean, when I think of engineering management, I also think of engineering efficiency. And they both relate in a very interesting way where we want to make sure they always are always achieving business outcomes. So there are two or three business outcomes here that we want our engineering teams to achieve. We want to acquire more customers by creating more, solving more customer use cases, which means adding more features quickly. And that’s what Dan was saying. You can move some of those infrastructure pieces out of your application into the mesh so you can focus and create reusable software. But that’s basically software that’s unique IP to your company. You don’t have to write the code, which already has been written for everyone. The second outcome is once you have got a customer, once you have been successful acquiring them, you want to retain them. And that customer satisfaction comes from being able to run your software reliably and fix issues when you find them. And that’s where, again, service mesh and Aspen Mesh would excel, because we surface metrics, consistent telemetry and tracing. At the same time, you’re able to tie it back to an application view where you can easily pinpoint where the problem is. So you are getting benefits at a networking level, but you’re able to get an understanding of an application that is very crucial to your architecture.

Alex Williams: Dan what is the importance of the efficiencies at the networking level, the networking management level. What has been the historical challenge that Istio helps resolve? And how does the sidecar play into that? Because I’m always trying to figure out the sidecar. And I think for a lot of people, it’s a little bit confusing to try to understand. And Lynn, your colleague at IBM describes it pretty well as almost like, taking all the furniture out of the room and then placing it back in room piece by piece, I don’t know if that’s the correct way to describe it.

Dan Berg: Possibly. That’s one analogy. So a couple of different things. First off, networking is hard. Fundamentally, it is hard. It almost feels like if you’re developing for cloud, you need to have a PhD to do it properly. And in some levels, that’s true. Where things are difficult, I mean, simple networking fine, getting from point A to point B, not a problem. Even some things in Kubernetes with routing from one service A service to service B.. That’s pretty easy, right? There’s kube-dns. You can do the lookup and kube-proxy will do your routing for you. However, it’s not very intelligent, not intelligent at all. There’s little to no security built into that. Then of course the routing and load balancing is very generic. It’s just round robin and that is it. There is nothing fancy about it. So what happens when you need specific routing based on, let’s say, zone awareness or you need it based on the client and source that’s coming in? What if, what happens if you need a proper circuit breaker because the connection of your destination wasn’t available? So now where are you going to code that? How are you going to build in your retries logic and your time out logic? Do you put that in your application? Possibly. But wouldn’t it be nice if you didn’t have to? So there’s a lot of complications with the network. And I don’t even get into. What about security? Right. Your authentication and your authorization? Typically, that’s done in the application. All you need is one bad actor in that entire chain and the whole thing falls apart. So Istio and basically service meshes, modern service meshes really push that programming down into the network. And this notion of the sidecar, which is kind of popular inside, like Kubernetes-based environments, it’s basically you put another container inside the pod. Well, what’s so special about that one container in the pod? Well, with Istio sidecar, that sidecar is an Envoy proxy. And what it is doing is it’s capturing all inbound and outbound traffic into and out of the pod. So everything traverses through that proxy, which means policies can be enforced, security can be enforced, routing decisions can be programmed and enforced. That happens at the proxy. So when a container in the pod communicates out, it’s captured by the proxy first and then it does some things around it, make some decisions, and then forwards it on. The same thing on the inbound requests, it’s checking should I accept this? Am I allowed to accept this? It’s doing it’s that control point. And all of that is programmed by the Istio control plane. So that’s that’s where the developer experience comes in. You program it through YAML, you’re programming the controlling the control plane, the control plane propagates all that programming logic down into those sidecars. And that’s where the control point actually takes place. That’s the magic right there. Does that make sense? It’s kind of like a motorcycle that has a little sidecar – literally the sidecar. Put your dog in the sidecar. If you want to take your dog with you everywhere you want to go. And every time you make a decision, you ask your dog? That’s the Envoy sidecar.

Neeraj Poddar: That’s the image that comes to my mind. And maybe that’s because when I grew up in India, that was more prevalent than it is in the U.S. right now and now somebody from America is also bringing it up. But that’s exactly right in my mind. And just to add one thing to what Dan said, day one networking problems are easy, relatively easy. Networking is never easy, but relatively easy in Kubernetes. Day two, day three – it gets complicated real fast, like early on in the service mesh and Istio days there were people saying it’s just doing DNS. Why do I need it? Now no one is saying that because those companies have matured from doing day ibe problems and they are realizing, oh my God, do I need to do all of this in my application? And when are those application developers going to write real value add code then.

Alex Williams: All right. So let’s move into day two and day three, Neeraj. So who are the teams and who are managing a day two and day three? Who are these people? What are their personas and what roles do they play?

Neeraj Poddar: That’s a really interesting question. I mean, the same personas which kind of started your project or your product and were there in day one, they kind of move along in day two but some of the responsibilities maybe change and some of the new personas come on board. So an operator role or security or security persona is really important for day two. You want to harden your cluster environment. You don’t want unencrypted data flowing through. For maintainability, as an operator whether it’s a platform operator or a DevOps SRE persona, they need to have consistent metrics across the system, otherwise they don’t know what’s going on. Similarly for day two, I would say the developer, which is who is creating the software and who is creating the new application – they need to be brought into when failures happen, but they need to be consulted at the right time with the right context. So I always think of, in microservices that if you don’t have the right context, you’re basically going to just spend time in meetings trying to figure out where the failure is. And that’s where a consistent set of telemetry and a consistent set of tracing for day two and day three is super crucial. Moving to security. I mean, think about certificate management again. I’m going to show my age here, but if you have managed certificates in your application in multiple distributed manner, you know, the pain there. You have been yelled at by a security officer some time saying this is not working and go upgrade it and then you’re stuck trying to do this in a short time span. Moving forward to Istio, now that’s a configuration change or that’s an upgrade of the Istio proxy container. Because you know what? We fix OpenSSL bugs much quicker because we are looped into the OpenSSL ecosystem. So you know, day three problems and then even further. If you look at day three, you have upgrade issues. How do you reliably upgrade without breaking traffic or without dropping your customer experience? Can you do a feature activation or using progressive delivery? And these are the things we’re just talking about. But I think maybe these are day three point five or day four problems, but in the future you should be able to activate features in one region, even in a county, who cares and test it out to your customers without relying on applications. So that’s how I see. I mean, the personas are the same, but the benefits change and the responsibilities change as your organizations mature.

Dan Berg: I was just going to say, I mean, one of the one of the things that we see quite often, especially with the adoption of Istio like the developer first and foremost, would be, as Neeraj says, the day one, setting up your basic networking and routing is pretty easy. But then as your system and application grows, just understanding where those network flows go, it’s amazing how quickly it gets out of control that you really don’t know where. Once the once traffic gets into your, let’s say, your Kubernetes cluster, once it comes into the cluster, where does it go? Where does it traverse? Did you even have your timeouts set up properly? How do you even test that? Right. So going through the not even just the operational aspects, but just the testing aspects and how to do proper testing of your distributed system is very complicated from a networking standpoint and that’s where things like Istio timeouts, retries and circuit breakers really become helpful, and their fault injection. So you can actually test some of these failures. And then with Jaeger and doing the tracing, you could actually see where the traffic goes. But one of my favorites are Kiali – bringing that up and just seeing the real time network flows and seeing the latency, seeing the error codes. That is hugely beneficial because I actually get to see where the traffic went when it came in into my cluster. So lots of benefit for the developer beyond just the security role. I mean, the developer role is very critical here.

Neeraj Poddar: Absolutely, yeah. I mean, you can, I’ll put a plug in for even operators here, which is once you get used to programming via YAML or being able to change the data path the extension that we are making in the community through WASM, you get to control a critical piece of infrastructure where you have zero day things happening. You can actually change that by adding your own filter. Like we have seen that being so powerful in existing paradigms with BIG-IP or NGINX where you have a whole ecosystem right now for people writing crazy scripts for doing things, which is saving them lots of money, because you know what? You don’t get just time to change your application, but it can change the proxy which is next to it. So you’re going to see a lot of interesting things happening therefore, you know, day three day four use cases.

Alex Williams: But who’s writing the scripts? Who’s writing the YAML? Who’s doing that configuring? Because a lot of these people you know, developers are not used to doing configurations, so. So who does that work?

Neeraj Poddar: That’s a really good question, and the reason I’m hesitant is the answer is depends. Yeah. If you have a very mature developer workflow, I would expect developers to give you information about the applications and then the platform team takes over and converting it into the Istio specific, Kubernetes specific language. But most of the organizations might not be there yet, and that gives you will need some collaborative effort between application developers and operators. So, for example, I’ll give you what Aspen Mesh is trying – we are trying to make sure even if you have the right YAMLs and both the personas are writing it, thosde APIs are specific to those personas. So we have created application YAMLs, which an application developer can write. It has no information or no prior knowledge about Istio. The operators can write something specific about their requirements about networking and security again in a platform agnostic way, and then Aspen Mesh can lower it down to Istio specific configuration. So it depends on what kind of toolchain you are using. I would hope that in future, application developers are writing less and less configuration, just platform specific.

Dan Berg: And I think that basically echoes the fact that we do see multiple roles using the Istio YAML files and the configurations, but you don’t have to be an expert in all of it. Generally speaking, there are traffic management capabilities and things like that that a developer would use, because those are you’re defining your routes. You’re defining your characteristics specific to your application as well as the roll out of your deployment if you’re trying to do a canary release, for example. That’s that’s something that the developer would do or an application author would be responsible for. But when you’re talking about setting up policies for inbound or outbound access controls into the cluster, that may be a security advisor that’s responsible for defining those levels of policies and not necessarily the developer, you wouldn’t want the developer defining that level of security policies. It would be a security officer that would be doing that. So there’s room for multiple different roles. And therefore, you don’t have to be an expert in every aspect of Istio because it’s based on your role, which aspect you’re going to care about.

Alex Williams: When we get into the complexities, I think of telemetry and telemetry has traditionally been a concept I’ve heard Intel talk about. Right with infrastructure and systems. And now telemetry is being discussed as a way to be used in the software. How is telemetry managed in Istio? What is behind it? What is the architecture behind that telemetry that makes it manageable, that allows you to really be able to leverage it

Dan Berg: For the most part. So it all really starts with the Istio control plane, which is is gathering the actual metrics and provides the Prometheus endpoint that is made available that you can connect up to and scrape that information and use it. The how it gets the telemetry information, that’s really the key part of it is where does that information come from? Yeah. And if we take a step back and remember, I was talking about the sidecar, the sidecar being the point that makes those decisions, the routing decisions, the security decisions.

Alex Williams: Well, the dog, the dog, the dog,

Dan Berg: Yes the dog that is way smarter than you and making all the proper decisions telling you exactly where to go. So that’s exactly what’s happening here. Except since all traffic is coming in and out of, in and out of the pod is going through that proxy, it is asynchronously sending telemetry information, metrics about that communication flow, both inbound and outbound so we can track failure rates, it can track latency, it can track security settings. So it can send a large amount of information about that, that flow, that communication flow. And once you start collecting it up into the Istio control plane, into the telemetry endpoint and you start scraping that off and showing it in a Grafana dashboard as an example, there’s a vast amount of information. Now, once you start piecing it together, you can see going from service A to service B, which is nothing more than going from sidecar A to sidecar B, right we have secure identities. We know exactly where the traffic is is going because we have identities for everything in the system and everything that is joining the mesh is joined because it’s associated with a sidecar proxy. So it’s these little agents, these proxies that are collecting up all that information and sending it off into the Istio control plane so you can view it and see exactly what’s going on. And by the way, that is one of the most important pieces of Istio. As soon as you turn it on, join some services, you’ve got telemetry. It’s not like you have to do anything special. Telemetry starts flowing and there’s a huge amount of value. Once you see the actual information in front of you, traffic flowing, error rates, it’s hugely powerful.

Neeraj Poddar: Just to add to what Dan said here, the amount of contextual information that the sidecars add for every metric we export, it’s super important. Like I was in front of a customer recently and like Dan said there’s a wow factor that you can just add things to the mesh. And now suddenly you have so much information related to Kubernetes which tells you about the port, the services, the role of the application labels. So that’s super beneficial and all of that without changing the application. Another point here is if you’re doing this in applications there’s always inconsistencies between applications developed from one team versus applications developed with another. Second problem that I’ve always seen is it’s very hard to move to a different telemetry backend system. So for some reason, you might not want to use Prometheus and you want to use something else to model. If you tie all of that in your application you have to change all of this. So this proxy can also give you a way of switching backends, for example, in the future if you need without going through your application lifecycle. So it’s super powerful.

Alex Williams: So let’s talk about a little bit more about the teams and more about the capabilities and you know, I know that Aspen Mesh has come out with its latest release, 1.5, and you have a security APIs built into it, you’re enabling Envoy support, which is written in WebAssembly, which is interesting. We’re hearing a little bit more about WebAssembly but not much, traffic management, you know, and how how you think about traffic management. Give us a picture of 1.5 and higher kind of tracing Istio’s evolution with it.

Neeraj Poddar: Yeah. So, I mean, all Aspen Mesh releases are tied to the upstream Istio releases, so we don’t take away any of the capabilities that Istio provides. We only add capabilities we would think the organization will benefit from like a wrapper around it so that you have a better user experience. So Istio 1.5 by itself, moved from a monolithic architecture to – sorry, move from a microservices control plane to a monolithic one for operational simplification. Right. So we have that. Similarly telemetry V2, which is an evolution from the out of process mixer V1. We also provide that benefit where users don’t have to run mixer. There was a lot of resource contention where it was consuming a lot of CPU and memory and contributing to some latency and latency numbers, which didn’t make sense. So all of those benefits of these two communities working on you are getting with the Aspen Mesh release. But the key thing here is for us to provide rapid APIs like security APIs. I’ll give you a quick example. So Istio moved from 1.4 to 1.5, I think, between job-based policies to request authentication or authentication policies. We had to change the APIs because the older APIs, were not making sense after user feedback. There were some drawbacks. This is great for improvement, but for a customer now I have to rethink what I did.

Neeraj Poddar: When I have to upgrade, I have to make sure we move along with Istio users. So us providing a wrapper around it means we do the conversion for them. So that’s one way we provide some benefit to our customers. Like you said, WASM is an interesting development that’s happening in the community. I feel like as the ABI itself matures and more and the rich ecosystem develops, this is going to be a real powerful enhancement. Vendors can actually add extensions without rebuilding and having to rely on a C++ filters. Companies who have some necessity for which they don’t want to, you know, offload that building cost to vendors or open source. They can extend Envoy on on the fly themselves. This is a really huge thing. One thing I should talk about is that the Istio community is regularly changing or evolving the way they are installing Istio. You know Dan is here, he can tell you from the very beginning we have been doing helm, we have not been doing helm or we have gone to istioctl. It’s all in the zone the right way. Right. It’s because of user feedback and trying to make it even more smooth going forward. So we try to smooth out that code where, you know, Aspen Mesh customers can continue to use the tooling that they’re comfortable with. So those are the kind of things we have given in 1.5 Where our customers can still use help.

Alex Williams: When you’re thinking about – when you’re thinking about the security Dan and you’re thinking about what distinguishes Istio, what comes to mind, what, and especially when you’re thinking about multi cluster operations?

Dan Berg: One of the key aspects of Istio and one of the huge value benefits of Istio is that if you enable Istio and the services within the mesh, if you enable strict security policy, what that’s going to do is that’s going to enable automatic management of mutual TLS authentication between the services, which is in layman’s terms, allowing you to do encryption on the wire between your pods. And to do that, if you’re looking at a Kubernetes environment, if you’ve got a financial organization as a customer that you’re looking to support or any other customer that has strict encryption requirements and they’re asking, well, how are you going to encrypt on the wire? Well, in a Kubernetes environment, that’s kind of difficult unless you want to run IPsec tunnels everywhere, which has pretty nasty performance drain. Plus, that only works between the nodes and not necessarily between the, between the pods or you start moving to IPv6, which isn’t necessarily supported everywhere or even proven in all cases, but Istio literally through a configuration can enable a mutual TLS with certificate management and secure service identity. So hugely powerful. And you can visualize all of that with the tools and utilities from Istio as well. So you know exactly which traffic flows, like in Kiali you can see exactly what traffic flows are secured and which ones are not. So that’s hugely powerful. And then the whole multi cluster support, which you brought up as well, is an interesting direction. I would say it’s still in its infancy stages of managing more complex service mesh deployments. Istio has a lot of options for multi cluster. And while I think that’s powerful, I also think it’s complex. And I do believe that this is going to, where we’re going in this journey is to simplify those options, to make it easier for customers to deal with multi cluster. But one of the values of security and multi cluster ultimately is around this level of secure identities and the certificate management that you extend the boundaries of trust into multiple different clusters. So now you can start defining policies and traffic routing across clusters, which you can’t do today. Right. That’s very complex. But you start broadening and stretching that service mesh with the capabilities made afforded to you by Istio. And that’s just going to improve over time. I mean, it’s not, we’re on a journey right now of getting there and a lot of customers are starting to dip their toes in that multi cluster environment and Istio is right there with them and will be evolving. And it’s going to be a fantastic, fantastic story. I would just say it’s very early days.

Neeraj Poddar: Yeah, I was just going to echo like it’s in infancy, but it’s so exciting to see what you can do there. Like really when I think about it, multi cluster, you can think about new cases emerging from the telecom industry where the multi clusters are not just clusters in data centers, they’re at edge and far edge and you might have to do some crazy things.

Dan Berg: Yeah, well, that’s the that’s the interesting thing. I know earlier this year at IBM, we launched a new product called IBM Cloud Satellite. And that’s where if you own a service mesh, you’re going to be extremely excited with those those kind of edge scenarios. You’re broadening your mesh into areas that you’re putting clusters at. Two years ago, you would have never thought about putting a cluster in those locations. I think service mesh is going to become more and more important as we progress here with the distributed nature of the problems we’re trying to solve.

Alex Williams: Yeah, I was going to ask about the telco and 5G, and I think what you say sums it up and to be able to manage clusters at the edge, for instance, in the same way that you can and, you know, essentially, you know, in a data center environment.

Dan Berg: Well you’re also dealing with a lot more clusters, too, in these in these environments, instead of tens or even hundreds, you might be dealing with thousands and trying to program like in the old days at the application level, that’s going to be almost impossible. You need a way to distribute consistent policies, programmable policies distributed across all these clusters, and Istio provides some of the raw mechanics to make that happen. These are going to be incredible, incredibly important tools as we move into this new space.

Neeraj Poddar: I was just going to say, I mean, I always think of evolution of service mesh as is going to follow the same trajectory as evolution of ADC market that happened as and when the telcos and the big enterprises came in because of a lot of requirements of the telecom industry. Currently, the load balancers are so evolved. Similarly, service mesh will have a lot more capabilities. Think about the clusters running in far edge. They will have different resource constraints. You need a proxy which will be faster and slimmer. Some people will say that’s not possible, but we’ll have to make to do that, have to do that. So I’m just always excited when I think about these expansions. And like Dan said, these are not talking about tens or hundreds of clusters now we are talking about thousands.

Alex Williams: We’ve been doing research and and we find actually in our research that the clusters that are most predominant that we’re finding among the people we’re surveying are those of more than five thousand clusters. And that, I guess my last question for you. Is about day five, day six, day seven, and what role does observability play in this? And because it seems like what we’re talking about essentially is observability and I’m curious on how that concept is evolving for you. Now, you think about it in terms of as we move out to those, to the the days beyond, for people who are using Istio and service mesh capabilities,

Dan Berg: Obviously you need that sidecar. You need that dog next to you collecting all that information, sending it off. That that is hugely important. But once you start dealing with scale, you can’t go keep looking at that data time in and time out. Right. You’ve got to be able to centralize that information. Can you can you send all of that and centralize it into your centralized monitoring system at your enterprise level and the answer there is yes, you absolutely can. SysDig, a great partner that we work with, provides a mechanism for scraping all of the information from the Istio Prometheus endpoint, bringing that all in, and then they have native Istio support directly into that environment, which means they know about the Istio metrics and then can present that in a unified manner. So now you can start looking at common metrics across all of these clusters, all the service meshes in a central place, and start sending alerts, start building alerts, because you can’t look at five thousand clusters and X number of service meshes. It’s just too large. It’s too many. So you have to have the observability. You need to be collecting the metrics and you’ve got to be able to have the the alerts being generated from those metrics.

Neeraj Poddar: Yeah, and I think we need to go even a step beyond that, which is you’ll have information from your mesh, you’ll have information on your nodes, you’ll have information on your cloud, your GitHub, whatever. You get it all to a level where there is some advanced analytics making sense of it. There’s only so much that a user can do once they get the dreaded alert.

Neeraj Poddar: They need to do the next step, which is in this haystack of metrics and tracing and log. Can someone narrow it down to the place that I need to look, because you might get alerted on a microservice A, but it has dependencies which are other microservices, so the root cause might be 10 different levels down. So I think that’s the next day seven day eight problem we need to solve, how do we surface the information in a way where it’s presentable? For me, it’s even tying it back to the context of applications. Dan and I are both from networking. We love networking. I can talk networking all day, but I think we need to talk to the language of applications. That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information we are correlating it. So I think that’s that’s going to be the real thing – it’s still very early. People are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way. It’s scary but it will get there.

Alex Williams: Well Neeraj and Dan, thank you so much for joining us in this conversation about service mesh technologies and Istio and these days beyond where we are now. And I look forward to keeping in touch. Thank you very much.

Dan Berg: Thanks for having us.

Neeraj Poddar: Thank you.

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.



abstract stock photo

When You Need (Or Don’t Need) Service Mesh

The New Stack Makers Podcast
When You Need (Or Don’t Need) Service Mesh 

The adoption of a service mesh is increasingly seen as an essential building block for any organization that has opted to make the shift to a Kubernetes platform. As a service mesh offers observability, connectivity and security checks for microservices management, the underlying capabilities — and development — of Istio is a critical component in its operation, and eventually, standardization.

In the second of The New Stack Makers three-part podcast series featuring Aspen Mesh, The New Stack correspondent B. Cameron Gain opens the discussion about what service mesh really does and how it is a technology pattern for use with Kubernetes. Joining in the conversation are Zack Butcher, founding engineer, Tetrate and Andrew Jenkins, co-founder and CTO, Aspen Mesh, who also cover how service mesh, and especially Istio, help teams get more out of containers and Kubernetes across the whole application life cycle.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 

Bruce Cameron Gain: Hi, it’s B. Cameron Gain Of The New Stack. Today, we’re going to speak about making Kubernetes more efficient with a service mesh. And this is part of our Aspen Mesh three part Makers series. Today, I’m here with Zack Butcher, co-founding engineer of Tetrate, and Andrew Jenkins, co-founder and CTO of Aspen Mesh. Thank you for joining us. 

Zack Butcher: Thanks for thanks for having me. 

Bruce Cameron Gain: So the adoption of a service mesh is really increasingly seen as an essential building block for any organization that has opted to make the shift to a Kubernetes platform. So as these service mesh offerings provide observability, connectivity and security checks, et cetera, for microservices management, I want to look at the underlying capabilities and development of Istio specifically and service meshes in general and how they are a critical component in operations of Kubernetes deployments. So, Andrew, could you please put service mesh in the context you have? An organization might use it to migrate to a cloud native environment. What do they need to know? 

Andrew Jenkins: Yeah, so the migration to cloud data for organizations that we work with is always kind of involves a couple of steps along the way. So there is kind of an end state goal that you want to have microservices to unlock developer efficiency by having developers and people able to move fast on smaller components that are all stitched up into an integrated experience for users. But you have to get there from here, from wherever you are. And so we find that organizations use service mesh a lot to help out with that evolutionary path. So that involves taking where we are now, moving some pieces into more of the cloud native model and developing new cloud native components, but without leaving behind everything that you’ve already already done. And of course, like you talked about, it’s really important to be able to have a good understanding, observability, of all of these different components of the system, be able to know that it’s secure to, be able to connect all these pieces together, whether they’re all in public clouds, on-prem, or different clouds. And so a service mesh can really help with all that connectivity, the security, all those components there. That’s why we see organizations latching on to service mesh as an answer for not just the deployment problem, but how do you integrate all these pieces together? 

Bruce Cameron Gain: Well thank you. Zack, this is kind of a reflection, maybe of what [Andrew] just described, but as you know. Migrations are happening, it’s not greenfield. It’s very rare, so as [Andrew] described, they’re moving from data centers to cloud native environments, for example, and they’re doing it, doing it in bits and pieces. So as they’re doing this, they’re doing it, I would imagine, most often in incremental steps and often different cloud environments as well. And so what do they need to know as far as the operations go and how will the service mesh come into play for these multi-cloud deployments? Is it possible just that one Istio or just one platform service mesh which will take care of everything? Or I would imagine they’re off – they’re piecing together what when they get there, we’ll get to where we’ll just have one service mesh interface. 

Zack Butcher: Yeah. So I think this is this idea of the transition and the migration that Andrew touched on is really, really relevant right now. And in fact, this is really kind of what, this was the original reason I left Google working on Istio to help start Tetrate. Right. We were talking when we were Google about we were trying to get the initial Istio users, right. And we were trying to understand we had built out the project for like a year and we were running around trying to get the initial users. What we heard consistently was, hey, this is great, but I don’t only have Kubernetes. And I think it’s important that we understand that, data centers aren’t going away any time soon. Right. When you go out and build your own data center, that’s a 50 year investment, right? 40, 50 years easily that you’re expecting to get value out of that. That’s not going to go away tomorrow just because you’re moving to cloud and you will have to split infrastructure for a long time and it will be the norm and it will be increasingly the norm to have this kind of split infrastructure between things that you own, different cloud providers that that fit different use cases well. And that’s exactly where we see the need for a common substrate. Right. How do you start to enable communication of those components that need to communicate across these different environments? Right. That’s where the identity and security aspects of a mesh come in. Right. How do you enforce from an organizational perspective? I have regulatory controls. Right. And I need to ensure that I have controls in place across all of my environments that are consistent and that I can prove to an auditor are consistent in that are enforced across all of these environments. A service mesh because of the centralized control, because of the consistency that it gives you, is incredibly useful for helping bring sanity to kind of the craziness that is the split infrastructure world there, this kind of multi-cloud on-prem world. 

Bruce Cameron Gain: Well, without mentioning any names, some providers are making a claim that maybe not now, but very shortly you can just have one single service mesh interface for multiple cloud environments and including your data center as well. How far are we away from that scenario? 

Zack Butcher: Yeah, I think I think it depends on what you mean by by interface. Right. Are we going to get into a world where there’s a common point where me as a developer can go and push configuration about how I want my application to behave on the network, and that is going to go and be realized across all of the physical infrastructure that my company has. Right. No matter where my service is running, if that’s what we mean when we say there’s going to be a common interface. Yes, I 100 percent think that we are going to land at a world like that. Right. Where individual developers need to stop thinking about individual infrastructure, need to stop thinking about individual clusters, because that’s not really what’s relevant for me, shipping business value to my users. And instead, I want to be able to think about the services that I that I have and need to maintain and how I want them to behave. And in the features and functionality that they have right. Now fundamentally more value focused world. 

Bruce Cameron Gain: Andrew, do you agree? And at the same time, if you do agree, when could that scenario happen? 

Andrew Jenkins: I think I strongly agree that the developer should be out of the business of worrying about this interface. And that I think we’ll see, we already see a lot of commonality, even across different service mesh implementations for the features and especially kind of around Kubernetes as a touchpoint for organizing policy and all these sorts of things. But just exactly the same. It’s really important that my organization can guarantee to my security folks that we are meeting our security rules that we’ve set up internally and that maybe what the organization wants to do is not give developers the one common underlying interface that will unify all service meshes. It may want to give a kind of profile set like we’ve already designed these application bundles to talk about applications this way or that way. And we know we can understand those. How those map for security requirements and so in that kind of world, organizations are building what they want to show to their developers on top of the underlying capabilities of the infrastructure. They’re making some choices, kind of just like Zack said so that individual developers who may not be experts on every single layer, they can take advantage of the experts in the organization who have thought about those things and mapped them back to requirements. 

Andrew Jenkins: So I don’t think that external, like, sort of stamped out, forced adherence is very successful in kind of the Kubernetes cloud-native world. So I think what you see is kind of bubbling up of the things that are really common into a couple of interfaces. And there’s already efforts underway for some of these sorts of things. And then they’ll be parts of these interfaces that people like and there’ll be kind of gravity around that and they’ll solidify. And that’s kind of, I think, happening a little bit. I think in the next year or so, you’ll see that happen much more strongly around applications and how they interact with things like service meshes. And then I think a few years from now, big organizations will have their own takes on these. They’ll be built in. And if I walk in the door of organization A on day one, I’ll know where to go to get the catalog that describes my application and I’ll just run with it and rely on the experts underneath, both in my own organization and out there in the community that sort of map that to the actual implementation underneath. 

Bruce Cameron Gain: Well, thanks, Andrew, and I definitely want to revisit that topic more specifically. But as you remove the development layer, as far as the operations folks go, when do you think we might reach that stage where operations staff for, A for example, just has to manage maybe one sole or one panel or interface to deal with the different deployments out there? I mean, as you just mentioned, the developer should not have to worry about that infrastructure aspect of things. But how far are we away from that day where the operations folks are able to just streamline everything now to a single pane, so to speak, and working with that service mesh type of interface when they can just instantly or near instantaneously deploy and set governance standards and compliance standards, et cetera. 

Andrew Jenkins: So there are some organizations that I think are already pretty far down this path with Istio, and Istio has a bunch of great blog posts where users come in and talk about the ways that they’re using it and configuring it. And so there are some organizations that have kind of already built a whole lot of this around Istio. And so I think the thing that we’ll start seeing, though, is that rather than having everybody having to invest a whole lot in service mesh expertise to get that outcome, there will start to be kind of some common best practices, implementation pieces, things from things from vendors and the ecosystem that sort of simplify this so that you don’t – the amount of effort that an organization has to invest to get that benefit will go way down and that will cause the adoption of this to go to increase dramatically. So it now takes investment to do this all yourself. I think that we’ll start to see that become a whole lot more easily adopted into organizations going forward. 

Zack Butcher: Yeah, I think that’s spot on right. As far as something like a single pane of glass goes, if you want to look at things like Kiali, for example, is a good example, I think in open source of starting to build that kind of thing out right where Kiali is an open source telemetry system on top of Prometheus, that ships with Istio and gives a set of dashboards and nice visibility. Right. It’s not a single pane of glass and that you can’t do policy control there. It’s only visibility. But that is actively being – I think that’s actively being worked on both by vendors as well as kind of in the community as well. So I think that that is not a very far away world. I do think, though, Andrew is exactly correct. When he was talking about standardization and picking interfaces as a community? The simple matter is like we’re still this is still very early days for the mesh. Right? These are still we are still learning and developing best practices and we are exactly doing exactly what interests that. Right. As a community together. I think the important things that we start to standardize now are not necessarily APIs and interfaces, but practices, techniques that are standards for deployments. Those kinds of things, I think are are really what’s needed badly today, where we can look at things like nicer unified interfaces or potential for APIs over top of multiple meshes. That kind of thing, once we better understand what really the APIs that we actually need are right, because it’s still just very early for that kind of thing. 

Bruce Cameron Gain: Indeed. And at the same time and this was one of the questions, actually. So actually, I’m going to rephrase the question. Just formulated another question with you based on your answer. And it is indeed in the early days of service mesh, and Kubernetes actually, and you have the main features, observability security, obviously, traffic management. But to rephrase my original question is what’s not the most important necessarily out of those three features, but which of those features needs the most work? And actually security always needs work. So maybe where do we really need to see some improvements in observability or traffic management or both? 

Andrew Jenkins: You know, there’s always room for improvements in both, and I think especially in Istio, my feeling is there’s this sort of stable foundation and a lot of room for innovation and some on top of that, including some infrastructure Istio implementation advancements, that makes that iteration easier to do more rapidly so that we can make progress on a lot of different fronts. I’ll tell you that when I was thinking about what was most important, I’ll tell you that in the early days I was totally wrong in that my thinking was that traffic management was going to be really like the very biggest conscious, spectacular feature coming out of a service mesh. And what I found in the early days for sure was that people needed that observability foundation first to even understand all of these cool new pieces that they had deployed in the cloud, how they’re interacting, even going back on like the security front, what’s talking to what, is it secured. What does it map back to in my security policy, they needed that way before they could start thinking about cool novel ways of doing experimental deployments, canary deployments, progressive delivery. So I think that though there’s been a lot of progress on observability and there’s a lot of foundational work, so I don’t know if it’s the most important, but I bet you that in the near future we’ll see a lot more emphasis on the cool things that you could do with advanced kind of traffic management. 

Zack Butcher: Yeah, yeah. No, I think Andrew is pretty spot on there. Again, with respect to what is the biggest thing to improve, I think we always need to look at what does it take? What do what do I an operator or me, an application developer need to do to start to actually realize value from Istio in my environment? Right. And so that cycle, that that time, you know, what configuration do I have to learn? What configuration do I have to write? What ways can we remove that configuration? That’s a big thing to to improve. In my mind, looking forward at the project is exactly like you said, we have a very solid base in terms of the capabilities that exist in the system today. Then kind of the answer to your question around what is the most important of those kind of three pillars. And in my mind, there’s kind of two answers to this. So one of the answers is that none of the three, because it’s kind of the one of the key value adds of the mesh that it brings all three of these together. Right. And Andrew kind of alluded to that some in his answer, like you need the observability to be able to see and understand what’s happening with the traffic, to be able to get a handle on the security in your system. They all kind of go together in some way. 

Zack Butcher: Right. I can say from the perspective of some of the people that I worked with the most important and some of the companies that we work with, some of the most important features for them are the security, side of the house, because we work in a lot of financial industry, I should say. And so for them, when we look at kind of the it’s still early days. It’s still kind of expensive to adopt a mesh. And they’re in for in their world, it was security was the killer thing that was most important that that gave them value that warranted adopting the mesh. So I think as well, it’s a little hard to say because it’s going to depend on your specific set of use cases. I totally agree with what Andrew says, over time, I think traffic will grow into one of the most important pieces because the observability and the security parts are really kind of table stakes. You have to have those and they need to be present in your system and they need to be configured correctly. And that gives you the insight into what’s happening and that gives you the assurance that you have control over the system. And then the traffic part is really what application developers start to deal with day to day. 

Bruce Cameron Gain: How is this analogy and please be honest, if you don’t agree that it’s applicable. But I’m just thinking speaking of that, I was thinking about of a very high performance car, say, a Tesla, for example. And you have obviously extremely high levels of torque and speed, that’s one component. You have the the user interface, so you have this magnificent screen, the middle. I don’t know if you’ve seen this or not. It’s beautiful. And then you have as well the driverless capabilities. And then the third component, obviously, is security and certain ways to keep you safe. So either of those three once negated or one stop working properly. That’s just not going to offer a proper driving experience in a Tesla. And for me, I see that analogy with the service mesh. 

Zack Butcher: Exactly. Exactly. Yeah, I think that’s a really apt analogy. Right. They really work best in concert and they make sense in concert. You know, there have been these three verticals have existed for since computing has existed. Right. And these have been separate spaces. And there are people that have really compelling products and can do really interesting things in each of the spaces of observability, of application security and application traffic management. But the real kind of game changer in my mind with a mesh is the way that it brings all of those together under a single centralized consistent control, which is that kind of control plane that gives me the single point to to configure. 

Bruce Cameron Gain: Andrew, you mentioned this a while ago in this article you wrote, and it did extremely well. But it’s a subject you guys might not necessarily like to talk about. But in some instances, you don’t need service mesh. And or at the same time. Could you argue, though, in fact, if you’re deploying on a Kubernetes environment, not counting the serverless, but the cloud native environment, especially when you have several of our different cloud environments to manage? Are there instances where you don’t need a service mesh and why? 

Andrew Jenkins: So I think there are some I mean, I don’t think it’s all on us to say, hey, everybody absolutely must use this new thing. Right. There are actually problems where you don’t need Kubernetes. And even if you look at like you may not need containers at all or if you look at serverless. Right. There’s another thing beyond that, which is there’s no-code kind of codeless application development where I don’t even write code. Well, you can in some cases and in other cases we’ll know it’s really more suitable to actually write software, right, write code. And so there is always this continuum of kind of what what pieces you need. And it’s definitely not the case that all problems are solved by a service mesh and require a service mesh. Zack talked about, especially in the early days, how the security benefits were really key for some of the users that he was working with to justify the investment in the early days of Istio to adopt Istio and use it. And so where we’re at now is I think that the security benefits of adopting Istio are at least as good, probably even significantly higher for all of those organizations. And hopefully the cost of adoption continues to go down as folks like Tetrate and Aspen Mesh and everybody else works on improving the Istio experience. And so it becomes even easier to adopt. But let’s be honest, it is a thing, service mesh is a thing that you have to understand at least a little bit about. And so there are some problems where you have very few services communicating or you have a very limited amount of ability to insert a service mesh where it may not justify the effort that you’re going to invest in trying to understand or deploy or implement a service mesh. And I think that as the cost of adoption keeps going down, those become fewer. But that doesn’t mean that that will always be the right answer. 

Zack Butcher: And if I can just parlay off that, I think Andrew is exactly correct. And even what we’ve seen and will continue to see more and more is that even within a single organization, there will be use cases that do not fit the mesh. Right. So I was talking a little bit ago with a company that does a lot of video streaming and for video streaming, a mesh doesn’t provide them very much benefit, but it adds latency in their critical path. It gives them negative things on that side of the house. However, they have a whole API side of the house, too, that where people go and interact with their products and stuff like that, where a mesh does make sense. Right. And so even within the context of a single organization, you’re going to see sets of applications or sets of use cases where it may or may not make sense as well even. And that extends out to the entire organization. 

Bruce Cameron Gain: And regardless, I’m supposing that in most cases, whether an organization presents – taking the video streaming company – for the developer, that really doesn’t matter that much. I mean, whether they’re not worrying about files, for example, don’t worry about YAML, etc., don’t worry about service mesures, whether there’s a service mesh underneath the covers, so to speak, it’s kind of immaterial usually or almost exclusively for developers or not? I mean, what do they need to know? I mean, then what, what or how does their lives or how did their lives change either way when there’s a service mesh or not. 

Zack Butcher: There’s a kind of a mix? Right. And part of this depends on how your organization has decided to approach a service mesh. And part of it depends on kind of how mature you are in that path. Right. So in the extreme, it kind of in a fully mature organization, the real goal and this is the goal with DevOps right. The goal with DevOps is to get developers doing the operations for their own services. Right. Is to get them involved in production. And, you know, whatever we say about the phrase, that idea is good. And in the extreme, a service mesh enables that right. It gives you the ability to put in the hands of individual developers that control over how their application behaves at a higher level way without having to go change code and things like that. So that and I believe that for most organizations that are adopting a mesh that is a desirable instinct, that any of their developers can go kind of reach under the hood and use the mesh to make their applications better, to achieve whatever they need to achieve with it using the mesh. But then there’s the question of like, how do you get them into that point and how do you actually enable successful adoption in an organization in that. 

Zack Butcher: Right. And so what we typically see is that the path for adoption starts with hiding the mesh. Right. Get the people operating the system, the kind of platform to get them to install a mesh and start to use it, start to onboard teams, start to provide some of the underlying visibility with it, start to provide some of the underlying security with it, maybe just do broad traffic related things. Right, that are kind of one size fits all. And then as they gain confidence, start to do more with it with respect to things like traffic management, start to give their own developers more control as they as they get more confident. So I think it’s a spectrum. Right. And then the other side of that, too, is how much does your organization have a pass or try to hide underlying infrastructure in general is going to have an influence on the amount that a developer needs to interact with a mesh or has to interact with a mesh. In general, the instinct should be developers should be able to control their own traffic. Probably the platform team should be able to control the other things. 

Bruce Cameron Gain: And we had kind of touched on this before we started our conversation or the recording excuse me, we’re talking about maybe there are alternatives out there that it’s platforms where there is indeed a service mesh type of functioning or functionality. But even for the operations team, it’s transparent and they don’t really have to worry about managing that. Is that a viable scenario or is this maybe something that is being promised that might not really work? 

Andrew Jenkins: There are definitely platforms that kind of include baked in service meshes and kind of management around a service mesh. I would say that their goal is to sort of make the the downsides as transparent as possible in managing and upgrading and things like that. But hopefully there’s the upsides are still sort of service. Observability should still be driven by the mesh. The kinds of policies that you can enact or traffic management that you can do is still driven by the mesh. And so in that sense, your developers or operations folks, the platform team, are still interacting with the mesh, even if they don’t have to interact with it as a completely separate component. And so I think there’s still the fundamental principles of service mesh apply. It’s just that there are some cases where all the choices that the platform has made around how it’s going to use a service mesh may match up one to one with your organization. And so therefore, there’s no benefit for you swapping that out and doing it all yourself. That happens sometimes. But then I’ll say that we’re also, I guess by nature, seeing a lot of cases where we’re talking to users who want a little bit of a deeper level of control. They want to be able to do some special things in the service mesh, you know, even as simple as adopting their own upgrade path for the service mesh component or having it be consistent across different platforms where they may want to do or make some choices differently than what the platforms already made. But in all cases, hopefully your developers are getting to utilize the benefits of the service mesh, whether it’s sort of baked into the platform or whether it’s something that a platform team is operating at a more custom level. 

Bruce Cameron Gain: And as far as observability goes, it’s and with Istio, it seemed as if observability is of micro services, because it seems that as if that is the key capability of Istio. Would you agree or not? 

Andrew Jenkins: I’d agree that it’s the first thing out of the box that makes a positive impact in your life as a developer. I’ll say that. 

Zack Butcher: Yeah, for sure. I can totally agree with that. Right. Like as far as day one experience goes, that is or day zero, like observability is the key from it. I would argue that identity is the single most important feature of a service mesh and in fact, identity is kind of the key thing that it does and everything else stems from identity. But that’s kind of partly philosophical. And then we can go into the weeds on that one. But yeah, in terms of user facing features, observability definitely wows from the start. 

Bruce Cameron Gain: Is it possible to maybe just a few sentences, dig down a little deeper? Why identity is such a key feature? 

Zack Butcher: Communication doesn’t matter unless you know who you’re communicating to and with. Right. What metrics are you producing? What are the metrics about? Unless you know, the client, a server that you’re communicating, how can you know, so everything in the system really stems from knowing who you’re communicating with and from that sense of identity. Right. And from that we can have policy. From that we can talk about how traffic flows and where traffic flows. Right. What is a destination in your message to send traffic to the thing with an identity we need the name for? We need a handle for it first. So what do you report metrics on? It’s a service. That’s a thing with an identity. It really all stems from having service as a reified concept, assigning identity to them at runtime, being able to use that at runtime to know who you’re actually talking with and everything else in it kind of follows from that. So that’s why I say it’s a little philosophical in that like, yes, we can’t communicate without having an identity. Yes, we can. But who are you really talking to and how can you trust those metrics? How can you trust that communication and what is actually happening there unless you know. And so that’s why I say it. 

Bruce Cameron Gain: Andrew, is that is that a feature that was prevalent or very wow feature for you at the beginning that might have evolved and changed? 

Andrew Jenkins: It’s a key part of the scaling beyond just one cluster right. This identity problem is like something approaching tractable in a tiny, self-contained environment like one Kubernetes cluster or something like that. But as you start distributing this planet scale or across data centers or with organizations that are hybrid or with a system that’s so large that it changes so quickly that it’s really hard, just to write down all of the identities of everything all at once in one place. Then you need something smarter and more flexible. And it’s already built into service mesh this ability to handle identity at a large and very flexible and very rapidly iterating scale. And so I don’t but that’s kind of like the day zero kind of thing. I think that it wasn’t first on a lot of users minds as a thing that they need. And unfortunately, because it’s already built in, it may actually be one of the things that is harder to notice, that it was so key to helping you sort of scale up. But it is absolutely crucial. It’s the part of security that’s like it actually is somewhat of a solvable problem to be able to talk to some pod in some other Kubernetes cluster. That’s not it. It’s about just like Zack said, knowing what it is, knowing who is the other end of the thing that you’re talking to and then being able to use that as a foundation for policy and all this other stuff. 

Bruce Cameron Gain: The new version of Istio had just been released. But what’s the key feature or what do you love about it the most? And what do the people migrating to Kubernetes today and looking at a service mesh? What are they going to like? 

Andrew Jenkins: I have two answers here. One is really boring and that’s good. Support for these – this is an important for me – elliptic curve crypto certs for TLS between pods, which is totally not all that mind blowing of a feature, but it shows kind of the state that Istio is in where now it’s got a lot of capacity to circle back and flesh out requirements, make sure that we adopt organizational requirements, policies, things like that. So that’s just a great example of the kind of maturity side on Istio. The other thing that’s been kind of developing over a couple of releases and is getting more and more mature and is really big in one five is WebAssembly support. And that’s going to be a way to extend Istio and especially the side of our Envoy proxy with a more portable and rapidly evolving way, rather than having to build very low level components in the system. And I think that that’s going to be great because it will allow developers to extend the capabilities, the service mesh. But without all of that having to happen in this crowded core where stability is an extremely important concern and that can be a natural drag on innovation. So we’re kind of opening up the WebAssembly front allows us to sort of do both stability and an open door for innovation. 

Bruce Cameron Gain: And that’s mainly relegated to the JavaScript side of things. Or is that maybe a wider thing? 

Andrew Jenkins: So WebAssembly is cool, because it’s kind of like the concept of JavaScript that, hey, it’s this language that can run anywhere and it’s in everybody’s browser. So it’s that conceptually, but with a lot of the technical reasons why JavaScript might not be a great fit for kind of low level applications. So you can while WebAssembly is this output format. You can do input into WebAssembly in many different programming languages like or JavaScript if you want. And that’s an important part of of broadening that that ecosystem. 

Bruce Cameron Gain: That’s fascinating so that’s kind of working on the giving programmers, engineers accessibility to improving things for the application experience with the infrastructure changes and configurations possibly. Is that correct? 

Andrew Jenkins: Yeah, yeah. 

Bruce Cameron Gain: And Zack, what are your thoughts about that? 

Zack Butcher: Yeah, Andrew took all my answers. No, I think the single biggest thing for me about Istio one point six is that like it’s kind of a boring release in a lot of respects. And I think that that is like the ultimate goal of any infrastructure project. Right. Like I in many respects, I am very happy when there are not big earth shattering features. Right. So I look at things like upgrading to one point six, for many people will be the first time that they use the operator to do an upgrade, because that was, I believe, made default in one five, maybe it was one four. But things like making the lifecycle management easier going into this next release are some of the big things that I think are really, really big and key for people, whether simply, like Andrew said, I think is is really, really going to be an awesome enabling technology in the future. Just when you would ask about JavaScript there today, actually Envoy only supports C++ and Rust for WebAssembly in there and Go is in very early stages as well. So actually there’s not even JavaScript SDKs to use with Envoy today because Envoy has to expose an API and when we program it, you need an API, a handle to that API in your programming language. Right. And so today there’s only been C++ and Rust that have been implemented semi-officially and then there’s a Go one as well. So those are the big things in my mind is just keep it going and keep upgrades are easier. I think you need even less configuration than ever before now to to do the installation and upgrade as well. And so those to me are kind of the big and exciting things now. The more boring an Istio release notes can be, the happier I am, because I think that that shows how the project is maturing, how we’re able to spend time going back and addressing not the 80 percent use cases, but the 20 percent use cases. Right. And that to me, is the really interesting stuff.

Bruce Cameron Gain: To what are you working as far as under the Tetrate umbrella? What are you working on now to solve those 20 percent use cases? 

Zack Butcher: Yeah, so generally speaking, what we’ve kind of what we’ve been doing is working hand in hand with some companies and getting at a large scale, getting service mesh into production with them. Right. And what are all of the things that need to happen to make that happen? You talk a little bit about kind of single pane of glass stuff. So we’re building out that kind of thing. Right. As an organization, I need centralized controls and that kind of thing. And so that’s the general theme kind of build out the sets of tooling and infrastructure required to get a mesh actually adopted in a real large, large enterprise. 

Bruce Cameron Gain: And Andrew, as far as Aspen Mesh goes, what what are some of the challenges that you’re working on at this time? 

Andrew Jenkins: Yeah, we’re really talking in circles and building on each other here. So I think right now, Aspen Mesh is taking a turn around some of the release and integration stuff, that is something that if it’s done right, it’s really powerful and advancing for a project. But it’s not necessarily anybody’s absolute favorite thing to do. And so we’ve been we’re stepping up to the plate around some of that at this point, there’s also there’s been some security stuff, interestingly, kind of going back to Zack’s discussion around identity. We have some users who have existing very large systems that had concepts of identity based around existing kind of concepts like domain names and TLS infrastructure. And so helping bridge the gap between what what they’re doing now, what they want to do in the future. And this migration of there will never be – there’s no way that we can just jump to the future. We’re going to have to evolve point wise from from where we are to where we’re going. So a lot of that is adding some foundational components to make sure that those identities are flexible enough to address the the use cases that are not quite as easy as, I’ve got a brand new fresh container application that I’m just going to stand in my Kubernetes cluster. It’s this brownfield hybrid environment. 

Bruce Cameron Gain: Excellent. And you were one of the earlier developers of service mesh. If I as I understand, I was wondering if you could describe maybe briefly that you know how that’s evolved. And you did already a little bit about some of the many wrong turns that have been made, especially for the open source projects. But in Istio, where is this all going at the same time? What’s next for Istio and service meshes? 

Andrew Jenkins: I’ve worked on projects even before the coining of service mesh or the term Istio. I worked on projects around how to connect applications flexibly and especially kind of as as things move to containers. Istio really changed the game in terms of broad open source adoption and an API that really natively matched up to a developer API that policy objects and things that really natively matched up to Kubernetes very well. And so that’s why Aspen Mesh is built around Istio as kind of a foundational component and why we do some of the things in the community that we do to help. We keep the underlying project project healthy. Going into the future of service meshes, I do think now that people have got it in their hands, they’re getting it into their clusters more and more. They’re starting to build applications that don’t necessarily bring along all of the components that a service mesh also provides. They’re starting to say, oh, we actually can delegate all of that stuff to the service mesh. I think that there’s going to be two big fronts that we’ll see. One is service meshes that span and interact across infrastructure components. So it won’t just be a Kubernetes cluster. You will have service meshes that your organization manages that may include virtual machines, that may include many different Kubernetes clusters. They may stitch all these things together in a way that’s secure, that maintains identity, that’s still observable. So that’s one that’s kind of adoption across a bunch of different clusters. And the second one is novel ways of deploying and managing applications built on capabilities of a service mesh. So this is kind of the progressive delivery and canary rollouts and things like that. I think that’s been a wishlist item for a lot of large organizations. And I think that with Kubernetes, containerisation and things like a service mesh, it’s going to be a lot more practical for them to actually start building on that and getting value in their application lifecycle. 

Bruce Cameron Gain: And again, I think based on what you both said, it might move in the direction of being more applicable to the data center and on premises, model it as you are migrating to cloud native environment. But maybe we’re going to move in a direction where the service mesh will be more applicable to the on premises deployments as well? 

Zack Butcher: Yeah, yeah, for sure. That’s like that’s actually a primary thing that Tetrate works on. And when I talked earlier in the podcast about the fact that data centers are not going away, we’re going to have over the next 40, 50 years. Right. That’s exactly acknowledging that the mesh has to span this heterogeneous infrastructure. I’ll use the term legacy is a dirty word because it’s not – that’s actually the stuff that’s making money in most organizations. Right. It’s the you know, so you have to go back into what is the brownfield. So I think that’s one of the big areas that it’s going. I think if I and I think Andrew is exactly right. So we’ll see better development experience, see it become more pervasive across more environments, that kind of thing. If I just even a little bit longer view, maybe this is a little bit too far. But I think Andrew is exactly correct that in the near term of the next five years or so. If we look a little bit further, I get to work with, like some folks, like the Open Networking Foundation a decent bit, and they do some really interesting things around adding software defined networking and telco standards and stuff like that. And where we see it going in there really, really long term is that it just becomes part of the network. Right. If you look at kind of the dream of what Istio wants to do, if you look at the capabilities that Envoy has, SDN is kind of approaching this from the bottom up, Envoy and Istio and these ecosystems are kind of approaching it from the top down, from the application down. And I think the real beauty is going to be eventually we’re going to meet up right and these kind of these capabilities that the mesh brings are going to be a transparent and ambient part of the network that you’re in. And that’s the beauty and boring, right? That’s when that’s when we’ve made it right. When when your service is like in the kernel and it’s just boring and it just does it. That’s the goal. 

Bruce Cameron Gain: Even for the operations folks. Right? I mean, that’s – that already, the developers get to do their magic. They can do their fun work, they get to create their applications. And then the operations folks are struggling with the security more so. And maybe they’re trying to look at ways to automate things. And maybe in five years they’re not going to have to worry about service mesh as you said. And at the same time, for the for the developers folks, it’s going to be business as usual, except maybe as far as what you brought up before, the programming languages are going to be more applicable or the choice of the menu of programming languages for certain applications used with a service mesh become much larger at the same time. So I guess you would have both, the best of both worlds. 

Zack Butcher: Yeah, I think the real goal is that eventually as an application developer, what I really want to be able to do is guarantee quality of service for my application. Right. So I want to be able to say, hey, for these types of traffic, this is the quality of service that my application needs to provide and the network should go and do what is required to implement that quality of service, whether that’s pushing it to switch pipelines – we can do, we can do per request HTTP packet switch – in a switch I can handle HTTP packets if I really need to write or I can do it in the NFV or I can do it in userspace. Right. And so we’re going to see kind of trade offs on that spectrum transparent to the user, but based on things like quality of service. And that’s where I hope we can start to get away from this. I think of almost like programming a service mesh today, like publishing individual routes before we had BGP right. You know, it’s very manual, very finicky, very one off. And we need the sets of technology that start to make it kind of more automatic, more transparent and just work completely. 

Andrew Jenkins: This is the exact right analogy. When developers start today, they don’t really worry about how to retry packets over the network because the network might be unreliable and lose packets. That’s a solved problem decades ago, but it’s built in. They don’t worry about parsing HTTP request responses. It’s built into some some library that they can use. But they have had to worry about some higher level reliability concerns or addressing addressability concerns, things like that. And as Zack says, when we get to the end state and it’s just sort of pervasive and built then we’ll know success because there will be a whole new class of things that they don’t have to worry about. And we’re starting to see that in some environments. This already happened in containers in Kubernetes. That is, you can already delegate, hey, how do I find the best instance of this service, you can delegate that down to a service mesh. How do I make sure that I’m talking to a secured version of this, that I know the identity of? A service mesh can do that. And so if this is universal for all programs everywhere, because of a combination of service mesh implementations like Istio and equivalent capabilities in NICs and switches and things like that, then that’s that’s a massive success because this is the whole developer thing right. Now there’s a whole class of problems that they don’t have to worry about, that don’t slow them down. They can focus on the next higher level thing. 

Bruce Cameron Gain: Well, I wanted to thank you both very much. Zack Butcher, founding engineer, of Tetrate, and Andrew Jenkins, co-founder and CTO, Aspen Mesh. 

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 



abstract technology encryption graphic

Service Mesh University

Catch up on all things service mesh in these seven, on-demand videos with the experts that help you learn more at your own pace. Everything is organized into bite size sections including:


closeup photo of clouds in a puzzle

Manning eBook: Solving Microservices Challenges with Service Mesh

Based on our knowledge of service meshes and the lessons we’ve learned helping users adopt service meshes and build advanced applications on top of them, Aspen Mesh and Manning have put together a comprehensive guide on how to apply service mesh to containerized applications. Chapters include: