Tutorial | Istio Expert Shares how to Get Started with OS Istio from Install to mTLS Authentication & Traffic Control

Tutorial How-To from Solutions Engineer, Brian Jimerson

You’ve got Kubernetes. Now hear an Istio expert share what it takes to get started with OS Istio, and learn how to get mTLS authentication and traffic control right out of the box.

As microservices become the de-facto standard for modern applications, developers and platform operators are realizing the complexity that microservices introduces. A service mesh such as Istio can address this complexity without requiring custom application logic. Istio adds traffic management, observability, and security to your workloads on Kubernetes. Brian will share first hand what the clear advantages are and how to get started, including:

  • Installing Istio on your cluster
  • Enabling mTLS authentication between workloads by default
  • Getting better visibility into application traffic
  • Controlling traffic between applications
  • Controlling ingress and egress traffic with Gateways

Read the tutorial or watch the video presentation to learn why Istio is the technology leader for service mesh and production-grade networking for microservices.

About Brian Jimerson, Aspen Mesh Solutions Engineer

Brian Jimerson, Solutions Engineer, Aspen Mesh

Brian Jimerson is a Solutions Engineer at Aspen Mesh by F5 with 15+ years of experience driving better platform performance. He has hands-on expertise moving organizations to a cloud-native platform, optimizing critical application delivery at scale, and leveraging OS Istio to help run cloud-native applications at scale.

Not up for a video? Below is a recap of what was covered during the Tutorial Presentation from Solutions Engineer Brian Jimerson.

Istio Expert Shares How to Get Started with OS Istio – from Install to mTLS Authentication & Traffic Control
Brian Jimerson, Senior Solutions Engineer, Aspen Mesh

What is covered:

  • Microservices and how they’ve come about
  • Challenges that they can present to customers at scale
  • Istio as a service mesh and what problems it’s trying to solve with microservices
  • Istio installation and options
  • Encryption and authorization in Istio
  • Complex traffic management in Istio
  • Observability and visibility of Istio and microservices

About Aspen Mesh

Aspen Mesh is part of F5 and we’re built to bring enterprise-class Istio to the market. We have a product that is built on top of Istio to help manage it at scale and to give you visibility, insight, and configuration of multiple Kubernetes clusters in Istio. We also have services and support for large-scale customers and we’re also a large contributor to the open source Istio project—we’re in the top five in terms of contributors for open source Istio measured by pull requests.

Microservices

I think it’s important to set the stage and talk about what Istio does and how. First, looking at the current landscape of the enterprise software development industry, over the last five to seven years—I think most companies that have been historically brick and mortar and stagnant in terms of software delivery have realized that they need to build software and to use software to deliver innovation and keep up with their competitors.

Most large companies that have been around for 100 years didn’t have those mechanisms in place. They were building large monoliths that took forever to change. If you needed to update or add a new feature, it could take months to get that feature into production and they just couldn’t deliver rapid innovation.

A large part of that is just the nature of a monolith. You have multiple teams contributing to the same deployment. You need QA schedules and testing. If you made one feature, you had to deploy the whole monolith and that took forever, and it just wasn’t working.

The notion of microservices has been around for a long time and it has quickly become the de facto development architecture for these companies because microservices allow individual components to be deployed without any concern about other features and functions and other microservices.

What is a microservice?
I like Adrian Cockroft’s definition the best. There are some others out there like Martin Fowler, but a microservice is a loosely coupled service-oriented architecture with bounded context. This allows development teams to move fast, move individually and deliver innovation through software much better than they could before. I think it’s important to unpack this this term a bit because this has set the stage for what a service mesh and Kubernetes does.

The first part of this definition is “loosely coupled.” What does “loosely coupled” mean? This is not a new concept in development. You can generally test to see if it’s loosely coupled if a service must be updated in concert with all the other services, it’s not loosely coupled. You should be able to update one service and deploy it and not have to redeploy all the other services within that system or that application.

With loose coupling, instead of having a monolith that speaks in memory to different components, you now must have a protocol and a contract to communicate with other services. As long as that contract doesn’t change, you can independently deploy services without worrying about other services. Typically, those protocols and contracts are http with the rest API, or maybe you’re doing asynchronous coupling through a message broker over TCP, but that loose coupling means that you’re now doing communication over the network instead doing it in memory for an application for monolith.

The other part of this definition is bounded context. The idea with a bounded context is that a service does one thing and one thing very well and it doesn’t concern itself with other parts of the system. You know you have a bounded context if you don’t know about your surrounding services so that means that you don’t have to worry about those other services, you just do your one thing and do it very well and you communicate with them through a contract and a protocol.

These two terms are inherent in this definition and imply that there’s a lot of network communication between services that are participating in the system. Whereas in the monolithic world, that was traditionally in memory, maybe some outside calls to an API or database, but for the most part different libraries and components within a monolith just communicate with each other in memory.

Microservice Distribution
This creates a massive growth in network traffic between services. The graph on the left (see below) is kind of a contrived system but it’s not an atypical system. You have six, seven, or eight microservices and because of the loose coupling they all need to communicate with each other over the network instead of in memory.

Because it’s a bounded context and these services don’t necessarily know much about other participating services, there’s the potential that any service could have to call another service for some reason, so you must assume it’s going to happen. You can see the huge amount of traffic increase over the network as opposed to historical monoliths.

The diagram on the left depicts a very simple example with seven microservices—probably a small application or system. For comparison on the right, this is the Netflix call graph. Now that’s the other extreme. Most organizations are going to be somewhere in the middle, but you can imagine they will still have a lot of network complexity.

Microservice Distribution

Challenges with Microservices
The introduction of all these network calls and communication introduces a lot of new challenges that historically development teams haven’t had to deal with in the past. One of the most basic challenges is “how do I register my service, what is my address and how do other services discover my address?” Also, “how do I route and do load balancing between services?”

Challenges with Microservices

In a large system, you must expect that there’s going to be degradation in performance and there’s going to be services that go down. You have to be able to tolerate that from a calling service through various techniques like timing out and circuit breaker patterns.

Additional questions you want to be able to answer include “how do I see what’s going on within my system? How do I view network traffic and latency? How do I view what services are returning errors or are slow to respond? How do I aggregate different APIs and different services into a single API? How do I transform between protocols and data through there? How do I manage configuration of a highly distributed system?”

Kubernetes is one of the first what you call “cloud native” platforms that tried to address some of these questions and it’s done a really good job on the compute side, the storage side and how to schedule workloads, but it doesn’t do much in terms of the traffic between participating services and ingress and egress. There’s a need in the market to handle the network side of things now that Kubernetes is being widely adopted.

What is Istio?

There are other service meshes out there outside of Istio, but service meshes—and Istio in particular—are meant to address a lot of these challenges that are being introduced by these highly distributed loosely coupled systems. It tries to offload a lot of these challenges to an infrastructure component much like Kubernetes did with compute and scheduling, by allowing you to connect and secure microservices to give you sophisticated routing and traffic management, give you visibility into what’s going on in the system, and applying security.

I think one of the important things that Istio and Aspen Mesh and all the different vendors of Istio have realized is that this cloud native world introduces a new way to implement security. The old security controls don’t really work in this new world for several reasons. We don’t want organizations to take a step back in their security posture and so security is first and foremost across the board with Istio.

What is Istio?

Istio Architecture
What does Istio look like? Istio has two different components in its simplest form. It has a control plane which is exactly what it sounds like, much like Kubernetes control plane. There’s a component called “istiod” or a pod called “istoid” which does all the control functions within the service mesh. And there’s the data plane that’s comprised of running workloads, and also what’s called a sidecar proxy, which is a container that runs within the same pod as the service. It’s actually Envoy, which is an open source project from Lyft and that comprises the data plane. Any pod that has an Envoy sidecar proxy injected into it is part of the data plane and istiod will send information like configuration and certificate rotation, installations, telemetry, and things like that.

One thing I want to point out is istiod used to be three separate components: “mixer,” “pilot” and “citadel” and you still see references to those names. They’ve all been collapsed into a single istiod component. If you see references these terms just know that’s part of istiod now.

One of the things in the data plane that Istio does is it automatically injects an Envoy proxy into each pod that has one or more microservice containers running within it. Part of that injection is that the istiod will modify the IP tables for that pod so that all traffic leaving the pod or coming in from external clients to the pod goes to the Envoy proxy. Then the Envoy proxy will communicate with the microservice through the loopback interface so the microservice is never exposed directly to any network interface that can go outside the pod. In this case it’s all Envoy.

Istio Architecture

Installing Istio – The 4 Commands
How do you get going and install Istio? It’s easy to get a default installation of Istio. In the Istio distribution which is open source, there’s a binary CLI called istioctl, and if you run istioctl install against the Kubernetes cluster, it’ll install the control plane. Then you can label create a new namespace and label it with Istio injection or existing namespace with Istio injection and any workload deck or any pod that gets deployed to that namespace that’s labeled will automatically have the sidecar injected into the pod.

Installing Istio

Installing Istio – Commands Installed
If we tie that back to the original architecture what happens is when you run the istioctl install, that creates a namespace called istio-system which is the root namespace for Istio and it installs istiod and related components. It might install an ingress gateway and egress gateway if you want that as well and some observability tools, but all the control plane items will get installed into that istio-system namespace.

When you create a namespace and label with Istio injection, there’s a mutating webhook that gets fired when a new pod is deployed to the namespace that has that label. That mutating webhook will start an init container that modifies the IP tables so that the service only listens on the loopback address and also installs the Envoy proxy, and iptables will then route all the traffic to and from that proxy.

A couple of interesting things about this approach: One, the service itself is transparent. As long as you’re not binding specifically to an interface or IP address that service never really knows that there’s a proxy that’s intercepting all the traffic to and from it. The other nice thing about this is that it doesn’t really matter what language your service is written in. It could be node or java or whatever, the proxy itself is just communicating via network traffic. It gives you a lot of flexibility. Your development teams don’t have to do anything special aside from some configuration for routing and this all just works.

A question you may ask is when you create something, is the webhook going to add a bunch of the information that Istio needs and you don’t have to change your services? This is correct. We’ll look at some of the configuration, but there’s nothing special your application needs to do. It can reference hosts by the same host name, it can make calls it’s all transparently proxied through the sidecar proxy based on the mutant webhook.

Installing Istio - What the commands installed

Installing Istio – Answering a Few Questions
Again back to this example when you do a kubectl apply you know whatever your application is the istiod and the mutating webhook are automatically going to grab this pod that gets deployed from this deployment and inject the Envoy proxy and modify the IP tables for that.

Another question you may ask is what are the benefits of Helm versus just using istioctl? It requires a little more work with Helm instead of just using binary but it’s easier to hook into things like CI/CD tools and that sort of thing. It’s easier to override configurations. Underneath the hood they’re both calling the same thing they see a CTL and Helm they’re all ultimately doing the same deployment, it’s just what your preference is. In practice, most organizations go down the Helm path if they’re comfortable with Helm and it’s part of their process. But in a sandbox environment you want to get up and running and kick the tires on Istio, istioctl is by far the easiest way to go.

Installing Istio

Authentication and Authorization
One of the things that usually drives people to look at a service mesh like Istio out of the box is peer-to-peer authentication. If you think back to those call graphs, all the communication that happens between them, that is going over the network even though it’s an internal network and by default if you’re just running straight up Kubernetes it’s not going to be encrypted, it’s not going to be authenticated, it’s not going to be authorized or anything of that nature. It’s going to be the wild west. Anything can talk to anything in plain text and away we go.

This usually flies in the face of most security policies for organizations. They want encrypted traffic even if it’s an internal network. They want to know identities and they want to be able to audit identities of these network calls and have some level of policy around these. One of the most common ways to do this is something called mTLS, which stands for mutual TLS.

It’s your typical TLS X.509 type of protocol, but in mutual TLS, both the client and the server or the peers, both exchange the handshake for TLS. The client will initiate a TLS handshake with the server, and the server will in turn do another TLS handshake with the client. Within that TLS connection there is identity based on usually a service account and an authorization process to make sure that one service can talk to another.

This meets a lot of the requirements for secure communications between services and the good thing is that Istio does this out of the box. You install it, and it’s automatically going to have mTLS communication between pods unless you override it. It can do that because of the proxy and istiod handles the certificate management for use for rotate certificates, it revokes certificates and that sort of thing because the microservice in a proxy or over the loopback interface doesn’t need to be encrypted. You get mTLS for free out of the box and this is probably the first thing everybody looks at and says “I need a service mesh to do mTLS authentication.”

There’s also some confusion sometimes around different types of authentication mechanisms. mTLS is really peer-to-peer. It’s using X.509 encryption and it’s concerned about “is this one pod allowed to talk to this other pod.” For authentication and authorization for an end user whether it’s a browser or an external system, that doesn’t typically use mTLS.

One of the things that does work out of the box is using JWT or JSON web tokens. JWT is part of the oauth flow and an end user would authenticate with some identity management system and that would issue a JWT token that the user would then send as part of the request to an application. The sidecar proxy would then authenticate that against that issuer to make sure that the user is allowed to do whatever they’re requesting, and it’ll propagate that job token with requests to other systems as in the header. JWT is intended to be for end users to authenticate and be authorized for requests into the system and again mTLS is meant to be up the auth mechanism between pods within the system.

You might wonder if it’s terminating at the sidecar so you don’t have to do anything with your application to support this, or does it go through the sidecar at all? And the answer is yes, it’s terminated at the sidecar. The application doesn’t need to do anything with encryption or authentication or anything of that nature it’s all terminated from sidecar to sidecar.

Another question: Is your pod is going to have two containers in it? The sidecar and then your actual application? The answer is if your pod has one container normally and you deploy it as part of the data plane, it’ll have two containers. One called “istio-proxy” and the other being your application. And again, the IP tables for the pod are modified so that the only communication the container can have is on the loopback interface which is attached to istio-proxy as well. All the other IP addresses are bound to the istio-proxy. There’s nothing for the application to do. It doesn’t need to worry about certificate management which can be a nightmare. It doesn’t need to worry about binding to interfaces or routes or anything like that. It’s all handled by the istio-proxy.

Another question I’ve been asked – since the pod has both of those on the same host usually they’re only communicating to each other. They’re not leaving the hosts that they’re on either so the communication between the istio-proxy and the microservice that’s not TLS? The answer to this question is no. That’s all just on the loopback between the microservice and istio-proxy. They’re going to be on the same host, the same pod, with the same interfaces and they’re going to communicate on the loopback. Then istio-proxy will transparently proxy request responses if it goes off the node to another node or wherever it might live in Kubernetes, it’ll take care of that.

Another point I should probably call out is that there are different configurations for mTLS. Our recommendation is to enforce mTLS mesh-wide. It’s one setting when you install it and then you can override that to be more permissive if you have, say, legacy workloads that just won’t work or if you have some need for that you can override at the individual workload level. But our recommendation is turn this on just so you make sure that you know everything’s encrypted and everything’s set up.

A question I’ve been asked is if you’re installing Istio you’re probably turning on injection for almost all your namespaces, and then you just turn this off for legacy stuff instead of just not including that in an Istio namespace? By default, the namespaces that you label as this “istio-injection=enabled” is going to have every pod in that namespace and is going to have proxy injected into that pod. Which means that it’s also going to have mTLS on by default.

There are namespaces that you probably wouldn’t want to label that, so you don’t want it to be part of the data plane. Obviously like the Kubernetes kube-system namespace and your istio-system namespace and some other system stuff that aren’t running custom workloads you don’t want the label. The ones that you do label with istio-injection=enabled will be a participant in data plane which means you get the proxy, you get mTLS, you get all the Istio goodness. You can override that injection on a per-pod basis or per-deployment basis through attributes. You can also turn off mTLS and a lot of this other stuff through attributes.

What we generally recommend is zero trust, be secure as possible out of the box and if you have a specific use case where you need to override that, do it. That way your developers don’t need to worry about encryption and routing and a lot of stuff is just they’re deploying through their CI pipeline and they automatically get all the Istio benefits out of that.

Authentication and Authorization

Traffic Management and Gateways

The second set of features that most people leverage is traffic management. You can do some complex traffic management as well as controlling ingress/egress into the service mesh. Out of the box there’s a lot that you can do in terms of managing traffic to your workloads that are running in Istio.

Traffic Management
One of the things I find really cool is weighted routing. Weighted routing means having multiple versions of your application deployed side-by-side and routing a percentage of traffic to each of the versions. That’s configurable. Let’s say you deploy two different versions, you deploy a new version so you have v1 and v2 and you could send 75 percent of your traffic to v1, the old version, and 25 percent of the traffic to v2 on the inbound part.

Traffic Management

That allows you to do a lot of cool things like canary style deployments. Canary style deployments are where you employ a new version side-by-side to the old version, you start to route some percentage of traffic to the new version, and gradually increase that over time while you’re testing performance or functions until all the traffic is routed to the new version and you can retire the old version.

It also allows you to do things like A/B testing. So, maybe you have a certain group of users that are beta testers or internal users and you can route traffic to a new version that’s running side-by-side with the old version based on header attributes or other types of attributes. You can have a group of users testing a new version that you’ve released while the rest of the common users are still using the old version until you decide to cut over. These are two scenarios, but you can imagine there’s a lot you can do with weighted routing.

There’s also locality-based load balancing and failover. Not something we’ll get into today but something to be aware of.

Istio has a configuration called multi-cluster. Multi-cluster is when you have a logical Istio instance that spans multiple Kubernetes clusters. Maybe you have on-premise Kubernetes cluster and you use a public cloud like EKS as well or Platform 9. It could span those clusters or it could span clusters that are in different regions in AWS. As long as there’s network connectivity, you can span multiple clusters with a single logical mesh.

Often people have requirements  to route the users to the closest cluster to their location and you can do that in a multi-cluster configuration with locality-based load balancing. Or, maybe you have data privacy requirements for different countries and you want to route users to the country that meets the data privacy requirements for that. It’s an advanced feature but can also be useful as people start to mature on Istio.

You can also do traffic mirroring, which is like weighted routing. Traffic mirroring means you’re sending 100 percent of the traffic to two different versions, but only one version is live and the response is sent back to the client. The other version the response is sent to nowhere. This is another useful way to do feature testing where you’re sending live traffic to a mirror in conjunction with the live version, the user doesn’t know that, but you can do testing of those features.

There’s also a lot of network resiliency. As mentioned before, you must assume that there’s going to be something that’s goes wrong in this large scale. For example, if you have hundreds of microservices they’re all talking to each other over a network, there are going to be network issues and/or performance degradation of a service. You don’t want to bring down the whole system because of one of those. So, things like a circuit breaker pattern can be done with traffic in the circuit breakers where a client can handle a bad response from a server in the mesh and handle it gracefully and not just blow up itself and cause cascading issues. You can set timeouts for requests and how often you want to retry as well so you can handle a lot of that degradation.

You can also do fault injection. If you want to test the resiliency of your applications in the mesh, you can set fault or inject fault into these applications. For example, you can say five percent of the requests I want to return a 500 error and see how the client handles that. Or, 10 percent of requests I want to inject latency of 100 milliseconds or one second and see how everything handles it. If you’re in a pre-prod environment and you want to test the resiliency of all your applications, you can do fault injection. There’s obviously more, but those are the big ones when it comes to traffic management.

It’s also worth pointing out a lot of these features are what other projects have built on such as Knative and scale zero requiring retries until the service is available. Once you get a handle on Istio, you can install something like Knative and other things on top of it and take advantage. A lot of CD tools that are out there like Argo and Flux also use these features to do automated canary deployments and feature testing and things like that as well. There are a number of projects out there that build upon these features.

Traffic Management APIs
These are the four core Istio APIs that give you traffic management rules: virtual service, destination rule, gateway, and service entry.

The virtual service and destination rule are by far the most common. Read the documentation to understand the many settings in the virtual servers that you can set as it’s too much to cover right now. It gets a little confusing as well, so if you are going to play around with this in a sandbox, the virtual service/destination rule relationship is a little confusing and honestly sometimes I wish they’d just combine them because they overlap all the time.

Traffic Management APIs

Traffic Routing and Policies
In general, the virtual service configures how a request is routed to a Kubernetes service within the mesh. The destination rule at the end of that virtual service then configures how to handle traffic for that. I tend to think of virtual service being at the layer 7, and so you can do things like traffic splitting and weighted routing and anything that deals with layer 7 types of things a virtual service handles. A destination rule’s more at the layer 3/layer 4 level.

With a destination rule, you can apply TLS policies, you can do connection timeouts, things at that layer. They do tend to go in conjunction with each other—if you have one, you usually have the other. Those are the ones that give you all the sophisticated routing and management.

There’s also a gateway object. A gateway object is what manages inbound traffic or outbound traffic for an application. There’s an ingress gateway that handles all the inbound traffic and then you assign a gateway per host or set of hosts that route the traffic into your application.

The destination rule is about how to route a request and what to do with that request before it hits the pod. If internally you have service a calling service b, you could have a virtual service set up for service b that says if the URL has “/v1” send it to service b’s v1 instance. Or if it has “/v2” in the URL, send it to v2 of the service b instance. That gives you the complex routing within the mesh. The destination rule would then kick in after the virtual service evaluated whatever rules it had and the destination rule will then say I want to make sure all requests are going to terminate TLS at service b, or I want to set a connection timeout—things at the layer 4 level. They can handle external requests, but they can also handle internal requests. They’re the core building blocks of traffic management.

On the right of the slide is a simple example of a virtual service that says any traffic coming to an ingress gateway with this host, route to this back end. Frontend is the name of the Kubernetes service, the destination. You can do more complex actions in here; it doesn’t necessarily need to be bound to an ingress gateway. It could just be general traffic within the mesh as well. This yaml example handles ingress traffic to a service but it can also be traffic from another service to the service, and that’s why they created a virtual service instead of just using the ingress resource. The challenging part of Istio is there’s a lot of different pieces to chain together. In the simplest form, a virtual service is wherever the request is coming from, this is how I want it routed to the service that I’m running.

External Services
The other interesting traffic management API is service entry. There is a setting in Istio that we at Aspen Mesh recommend you turn on called outbound traffic policy. There’s a setting for that and one of the values you can set to it is registry only. What that means is if an application that’s in the mesh tries to call something outside of the mesh, maybe it’s a third-party API or maybe it’s a database or something that’s not running within the mesh, do not allow it unless it’s in the internal service registry for Istio. By default, if you set that to registry only, any calls outside of the service mesh that Istio doesn’t know about will be rejected. A service entry is a special type of API object that will add an external service by hostname to the service registry. That’s how you can control what traffic is or where egress traffic is allowed to by turning on the outbound traffic policy to registry only and then adding service entries for allowed external services.

Resilience and Testing
All the resilience and testing features that we talked about, these aren’t standalone APIs, they’re different configurations within a virtual service or destination rule, but that’s also how you configure a lot of these resilience and testing features.

Ingress and Egress – Use Controlled Ingress/Egress Gateways to Secure Your Mesh

We’ve talked a lot about traffic within the mesh, but you’re going to have external requests coming in and you’re going to have requests that are going out of the mesh to some sort of external service and there’s a concept of an ingress gateway and egress gateway to handle this.

The ingress gateway sits behind your ingress controller on your cluster and it’s an Istio-specific object, but it gives much of the routing configuration that you need. It’s another layer of configuration but it’s powerful.

One of the things you can do with your ingress gateway and egress gateway is to configure your mesh to only allow traffic going out through the egress gateway. If you have dedicated nodes for your ingress and egress gateways, it gives you a lot of capability to do things like audits or to set firewall rules to only allow traffic from those dedicated node IP addresses. It’s powerful in improving your security posture by having these dedicated gateways for ingress and egress.

Ingress and Egress

Ingress and Egress Example
To give you an example of what that might look like, below is a project I was working on before—we have an application in the blue box that makes a call to a virtual machine outside of the mesh running RabbitMQ, and that call needs to be encrypted over TLS. Through the configuration of egress gateway’s virtual service routing, we say that all traffic leaving the mesh needs to be routed through the egress gateway and then out so that you have visibility into traffic leaving. Because that connection is encrypted, the application just knows about the host name of the server and of the protocol.

The communication between the proxy running on that app’s pod and the proxy running on the egress gateway is then wrapped and tunneled in mTLS. It’s wrapping that encrypted RabbitMQ connection and mTLS between those proxies and then the proxy on egress gateway will then terminate the mTLS and send on the encrypted traffic to RabbitMQ. Another example of the app that knows nothing about this proxy it’s just sending it to the hostname and the proxy automatically picks that up and routes it through the gateway.

People may ask if the Istio proxy is a pod or is a DaemonSet running on each of the nodes, does it run on each of the controller nodes? And the answer to that is no, the Istio proxy is a container that runs within the pod. The Istio egress gateway and ingress gateway are stand-alone Istio proxies that are configured to handle inbound and outbound traffic. They are another Envoy proxy that’s deployed in Istio system namespace and it’s a pod running with just a standalone proxy container. It’s a normal deployment, it’s not a DaemonSet, it’s just a standalone and it’s configured to allow certain ports and config traffic through whether it’s inbound or outbound.

Ingress and Egress Example

Observability

The last section I want to cover is observability. One of the challenges is you’ve got all these running microservices making all these network calls and there’s an issue that the end user reports and you’re like okay well I don’t even know where to start because there’s so much going on here. Istio collects a lot of telemetry data from the data plane as well itself to give you insights into what’s going on in the mesh.

Mesh Visualization
One of the things you can do is visualize what your mesh looks like. Istio collects all this telemetry data and it can send it to Prometheus or some sort of data store, but the tools I’m showing here are visualization tools and you can use any tool you want that supports these protocols. They’re starting to move to open telemetry. I’m showing you Kiali, but there are other tools out there commercial and open source that can give you the same graph visualization based on this data.

This Kiali example shows you the call graph for a sample application all the way from the ingress gateway through services that are running multiple versions of the pods behind them. You can get data to see if these connections are healthy (these are all green arrows in this case) and the blue ones because it’s TCP. You can look at number of requests, but it’s good to visualize the whole communication path in a more complex system of microservices.

Mesh Visualization

Distributed Tracing
When you have an issue with a system it’s hard to tell where that that issue is being introduced. If you have 50 microservices that are all calling each other, which one is being slow? So distributed tracing is a useful way to go about seeing who the culprit is within a call graph. Istio does this by injecting headers into all the requests for tracing. Things like a correlation ID and a trace iD, it injects that into all the request headers and then it can send that data so you can visualize and know where there’s bottlenecks or errors being thrown that are causing an issue. In this case this is Jaeger Zipkin, there’s several distributed tracing tools out there.

Distributed Tracing

Metrics Visualization
Metrics are typically stored in Prometheus. You can use in your APM tool of choice whether it’s Datadog or Dynatrace or in this case this is a Grafana dashboard, but there’s a ton of metrics that you can build your dashboards around to look at both the Istio control plane performance and metrics, as well as your applications.

The metrics you can see in this dashboard (below), give you information about mTLS at the bottom so you can see that your pods are protected by mTLS in this case so you can see what’s encrypted and what’s not. Throughput, latency, all sorts of data you can pull into a tool of your choice.

Metrics Visualization

One question I’ve received is if Jaeger takes into account mTLS authentication time to figure out how much overhead there is for TLS? The answer to that is no, Jaeger’s just looking at the communication between the proxies in that case and then once it hits the proxy terminates the mTLS and then there’s no real data there. You can get that data from Envoy and visualize that in something like Grafana, that is a metric that’s collected for the TLS termination overhead. Kiali itself doesn’t go that deep. We have some pretty good guides around benchmark performance of Envoy and how to tune Envoy because there is some overhead with that proxy but what we’ve found is if you tune it properly for your environment, it’s minimal.

Next Steps – How to Kick the Tires on Istio

For those who want to look at Istio further here’s some quick steps to get something up and running that you can play around with. Istio is open source so there’s documentation out there.

  • Get a sandbox cluster. If I were to go about this from the beginning, I’d recommend creating some sort of sandbox cluster whether it’s local or Platform 9. Whatever a cluster is, it needs to be Kubernetes 119 or above.
  • Download and install Istio with the demo profile. They have this concept of profiles when you do the istioctl install. I would recommend the demo profile because that installs everything – the ingress gateway, egress gateway, core Istio control plane, etc.
  • Install the tools of your choice. It doesn’t have to be Kiali, Jaeger and Grafana but these three are common ones. If you have other preferences that’s fine.
  • Install a microservice-based application. There’s a bunch of these applications to demonstrate a lot of the features in Istio, but bookinfo is shipped into Istio distribution. Aspen Mesh has one called catalog-demo which is a set of microservices as well that you can download and install.
  • Explore key features like security, observability, and traffic management. There’s a lot of good examples in the documentation to be able to do that.

Have more questions about how to get started? Get in touch with the Aspen Mesh team or learn about our Professional Services and 24/7 Expert Support.

Recent related content in the Service Mesh Knowledge Hub: