Istio 0.8 Updates

It’s here! Istio 0.8 landed on May 31, 2018. It brought a slew of new features, stability and performance improvements, and new APIs. I thought it would be helpful to share some of our favorites and experience with them.

Pilot Scalability

The first big change we are excited about in 0.8  is related to pilot scalability. Before 0.8, whenever we used Istio in clusters with more than a dozen services and more than 40-50 pods we started seeing catastrophically bad pilot performance. The earliest bug we were aware of around this was #2728 but there’s been a bunch of different manifestations. The root cause of the pain is computing the new pilot configuration when something in the cluster changes (like a new service, or a pod going away), combined with the fact that every sidecar needed to poll pilot for changes. The new alphav3 config APIs were added to pilot using the new Envoy v2 config APIs which uses a push model (documented here and here).  And that work has paid off!  

We were really happy when we deployed 0.8 in our test cluster and scaled services up and down, nuked and paved large apps, did everything we can think of. If you previously tried Istio and got a bit of a queasy feeling around behaviors when you grew your service mesh beyond an example app, now might be the time to give it another shot. (Istio 1.0 isn’t too far off, either).

Sidecar Injection

Istio is built on the sidecar model - every pod in the mesh has a dataplane proxy running right alongside it to add the service mesh smarts. All traffic to and from the application container goes through the sidecar first. The application container should be unaware - the less the application has to be coupled to the sidecar, the better.

But something has to make sure the sidecar (and its friend, the init container) get started alongside the app. Previous versions of Istio let you inject manually or at the time that the Kubernetes Deployment was created. The drawback of Deployment-time injection comes when you want to upgrade your service mesh to run a newer version of the sidecar.  You either have to patch the Deployments in your Kubernetes cluster (“manually” injecting the upgrade), or delete and recreate them.

With 0.8, we can use a MutatingWebhook Pod Admission Controller - although the title is a mouthful, the end result is a pretty neat example of Kubernetes extensibility. Whenever Kubernetes is about to create a pod, it lets its pod admission controllers take a look and decide to allow, reject or allow-with-changes that pod. Every (properly authenticated and configured) admission controller gets a shot.  Istio provides one of these admission controllers that injects the current sidecar and init container. (It’s called a MutatingWebhook because it can mutate the pod to add the sidecar instead of only accept/reject, and it runs in some other service reachable via webhook aside from the Kubernetes API server.)

We like this because it makes it easier for us to upgrade our service mesh. We’re happiest if all of our pods are running the most current version of the sidecar and that version correlates to the release of the Istio control plane version we’ve deployed.  We don’t like to think about different pods using different (older) sidecars if we don’t have to. Now, once we install an upgraded control plane, we trigger some sort of rolling upgrade on the pods behind each service and they pick up the new sidecar.

There are multiple ways to say “Hey Istio, please inject a sidecar into this” or “Hey Istio, please leave this alone”. We use namespace annotations to turn it on and then pod metadata annotations to disable it for particular pods if needed.

v1alpha3 APIs and composability

Finally, a bit about new APIs. Istio 0.8 introduces a bunch of new Kubernetes resources for configuration. Before, we had what are called “v1alpha1” resources like RouteRules. Now, we have “v1alpha3” resources like DestinationPolicies and VirtualServices. 0.8 supports both v1alpha1 and v1alpha3 resources as a migration point from v1alpha1 to v1alpha3. The future is v1alpha3 config resources so we should all be porting any of our existing config.

We found this process to be not too difficult for our existing configs. Minor note - we all think of incoming traffic first but don’t forget to also port Egress config to ServiceEntries or opt-out IP ranges. Personally, even though it can be a bit surprising at first to have to configure external access, we really like the control and visibility we get from managing Egresses with ServiceEntries for production environments.

One of the big changes in the new v1alpha3 APIs is that you define the routing config for a service all in one place like the VirtualService custom resource.  Before, you could define this config in multiple RouteRules that were ordered by precedence. A downside of the multiple RouteRules approach is that if you want to model what’s going to happen in your head, you’ve got to read and sort different RouteRule resources. The new approach is definitely simpler, but the tradeoff for that simplicity is some difficulty if you actually want your config to come from multiple resources.

We think this kind of thing can happen when you have multiple teams contributing to a larger service mesh and the apps on it (think a Platform Ops team contributing “base” policy, a Platform Security team contributing some policy on top, and finally an individual app team fine-tuning their particular routes). We’re not sure what the solution is, but my colleague Neeraj started the conversation on the istio-dev list and we’re looking forward to seeing where it goes in the future.

Better external service support

It used to be the case that if you wanted a service in the mesh to communicate with a TLS service outside the mesh, you had to modify your service to speak http over port 443, so that istio could route it correctly. Now that istio can use SNI to route traffic, you can leave your service alone, and configure istio to allow it to communicate with that external service by hostname. Since this is TLS passthrough, you don’t get L7 visibility of the egress traffic, but since you don’t need to modify your service, it allows you to add services to the mesh that you might not have been able to before.

We see great potential with the added features and increased stability in 0.8. We’re looking forward to seeing what is in store with 1.0 and are excited to see how teams use Istio now that it seems ready for production deployments.


Tracing and Metrics: Getting the Most Out of Istio

Are you considering or using a service mesh to help manage your microservices infrastructure? If so, here are some basics on how a service mesh can help, the different architectural options, and tips and tricks on using some key CNCF tools that integrate well with Istio to get the most out of it.

The beauty of a service mesh is that it bundles so many capabilities together, freeing engineering teams from having to spend inordinate amounts of time managing microservices architectures. Kubernetes has solved many build and deploy challenges, but it is still time consuming and difficult to ensure reliability and security at runtime. A service mesh handles the difficult, error-prone parts of cross-service communication such as latency-aware load balancing, connection pooling, service-to-service encryption, instrumentation, and request-level routing.

Once you have decided a service mesh makes sense to help manage your microservices, the next step is deciding what service mesh to use. There are several architectural options, from the earliest model of a library approach, the node agent architecture, and the model which seems to be gaining the most traction – the sidecar model. We have also seen an evolution from data plane proxies like Envoy, to service meshes such as Istio which provide distributed control and data planes. We're active users of Istio, and believers in the sidecar architecture striking the right balance between a robust set of features and a lightweight footprint, so let’s take a look at how to get the most out of tracing and metrics with Istio.

Tracing

One of the capabilities Istio provides is distributed tracing. Tracing provides service dependency analysis for different microservices and it provides tracking for requests as they are traced through multiple microservices. It’s also a great way to identify performance bottlenecks and zoom into a particular request to define things like which microservice contributed to the latency of a request or which service created an error.

We use and recommend Jaeger for tracing as it has several advantages:

  • OpenTracing compatible API
  • Flexible & scalable architecture
  • Multiple storage backends
  • Advanced sampling
  • Accepts Zipkin spans
  • Great UI
  • CNCF project and active OS community

Metrics

Another powerful thing you gain with Istio is the ability to collect metrics. Metrics are key to understanding historically what has happened in your applications, and when they were healthy compared to when they were not. A service mesh can gather telemetry data from across the mesh and produce consistent metrics for every hop. This makes it easier to quickly solve problems and build more resilient applications in the future.

We use and recommend Prometheus for gathering metrics for several reasons:

  • Pull model
  • Flexible query API
  • Efficient storage
  • Easy integration with Grafana
  • CNCF project and active OS community

We also use Cortex, which is a powerful tool to enhance Prometheus. Cortex provides:

  • Long term durable storage
  • Scalable Prometheus query API
  • Multi-tenancy

Check out this webinar for a deeper look into what you can do with these tools and more.


Service Mesh Security: Addressing Attack Vectors with Istio

As you break apart your monolith into microservices, you'll gain a slew of advantages such as scalability, increased uptime and better fault isolation. A downside of breaking applications apart into smaller services is that there is a greater area for attack. Additionally, all the communication that used to take place via function calls within the monolith is now exposed to the network. Adding security that addresses this must be a core consideration on your microservices journey.

One of the key benefits of Istio, the open source service mesh that Aspen Mesh is built on, is that it provides unique service mesh security and policy enforcement to microservices. An important thing to note is that while a service mesh adds several important security features, it is not the end-all-be-all for microservices security. It’s important to also consider a strategy around network security (a good read on how the network can help manage microservices), which can detect and neutralize attacks on the service mesh infrastructure itself, to ensure you’re entirely protected against today’s threats.

So let’s look at the attack vectors that Istio addresses, which include traffic control at the edge, traffic encryption within the mesh and layer-7 policy control.

Security at the Edge
Istio adds a layer of security that allows you to monitor and address compromising traffic as it enters the mesh. Istio integrates with Kubernetes as an ingress controller and takes care of load balancing for ingress. This allows you to add a level of security at the perimeter with ingress rules. You can apply monitoring around what is coming into the mesh and use route rules to manage compromising traffic at the edge.

To ensure that only authorized users are allowed in, Istio’s Role-Based Access Control (RBAC) provides flexible, customizable control of access at the namespace-level, service-level and method-level for services in the mesh. RBAC provides two distinct capabilities: the RBAC engine watches for changes on RBAC policy and fetches the updated RBAC policy if it sees any changes, and authorizes requests at runtime, by evaluating the request context against the RBAC policies, and returning the authorization result.

Encrypting Traffic
Security at the edge is a good start, but if a malicious actor gets through, Istio provides defense with mutual TLS encryption of the traffic between your services. The mesh can automatically encrypt and decrypt requests and responses, removing that burden from the application developer. It can also improve performance by prioritizing the reuse of existing, persistent connections, reducing the need for the computationally expensive creation of new ones.

Istio provides more than just client server authentication and authorization, it allows you to understand and enforce how your services are communicating and prove it cryptographically. It automates the delivery of the certificates and keys to the services, the proxies use them to encrypt the traffic (providing mutual TLS), and periodically rotates certificates to reduce exposure to compromise. You can use TLS to ensure that Istio instances can verify that they’re talking to other Istio instances to prevent man-in-the-middle attacks.

Istio makes TLS easy with Citadel, the Istio Auth controller for key management. It allows you to secure traffic over the wire and also make strong identity-based authentication and authorization for each microservice.

Policy Control and Enforcement
Istio gives you the ability to enforce policy at the application level with layer-7 level control. Applying policy at the this level is ideal for service routing, retries, circuit-breaking, and for security that operates at the application layer, such as token validation. Istio provides the ability to set up whitelists and blacklists so you can let in what you know is safe and keep out what you know isn’t.

Istio’s Mixer enables integrating extensions into the system and lets you declare policy constraints on network, or service behavior, in a standardized expression language. The benefit is that you can funnel all of those things through a common API which enables you to cache policy decisions at the edge of the service so, if the downstream policy systems start to fail, the network stays up.

Istio addresses some key concerns that arise with microservices. You can make sure that only the services that are supposed to talk to each other are talking to each other. You can encrypt those communications to secure against attacks that can occur when those services interact, and you can apply application-wide policy. While there are other, manual, ways to accomplish much of this, the beauty of a mesh is that is brings several capabilities together and lets you apply them in a manner that is scalable.

At Aspen Mesh, we’re working on some new capabilities to help you get the most out of the security features in Istio. We’ll be posting something on that in the near future so check back in on the Aspen Mesh blog.