It’s here! Istio 0.8 landed on May 31, 2018. It brought a slew of new features, stability and performance improvements, and new APIs. I thought it would be helpful to share some of our favorites and experience with them.

Pilot Scalability

The first big change we are excited about in 0.8  is related to pilot scalability. Before 0.8, whenever we used Istio in clusters with more than a dozen services and more than 40-50 pods we started seeing catastrophically bad pilot performance. The earliest bug we were aware of around this was #2728 but there’s been a bunch of different manifestations. The root cause of the pain is computing the new pilot configuration when something in the cluster changes (like a new service, or a pod going away), combined with the fact that every sidecar needed to poll pilot for changes. The new alphav3 config APIs were added to pilot using the new Envoy v2 config APIs which uses a push model (documented here and here).  And that work has paid off!  

We were really happy when we deployed 0.8 in our test cluster and scaled services up and down, nuked and paved large apps, did everything we can think of. If you previously tried Istio and got a bit of a queasy feeling around behaviors when you grew your service mesh beyond an example app, now might be the time to give it another shot. (Istio 1.0 isn’t too far off, either).

Sidecar Injection

Istio is built on the sidecar model – every pod in the mesh has a dataplane proxy running right alongside it to add the service mesh smarts. All traffic to and from the application container goes through the sidecar first. The application container should be unaware – the less the application has to be coupled to the sidecar, the better.

But something has to make sure the sidecar (and its friend, the init container) get started alongside the app. Previous versions of Istio let you inject manually or at the time that the Kubernetes Deployment was created. The drawback of Deployment-time injection comes when you want to upgrade your service mesh to run a newer version of the sidecar.  You either have to patch the Deployments in your Kubernetes cluster (“manually” injecting the upgrade), or delete and recreate them.

With 0.8, we can use a MutatingWebhook Pod Admission Controller – although the title is a mouthful, the end result is a pretty neat example of Kubernetes extensibility. Whenever Kubernetes is about to create a pod, it lets its pod admission controllers take a look and decide to allow, reject or allow-with-changes that pod. Every (properly authenticated and configured) admission controller gets a shot.  Istio provides one of these admission controllers that injects the current sidecar and init container. (It’s called a MutatingWebhook because it can mutate the pod to add the sidecar instead of only accept/reject, and it runs in some other service reachable via webhook aside from the Kubernetes API server.)

We like this because it makes it easier for us to upgrade our service mesh. We’re happiest if all of our pods are running the most current version of the sidecar and that version correlates to the release of the Istio control plane version we’ve deployed.  We don’t like to think about different pods using different (older) sidecars if we don’t have to. Now, once we install an upgraded control plane, we trigger some sort of rolling upgrade on the pods behind each service and they pick up the new sidecar.

There are multiple ways to say “Hey Istio, please inject a sidecar into this” or “Hey Istio, please leave this alone”. We use namespace annotations to turn it on and then pod metadata annotations to disable it for particular pods if needed.

v1alpha3 APIs and composability

Finally, a bit about new APIs. Istio 0.8 introduces a bunch of new Kubernetes resources for configuration. Before, we had what are called “v1alpha1” resources like RouteRules. Now, we have “v1alpha3” resources like DestinationPolicies and VirtualServices. 0.8 supports both v1alpha1 and v1alpha3 resources as a migration point from v1alpha1 to v1alpha3. The future is v1alpha3 config resources so we should all be porting any of our existing config.

We found this process to be not too difficult for our existing configs. Minor note – we all think of incoming traffic first but don’t forget to also port Egress config to ServiceEntries or opt-out IP ranges. Personally, even though it can be a bit surprising at first to have to configure external access, we really like the control and visibility we get from managing Egresses with ServiceEntries for production environments.

One of the big changes in the new v1alpha3 APIs is that you define the routing config for a service all in one place like the VirtualService custom resource.  Before, you could define this config in multiple RouteRules that were ordered by precedence. A downside of the multiple RouteRules approach is that if you want to model what’s going to happen in your head, you’ve got to read and sort different RouteRule resources. The new approach is definitely simpler, but the tradeoff for that simplicity is some difficulty if you actually want your config to come from multiple resources.

We think this kind of thing can happen when you have multiple teams contributing to a larger service mesh and the apps on it (think a Platform Ops team contributing “base” policy, a Platform Security team contributing some policy on top, and finally an individual app team fine-tuning their particular routes). We’re not sure what the solution is, but my colleague Neeraj started the conversation on the istio-dev list and we’re looking forward to seeing where it goes in the future.

Better external service support

It used to be the case that if you wanted a service in the mesh to communicate with a TLS service outside the mesh, you had to modify your service to speak http over port 443, so that istio could route it correctly. Now that istio can use SNI to route traffic, you can leave your service alone, and configure istio to allow it to communicate with that external service by hostname. Since this is TLS passthrough, you don’t get L7 visibility of the egress traffic, but since you don’t need to modify your service, it allows you to add services to the mesh that you might not have been able to before.

We see great potential with the added features and increased stability in 0.8. We’re looking forward to seeing what is in store with 1.0 and are excited to see how teams use Istio now that it seems ready for production deployments.