Good startups believe deeply that something is true about the future, and organize around it.

When we founded Aspen Mesh as a startup inside of F5, my co-founders and I believed these things about the future:

  1. App developers would accelerate their pace of innovation by modularizing and building APIs between modules packaged in containers.
  2. Kubernetes APIs would become the lingua franca for describing app and infrastructure deployments and Kubernetes would be the best platform for those APIs.
  3. The most important requirement for accelerating is to preserve control without hindering modularity, and that’s best accomplished as close to the app as possible.

We built Aspen Mesh to address item 3. If you boil down reams of pitch decks, board-of-directors updates, marketing and design docs dating back to summer of 2017, that’s it. That’s what we believe, and I still think we’re right.

Aspen Mesh is a service mesh company, and the lowest levels of our product are the open-source service mesh Istio. Istio has plenty of fans and detractors; there are plenty of legitimate gripes and more than a fair share of uncertainty and doubt (as is the case with most emerging technologies). With that in mind, I want to share why we selected Istio and Envoy for Aspen Mesh, and why we believe more strongly than ever that they’re the best foundation to build on.

 

Why a service mesh at all?

A service mesh is about connecting microservices. The acceleration we’re talking about relies on applications that are built out of small units (predominantly containers) that can be developed and owned by a single team. Stitching these units into an overall application requires APIs between them. APIs are the contract. Service Mesh measures and assists contract compliance. 

There’s more to it than reading the 12-factor app. All these microservices have to effectively communicate to actually solve a user’s problem. Communication over HTTP APIs is well supported in every language and environment so it has never been easier to get started.  However, don’t let the simplicity delude: you are now building a distributed system. 

We don’t believe the right approach is to demand deep networking and infrastructure expertise from everyone who wants to write a line of code.  You trade away the acceleration enabled by containers for an endless stream of low-level networking challenges (as much as we love that stuff, our users do not). Instead, you should preserve control by packaging all that expertise into a technology that lives as close to the application as possible. For Kubernetes-based applications, this is a common communication enhancement layer called a service mesh.

How close can you get? Today, we see users having the most success with Istio’s sidecar container model. We forecasted that in 2017, but we believe the concept (“common enhancement near the app”) will outlive the technical details.

This common layer should observe all the communication the app is making; it should secure that communication and it should handle the burdens of discovery, routing, version translation and general interoperability. The service mesh simplifies and creates uniformity: there’s one metric for “HTTP 200 OK rate”, and it’s measured, normalized and stored the same way for every app. Your app teams don’t have to write that code over and over again, and they don’t have to become experts in retry storms or circuit breakers. Your app teams are unburdened of infrastructure concerns so they can focus on the business problem that needs solving.  This is true whether they write their apps in Ruby, Python, node.js, Go, Java or anything else.

That’s what a service mesh is: a communication enhancement layer that lives as close to your microservice as possible, providing a common approach to controlling communication over APIs.

 

Why Istio?

Just because you need a service mesh to secure and connect your microservices doesn’t mean Envoy and Istio are the only choice.  There are many options in the market when it comes to service mesh, and the market still seems to be expanding rather than contracting. Even with all the choices out there, we still think Istio and Envoy are the best choice.  Here’s why.

We launched Aspen Mesh after learning some lessons with a precursor product. We took what we learned, re-evaluated some of our assumptions and reconsidered the biggest problems development teams using containers were facing. It was clear that users didn’t have a handle on managing the traffic between microservices and saw there weren’t many using microservices in earnest yet so we realized this problem would get more urgent as microservices adoption increased. 

So, in 2017 we asked what would characterize the technology that solved that problem?

We compared our own nascent work with other purpose-built meshes like Linkerd (in the 1.0 Scala-based implementation days) and Istio, and non-mesh proxies like NGINX and HAProxy. This was long before service mesh options like Consul, Maesh, Kuma and OSM existed. Here’s what we thought was important:

  • Kubernetes First: Kubernetes is the best place to position a service mesh close to your microservice. The architecture should support VMs, but it should serve Kubernetes first.
  • Sidecar “bookend” Proxy First: To truly offload responsibility to the mesh, you need a datapath element as close as possible to the client and server.
  • Kubernetes-style APIs are Key: Configuration APIs are a key cost for users.  Human engineering time is expensive. Organizations are judicious about what APIs they ask their teams to learn. We believe Kubernetes API design and mechanics got it right. If your mesh is deployed in Kubernetes, your API needs to look and feel like Kubernetes.
  • Open Source Fundamentals: Customers will want to know that they are putting sustainable and durable technology at the core of their architecture. They don’t want a technical dead-end. A vibrant open source community ensures this via public roadmaps, collaboration, public security audits and source code transparency.
  • Latency and Efficiency: These are performance keys that are more important than total throughput for modern applications.

As I look back at our documented thoughts, I see other concerns, too (p99 latency in languages with dynamic memory management, layer 7 programmability). But the above were the key items that we were willing to bet on. So it became clear that we had to palace our bet on Istio and Envoy. 

Today, most of that list seems obvious. But in 2017, Kubernetes hadn’t quite won. We were still supporting customers on Mesos and Docker Datacenter. The need for service mesh as a technology pattern was becoming more obvious, but back then Istio was novel – not mainstream. 

I’m feeling very good about our bets on Istio and Envoy. There have been growing pains to be sure. When I survey the state of these projects now, I see mature, but not stagnant, open source communities.  There’s a plethora of service mesh choices, so the pattern is established.  Moreover the continued prevalence of Istio, even with so many other choices, convinces me that we got that part right.

 

But what about…?

While Istio and Envoy are a great fit for all those bullets, there are certainly additional considerations. As with most concerns in a nascent market, some are legitimate and some are merely noise. I’d like to address some of the most common that I hear from conversations with users.

“I hear the control plane is too complex” – We hear this one often. It’s largely a remnant of past versions of Istio that have been re-architected to provide something much simpler, but there’s always more to do. We’re always trying to simplify. The two major public steps that Istio has taken to remedy this include removing standalone Mixer, and co-locating several control plane functions into a single container named istiod.

However, there’s some stuff going on behind the curtains that doesn’t get enough attention. Kubernetes makes it easy to deploy multiple containers. Personally, I suspect the root of this complaint wasn’t so much “there are four running containers when I install” but “Every time I upgrade or configure this thing, I have to know way too many details.”  And that is fixed by attention to quality and user-focus. Istio has made enormous strides in this area. 

“Too many CRDs” – We’ve never had an actual user of ours take issue with a CRD count (the set of API objects it’s possible to define). However, it’s great to minimize the number of API objects you may have to touch to get your application running. Stealing a paraphrasing of Einstein, we want to make it as simple as possible, but no simpler. The reality: Istio drastically reduced the CRD count with new telemetry integration models (from “dozens” down to 23, with only a handful involved in routine app policies). And Aspen Mesh offers a take on making it even simpler with features like SecureIngress that map CRDs to personas – each persona only needs to touch 1 custom resource to expose an app via the service mesh.

“Envoy is a resource hog” – Performance measurement is a delicate art. The first thing to check is that wherever you’re getting your info from has properly configured the system-under-measurement.  Istio provides careful advice and their own measurements here.  Expect latency additions in the single-digit-millisecond range, knowing that you can opt parts of your application out that can’t tolerate even that. Also remember that Envoy is doing work, so some CPU and memory consumption should be considered a shift or offload rather than an addition. Most recent versions of Istio do not have significantly more overhead than other service meshes, but Istio does provide twice as many feature, while also being available in or integrating with many more tools and products in the market. 

“Istio is only for really complicated apps” – Sure. Don’t use Istio if you are only concerned with a single cluster and want to offload one thing to the service mesh. People move to Kubernetes specifically because they want to run several different things. If you’ve got a Money-Making-Monolith, it makes sense to leave it right where it is in a lot of cases. There are also situations where ingress or an API gateway is all you need. But if you’ve got multiple apps, multiple clusters or multiple app teams then Kubernetes is a great fit, and so is a service mesh, especially as you start to run things at greater scale.

In scenarios where you need a service mesh, it makes sense to use the service mesh that gives you a full suite of features. A nice thing about Istio is you can consume it piecemeal – it does not have to be implemented all at once. So you only need mTLS and tracing now? Perfect. You can add mTLS and tracing now and have the option to add metrics, canary, traffic shifting, ingress, RBAC, etc. when you need it.

We’re excited to be on the Istio journey and look forward to continuing to work with the open source community and project to continue advancing service mesh adoption and use cases. If you have any particular question I didn’t cover, feel free to reach out to me at @notthatjenkins. And I’m always happy to chat about the best way to get started on or continue with service mesh implementation.