Doubling Down On Istio

Good startups believe deeply that something is true about the future, and organize around it.

When we founded Aspen Mesh as a startup inside of F5, my co-founders and I believed these things about the future:

  1. App developers would accelerate their pace of innovation by modularizing and building APIs between modules packaged in containers.
  2. Kubernetes APIs would become the lingua franca for describing app and infrastructure deployments and Kubernetes would be the best platform for those APIs.
  3. The most important requirement for accelerating is to preserve control without hindering modularity, and that’s best accomplished as close to the app as possible.

We built Aspen Mesh to address item 3. If you boil down reams of pitch decks, board-of-directors updates, marketing and design docs dating back to summer of 2017, that's it. That's what we believe, and I still think we're right.

Aspen Mesh is a service mesh company, and the lowest levels of our product are the open-source service mesh Istio. Istio has plenty of fans and detractors; there are plenty of legitimate gripes and more than a fair share of uncertainty and doubt (as is the case with most emerging technologies). With that in mind, I want to share why we selected Istio and Envoy for Aspen Mesh, and why we believe more strongly than ever that they're the best foundation to build on.

 

Why a service mesh at all?

A service mesh is about connecting microservices. The acceleration we're talking about relies on applications that are built out of small units (predominantly containers) that can be developed and owned by a single team. Stitching these units into an overall application requires APIs between them. APIs are the contract. Service Mesh measures and assists contract compliance. 

There's more to it than reading the 12-factor app. All these microservices have to effectively communicate to actually solve a user's problem. Communication over HTTP APIs is well supported in every language and environment so it has never been easier to get started.  However, don't let the simplicity delude: you are now building a distributed system. 

We don't believe the right approach is to demand deep networking and infrastructure expertise from everyone who wants to write a line of code.  You trade away the acceleration enabled by containers for an endless stream of low-level networking challenges (as much as we love that stuff, our users do not). Instead, you should preserve control by packaging all that expertise into a technology that lives as close to the application as possible. For Kubernetes-based applications, this is a common communication enhancement layer called a service mesh.

How close can you get? Today, we see users having the most success with Istio's sidecar container model. We forecasted that in 2017, but we believe the concept ("common enhancement near the app") will outlive the technical details.

This common layer should observe all the communication the app is making; it should secure that communication and it should handle the burdens of discovery, routing, version translation and general interoperability. The service mesh simplifies and creates uniformity: there's one metric for "HTTP 200 OK rate", and it's measured, normalized and stored the same way for every app. Your app teams don't have to write that code over and over again, and they don't have to become experts in retry storms or circuit breakers. Your app teams are unburdened of infrastructure concerns so they can focus on the business problem that needs solving.  This is true whether they write their apps in Ruby, Python, node.js, Go, Java or anything else.

That's what a service mesh is: a communication enhancement layer that lives as close to your microservice as possible, providing a common approach to controlling communication over APIs.

 

Why Istio?

Just because you need a service mesh to secure and connect your microservices doesn't mean Envoy and Istio are the only choice.  There are many options in the market when it comes to service mesh, and the market still seems to be expanding rather than contracting. Even with all the choices out there, we still think Istio and Envoy are the best choice.  Here's why.

We launched Aspen Mesh after learning some lessons with a precursor product. We took what we learned, re-evaluated some of our assumptions and reconsidered the biggest problems development teams using containers were facing. It was clear that users didn't have a handle on managing the traffic between microservices and saw there weren't many using microservices in earnest yet so we realized this problem would get more urgent as microservices adoption increased. 

So, in 2017 we asked what would characterize the technology that solved that problem?

We compared our own nascent work with other purpose-built meshes like Linkerd (in the 1.0 Scala-based implementation days) and Istio, and non-mesh proxies like NGINX and HAProxy. This was long before service mesh options like Consul, Maesh, Kuma and OSM existed. Here's what we thought was important:

  • Kubernetes First: Kubernetes is the best place to position a service mesh close to your microservice. The architecture should support VMs, but it should serve Kubernetes first.
  • Sidecar "bookend" Proxy First: To truly offload responsibility to the mesh, you need a datapath element as close as possible to the client and server.
  • Kubernetes-style APIs are Key: Configuration APIs are a key cost for users.  Human engineering time is expensive. Organizations are judicious about what APIs they ask their teams to learn. We believe Kubernetes API design and mechanics got it right. If your mesh is deployed in Kubernetes, your API needs to look and feel like Kubernetes.
  • Open Source Fundamentals: Customers will want to know that they are putting sustainable and durable technology at the core of their architecture. They don't want a technical dead-end. A vibrant open source community ensures this via public roadmaps, collaboration, public security audits and source code transparency.
  • Latency and Efficiency: These are performance keys that are more important than total throughput for modern applications.

As I look back at our documented thoughts, I see other concerns, too (p99 latency in languages with dynamic memory management, layer 7 programmability). But the above were the key items that we were willing to bet on. So it became clear that we had to palace our bet on Istio and Envoy. 

Today, most of that list seems obvious. But in 2017, Kubernetes hadn’t quite won. We were still supporting customers on Mesos and Docker Datacenter. The need for service mesh as a technology pattern was becoming more obvious, but back then Istio was novel - not mainstream. 

I'm feeling very good about our bets on Istio and Envoy. There have been growing pains to be sure. When I survey the state of these projects now, I see mature, but not stagnant, open source communities.  There's a plethora of service mesh choices, so the pattern is established.  Moreover the continued prevalence of Istio, even with so many other choices, convinces me that we got that part right.

 

But what about...?

While Istio and Envoy are a great fit for all those bullets, there are certainly additional considerations. As with most concerns in a nascent market, some are legitimate and some are merely noise. I'd like to address some of the most common that I hear from conversations with users.

"I hear the control plane is too complex" - We hear this one often. It’s largely a remnant of past versions of Istio that have been re-architected to provide something much simpler, but there's always more to do. We're always trying to simplify. The two major public steps that Istio has taken to remedy this include removing standalone Mixer, and co-locating several control plane functions into a single container named istiod.

However, there's some stuff going on behind the curtains that doesn't get enough attention. Kubernetes makes it easy to deploy multiple containers. Personally, I suspect the root of this complaint wasn't so much "there are four running containers when I install" but "Every time I upgrade or configure this thing, I have to know way too many details."  And that is fixed by attention to quality and user-focus. Istio has made enormous strides in this area. 

"Too many CRDs" - We've never had an actual user of ours take issue with a CRD count (the set of API objects it's possible to define). However, it's great to minimize the number of API objects you may have to touch to get your application running. Stealing a paraphrasing of Einstein, we want to make it as simple as possible, but no simpler. The reality: Istio drastically reduced the CRD count with new telemetry integration models (from "dozens" down to 23, with only a handful involved in routine app policies). And Aspen Mesh offers a take on making it even simpler with features like SecureIngress that map CRDs to personas - each persona only needs to touch 1 custom resource to expose an app via the service mesh.

"Envoy is a resource hog" - Performance measurement is a delicate art. The first thing to check is that wherever you're getting your info from has properly configured the system-under-measurement.  Istio provides careful advice and their own measurements here.  Expect latency additions in the single-digit-millisecond range, knowing that you can opt parts of your application out that can't tolerate even that. Also remember that Envoy is doing work, so some CPU and memory consumption should be considered a shift or offload rather than an addition. Most recent versions of Istio do not have significantly more overhead than other service meshes, but Istio does provide twice as many feature, while also being available in or integrating with many more tools and products in the market. 

"Istio is only for really complicated apps” - Sure. Don’t use Istio if you are only concerned with a single cluster and want to offload one thing to the service mesh. People move to Kubernetes specifically because they want to run several different things. If you've got a Money-Making-Monolith, it makes sense to leave it right where it is in a lot of cases. There are also situations where ingress or an API gateway is all you need. But if you've got multiple apps, multiple clusters or multiple app teams then Kubernetes is a great fit, and so is a service mesh, especially as you start to run things at greater scale.

In scenarios where you need a service mesh, it makes sense to use the service mesh that gives you a full suite of features. A nice thing about Istio is you can consume it piecemeal - it does not have to be implemented all at once. So you only need mTLS and tracing now? Perfect. You can add mTLS and tracing now and have the option to add metrics, canary, traffic shifting, ingress, RBAC, etc. when you need it.

We’re excited to be on the Istio journey and look forward to continuing to work with the open source community and project to continue advancing service mesh adoption and use cases. If you have any particular question I didn’t cover, feel free to reach out to me at @notthatjenkins. And I'm always happy to chat about the best way to get started on or continue with service mesh implementation. 


Steering The Future Of Istio

I’m honored to have been chosen by the Istio community to serve on the Istio Steering Committee along with Christian Posta, Zack Butcher and Zhonghu Xu. I have been fortunate to contribute to the Istio project for nearly three years and am excited by the huge strides the project has made in solving key challenges that organizations face as they shift to cloud-native architecture. 

Maybe what’s most exciting is the future direction of the project. The core Istio community realizes and advocates that innovation in Open Source doesn't stop with technology - it’s just the starting point. New and innovative ways of growing the community include making contributions easier, Working Group meetings more accessible and community meetings an open platform for end users to give their feedback. As a member of the steering committee, one of my main goals will be to make it easier for a diverse group of people to more easily contribute to the project.

Sharing my personal journey with Istio, when I started contributing to Istio, I found it intimidating to present rough ideas or proposals in an open Networking WG meeting filled with experts and leaders from Google & IBM (even though they were very welcoming). I understand how difficult it can be to get started on contributing to a new community, so I want to ensure the Working Group and community meetings are a place for end users and new contributors to share ideas openly, and also to learn from industry experts. I will focus on increasing participation from diverse groups, through working to make Istio the most welcoming community possible. In this vein, it will be important for the Steering Committee to further define and enforce a code of conduct creating a safe place for all contributors.

The Istio community’s effort towards increasing open governance by ensuring no single organization has control over the future of the project has certainly been a step in the right direction with the new makeup of the steering committee. I look forward to continuing work in this area to make Istio the most open project it can be. 

Outside of code contributions, marketing and brand identity are critically important aspects of any open source project. It will be important to encourage contributions from marketing and business leaders to ensure we recognize non-technical contributions. Addressing this is less straightforward than encouraging and crediting code commits, but a diverse vendor neutral marketing team in Open Source can create powerful ways to reach users and drive adoption, which is critical to the success of any open source project. Recent user empathy sessions and user survey forms are a great starting point, but our ability to put these learning into actions and adapt as a community will be a key driver in growing project participation.

Last, but definitely not least, I’m keen to leverage my experience and feedback from years of work with Aspen Mesh customers and broad enterprise experience to make Istio a more robust and production-ready project. 

In this vein, my fellow Aspen Mesher Jacob Delgado has worked tirelessly for many months contributing to Istio. As a result of his contributions, he has been named a co-lead for the Istio Product Security Working Group. Jacob has been instrumental in championing security best practices for the project and has also helped responsibly remediate several CVEs this year. I’m excited to see more contributors like Jacob make significant improvements to the project.

I'm humbled by the support of the community members who voted in the steering elections and chose such a talented team to shepherd Istio forward. I look forward to working with all the existing, and hopefully many new, members of the Istio community! You can always reach out to me through email, Twitter or Istio Slack for any community, technical or governance matter, or if you just want to chat about a great idea you have.


3 Ways Service Mesh Helps DevOps

3 Ways Service Mesh Helps DevOps Teams

How exactly can a service mesh help DevOps teams? Our three co-founders were each recently interviewed by The New Stack Maker’s podcast, where they each addressed how service mesh can help DevOps teams. Below are some key takeaways that we hope will be useful for your team. 

1. Istio Boosts Engineering Efficiency 

In the first of this three-part podcast series, Neeraj Poddar, Aspen Mesh’s Chief Architect and Dan Berg, IBM Cloud’s Distinguished Engineer, discussed how the core capabilities of Istio can make engineering teams more efficient.

Microservices have provided new benefits for organizations, including better security and increased uptime, but on the flip side, there is also added infrastructure complexity. Service meshes like Istio have emerged as a way to provide better management of this complexity at scale. The core capabilities of the Istio service mesh—connection, security, control, and observability—help make engineering teams more efficient in many ways, and especially when it comes to running multicluster applications. “It’s a natural evolution to fit where we are today with cloud native applications based on containers,” Berg said. 

Providers like Aspen Mesh also play a role in helping DevOps teams take advantage of Istio’s traffic management, security, and general networking capabilities, said Berg. “Generally speaking, there are traffic management capabilities and things like that a developer would use, because you’re defining your routes and characteristics specific to your application, as well as the rollout of your deployment,” Berg said.

The future of Istio in terms of how it builds upon running multicluster applications on Kubernetes should include evolving to “talk to the language of applications,” Poddar said. “That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information and we are correlating it,” Poddar said. “We’re still very early, as people are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way is scary — but we will get there.”

Listen to the podcast here.

 

2. The Importance of Knowing When You Need a Service Mesh 

In the second part of the series, Aspen Mesh’s CTO, Andrew Jenkins, and Tetrate’s Founding Engineer, Zack Butcher, talked about how service mesh is the gateway to cloud migration, and when you do—or don’t—need a service mesh.

A service mesh helps organizations migrate to cloud native environments by bridging the management gap between on-premises data center deployments to containerized-cloud environments. Once implemented, a service mesh relieves the complexity of this process. And for many DevOps team members, the switch to a cloud native environment and Kubernetes cannot be done without a service mesh.

In a typical environment split between on-premises servers and multicloud deployments, a service mesh provides the “common substrate,” by enabling “communication of those components that need to communicate across these different environments,” Butcher said.

There are also some cases where a service mesh may not be needed for DevOps. “I don’t think it’s honest to say, ‘hey, everybody absolutely must use this new thing,’” Jenkins said. “There are actually problems where you don’t need Kubernetes and you may not need containers at all or if you look at serverless, for example.”

As organizations consider which technologies to adopt in order to meet their software development and deployment goals, there are many tools and solutions to choose from. Ultimately, organizations are turning to service meshes as an answer for “not just a deployment problem,” but as a way to “integrate all the pieces together” during a cloud native journey, explained Jenkins.

Listen to the podcast here.

 

3. Service Mesh Amplifies Business Value 

Shawn Wormke, Aspen Mesh VP and General Manager, and Tracy Miranda, CloudBees Director of Open Source Community, met with the TNS team to discuss how exactly a service mesh can amplify business value for organizations in the third and final installment of this three-part podcast series.

Service meshes are increasingly providing DevOps teams with new ways to gain observability into the events that cause application deployment and management problems. Ideally, a service mesh should also help DevOps teams determine who should take the appropriate actions.

“What we’ve seen with our customers is they want to move [the maintenance work] down underneath the application, and let the application owners really focus on business-value code,” said Wormke. “They also want to let the operations team that is the ‘Ops’ part of DevOps really work on providing them the tooling and the common infrastructure it takes to run those things in production in a large enterprise environment.”

Service meshes offer powerful capabilities that teams can exploit, once they get past the learning curve. “We just need an easy way to get [service meshes] into folks’ hands and help them steer clear of the pitfalls so that they can get to all the real magic that you can start to do once you’ve got this orchestration and all these things connected,” said Miranda. “You can start to do pretty clever things.”

Listen to the podcast here.