Get a complimentary health check of your os istio graphic

Get a Health Check Report of your Istio to see if everything's configured and optimized.

How do you know your Open Source Istio is operating at its full potential? At Aspen Mesh, we focus on optimizing Istio-based service mesh for our customers (service mesh is all we do).

We talk to companies every day about their OS Istio, and the most common question we get is, “How do we know we’ve got everything in our Istio implementation working correctly?” Whether you’re in a pre-production environment, have Istio deployed in a portion of your network, or network-wide, there's often a fear something’s not configured correctly or there's a potential problem lurking that you don’t have the insight to head-off. Just as importantly, we're asked if there is enhanced Istio functionality to leverage that can drive better performance.

At Aspen Mesh the first thing we do for a new customer is a 360-degree health check of their Istio implementation. It’s a lot like a 100-point diagnostic inspection for your car – a way to identify what’s working fine, where there are potential problems, and get recommendations from an expert about what’s critical to address immediately.

That got us thinking, we should give everyone this level of insight into their Istio implementation.

Aspen Mesh Now Offers a Complimentary OS Istio Health Check Report. This evaluation provides insight across all key areas, identifies critical issues, directs you to best practices, and recommends next steps. You receive an assessment of your Istio by our Istio experts. This is the same evaluation we conduct for every new Aspen Mesh customer running Istio.

A few things that are covered in the Report:

  • Platform: Ensure a stable foundation for smooth version upgrades.
  • Security: ID security risks & apply best practices.
  • Ingress/Egress: Know you’re following best practices.
  • Application Policy inspection
  • Recommendations about where to optimize your performance.
  • Steps to take to go live with confidence.

You Receive your Report After it is Complete

Our Istio expert will review the report with you and recommend remediation steps for critical items discovered – and answer any questions you have. There's no obligation and the Report typically takes about 2 business days. After the review, we give you with a copy of your report. If you want to learn how we work to tackle any Istio problem you have and optimize an Istio environment, we can also share how to take advantage of Aspen Mesh's array of customized Services and Aspen Mesh 24/7 white glove Expert Support for OS Istio.

Where we get the data about your Istio to build your Report
The Aspen Mesh Istio Inspection Report analyzes your Istio system for common misconfigurations and vulnerabilities.

The Report is done in 3 easy steps:

  1. You run the Aspen Mesh Data Collector tool on a workstation with your Kubernetes context configured. This generates a compressed file with the data collected from your Istio installation.
  2. You upload the compressed data file to the Aspen Mesh site.
  3. Aspen Mesh engineers analyze the data collected and build your customer report that details all of our findings.

The Aspen Mesh Data Collector collects the following data:

  • Kubernetes, Istio, and Envoy versions
  • Node topology (number of nodes, node size)
  • Objects installed in your cluster (Kubernetes and Istio objects)
  • Kubernetes events

Note that the Aspen Mesh Data Collector does not collect any potentially sensitive data such as secrets, certificates, or logs. All data that is collected is securely stored and accessed only by Aspen Mesh. Get in touch if you have questions about the process --  I can send you a link to our Data Collector tool and share how we gather and analyze your data to provide a comprehensive assessment. Just send me a note and I'm happy to connect.

-Steven Cheng, Sr. Solutions Engineer at Aspen Mesh


top 5 istio contributors graphic

Aspen Mesh Leads the Way for a Secure Open Source Istio

Here at Aspen Mesh, we entrenched ourselves in the Istio project not long after its start. Recognizing Istio's potential early on, we committed to building our entire company with Istio at its core. From the early days of the project, Aspen Mesh took an active role in Istio -- we've been part of the community since Fall of 2017. Among our many firsts, Aspen Mesh was the first non-founding company to have someone on the Technical Oversight Committee (TOC) and have a release manager role when we helped manage the release of Istio v1.6 in 2020.

Ensuring open source Istio continues to set the standard as the foundation for a secure enterprise-class service mesh is important to us. I hold a seat on the Istio Product Security Working Group (PSWG), where we continuously monitor and address potential Common Vulnerability and Exposures (CVEs) reports for Istio and its dependencies like the Envoy project. In fact, we helped create the PSWG in collaboration with other community leaders to ensure Istio remains a secure project with well-defined practices around responsible early disclosures and incident management.

Along with me, my colleague, Jacob Delgado has been a tremendous contributor to Istio's security and he currently leads the Product Security Working Group.

Aspen Mesh leads contribution to Open Source Istio

The efforts of Aspen 'Meshers' can be seen across Istio's architecture today, and we add features to open source Istio regularly. Some of the major features we've added include Elliptic Curve Cryptography (ECC) support, Configuration validation (istio-vet -> Istio analyzers), custom tracing tags, and Help v3 support. We are a Top 5 Istio Contributor of Pull Requests (PRs). One of our primary areas of focus is helping to shape and harden Istio's security. We have responsibly reported several critical CVEs and addressed them as part of PSWG like the Authentication Policy Bypass CVE. You can read more about how security releases and 0-day critical CVE patches are handled in Istio in this blog authored by my colleague Jacob.

Istio Security Assessment Report findings announced in 2021

The success of the Istio project and its critical use enforcing key security policies in infrastructure across a wide swath of industries was the impetus for a comprehensive security assessment that began in 2020. In order to determine whether there were any security issues in the Istio code base, a third-party security assessment of the Istio project was conducted last year that enlisted the NCC Group and sought collaboration with subject matter experts across the community.

This in-depth assessment focused on Istio’s architecture as a whole, looking at security related issues with a focus on key components like istiod (Pilot), Ingress/Egress gateways, and Istio’s overall Envoy usage as its data plane proxy for Istio version 1.6.5. Since the report, the Product Security Working Group has issued several security releases as new vulnerabilities were disclosed, along with fixes to address concerns raised in the report. A good outcome of the report is the detailed Security Best Practices Guide developed for Istio users.

We invite you to read a summary of the Istio Security Assessment Report compiled for the Istio community. I detail the key areas of the report and distill what it means for Istio users today and looking ahead. Whether you're a current open source Istio user, like keeping up on all things security, or you want a deep dive into Istio Security.

At Aspen Mesh, we build upon the security features Istio provides and address enterprise security requirements with a zero-trust based service mesh that provides security within the Kubernetes cluster, provides monitoring and alerts, and ensures highly-regulated industries maintain compliance. You can read about how we think about security in our white paper, Adopting a Zero-Trust Approach to Security for Containerized Applications.

If you'd like to talk to us about what enterprise security in a service mesh looks like, please get in touch!

-Aspen Mesh

 

istio test stats from cncf.io

photo of rocket

How Istio is Built to Boost Engineering Efficiency

How Istio is Built to Boost Engineering Efficiency

The New Stack Makers Podcast
How Istio is Built to Boost Engineering Efficiency

One of the bright points to emerge in Kubernetes management is how the core capabilities of the Istio service mesh can help make engineering teams more efficient in running multicluster applications. In this edition of The New Stack Makers podcast, The New Stack spoke with Dan Berg, distinguished engineer, IBM Cloud Kubernetes Services and Istio, and Neeraj Poddar, co-founder and chief architect, Aspen Mesh, F5 Networks. They discussed Istio’s wide reach for Kubernetes management and what we can look out for in the future. Alex Williams, founder and publisher of The New Stack, hosted this episode.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.

Alex Williams: Hey, it’s another episode of The New Stack Makers, and today the topic is Istio and engineering management. Today, I am joined for a conversation about Istio with Neeraj Poddar, co-founder and chief architect at Aspen Mesh. Hello, Neeraj, how are you?

Neeraj Poddar: I’m doing good. It’s great to be here Alex.

Alex Williams: Thank you for joining us – you’re live from Boulder. And live from Raleigh, North Carolina, is Dan Berg, Distinguished Engineer at IBM Cloud Kubernetes Service and Istio. That’s a mouthful.

Dan Berg: Yes, sir. I was I was worried there for a moment you weren’t going to be able to get Kubernetes out.

Alex Williams: You know, it’s been that way lately. Actually we’re just finishing our second edition of the eBook that we wrote first in 2017 about Kubernetes, service mesh was just beginning to be discussed there, and I was reading some articles and some of the articles were saying things like, well Istio is still in its early days and now today you’re telling me that you have more meetings than you can go to related to Istio. I don’t know what that means. What does that mean? What does that mean to you both? What does that say about Istio and what is Istio? So for those who may not be familiar with it.

Neeraj Poddar: You’re right. I mean, we have so many meetings and discussions, both asynchronous and and synchronously, that it’s great to see the community grow. And like you’re saying, from three years before to where we are now, it’s amazing, not just the interest from developers, it’s also the interest from end users, the feedback and then making the product and the whole community better. So coming to what Istio is, Istio is a open source service mesh platform for simplifying microservices communication. And in simple terms, it handles a lot of complicated pieces around microservices communicating with each other, things like enforcing policies, managing certificates, surfacing relevant telemetry so that you can understand what’s happening in your cluster. And those problems become more and more complicated as you add more microservices. So service mesh and Istio in a way is just taking that burden away from the developers and moving it into the infrastructure there. It’s basically decoupling the two things and enabling them to be successful at the same time.

Alex Williams: Now, Dan, you’ve been around a bit and you have your own experiences with APIs and how they evolved, and is this why we’re seeing this amazing interest in Istio? Because it takes the API to that next evolution? Is it the network effect on APIs that we’re seeing or is it something different that’s relevant to so many people?

Dan Berg: Well, I think it’s I think it’s a combination of a few things. And first off, thanks for calling me old for saying I’ve been around for a while.

Dan Berg: So I think it’s a combination of several different things. First and foremost, near and dear to my heart, obviously, is containers and the evolution of containers, especially as containers have been brought to the cloud and really driving more cloud native solutions, which drives distributed solutions in these clouds, which is driving more use of microservices. Microservices aren’t new. It’s just they’re being applied in a new way in the cloud environments. Because of that, there’s a lot of complexity around that and the distribution and delivery of those containers is a bit different than what we’ve seen in traditional VMs in the past, which means how you manage microdervices is the difference. I mean, you need the mechanism. You need a way to drive your DevOps processes that are GitOps-based, API, CLI driven. So what that naturally means is we need a better way of managing microservices and the microservices in your cloud. The evolution of Istio as a service mesh, which I often think of as the ability to program through an API, your network and your network policies. It’s a natural evolution to fit where we are today with cloud native applications based on containers. This is the modern way to manage your microservices.

Neeraj Poddar: The way Dan explained it – it’s a natural progression. I especially want to mention that in context of network policies, even when companies migrate from monoliths to microservices, when you are doing that migration, the same organisational policies lie and no one wants to give that up and you don’t want embed that into your applications. So this is the key missing piece which makes you migrate or even scale. So it gives you both the things wherever you are in your journey.

Alex Williams: So the migration and the scale. And a lot of it is almost comes down to the user experience, doesn’t it? I mean, Istio is very well suited to writing generic reusable software, isn’t it? And to manage these interservice communications, which relates directly to the network, doesn’t it?

Dan Berg: Yeah, in many ways it does. A big, big part of this, though, is that it removes a lot of the burden and the lockin from your application code. So you’re not changing your application to adopt and code to a certain microservices architecture or microservices programming model – that is abstracted away with the use of these sidecars, which is a pivotal control point within the application. But from a developer standpoint, what’s really nice about this is now you can declare your intent. A security officer can declare their intent – you know Neeraj was talking about with policies you can drive these declarations through Istio without having to go through and completely modify your code in order to get this level of control.

Alex Williams: Neeraj, so what’s the Aspen Mesh view on that? And I know you talk a lot about engineering management. This relates directly to engineering management in many ways, doesn’t it? And in terms of being able to take care of those so you can have the reusable software.

Neeraj Poddar: Absolutely. I mean, when I think of engineering management, I also think of engineering efficiency. And they both relate in a very interesting way where we want to make sure they always are always achieving business outcomes. So there are two or three business outcomes here that we want our engineering teams to achieve. We want to acquire more customers by creating more, solving more customer use cases, which means adding more features quickly. And that’s what Dan was saying. You can move some of those infrastructure pieces out of your application into the mesh so you can focus and create reusable software. But that’s basically software that’s unique IP to your company. You don’t have to write the code, which already has been written for everyone. The second outcome is once you have got a customer, once you have been successful acquiring them, you want to retain them. And that customer satisfaction comes from being able to run your software reliably and fix issues when you find them. And that’s where, again, service mesh and Aspen Mesh would excel, because we surface metrics, consistent telemetry and tracing. At the same time, you’re able to tie it back to an application view where you can easily pinpoint where the problem is. So you are getting benefits at a networking level, but you’re able to get an understanding of an application that is very crucial to your architecture.

Alex Williams: Dan what is the importance of the efficiencies at the networking level, the networking management level. What has been the historical challenge that Istio helps resolve? And how does the sidecar play into that? Because I’m always trying to figure out the sidecar. And I think for a lot of people, it’s a little bit confusing to try to understand. And Lynn, your colleague at IBM describes it pretty well as almost like, taking all the furniture out of the room and then placing it back in room piece by piece, I don’t know if that’s the correct way to describe it.

Dan Berg: Possibly. That’s one analogy. So a couple of different things. First off, networking is hard. Fundamentally, it is hard. It almost feels like if you’re developing for cloud, you need to have a PhD to do it properly. And in some levels, that’s true. Where things are difficult, I mean, simple networking fine, getting from point A to point B, not a problem. Even some things in Kubernetes with routing from one service A service to service B.. That’s pretty easy, right? There’s kube-dns. You can do the lookup and kube-proxy will do your routing for you. However, it’s not very intelligent, not intelligent at all. There’s little to no security built into that. Then of course the routing and load balancing is very generic. It’s just round robin and that is it. There is nothing fancy about it. So what happens when you need specific routing based on, let’s say, zone awareness or you need it based on the client and source that’s coming in? What if, what happens if you need a proper circuit breaker because the connection of your destination wasn’t available? So now where are you going to code that? How are you going to build in your retries logic and your time out logic? Do you put that in your application? Possibly. But wouldn’t it be nice if you didn’t have to? So there’s a lot of complications with the network. And I don’t even get into. What about security? Right. Your authentication and your authorization? Typically, that’s done in the application. All you need is one bad actor in that entire chain and the whole thing falls apart. So Istio and basically service meshes, modern service meshes really push that programming down into the network. And this notion of the sidecar, which is kind of popular inside, like Kubernetes-based environments, it’s basically you put another container inside the pod. Well, what’s so special about that one container in the pod? Well, with Istio sidecar, that sidecar is an Envoy proxy. And what it is doing is it’s capturing all inbound and outbound traffic into and out of the pod. So everything traverses through that proxy, which means policies can be enforced, security can be enforced, routing decisions can be programmed and enforced. That happens at the proxy. So when a container in the pod communicates out, it’s captured by the proxy first and then it does some things around it, make some decisions, and then forwards it on. The same thing on the inbound requests, it’s checking should I accept this? Am I allowed to accept this? It’s doing it’s that control point. And all of that is programmed by the Istio control plane. So that’s that’s where the developer experience comes in. You program it through YAML, you’re programming the controlling the control plane, the control plane propagates all that programming logic down into those sidecars. And that’s where the control point actually takes place. That’s the magic right there. Does that make sense? It’s kind of like a motorcycle that has a little sidecar – literally the sidecar. Put your dog in the sidecar. If you want to take your dog with you everywhere you want to go. And every time you make a decision, you ask your dog? That’s the Envoy sidecar.

Neeraj Poddar: That’s the image that comes to my mind. And maybe that’s because when I grew up in India, that was more prevalent than it is in the U.S. right now and now somebody from America is also bringing it up. But that’s exactly right in my mind. And just to add one thing to what Dan said, day one networking problems are easy, relatively easy. Networking is never easy, but relatively easy in Kubernetes. Day two, day three – it gets complicated real fast, like early on in the service mesh and Istio days there were people saying it’s just doing DNS. Why do I need it? Now no one is saying that because those companies have matured from doing day ibe problems and they are realizing, oh my God, do I need to do all of this in my application? And when are those application developers going to write real value add code then.

Alex Williams: All right. So let’s move into day two and day three, Neeraj. So who are the teams and who are managing a day two and day three? Who are these people? What are their personas and what roles do they play?

Neeraj Poddar: That’s a really interesting question. I mean, the same personas which kind of started your project or your product and were there in day one, they kind of move along in day two but some of the responsibilities maybe change and some of the new personas come on board. So an operator role or security or security persona is really important for day two. You want to harden your cluster environment. You don’t want unencrypted data flowing through. For maintainability, as an operator whether it’s a platform operator or a DevOps SRE persona, they need to have consistent metrics across the system, otherwise they don’t know what’s going on. Similarly for day two, I would say the developer, which is who is creating the software and who is creating the new application – they need to be brought into when failures happen, but they need to be consulted at the right time with the right context. So I always think of, in microservices that if you don’t have the right context, you’re basically going to just spend time in meetings trying to figure out where the failure is. And that’s where a consistent set of telemetry and a consistent set of tracing for day two and day three is super crucial. Moving to security. I mean, think about certificate management again. I’m going to show my age here, but if you have managed certificates in your application in multiple distributed manner, you know, the pain there. You have been yelled at by a security officer some time saying this is not working and go upgrade it and then you’re stuck trying to do this in a short time span. Moving forward to Istio, now that’s a configuration change or that’s an upgrade of the Istio proxy container. Because you know what? We fix OpenSSL bugs much quicker because we are looped into the OpenSSL ecosystem. So you know, day three problems and then even further. If you look at day three, you have upgrade issues. How do you reliably upgrade without breaking traffic or without dropping your customer experience? Can you do a feature activation or using progressive delivery? And these are the things we’re just talking about. But I think maybe these are day three point five or day four problems, but in the future you should be able to activate features in one region, even in a county, who cares and test it out to your customers without relying on applications. So that’s how I see. I mean, the personas are the same, but the benefits change and the responsibilities change as your organizations mature.

Dan Berg: I was just going to say, I mean, one of the one of the things that we see quite often, especially with the adoption of Istio like the developer first and foremost, would be, as Neeraj says, the day one, setting up your basic networking and routing is pretty easy. But then as your system and application grows, just understanding where those network flows go, it’s amazing how quickly it gets out of control that you really don’t know where. Once the once traffic gets into your, let’s say, your Kubernetes cluster, once it comes into the cluster, where does it go? Where does it traverse? Did you even have your timeouts set up properly? How do you even test that? Right. So going through the not even just the operational aspects, but just the testing aspects and how to do proper testing of your distributed system is very complicated from a networking standpoint and that’s where things like Istio timeouts, retries and circuit breakers really become helpful, and their fault injection. So you can actually test some of these failures. And then with Jaeger and doing the tracing, you could actually see where the traffic goes. But one of my favorites are Kiali – bringing that up and just seeing the real time network flows and seeing the latency, seeing the error codes. That is hugely beneficial because I actually get to see where the traffic went when it came in into my cluster. So lots of benefit for the developer beyond just the security role. I mean, the developer role is very critical here.

Neeraj Poddar: Absolutely, yeah. I mean, you can, I’ll put a plug in for even operators here, which is once you get used to programming via YAML or being able to change the data path the extension that we are making in the community through WASM, you get to control a critical piece of infrastructure where you have zero day things happening. You can actually change that by adding your own filter. Like we have seen that being so powerful in existing paradigms with BIG-IP or NGINX where you have a whole ecosystem right now for people writing crazy scripts for doing things, which is saving them lots of money, because you know what? You don’t get just time to change your application, but it can change the proxy which is next to it. So you’re going to see a lot of interesting things happening therefore, you know, day three day four use cases.

Alex Williams: But who’s writing the scripts? Who’s writing the YAML? Who’s doing that configuring? Because a lot of these people you know, developers are not used to doing configurations, so. So who does that work?

Neeraj Poddar: That’s a really good question, and the reason I’m hesitant is the answer is depends. Yeah. If you have a very mature developer workflow, I would expect developers to give you information about the applications and then the platform team takes over and converting it into the Istio specific, Kubernetes specific language. But most of the organizations might not be there yet, and that gives you will need some collaborative effort between application developers and operators. So, for example, I’ll give you what Aspen Mesh is trying – we are trying to make sure even if you have the right YAMLs and both the personas are writing it, thosde APIs are specific to those personas. So we have created application YAMLs, which an application developer can write. It has no information or no prior knowledge about Istio. The operators can write something specific about their requirements about networking and security again in a platform agnostic way, and then Aspen Mesh can lower it down to Istio specific configuration. So it depends on what kind of toolchain you are using. I would hope that in future, application developers are writing less and less configuration, just platform specific.

Dan Berg: And I think that basically echoes the fact that we do see multiple roles using the Istio YAML files and the configurations, but you don’t have to be an expert in all of it. Generally speaking, there are traffic management capabilities and things like that that a developer would use, because those are you’re defining your routes. You’re defining your characteristics specific to your application as well as the roll out of your deployment if you’re trying to do a canary release, for example. That’s that’s something that the developer would do or an application author would be responsible for. But when you’re talking about setting up policies for inbound or outbound access controls into the cluster, that may be a security advisor that’s responsible for defining those levels of policies and not necessarily the developer, you wouldn’t want the developer defining that level of security policies. It would be a security officer that would be doing that. So there’s room for multiple different roles. And therefore, you don’t have to be an expert in every aspect of Istio because it’s based on your role, which aspect you’re going to care about.

Alex Williams: When we get into the complexities, I think of telemetry and telemetry has traditionally been a concept I’ve heard Intel talk about. Right with infrastructure and systems. And now telemetry is being discussed as a way to be used in the software. How is telemetry managed in Istio? What is behind it? What is the architecture behind that telemetry that makes it manageable, that allows you to really be able to leverage it

Dan Berg: For the most part. So it all really starts with the Istio control plane, which is is gathering the actual metrics and provides the Prometheus endpoint that is made available that you can connect up to and scrape that information and use it. The how it gets the telemetry information, that’s really the key part of it is where does that information come from? Yeah. And if we take a step back and remember, I was talking about the sidecar, the sidecar being the point that makes those decisions, the routing decisions, the security decisions.

Alex Williams: Well, the dog, the dog, the dog,

Dan Berg: Yes the dog that is way smarter than you and making all the proper decisions telling you exactly where to go. So that’s exactly what’s happening here. Except since all traffic is coming in and out of, in and out of the pod is going through that proxy, it is asynchronously sending telemetry information, metrics about that communication flow, both inbound and outbound so we can track failure rates, it can track latency, it can track security settings. So it can send a large amount of information about that, that flow, that communication flow. And once you start collecting it up into the Istio control plane, into the telemetry endpoint and you start scraping that off and showing it in a Grafana dashboard as an example, there’s a vast amount of information. Now, once you start piecing it together, you can see going from service A to service B, which is nothing more than going from sidecar A to sidecar B, right we have secure identities. We know exactly where the traffic is is going because we have identities for everything in the system and everything that is joining the mesh is joined because it’s associated with a sidecar proxy. So it’s these little agents, these proxies that are collecting up all that information and sending it off into the Istio control plane so you can view it and see exactly what’s going on. And by the way, that is one of the most important pieces of Istio. As soon as you turn it on, join some services, you’ve got telemetry. It’s not like you have to do anything special. Telemetry starts flowing and there’s a huge amount of value. Once you see the actual information in front of you, traffic flowing, error rates, it’s hugely powerful.

Neeraj Poddar: Just to add to what Dan said here, the amount of contextual information that the sidecars add for every metric we export, it’s super important. Like I was in front of a customer recently and like Dan said there’s a wow factor that you can just add things to the mesh. And now suddenly you have so much information related to Kubernetes which tells you about the port, the services, the role of the application labels. So that’s super beneficial and all of that without changing the application. Another point here is if you’re doing this in applications there’s always inconsistencies between applications developed from one team versus applications developed with another. Second problem that I’ve always seen is it’s very hard to move to a different telemetry backend system. So for some reason, you might not want to use Prometheus and you want to use something else to model. If you tie all of that in your application you have to change all of this. So this proxy can also give you a way of switching backends, for example, in the future if you need without going through your application lifecycle. So it’s super powerful.

Alex Williams: So let’s talk about a little bit more about the teams and more about the capabilities and you know, I know that Aspen Mesh has come out with its latest release, 1.5, and you have a security APIs built into it, you’re enabling Envoy support, which is written in WebAssembly, which is interesting. We’re hearing a little bit more about WebAssembly but not much, traffic management, you know, and how how you think about traffic management. Give us a picture of 1.5 and higher kind of tracing Istio’s evolution with it.

Neeraj Poddar: Yeah. So, I mean, all Aspen Mesh releases are tied to the upstream Istio releases, so we don’t take away any of the capabilities that Istio provides. We only add capabilities we would think the organization will benefit from like a wrapper around it so that you have a better user experience. So Istio 1.5 by itself, moved from a monolithic architecture to – sorry, move from a microservices control plane to a monolithic one for operational simplification. Right. So we have that. Similarly telemetry V2, which is an evolution from the out of process mixer V1. We also provide that benefit where users don’t have to run mixer. There was a lot of resource contention where it was consuming a lot of CPU and memory and contributing to some latency and latency numbers, which didn’t make sense. So all of those benefits of these two communities working on you are getting with the Aspen Mesh release. But the key thing here is for us to provide rapid APIs like security APIs. I’ll give you a quick example. So Istio moved from 1.4 to 1.5, I think, between job-based policies to request authentication or authentication policies. We had to change the APIs because the older APIs, were not making sense after user feedback. There were some drawbacks. This is great for improvement, but for a customer now I have to rethink what I did.

Neeraj Poddar: When I have to upgrade, I have to make sure we move along with Istio users. So us providing a wrapper around it means we do the conversion for them. So that’s one way we provide some benefit to our customers. Like you said, WASM is an interesting development that’s happening in the community. I feel like as the ABI itself matures and more and the rich ecosystem develops, this is going to be a real powerful enhancement. Vendors can actually add extensions without rebuilding and having to rely on a C++ filters. Companies who have some necessity for which they don’t want to, you know, offload that building cost to vendors or open source. They can extend Envoy on on the fly themselves. This is a really huge thing. One thing I should talk about is that the Istio community is regularly changing or evolving the way they are installing Istio. You know Dan is here, he can tell you from the very beginning we have been doing helm, we have not been doing helm or we have gone to istioctl. It’s all in the zone the right way. Right. It’s because of user feedback and trying to make it even more smooth going forward. So we try to smooth out that code where, you know, Aspen Mesh customers can continue to use the tooling that they’re comfortable with. So those are the kind of things we have given in 1.5 Where our customers can still use help.

Alex Williams: When you’re thinking about – when you’re thinking about the security Dan and you’re thinking about what distinguishes Istio, what comes to mind, what, and especially when you’re thinking about multi cluster operations?

Dan Berg: One of the key aspects of Istio and one of the huge value benefits of Istio is that if you enable Istio and the services within the mesh, if you enable strict security policy, what that’s going to do is that’s going to enable automatic management of mutual TLS authentication between the services, which is in layman’s terms, allowing you to do encryption on the wire between your pods. And to do that, if you’re looking at a Kubernetes environment, if you’ve got a financial organization as a customer that you’re looking to support or any other customer that has strict encryption requirements and they’re asking, well, how are you going to encrypt on the wire? Well, in a Kubernetes environment, that’s kind of difficult unless you want to run IPsec tunnels everywhere, which has pretty nasty performance drain. Plus, that only works between the nodes and not necessarily between the, between the pods or you start moving to IPv6, which isn’t necessarily supported everywhere or even proven in all cases, but Istio literally through a configuration can enable a mutual TLS with certificate management and secure service identity. So hugely powerful. And you can visualize all of that with the tools and utilities from Istio as well. So you know exactly which traffic flows, like in Kiali you can see exactly what traffic flows are secured and which ones are not. So that’s hugely powerful. And then the whole multi cluster support, which you brought up as well, is an interesting direction. I would say it’s still in its infancy stages of managing more complex service mesh deployments. Istio has a lot of options for multi cluster. And while I think that’s powerful, I also think it’s complex. And I do believe that this is going to, where we’re going in this journey is to simplify those options, to make it easier for customers to deal with multi cluster. But one of the values of security and multi cluster ultimately is around this level of secure identities and the certificate management that you extend the boundaries of trust into multiple different clusters. So now you can start defining policies and traffic routing across clusters, which you can’t do today. Right. That’s very complex. But you start broadening and stretching that service mesh with the capabilities made afforded to you by Istio. And that’s just going to improve over time. I mean, it’s not, we’re on a journey right now of getting there and a lot of customers are starting to dip their toes in that multi cluster environment and Istio is right there with them and will be evolving. And it’s going to be a fantastic, fantastic story. I would just say it’s very early days.

Neeraj Poddar: Yeah, I was just going to echo like it’s in infancy, but it’s so exciting to see what you can do there. Like really when I think about it, multi cluster, you can think about new cases emerging from the telecom industry where the multi clusters are not just clusters in data centers, they’re at edge and far edge and you might have to do some crazy things.

Dan Berg: Yeah, well, that’s the that’s the interesting thing. I know earlier this year at IBM, we launched a new product called IBM Cloud Satellite. And that’s where if you own a service mesh, you’re going to be extremely excited with those those kind of edge scenarios. You’re broadening your mesh into areas that you’re putting clusters at. Two years ago, you would have never thought about putting a cluster in those locations. I think service mesh is going to become more and more important as we progress here with the distributed nature of the problems we’re trying to solve.

Alex Williams: Yeah, I was going to ask about the telco and 5G, and I think what you say sums it up and to be able to manage clusters at the edge, for instance, in the same way that you can and, you know, essentially, you know, in a data center environment.

Dan Berg: Well you’re also dealing with a lot more clusters, too, in these in these environments, instead of tens or even hundreds, you might be dealing with thousands and trying to program like in the old days at the application level, that’s going to be almost impossible. You need a way to distribute consistent policies, programmable policies distributed across all these clusters, and Istio provides some of the raw mechanics to make that happen. These are going to be incredible, incredibly important tools as we move into this new space.

Neeraj Poddar: I was just going to say, I mean, I always think of evolution of service mesh as is going to follow the same trajectory as evolution of ADC market that happened as and when the telcos and the big enterprises came in because of a lot of requirements of the telecom industry. Currently, the load balancers are so evolved. Similarly, service mesh will have a lot more capabilities. Think about the clusters running in far edge. They will have different resource constraints. You need a proxy which will be faster and slimmer. Some people will say that’s not possible, but we’ll have to make to do that, have to do that. So I’m just always excited when I think about these expansions. And like Dan said, these are not talking about tens or hundreds of clusters now we are talking about thousands.

Alex Williams: We’ve been doing research and and we find actually in our research that the clusters that are most predominant that we’re finding among the people we’re surveying are those of more than five thousand clusters. And that, I guess my last question for you. Is about day five, day six, day seven, and what role does observability play in this? And because it seems like what we’re talking about essentially is observability and I’m curious on how that concept is evolving for you. Now, you think about it in terms of as we move out to those, to the the days beyond, for people who are using Istio and service mesh capabilities,

Dan Berg: Obviously you need that sidecar. You need that dog next to you collecting all that information, sending it off. That that is hugely important. But once you start dealing with scale, you can’t go keep looking at that data time in and time out. Right. You’ve got to be able to centralize that information. Can you can you send all of that and centralize it into your centralized monitoring system at your enterprise level and the answer there is yes, you absolutely can. SysDig, a great partner that we work with, provides a mechanism for scraping all of the information from the Istio Prometheus endpoint, bringing that all in, and then they have native Istio support directly into that environment, which means they know about the Istio metrics and then can present that in a unified manner. So now you can start looking at common metrics across all of these clusters, all the service meshes in a central place, and start sending alerts, start building alerts, because you can’t look at five thousand clusters and X number of service meshes. It’s just too large. It’s too many. So you have to have the observability. You need to be collecting the metrics and you’ve got to be able to have the the alerts being generated from those metrics.

Neeraj Poddar: Yeah, and I think we need to go even a step beyond that, which is you’ll have information from your mesh, you’ll have information on your nodes, you’ll have information on your cloud, your GitHub, whatever. You get it all to a level where there is some advanced analytics making sense of it. There’s only so much that a user can do once they get the dreaded alert.

Neeraj Poddar: They need to do the next step, which is in this haystack of metrics and tracing and log. Can someone narrow it down to the place that I need to look, because you might get alerted on a microservice A, but it has dependencies which are other microservices, so the root cause might be 10 different levels down. So I think that’s the next day seven day eight problem we need to solve, how do we surface the information in a way where it’s presentable? For me, it’s even tying it back to the context of applications. Dan and I are both from networking. We love networking. I can talk networking all day, but I think we need to talk to the language of applications. That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information we are correlating it. So I think that’s that’s going to be the real thing – it’s still very early. People are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way. It’s scary but it will get there.

Alex Williams: Well Neeraj and Dan, thank you so much for joining us in this conversation about service mesh technologies and Istio and these days beyond where we are now. And I look forward to keeping in touch. Thank you very much.

Dan Berg: Thanks for having us.

Neeraj Poddar: Thank you.

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.



Sailing Faster with Istio

While the extraordinarily large shipping container, Ever Given, ran aground in the Suez Canal, halting a major trade route that has caused losses in the billions, our solution engineers at Aspen Mesh have been stuck diagnosing a tricky Istio and Envoy performance bottleneck on their own island for the past few weeks. Though the scale and global impacts of these two problems is quite different, it has presented an interesting way to correlate a global shipping event with the metaphorical nautical themes used by Istio. To elaborate on this theme, let’s switch from containers carrying dairy, and apparently everything else under the sun, to containers shuttling network packets.

To unlock the most from containers and microservices architecture, Istio (and Aspen Mesh) uses a sidecar proxy model. Adding sidecar proxies into your mesh provides a host of benefits, from uniform identity to security to metrics and advanced traffic routing. As Aspen Mesh customers range from large enterprises all the way to service providers, the performance impacts of adding these sidecars is as important to us as the benefits outlined above. The performance experiment that I’m going to cover in this blog is geared toward evaluating the impact of adding sidecar proxies in high throughput scenarios on the server or client, or both sides.

We have encountered workloads, especially in the service provider space, where there are high requests or transactions-per-second requirements for a particular service. Also, scaling up — i.e., adding more CPU/memory — is preferable to scaling out. We wanted to test the limits of sidecar proxies with regards to the maximum achievable throughput so that we can tune and optimize our model to meet the performance requirements of the wide variety of workloads used by our customers.

Throughput Test Setup

The test setup we used for this experiment was rather simple: a Fortio client and server running on Kubernetes on large AWS node instance types like burstable t3.2xlarge with 8 vCPUs and 32 GB of memory or dedicated m5.8xlarge instance types which have 32 vCPUs and 128 GB of memory. The test was running a single instance of the Fortio client and server pod with no resource constraints on their own dedicated nodes. The Fortio client was run in a mode to maximize throughput like this:

The above command runs the test for 60 seconds with queries per second (QPS) 0 (i.e. maximum throughput with a varying number of simultaneous parallel connections). With this setup on a t3.2xlarge machine, we were able to achieve around 100,000 QPS. Further increasing the number of parallel connections didn’t result in throughput beyond ~100K QPS, signaling a possible CPU bottleneck. Running the same experiment on an m5.8xlarge instance, we could achieve much higher throughput around 300,000 QPS or higher depending upon the parallel connection settings.

This was sufficient proof of CPU throttling. As adding more CPUs increased the QPS, we felt that we had a reasonable baseline to start evaluating the effects of adding sidecar proxies in this setup.

Adding Sidecar Proxies on Both Ends

Next, with the same setup on t3.2xlarge instances, we added Istio sidecar proxies on both Fortio client and server pods with Aspen Mesh default settings; mTLS STRICT setting, access logging enabled and the default concurrency (worker threads) of 2. With these parameters, and running the same command as before, we could only get a maximum throughput of around ~10,000 QPS.

This is a factor of 10 reduction in throughput. This was expected as we had only configured two worker threads, which were hopefully running at their maximum capacity but could not keep up with client load.

So, the logical next step for us was to increase the concurrency setting to run more worker threads to accept more connections and achieve higher throughput. In Istio and Aspen Mesh, you can set the proxy concurrency globally via the concurrency setting in proxy config under mesh config or override them via pod annotations like this:

Note that using the value “0” for concurrency configures it to use all the available cores on the machine. We increased the concurrency setting from two to four to six and saw a steady increase in maximum throughput from 10K QPS to ~15K QPS to ~20K QPS as expected. However, these numbers were still quite low (by a factor of five) as compared to the results with no sidecar proxies.

To eliminate the CPU throttling factor, we ran the same experiment on m5.8xlarge instances with even higher concurrency settings but the maximum throughput we could achieve was still around ~20,000 QPS.

This degradation was far from acceptable, so we dug into why the throughput was low even with sufficient worker threads configured on the sidecar proxies.

Peeling the Onion

To investigate this issue, we looked at the CPU utilization metrics in the server pod and noticed that the CPU utilization as a percentage of total requested CPUs was not very high. This seemed odd as we expected the proxy worker threads to be spinning as fast as possible to achieve the maximum throughput, so we needed to investigate further to understand the root cause.

To get a better understanding of low CPU utilization, we inspected the connections received by the server sidecar proxy. Envoy’s concurrency model relies on the kernel to distribute connections between the different worker threads listening on the same socket. This means that if the number of connections received at the server sidecar proxy is less than the number of worker threads, you can never fully use all CPUs.

As this investigation was purely on the server-side, we ran the above experiment again with the Fortio client pod, but this time without the sidecar proxy injected and only the Fortio server pod with the proxy injected. We found that the maximum throughput was still limited to around ~20K QPS as before, thereby hinting at issues on the server sidecar proxy.

To investigate further, we had to look at connection level metrics reported by Envoy proxy. Later in this article, we’ll see what happens to this experiment with Envoy metrics exposed. (By default, Istio and Aspen Mesh don’t expose the connection-level metrics from Envoy.)

These metrics can be enabled in Istio version 1.8 and above by following this guide and adding the appropriate pod annotations corresponding to the metrics you want to be exposed. Envoy has many low-level metrics emitted at high resolution that can easily overwhelm your metrics backend for a moderately sized cluster, so you should enable this cautiously in production environments.

Additionally, it can be quite a journey to find the right Envoy metrics to enable, so here’s what you will need to get connection-level metrics. On the server-side pod, add the following annotation:

This will enable reporting for all listeners configured by Istio, which can be a lot depending upon the number of services in your cluster, but only enable the downstream connections total counter and downstream connections active gauge metrics.

To look at these metrics, you can use your Prometheus dashboard, if it’s enabled, or port-forward to the server pod under test to port 15000 and navigate to http://localhost:15000/stats/prometheus. As there are many listeners configured by Istio, it can be tricky to find the correct one. Here’s a quick primer on how Istio sets up Envoy configuration. (You can find the complete list of Envoy listener metrics here.)

For any inbound connections to a pod from clients outside of the pod, Istio configures a virtual inbound listener at 0.0.0.0:15006, which receives all the traffic from iptables’ redirect rules. This is the only listener that’s actually configured to receive connections from the kernel, and after the connection is received, it is matched against filter chain attributes to proxy the traffic to the correct application port on localhost. This means that even though the Fortio client above is targeting port 8080, we need to look at the total and active connections for the virtual inbound listener at 0.0.0.0:15006 instead of 0.0.0.0:8080. Looking at this metric, we found that the number of active connections were close to the configured number of simultaneous connections on the Fortio client side. This invalidated our theory about the number of connections being less than worker threads.

The next step in our debugging journey was to look at the number of connections received on each worker thread. As I had alluded to earlier, Envoy relies on the kernel to distribute the accepted connections to different worker threads, and for all the worker threads to be fully utilizing the allotted CPUs, the connections also need to be fairly balanced. Luckily, Envoy has per-worker metrics for listeners that can be enabled to understand the distribution. Since these metrics are rooted at listener.<address>.<handler>.<metric name>, the regex provided in the annotation above should also expose these metrics. The per-worker metrics looked like this:

As you can see from the above image, the connections were far from being evenly distributed among the worker threads. One thread, worker 10, had 11.5K active connections as compared to some threads which had around ~1-1.5K active connections, and others were even lower. This explains the low CPU utilization numbers as most of the worker threads just didn’t have enough connections to do useful work.

In our Envoy research, we quickly stumbled upon this issue, which very nicely sums up the problem and the various efforts that have been made to fix it.

Image via Pixabay.

So, next, we went looking for a solution to fix this problem. It seemed like, for the moment, our own Ever Given was stuck as some diligent worker threads struggled to find balance. We needed an excavator to start digging.

While our intrepid team tackled the problem of scaling for high-throughput workloads by adding sidecar proxies, we encountered a bottleneck not entirely unlike what the Ever Given experienced not long ago in the Suez Canal.

Luckily, we had a few more things to try, and we were ready to take a closer look at the listener metrics.

Let There Be Equality Among Threads!

After parsing through the conversations in the issue, we found the pull request that enabled a configuration option to turn on a feature to achieve better balancing across worker threads. At this point, trying this out seemed worthwhile, so we looked at how to enable this in Istio. (Note that as part of this PR, the per-worker thread metrics were added, which was useful in diagnosing this problem.)

For all the ignoble things EnvoyFilter can do in Istio, it’s useful in situations like these to quickly try out new Envoy configuration knobs without making code changes in “istiod” or the control plane. To turn the “exact balance” feature on, we created an EnvoyFilter resource like this:

With this configuration applied and with bated breath, we ran the experiment again and looked at the per-worker thread metrics. Voila! Look at the perfectly balanced connections in the image below:

Measuring the throughput with this configuration set, we could achieve around ~80,000 QPS, which is a significant improvement over the earlier results. Looking at CPU utilization, we saw that all the CPUs were fully pegged at or near 100%. This meant that we were finally seeing the CPU throttling. At this point, by adding more CPUs and a bigger machine, we could achieve much higher numbers as expected. So far so good.

As you may recall, this experiment was purely to test the effects of server sidecar proxy, so we removed the client sidecar proxy for these tests. It was now time to measure performance with both sidecars added.

Measuring the Impacts of a Client Sidecar Proxy

With this exact balancing configuration enabled on the inbound port (server side only), we ran the experiment with sidecars on both ends. We were hoping to achieve high throughputs that could only be limited by the number of CPUs dedicated to Envoy worked threads. If only things were that simple.

We found that the maximum throughput was once again capped at around ~20K QPS.

A bit disappointing, but since we then knew about the issue of connection imbalance on the server side, we reasoned that the same could happen on the client side between the application and the sidecar proxy container on localhost. First, we enabled the following metrics on the client-side proxy:

In addition to the listener metrics, we also enabled cluster-level metrics, which emit total and active connections for any upstream cluster. We wanted to verify that the client sidecar proxy was sending a sufficient number of connections to the upstream Fortio server cluster to keep the server worker threads occupied. We found that the number of active connections mirrored the number of connections used by the Fortio client in our command. This was a good sign. Note that Envoy doesn’t report cluster-level metrics at the per-worker level, but these are all aggregated, so there’s no way for us to know how the connections were distributed on the outbound side.

Next, we inspected the listener connection statistics on the client side similar to the server side to ensure that we were not having connection imbalance issues. The outbound listeners, or the listeners set up to handle traffic originating from the application in the same pod as the sidecar proxy, are set up a bit differently in Istio as compared to the inbound side. For outbound traffic, a virtual listener “0.0.0.0:15001” is created similar to the listener on “0.0.0.0:15006,” which is the target for iptables redirect rules. Unlike the inbound side, the virtual listener hands off the connection to the more specific listener like “0.0.0.0:8080” based on the original destination address. If there are no specific matches, then the listener configuration in the virtual outbound takes effect. This can block or allow all traffic depending on your configured outbound traffic policy. In the traffic flow from the Fortio client to server, we expected the listener at “0.0.0.0:8080” to be handling connections on the client-side proxy, so we inspected connections metrics at this listener. The listener metrics looked like this:

The above image shows the connection imbalance issue between worker threads as we saw it on the server side. However, the connections on the outbound client-side proxy were only getting handled by one worker thread which explains the poor throughput QPS numbers. Having fixed this on the server-side, we applied a similar EnvoyFilter configuration with minor tweaks for context and port to address this imbalance:

Surely, applying this resource would fix our issue and we would be able to achieve high QPS with both client and server sidecar proxies with sufficient CPUs allocated to them. Well, we ran the experiment again and saw no difference in the throughput numbers. Checking the listener metrics again, we saw that even with this EnvoyFilter resource applied, only one worker thread was handling all the connections. We also tried applying the exact balance config on both virtual outbound port 15001 and outbound port 8080, but the throughput was still limited to 20K QPS.

This warranted the next round of investigations.

Original Destination Listeners, Exact Balance Issues

We went around looking in Envoy code and opened Github issues to understand why the client-side exact balance configuration was not taking effect, while the server side was working wonders. The key difference between the two listeners, other than the directionality, was that the virtual outbound listener “0.0.0.0:15001” was an original destination listener, which hands over connections to other listeners matched on the original destination address. With help from the Istio community (thanks, Yuchen Dai from Google), we found this open issue, which explains this behavior in a rather cryptic way.

Basically, the current exact balance implementation relies on connection counters per worker thread to fix the imbalance. When the original destination is enabled on the virtual outbound listener, the connection counter on the worker thread is incremented when a connection is received, but as the connection is immediately handed to the more specific listener like “0.0.0.0:8080,” it is decremented again. This quick increase and decrease in the internal count spoofs the exact balancer into thinking the balance is perfect as all these counters are always at zero. It also appears that applying the exact balance on the listener that handles the connection, “0.0.0.0:8080” in this case, but doesn’t accept the connection from the kernel has no effect due to current implementation limitations.

Fortunately, the fix for this issue is in progress, and we’ll be working with the community to get this addressed as quickly as possible. In the meantime, if you’re getting hit by these performance issues on the client side, scaling out with a lower concurrency setting is a better approach to reach higher throughput QPS numbers than scaling up with higher concurrency and worker threads. We are also working with the Istio community to provide configuration knobs for enabling exact balance in Envoy to optionally switch default settings so that everyone can benefit from our findings.

Working on this performance analysis was interesting and a challenge in its own way, like the small tractor next to the giant ship trying to make it move.

Well, maybe not exactly, but it was a learning experience for me and my team, and I’m glad we are able to share our learnings with the rest of the community as this aspect of Istio is often overlooked by the broader vendor ecosystem. We will run and publish performance numbers related to the impact of turning on various features such as mTLS, access logging and tracing in high-throughout scenarios in future blogs, so if you’re interested in this topic, subscribe to our blog to get updates or reach out to us with any questions.

Thank you Aspen Mesh team members Pawel and Bart who patiently and diligently ran various test scenarios, collected data and were uncompromising in their pursuit to get the last bit out of Istio and Aspen Mesh. It’s not surprising. After all, being part of F5, taking performance seriously is just part of our DNA. 


istiocon 9 trends

Top 9 Takeaways from IstioCon 2021

At the beginning of last year, we predicted the top three developments around service mesh in 2020 would be:

  1. A quickly growing need for service mesh
  2. Istio will be hard to beat
  3. Core service mesh use cases will emerge that will be used as models for the next wave of adopters

And we were right about all three, as evidenced by what we learned at IstioCon.

As a new community-led event, IstioCon 2021 provided the first organized opportunity for Istio’s community members to gather together on a large, worldwide scale, to present, learn and discuss the many features and benefits of the Istio service mesh. And this event was a resounding success.

With over 4,000 attendees — in its first year, and as a virtual event — IstioCon attendance exceeded expectations by multiples. The event showcased the lessons learned from running Istio in production, first-hand experiences from the Istio community, and featured maintainers from across the Istio ecosystem including Lin Sun, John Howard, Christian Posta, Neeraj Poddar, and more. With sessions presented across five days in English, as well as keynotes and sessions in Chinese, this was indeed a worldwide effort. It is well-known that the Istio community reaches far and wide, but it was fantastic to see that so many people interested in, considering, and even using Istio in production at scale were ready to show up and share.

But apart from the outstanding response of the Istio community, we were particularly excited to dig into what people are really using this service mesh for and how they’re interacting with it. So, we’ve pulled together the below curated list of top Istio trends, hot topics, and our top three list of sessions you don’t want to miss.

Top 3 Istio Service Mesh Trends to Watch

After watching each session (so you don’t have to!), we’ve distilled the top three service mesh and Istio industry takeaways that came out of IstioCon that you should keep on your radar.

1. Istio is production-ready. No longer just a shiny new object, this nascent technology has transformed over the past few years from a new infrastructure technology into the microservices management technology that people are using, now, in production and at scale at real companies. We saw insightful user story presentations from T-Mobile, Airbnb, eBay, Salesforce, FICO, and more.

2. Istio is more versatile than you thought. Did you know that Istio is being used right now by users and companies to manage everything from user-facing applications like Airbnb to behind-the-scenes infrastructure like running 5G?

3. Istio and Kubernetes have a lot in common. There are lots of similarities between Istio and Kubernetes in terms of how these technologies have developed, and how they are being adopted. It’s well known that Kubernetes is “the defacto standard for cloud native applications.” Istio is being called ”the most popular service mesh” according to the CNCF annual user survey. But more than this, the two are growing closer together in terms of the technologies themselves. We look forward to the growth of both technologies.

Top 3 Hot Topics

In addition to higher level industry trends, there were many other hot topics that surfaced as part of this conference. From security to Wasm, multicluster, integrations, policies, ORAS, and more, there is a lot going on in the service mesh marketplace that many folks may not have realized. Here are the three hot topics we’d like you to know about:

1. Mulitcluster. You can configure a single mesh to include multiple clusters. Using a multicluster deployment within a single mesh affords capabilities beyond that of a single cluster deployment, including fault isolation and fail over, location-aware routing, various control plane models, and team or project isolation. It was indeed a hot topic at IstioCon, with an entire workshop devoted to Istio Multicluster, plus two additional individual sessions and a dedicated office-hours session about multicluster.

2. Wasm. WebAssembly (Wasm) is a sandboxing technology that can be used to extend the Istio proxy (Envoy). The Proxy-Wasm sandbox API replaces Mixer as the primary extension mechanism in Istio. Over the past year, Wasm has come further to the forefront in terms of interest, as seen here by garnering two sessions plus its own office-hours session.

3. Security. Let’s face it, we’re all concerned about security, and with good reason. Istio has decided to face security challenges head on, and while not exactly a new topic, it’s one worth reiterating. The Istio Product Security Working Group had a session, plus we saw two more sessions featuring security as a headliner, and a dedicated office-hours session. 

Side note: Aspen Mesh had a tie with one another hot topic; debugging Istio. If you get a chance, check out the three recorded sessions on debugging as well.

Top 3 Sessions You Will Want to Watch On-demand

Not everyone has time to watch a conference for five days in a row. And that’s ok. There are about 77 sessions we wish you could watch, but we’ve also identified the top three we think you’ll get the most out of. Check these out:

1. Using Istio to Build the Next Generation 5G Platform. As the most-watched session at this event, we have to start here. In this session, Aspen Mesh’s Co-founder and Chief Architect Neeraj Poddar and David Lenrow, Senior Principal Cloud Security Architect at Verizon, covered what 5G is and why it matters, architecture options with Istio, platform requirements, security, and more.

2. User story from Salesforce - The Salesforce Service Mesh: Our Istio Journey. In this session, Salesforce Software Architect Pratima Nambiar talked us through their background around why they needed a service mesh, their initial implementation, Istio’s value, progressive adoption of Istio, and features they are watching and expect to adopt. 

3. User story from eBay - Istio at Scale: How eBay is Building a Massive Multitenant Service Mesh Using Istio. In this session, Sudheendra Murthy covered eBay’s story, from their applications deployment to service mesh journey, scale testing, and future direction.

What’s Next for Istio?

We were excited to be part of this year’s IstioCon, and it was wonderful to see the Istio community come together for this new event. As our team members have been key contributors to the Istio project over the past few years, we’ve had a front row seat at the growth of the project itself along with the community.

To learn more about what the Istio project has coming up on the horizon, check out this project roadmap session. We’re looking forward to the continued growth of this open source technology, so that more companies — and people — can benefit from what it has to offer.


steering future of istio

Steering The Future Of Istio

I’m honored to have been chosen by the Istio community to serve on the Istio Steering Committee along with Christian Posta, Zack Butcher and Zhonghu Xu. I have been fortunate to contribute to the Istio project for nearly three years and am excited by the huge strides the project has made in solving key challenges that organizations face as they shift to cloud-native architecture. 

Maybe what’s most exciting is the future direction of the project. The core Istio community realizes and advocates that innovation in Open Source doesn't stop with technology - it’s just the starting point. New and innovative ways of growing the community include making contributions easier, Working Group meetings more accessible and community meetings an open platform for end users to give their feedback. As a member of the steering committee, one of my main goals will be to make it easier for a diverse group of people to more easily contribute to the project.

Sharing my personal journey with Istio, when I started contributing to Istio, I found it intimidating to present rough ideas or proposals in an open Networking WG meeting filled with experts and leaders from Google & IBM (even though they were very welcoming). I understand how difficult it can be to get started on contributing to a new community, so I want to ensure the Working Group and community meetings are a place for end users and new contributors to share ideas openly, and also to learn from industry experts. I will focus on increasing participation from diverse groups, through working to make Istio the most welcoming community possible. In this vein, it will be important for the Steering Committee to further define and enforce a code of conduct creating a safe place for all contributors.

The Istio community’s effort towards increasing open governance by ensuring no single organization has control over the future of the project has certainly been a step in the right direction with the new makeup of the steering committee. I look forward to continuing work in this area to make Istio the most open project it can be. 

Outside of code contributions, marketing and brand identity are critically important aspects of any open source project. It will be important to encourage contributions from marketing and business leaders to ensure we recognize non-technical contributions. Addressing this is less straightforward than encouraging and crediting code commits, but a diverse vendor neutral marketing team in Open Source can create powerful ways to reach users and drive adoption, which is critical to the success of any open source project. Recent user empathy sessions and user survey forms are a great starting point, but our ability to put these learning into actions and adapt as a community will be a key driver in growing project participation.

Last, but definitely not least, I’m keen to leverage my experience and feedback from years of work with Aspen Mesh customers and broad enterprise experience to make Istio a more robust and production-ready project. 

In this vein, my fellow Aspen Mesher Jacob Delgado has worked tirelessly for many months contributing to Istio. As a result of his contributions, he has been named a co-lead for the Istio Product Security Working Group. Jacob has been instrumental in championing security best practices for the project and has also helped responsibly remediate several CVEs this year. I’m excited to see more contributors like Jacob make significant improvements to the project.

I'm humbled by the support of the community members who voted in the steering elections and chose such a talented team to shepherd Istio forward. I look forward to working with all the existing, and hopefully many new, members of the Istio community! You can always reach out to me through email, Twitter or Istio Slack for any community, technical or governance matter, or if you just want to chat about a great idea you have.