photo of rocket

How Istio is Built to Boost Engineering Efficiency

How Istio is Built to Boost Engineering Efficiency

The New Stack Makers Podcast
How Istio is Built to Boost Engineering Efficiency

One of the bright points to emerge in Kubernetes management is how the core capabilities of the Istio service mesh can help make engineering teams more efficient in running multicluster applications. In this edition of The New Stack Makers podcast, The New Stack spoke with Dan Berg, distinguished engineer, IBM Cloud Kubernetes Services and Istio, and Neeraj Poddar, co-founder and chief architect, Aspen Mesh, F5 Networks. They discussed Istio’s wide reach for Kubernetes management and what we can look out for in the future. Alex Williams, founder and publisher of The New Stack, hosted this episode.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.

Alex Williams: Hey, it’s another episode of The New Stack Makers, and today the topic is Istio and engineering management. Today, I am joined for a conversation about Istio with Neeraj Poddar, co-founder and chief architect at Aspen Mesh. Hello, Neeraj, how are you?

Neeraj Poddar: I’m doing good. It’s great to be here Alex.

Alex Williams: Thank you for joining us – you’re live from Boulder. And live from Raleigh, North Carolina, is Dan Berg, Distinguished Engineer at IBM Cloud Kubernetes Service and Istio. That’s a mouthful.

Dan Berg: Yes, sir. I was I was worried there for a moment you weren’t going to be able to get Kubernetes out.

Alex Williams: You know, it’s been that way lately. Actually we’re just finishing our second edition of the eBook that we wrote first in 2017 about Kubernetes, service mesh was just beginning to be discussed there, and I was reading some articles and some of the articles were saying things like, well Istio is still in its early days and now today you’re telling me that you have more meetings than you can go to related to Istio. I don’t know what that means. What does that mean? What does that mean to you both? What does that say about Istio and what is Istio? So for those who may not be familiar with it.

Neeraj Poddar: You’re right. I mean, we have so many meetings and discussions, both asynchronous and and synchronously, that it’s great to see the community grow. And like you’re saying, from three years before to where we are now, it’s amazing, not just the interest from developers, it’s also the interest from end users, the feedback and then making the product and the whole community better. So coming to what Istio is, Istio is a open source service mesh platform for simplifying microservices communication. And in simple terms, it handles a lot of complicated pieces around microservices communicating with each other, things like enforcing policies, managing certificates, surfacing relevant telemetry so that you can understand what’s happening in your cluster. And those problems become more and more complicated as you add more microservices. So service mesh and Istio in a way is just taking that burden away from the developers and moving it into the infrastructure there. It’s basically decoupling the two things and enabling them to be successful at the same time.

Alex Williams: Now, Dan, you’ve been around a bit and you have your own experiences with APIs and how they evolved, and is this why we’re seeing this amazing interest in Istio? Because it takes the API to that next evolution? Is it the network effect on APIs that we’re seeing or is it something different that’s relevant to so many people?

Dan Berg: Well, I think it’s I think it’s a combination of a few things. And first off, thanks for calling me old for saying I’ve been around for a while.

Dan Berg: So I think it’s a combination of several different things. First and foremost, near and dear to my heart, obviously, is containers and the evolution of containers, especially as containers have been brought to the cloud and really driving more cloud native solutions, which drives distributed solutions in these clouds, which is driving more use of microservices. Microservices aren’t new. It’s just they’re being applied in a new way in the cloud environments. Because of that, there’s a lot of complexity around that and the distribution and delivery of those containers is a bit different than what we’ve seen in traditional VMs in the past, which means how you manage microdervices is the difference. I mean, you need the mechanism. You need a way to drive your DevOps processes that are GitOps-based, API, CLI driven. So what that naturally means is we need a better way of managing microservices and the microservices in your cloud. The evolution of Istio as a service mesh, which I often think of as the ability to program through an API, your network and your network policies. It’s a natural evolution to fit where we are today with cloud native applications based on containers. This is the modern way to manage your microservices.

Neeraj Poddar: The way Dan explained it – it’s a natural progression. I especially want to mention that in context of network policies, even when companies migrate from monoliths to microservices, when you are doing that migration, the same organisational policies lie and no one wants to give that up and you don’t want embed that into your applications. So this is the key missing piece which makes you migrate or even scale. So it gives you both the things wherever you are in your journey.

Alex Williams: So the migration and the scale. And a lot of it is almost comes down to the user experience, doesn’t it? I mean, Istio is very well suited to writing generic reusable software, isn’t it? And to manage these interservice communications, which relates directly to the network, doesn’t it?

Dan Berg: Yeah, in many ways it does. A big, big part of this, though, is that it removes a lot of the burden and the lockin from your application code. So you’re not changing your application to adopt and code to a certain microservices architecture or microservices programming model – that is abstracted away with the use of these sidecars, which is a pivotal control point within the application. But from a developer standpoint, what’s really nice about this is now you can declare your intent. A security officer can declare their intent – you know Neeraj was talking about with policies you can drive these declarations through Istio without having to go through and completely modify your code in order to get this level of control.

Alex Williams: Neeraj, so what’s the Aspen Mesh view on that? And I know you talk a lot about engineering management. This relates directly to engineering management in many ways, doesn’t it? And in terms of being able to take care of those so you can have the reusable software.

Neeraj Poddar: Absolutely. I mean, when I think of engineering management, I also think of engineering efficiency. And they both relate in a very interesting way where we want to make sure they always are always achieving business outcomes. So there are two or three business outcomes here that we want our engineering teams to achieve. We want to acquire more customers by creating more, solving more customer use cases, which means adding more features quickly. And that’s what Dan was saying. You can move some of those infrastructure pieces out of your application into the mesh so you can focus and create reusable software. But that’s basically software that’s unique IP to your company. You don’t have to write the code, which already has been written for everyone. The second outcome is once you have got a customer, once you have been successful acquiring them, you want to retain them. And that customer satisfaction comes from being able to run your software reliably and fix issues when you find them. And that’s where, again, service mesh and Aspen Mesh would excel, because we surface metrics, consistent telemetry and tracing. At the same time, you’re able to tie it back to an application view where you can easily pinpoint where the problem is. So you are getting benefits at a networking level, but you’re able to get an understanding of an application that is very crucial to your architecture.

Alex Williams: Dan what is the importance of the efficiencies at the networking level, the networking management level. What has been the historical challenge that Istio helps resolve? And how does the sidecar play into that? Because I’m always trying to figure out the sidecar. And I think for a lot of people, it’s a little bit confusing to try to understand. And Lynn, your colleague at IBM describes it pretty well as almost like, taking all the furniture out of the room and then placing it back in room piece by piece, I don’t know if that’s the correct way to describe it.

Dan Berg: Possibly. That’s one analogy. So a couple of different things. First off, networking is hard. Fundamentally, it is hard. It almost feels like if you’re developing for cloud, you need to have a PhD to do it properly. And in some levels, that’s true. Where things are difficult, I mean, simple networking fine, getting from point A to point B, not a problem. Even some things in Kubernetes with routing from one service A service to service B.. That’s pretty easy, right? There’s kube-dns. You can do the lookup and kube-proxy will do your routing for you. However, it’s not very intelligent, not intelligent at all. There’s little to no security built into that. Then of course the routing and load balancing is very generic. It’s just round robin and that is it. There is nothing fancy about it. So what happens when you need specific routing based on, let’s say, zone awareness or you need it based on the client and source that’s coming in? What if, what happens if you need a proper circuit breaker because the connection of your destination wasn’t available? So now where are you going to code that? How are you going to build in your retries logic and your time out logic? Do you put that in your application? Possibly. But wouldn’t it be nice if you didn’t have to? So there’s a lot of complications with the network. And I don’t even get into. What about security? Right. Your authentication and your authorization? Typically, that’s done in the application. All you need is one bad actor in that entire chain and the whole thing falls apart. So Istio and basically service meshes, modern service meshes really push that programming down into the network. And this notion of the sidecar, which is kind of popular inside, like Kubernetes-based environments, it’s basically you put another container inside the pod. Well, what’s so special about that one container in the pod? Well, with Istio sidecar, that sidecar is an Envoy proxy. And what it is doing is it’s capturing all inbound and outbound traffic into and out of the pod. So everything traverses through that proxy, which means policies can be enforced, security can be enforced, routing decisions can be programmed and enforced. That happens at the proxy. So when a container in the pod communicates out, it’s captured by the proxy first and then it does some things around it, make some decisions, and then forwards it on. The same thing on the inbound requests, it’s checking should I accept this? Am I allowed to accept this? It’s doing it’s that control point. And all of that is programmed by the Istio control plane. So that’s that’s where the developer experience comes in. You program it through YAML, you’re programming the controlling the control plane, the control plane propagates all that programming logic down into those sidecars. And that’s where the control point actually takes place. That’s the magic right there. Does that make sense? It’s kind of like a motorcycle that has a little sidecar – literally the sidecar. Put your dog in the sidecar. If you want to take your dog with you everywhere you want to go. And every time you make a decision, you ask your dog? That’s the Envoy sidecar.

Neeraj Poddar: That’s the image that comes to my mind. And maybe that’s because when I grew up in India, that was more prevalent than it is in the U.S. right now and now somebody from America is also bringing it up. But that’s exactly right in my mind. And just to add one thing to what Dan said, day one networking problems are easy, relatively easy. Networking is never easy, but relatively easy in Kubernetes. Day two, day three – it gets complicated real fast, like early on in the service mesh and Istio days there were people saying it’s just doing DNS. Why do I need it? Now no one is saying that because those companies have matured from doing day ibe problems and they are realizing, oh my God, do I need to do all of this in my application? And when are those application developers going to write real value add code then.

Alex Williams: All right. So let’s move into day two and day three, Neeraj. So who are the teams and who are managing a day two and day three? Who are these people? What are their personas and what roles do they play?

Neeraj Poddar: That’s a really interesting question. I mean, the same personas which kind of started your project or your product and were there in day one, they kind of move along in day two but some of the responsibilities maybe change and some of the new personas come on board. So an operator role or security or security persona is really important for day two. You want to harden your cluster environment. You don’t want unencrypted data flowing through. For maintainability, as an operator whether it’s a platform operator or a DevOps SRE persona, they need to have consistent metrics across the system, otherwise they don’t know what’s going on. Similarly for day two, I would say the developer, which is who is creating the software and who is creating the new application – they need to be brought into when failures happen, but they need to be consulted at the right time with the right context. So I always think of, in microservices that if you don’t have the right context, you’re basically going to just spend time in meetings trying to figure out where the failure is. And that’s where a consistent set of telemetry and a consistent set of tracing for day two and day three is super crucial. Moving to security. I mean, think about certificate management again. I’m going to show my age here, but if you have managed certificates in your application in multiple distributed manner, you know, the pain there. You have been yelled at by a security officer some time saying this is not working and go upgrade it and then you’re stuck trying to do this in a short time span. Moving forward to Istio, now that’s a configuration change or that’s an upgrade of the Istio proxy container. Because you know what? We fix OpenSSL bugs much quicker because we are looped into the OpenSSL ecosystem. So you know, day three problems and then even further. If you look at day three, you have upgrade issues. How do you reliably upgrade without breaking traffic or without dropping your customer experience? Can you do a feature activation or using progressive delivery? And these are the things we’re just talking about. But I think maybe these are day three point five or day four problems, but in the future you should be able to activate features in one region, even in a county, who cares and test it out to your customers without relying on applications. So that’s how I see. I mean, the personas are the same, but the benefits change and the responsibilities change as your organizations mature.

Dan Berg: I was just going to say, I mean, one of the one of the things that we see quite often, especially with the adoption of Istio like the developer first and foremost, would be, as Neeraj says, the day one, setting up your basic networking and routing is pretty easy. But then as your system and application grows, just understanding where those network flows go, it’s amazing how quickly it gets out of control that you really don’t know where. Once the once traffic gets into your, let’s say, your Kubernetes cluster, once it comes into the cluster, where does it go? Where does it traverse? Did you even have your timeouts set up properly? How do you even test that? Right. So going through the not even just the operational aspects, but just the testing aspects and how to do proper testing of your distributed system is very complicated from a networking standpoint and that’s where things like Istio timeouts, retries and circuit breakers really become helpful, and their fault injection. So you can actually test some of these failures. And then with Jaeger and doing the tracing, you could actually see where the traffic goes. But one of my favorites are Kiali – bringing that up and just seeing the real time network flows and seeing the latency, seeing the error codes. That is hugely beneficial because I actually get to see where the traffic went when it came in into my cluster. So lots of benefit for the developer beyond just the security role. I mean, the developer role is very critical here.

Neeraj Poddar: Absolutely, yeah. I mean, you can, I’ll put a plug in for even operators here, which is once you get used to programming via YAML or being able to change the data path the extension that we are making in the community through WASM, you get to control a critical piece of infrastructure where you have zero day things happening. You can actually change that by adding your own filter. Like we have seen that being so powerful in existing paradigms with BIG-IP or NGINX where you have a whole ecosystem right now for people writing crazy scripts for doing things, which is saving them lots of money, because you know what? You don’t get just time to change your application, but it can change the proxy which is next to it. So you’re going to see a lot of interesting things happening therefore, you know, day three day four use cases.

Alex Williams: But who’s writing the scripts? Who’s writing the YAML? Who’s doing that configuring? Because a lot of these people you know, developers are not used to doing configurations, so. So who does that work?

Neeraj Poddar: That’s a really good question, and the reason I’m hesitant is the answer is depends. Yeah. If you have a very mature developer workflow, I would expect developers to give you information about the applications and then the platform team takes over and converting it into the Istio specific, Kubernetes specific language. But most of the organizations might not be there yet, and that gives you will need some collaborative effort between application developers and operators. So, for example, I’ll give you what Aspen Mesh is trying – we are trying to make sure even if you have the right YAMLs and both the personas are writing it, thosde APIs are specific to those personas. So we have created application YAMLs, which an application developer can write. It has no information or no prior knowledge about Istio. The operators can write something specific about their requirements about networking and security again in a platform agnostic way, and then Aspen Mesh can lower it down to Istio specific configuration. So it depends on what kind of toolchain you are using. I would hope that in future, application developers are writing less and less configuration, just platform specific.

Dan Berg: And I think that basically echoes the fact that we do see multiple roles using the Istio YAML files and the configurations, but you don’t have to be an expert in all of it. Generally speaking, there are traffic management capabilities and things like that that a developer would use, because those are you’re defining your routes. You’re defining your characteristics specific to your application as well as the roll out of your deployment if you’re trying to do a canary release, for example. That’s that’s something that the developer would do or an application author would be responsible for. But when you’re talking about setting up policies for inbound or outbound access controls into the cluster, that may be a security advisor that’s responsible for defining those levels of policies and not necessarily the developer, you wouldn’t want the developer defining that level of security policies. It would be a security officer that would be doing that. So there’s room for multiple different roles. And therefore, you don’t have to be an expert in every aspect of Istio because it’s based on your role, which aspect you’re going to care about.

Alex Williams: When we get into the complexities, I think of telemetry and telemetry has traditionally been a concept I’ve heard Intel talk about. Right with infrastructure and systems. And now telemetry is being discussed as a way to be used in the software. How is telemetry managed in Istio? What is behind it? What is the architecture behind that telemetry that makes it manageable, that allows you to really be able to leverage it

Dan Berg: For the most part. So it all really starts with the Istio control plane, which is is gathering the actual metrics and provides the Prometheus endpoint that is made available that you can connect up to and scrape that information and use it. The how it gets the telemetry information, that’s really the key part of it is where does that information come from? Yeah. And if we take a step back and remember, I was talking about the sidecar, the sidecar being the point that makes those decisions, the routing decisions, the security decisions.

Alex Williams: Well, the dog, the dog, the dog,

Dan Berg: Yes the dog that is way smarter than you and making all the proper decisions telling you exactly where to go. So that’s exactly what’s happening here. Except since all traffic is coming in and out of, in and out of the pod is going through that proxy, it is asynchronously sending telemetry information, metrics about that communication flow, both inbound and outbound so we can track failure rates, it can track latency, it can track security settings. So it can send a large amount of information about that, that flow, that communication flow. And once you start collecting it up into the Istio control plane, into the telemetry endpoint and you start scraping that off and showing it in a Grafana dashboard as an example, there’s a vast amount of information. Now, once you start piecing it together, you can see going from service A to service B, which is nothing more than going from sidecar A to sidecar B, right we have secure identities. We know exactly where the traffic is is going because we have identities for everything in the system and everything that is joining the mesh is joined because it’s associated with a sidecar proxy. So it’s these little agents, these proxies that are collecting up all that information and sending it off into the Istio control plane so you can view it and see exactly what’s going on. And by the way, that is one of the most important pieces of Istio. As soon as you turn it on, join some services, you’ve got telemetry. It’s not like you have to do anything special. Telemetry starts flowing and there’s a huge amount of value. Once you see the actual information in front of you, traffic flowing, error rates, it’s hugely powerful.

Neeraj Poddar: Just to add to what Dan said here, the amount of contextual information that the sidecars add for every metric we export, it’s super important. Like I was in front of a customer recently and like Dan said there’s a wow factor that you can just add things to the mesh. And now suddenly you have so much information related to Kubernetes which tells you about the port, the services, the role of the application labels. So that’s super beneficial and all of that without changing the application. Another point here is if you’re doing this in applications there’s always inconsistencies between applications developed from one team versus applications developed with another. Second problem that I’ve always seen is it’s very hard to move to a different telemetry backend system. So for some reason, you might not want to use Prometheus and you want to use something else to model. If you tie all of that in your application you have to change all of this. So this proxy can also give you a way of switching backends, for example, in the future if you need without going through your application lifecycle. So it’s super powerful.

Alex Williams: So let’s talk about a little bit more about the teams and more about the capabilities and you know, I know that Aspen Mesh has come out with its latest release, 1.5, and you have a security APIs built into it, you’re enabling Envoy support, which is written in WebAssembly, which is interesting. We’re hearing a little bit more about WebAssembly but not much, traffic management, you know, and how how you think about traffic management. Give us a picture of 1.5 and higher kind of tracing Istio’s evolution with it.

Neeraj Poddar: Yeah. So, I mean, all Aspen Mesh releases are tied to the upstream Istio releases, so we don’t take away any of the capabilities that Istio provides. We only add capabilities we would think the organization will benefit from like a wrapper around it so that you have a better user experience. So Istio 1.5 by itself, moved from a monolithic architecture to – sorry, move from a microservices control plane to a monolithic one for operational simplification. Right. So we have that. Similarly telemetry V2, which is an evolution from the out of process mixer V1. We also provide that benefit where users don’t have to run mixer. There was a lot of resource contention where it was consuming a lot of CPU and memory and contributing to some latency and latency numbers, which didn’t make sense. So all of those benefits of these two communities working on you are getting with the Aspen Mesh release. But the key thing here is for us to provide rapid APIs like security APIs. I’ll give you a quick example. So Istio moved from 1.4 to 1.5, I think, between job-based policies to request authentication or authentication policies. We had to change the APIs because the older APIs, were not making sense after user feedback. There were some drawbacks. This is great for improvement, but for a customer now I have to rethink what I did.

Neeraj Poddar: When I have to upgrade, I have to make sure we move along with Istio users. So us providing a wrapper around it means we do the conversion for them. So that’s one way we provide some benefit to our customers. Like you said, WASM is an interesting development that’s happening in the community. I feel like as the ABI itself matures and more and the rich ecosystem develops, this is going to be a real powerful enhancement. Vendors can actually add extensions without rebuilding and having to rely on a C++ filters. Companies who have some necessity for which they don’t want to, you know, offload that building cost to vendors or open source. They can extend Envoy on on the fly themselves. This is a really huge thing. One thing I should talk about is that the Istio community is regularly changing or evolving the way they are installing Istio. You know Dan is here, he can tell you from the very beginning we have been doing helm, we have not been doing helm or we have gone to istioctl. It’s all in the zone the right way. Right. It’s because of user feedback and trying to make it even more smooth going forward. So we try to smooth out that code where, you know, Aspen Mesh customers can continue to use the tooling that they’re comfortable with. So those are the kind of things we have given in 1.5 Where our customers can still use help.

Alex Williams: When you’re thinking about – when you’re thinking about the security Dan and you’re thinking about what distinguishes Istio, what comes to mind, what, and especially when you’re thinking about multi cluster operations?

Dan Berg: One of the key aspects of Istio and one of the huge value benefits of Istio is that if you enable Istio and the services within the mesh, if you enable strict security policy, what that’s going to do is that’s going to enable automatic management of mutual TLS authentication between the services, which is in layman’s terms, allowing you to do encryption on the wire between your pods. And to do that, if you’re looking at a Kubernetes environment, if you’ve got a financial organization as a customer that you’re looking to support or any other customer that has strict encryption requirements and they’re asking, well, how are you going to encrypt on the wire? Well, in a Kubernetes environment, that’s kind of difficult unless you want to run IPsec tunnels everywhere, which has pretty nasty performance drain. Plus, that only works between the nodes and not necessarily between the, between the pods or you start moving to IPv6, which isn’t necessarily supported everywhere or even proven in all cases, but Istio literally through a configuration can enable a mutual TLS with certificate management and secure service identity. So hugely powerful. And you can visualize all of that with the tools and utilities from Istio as well. So you know exactly which traffic flows, like in Kiali you can see exactly what traffic flows are secured and which ones are not. So that’s hugely powerful. And then the whole multi cluster support, which you brought up as well, is an interesting direction. I would say it’s still in its infancy stages of managing more complex service mesh deployments. Istio has a lot of options for multi cluster. And while I think that’s powerful, I also think it’s complex. And I do believe that this is going to, where we’re going in this journey is to simplify those options, to make it easier for customers to deal with multi cluster. But one of the values of security and multi cluster ultimately is around this level of secure identities and the certificate management that you extend the boundaries of trust into multiple different clusters. So now you can start defining policies and traffic routing across clusters, which you can’t do today. Right. That’s very complex. But you start broadening and stretching that service mesh with the capabilities made afforded to you by Istio. And that’s just going to improve over time. I mean, it’s not, we’re on a journey right now of getting there and a lot of customers are starting to dip their toes in that multi cluster environment and Istio is right there with them and will be evolving. And it’s going to be a fantastic, fantastic story. I would just say it’s very early days.

Neeraj Poddar: Yeah, I was just going to echo like it’s in infancy, but it’s so exciting to see what you can do there. Like really when I think about it, multi cluster, you can think about new cases emerging from the telecom industry where the multi clusters are not just clusters in data centers, they’re at edge and far edge and you might have to do some crazy things.

Dan Berg: Yeah, well, that’s the that’s the interesting thing. I know earlier this year at IBM, we launched a new product called IBM Cloud Satellite. And that’s where if you own a service mesh, you’re going to be extremely excited with those those kind of edge scenarios. You’re broadening your mesh into areas that you’re putting clusters at. Two years ago, you would have never thought about putting a cluster in those locations. I think service mesh is going to become more and more important as we progress here with the distributed nature of the problems we’re trying to solve.

Alex Williams: Yeah, I was going to ask about the telco and 5G, and I think what you say sums it up and to be able to manage clusters at the edge, for instance, in the same way that you can and, you know, essentially, you know, in a data center environment.

Dan Berg: Well you’re also dealing with a lot more clusters, too, in these in these environments, instead of tens or even hundreds, you might be dealing with thousands and trying to program like in the old days at the application level, that’s going to be almost impossible. You need a way to distribute consistent policies, programmable policies distributed across all these clusters, and Istio provides some of the raw mechanics to make that happen. These are going to be incredible, incredibly important tools as we move into this new space.

Neeraj Poddar: I was just going to say, I mean, I always think of evolution of service mesh as is going to follow the same trajectory as evolution of ADC market that happened as and when the telcos and the big enterprises came in because of a lot of requirements of the telecom industry. Currently, the load balancers are so evolved. Similarly, service mesh will have a lot more capabilities. Think about the clusters running in far edge. They will have different resource constraints. You need a proxy which will be faster and slimmer. Some people will say that’s not possible, but we’ll have to make to do that, have to do that. So I’m just always excited when I think about these expansions. And like Dan said, these are not talking about tens or hundreds of clusters now we are talking about thousands.

Alex Williams: We’ve been doing research and and we find actually in our research that the clusters that are most predominant that we’re finding among the people we’re surveying are those of more than five thousand clusters. And that, I guess my last question for you. Is about day five, day six, day seven, and what role does observability play in this? And because it seems like what we’re talking about essentially is observability and I’m curious on how that concept is evolving for you. Now, you think about it in terms of as we move out to those, to the the days beyond, for people who are using Istio and service mesh capabilities,

Dan Berg: Obviously you need that sidecar. You need that dog next to you collecting all that information, sending it off. That that is hugely important. But once you start dealing with scale, you can’t go keep looking at that data time in and time out. Right. You’ve got to be able to centralize that information. Can you can you send all of that and centralize it into your centralized monitoring system at your enterprise level and the answer there is yes, you absolutely can. SysDig, a great partner that we work with, provides a mechanism for scraping all of the information from the Istio Prometheus endpoint, bringing that all in, and then they have native Istio support directly into that environment, which means they know about the Istio metrics and then can present that in a unified manner. So now you can start looking at common metrics across all of these clusters, all the service meshes in a central place, and start sending alerts, start building alerts, because you can’t look at five thousand clusters and X number of service meshes. It’s just too large. It’s too many. So you have to have the observability. You need to be collecting the metrics and you’ve got to be able to have the the alerts being generated from those metrics.

Neeraj Poddar: Yeah, and I think we need to go even a step beyond that, which is you’ll have information from your mesh, you’ll have information on your nodes, you’ll have information on your cloud, your GitHub, whatever. You get it all to a level where there is some advanced analytics making sense of it. There’s only so much that a user can do once they get the dreaded alert.

Neeraj Poddar: They need to do the next step, which is in this haystack of metrics and tracing and log. Can someone narrow it down to the place that I need to look, because you might get alerted on a microservice A, but it has dependencies which are other microservices, so the root cause might be 10 different levels down. So I think that’s the next day seven day eight problem we need to solve, how do we surface the information in a way where it’s presentable? For me, it’s even tying it back to the context of applications. Dan and I are both from networking. We love networking. I can talk networking all day, but I think we need to talk to the language of applications. That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information we are correlating it. So I think that’s that’s going to be the real thing – it’s still very early. People are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way. It’s scary but it will get there.

Alex Williams: Well Neeraj and Dan, thank you so much for joining us in this conversation about service mesh technologies and Istio and these days beyond where we are now. And I look forward to keeping in touch. Thank you very much.

Dan Berg: Thanks for having us.

Neeraj Poddar: Thank you.

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.



abstract stock photo

When You Need (Or Don’t Need) Service Mesh

When You Need (Or Don’t Need) Service Mesh w/ B. Cameron Gain

The New Stack Makers Podcast
When You Need (Or Don’t Need) Service Mesh 

The adoption of a service mesh is increasingly seen as an essential building block for any organization that has opted to make the shift to a Kubernetes platform. As a service mesh offers observability, connectivity and security checks for microservices management, the underlying capabilities — and development — of Istio is a critical component in its operation, and eventually, standardization.

In the second of The New Stack Makers three-part podcast series featuring Aspen Mesh, The New Stack correspondent B. Cameron Gain opens the discussion about what service mesh really does and how it is a technology pattern for use with Kubernetes. Joining in the conversation are Zack Butcher, founding engineer, Tetrate and Andrew Jenkins, co-founder and CTO, Aspen Mesh, who also cover how service mesh, and especially Istio, help teams get more out of containers and Kubernetes across the whole application life cycle.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 

Bruce Cameron Gain: Hi, it’s B. Cameron Gain Of The New Stack. Today, we’re going to speak about making Kubernetes more efficient with a service mesh. And this is part of our Aspen Mesh three part Makers series. Today, I’m here with Zack Butcher, co-founding engineer of Tetrate, and Andrew Jenkins, co-founder and CTO of Aspen Mesh. Thank you for joining us. 

Zack Butcher: Thanks for thanks for having me. 

Bruce Cameron Gain: So the adoption of a service mesh is really increasingly seen as an essential building block for any organization that has opted to make the shift to a Kubernetes platform. So as these service mesh offerings provide observability, connectivity and security checks, et cetera, for microservices management, I want to look at the underlying capabilities and development of Istio specifically and service meshes in general and how they are a critical component in operations of Kubernetes deployments. So, Andrew, could you please put service mesh in the context you have? An organization might use it to migrate to a cloud native environment. What do they need to know? 

Andrew Jenkins: Yeah, so the migration to cloud data for organizations that we work with is always kind of involves a couple of steps along the way. So there is kind of an end state goal that you want to have microservices to unlock developer efficiency by having developers and people able to move fast on smaller components that are all stitched up into an integrated experience for users. But you have to get there from here, from wherever you are. And so we find that organizations use service mesh a lot to help out with that evolutionary path. So that involves taking where we are now, moving some pieces into more of the cloud native model and developing new cloud native components, but without leaving behind everything that you’ve already already done. And of course, like you talked about, it’s really important to be able to have a good understanding, observability, of all of these different components of the system, be able to know that it’s secure to, be able to connect all these pieces together, whether they’re all in public clouds, on-prem, or different clouds. And so a service mesh can really help with all that connectivity, the security, all those components there. That’s why we see organizations latching on to service mesh as an answer for not just the deployment problem, but how do you integrate all these pieces together? 

Bruce Cameron Gain: Well thank you. Zack, this is kind of a reflection, maybe of what [Andrew] just described, but as you know. Migrations are happening, it’s not greenfield. It’s very rare, so as [Andrew] described, they’re moving from data centers to cloud native environments, for example, and they’re doing it, doing it in bits and pieces. So as they’re doing this, they’re doing it, I would imagine, most often in incremental steps and often different cloud environments as well. And so what do they need to know as far as the operations go and how will the service mesh come into play for these multi-cloud deployments? Is it possible just that one Istio or just one platform service mesh which will take care of everything? Or I would imagine they’re off – they’re piecing together what when they get there, we’ll get to where we’ll just have one service mesh interface. 

Zack Butcher: Yeah. So I think this is this idea of the transition and the migration that Andrew touched on is really, really relevant right now. And in fact, this is really kind of what, this was the original reason I left Google working on Istio to help start Tetrate. Right. We were talking when we were Google about we were trying to get the initial Istio users, right. And we were trying to understand we had built out the project for like a year and we were running around trying to get the initial users. What we heard consistently was, hey, this is great, but I don’t only have Kubernetes. And I think it’s important that we understand that, data centers aren’t going away any time soon. Right. When you go out and build your own data center, that’s a 50 year investment, right? 40, 50 years easily that you’re expecting to get value out of that. That’s not going to go away tomorrow just because you’re moving to cloud and you will have to split infrastructure for a long time and it will be the norm and it will be increasingly the norm to have this kind of split infrastructure between things that you own, different cloud providers that that fit different use cases well. And that’s exactly where we see the need for a common substrate. Right. How do you start to enable communication of those components that need to communicate across these different environments? Right. That’s where the identity and security aspects of a mesh come in. Right. How do you enforce from an organizational perspective? I have regulatory controls. Right. And I need to ensure that I have controls in place across all of my environments that are consistent and that I can prove to an auditor are consistent in that are enforced across all of these environments. A service mesh because of the centralized control, because of the consistency that it gives you, is incredibly useful for helping bring sanity to kind of the craziness that is the split infrastructure world there, this kind of multi-cloud on-prem world. 

Bruce Cameron Gain: Well, without mentioning any names, some providers are making a claim that maybe not now, but very shortly you can just have one single service mesh interface for multiple cloud environments and including your data center as well. How far are we away from that scenario? 

Zack Butcher: Yeah, I think I think it depends on what you mean by by interface. Right. Are we going to get into a world where there’s a common point where me as a developer can go and push configuration about how I want my application to behave on the network, and that is going to go and be realized across all of the physical infrastructure that my company has. Right. No matter where my service is running, if that’s what we mean when we say there’s going to be a common interface. Yes, I 100 percent think that we are going to land at a world like that. Right. Where individual developers need to stop thinking about individual infrastructure, need to stop thinking about individual clusters, because that’s not really what’s relevant for me, shipping business value to my users. And instead, I want to be able to think about the services that I that I have and need to maintain and how I want them to behave. And in the features and functionality that they have right. Now fundamentally more value focused world. 

Bruce Cameron Gain: Andrew, do you agree? And at the same time, if you do agree, when could that scenario happen? 

Andrew Jenkins: I think I strongly agree that the developer should be out of the business of worrying about this interface. And that I think we’ll see, we already see a lot of commonality, even across different service mesh implementations for the features and especially kind of around Kubernetes as a touchpoint for organizing policy and all these sorts of things. But just exactly the same. It’s really important that my organization can guarantee to my security folks that we are meeting our security rules that we’ve set up internally and that maybe what the organization wants to do is not give developers the one common underlying interface that will unify all service meshes. It may want to give a kind of profile set like we’ve already designed these application bundles to talk about applications this way or that way. And we know we can understand those. How those map for security requirements and so in that kind of world, organizations are building what they want to show to their developers on top of the underlying capabilities of the infrastructure. They’re making some choices, kind of just like Zack said so that individual developers who may not be experts on every single layer, they can take advantage of the experts in the organization who have thought about those things and mapped them back to requirements. 

Andrew Jenkins: So I don’t think that external, like, sort of stamped out, forced adherence is very successful in kind of the Kubernetes cloud-native world. So I think what you see is kind of bubbling up of the things that are really common into a couple of interfaces. And there’s already efforts underway for some of these sorts of things. And then they’ll be parts of these interfaces that people like and there’ll be kind of gravity around that and they’ll solidify. And that’s kind of, I think, happening a little bit. I think in the next year or so, you’ll see that happen much more strongly around applications and how they interact with things like service meshes. And then I think a few years from now, big organizations will have their own takes on these. They’ll be built in. And if I walk in the door of organization A on day one, I’ll know where to go to get the catalog that describes my application and I’ll just run with it and rely on the experts underneath, both in my own organization and out there in the community that sort of map that to the actual implementation underneath. 

Bruce Cameron Gain: Well, thanks, Andrew, and I definitely want to revisit that topic more specifically. But as you remove the development layer, as far as the operations folks go, when do you think we might reach that stage where operations staff for, A for example, just has to manage maybe one sole or one panel or interface to deal with the different deployments out there? I mean, as you just mentioned, the developer should not have to worry about that infrastructure aspect of things. But how far are we away from that day where the operations folks are able to just streamline everything now to a single pane, so to speak, and working with that service mesh type of interface when they can just instantly or near instantaneously deploy and set governance standards and compliance standards, et cetera. 

Andrew Jenkins: So there are some organizations that I think are already pretty far down this path with Istio, and Istio has a bunch of great blog posts where users come in and talk about the ways that they’re using it and configuring it. And so there are some organizations that have kind of already built a whole lot of this around Istio. And so I think the thing that we’ll start seeing, though, is that rather than having everybody having to invest a whole lot in service mesh expertise to get that outcome, there will start to be kind of some common best practices, implementation pieces, things from things from vendors and the ecosystem that sort of simplify this so that you don’t – the amount of effort that an organization has to invest to get that benefit will go way down and that will cause the adoption of this to go to increase dramatically. So it now takes investment to do this all yourself. I think that we’ll start to see that become a whole lot more easily adopted into organizations going forward. 

Zack Butcher: Yeah, I think that’s spot on right. As far as something like a single pane of glass goes, if you want to look at things like Kiali, for example, is a good example, I think in open source of starting to build that kind of thing out right where Kiali is an open source telemetry system on top of Prometheus, that ships with Istio and gives a set of dashboards and nice visibility. Right. It’s not a single pane of glass and that you can’t do policy control there. It’s only visibility. But that is actively being – I think that’s actively being worked on both by vendors as well as kind of in the community as well. So I think that that is not a very far away world. I do think, though, Andrew is exactly correct. When he was talking about standardization and picking interfaces as a community? The simple matter is like we’re still this is still very early days for the mesh. Right? These are still we are still learning and developing best practices and we are exactly doing exactly what interests that. Right. As a community together. I think the important things that we start to standardize now are not necessarily APIs and interfaces, but practices, techniques that are standards for deployments. Those kinds of things, I think are are really what’s needed badly today, where we can look at things like nicer unified interfaces or potential for APIs over top of multiple meshes. That kind of thing, once we better understand what really the APIs that we actually need are right, because it’s still just very early for that kind of thing. 

Bruce Cameron Gain: Indeed. And at the same time and this was one of the questions, actually. So actually, I’m going to rephrase the question. Just formulated another question with you based on your answer. And it is indeed in the early days of service mesh, and Kubernetes actually, and you have the main features, observability security, obviously, traffic management. But to rephrase my original question is what’s not the most important necessarily out of those three features, but which of those features needs the most work? And actually security always needs work. So maybe where do we really need to see some improvements in observability or traffic management or both? 

Andrew Jenkins: You know, there’s always room for improvements in both, and I think especially in Istio, my feeling is there’s this sort of stable foundation and a lot of room for innovation and some on top of that, including some infrastructure Istio implementation advancements, that makes that iteration easier to do more rapidly so that we can make progress on a lot of different fronts. I’ll tell you that when I was thinking about what was most important, I’ll tell you that in the early days I was totally wrong in that my thinking was that traffic management was going to be really like the very biggest conscious, spectacular feature coming out of a service mesh. And what I found in the early days for sure was that people needed that observability foundation first to even understand all of these cool new pieces that they had deployed in the cloud, how they’re interacting, even going back on like the security front, what’s talking to what, is it secured. What does it map back to in my security policy, they needed that way before they could start thinking about cool novel ways of doing experimental deployments, canary deployments, progressive delivery. So I think that though there’s been a lot of progress on observability and there’s a lot of foundational work, so I don’t know if it’s the most important, but I bet you that in the near future we’ll see a lot more emphasis on the cool things that you could do with advanced kind of traffic management. 

Zack Butcher: Yeah, yeah. No, I think Andrew is pretty spot on there. Again, with respect to what is the biggest thing to improve, I think we always need to look at what does it take? What do what do I an operator or me, an application developer need to do to start to actually realize value from Istio in my environment? Right. And so that cycle, that that time, you know, what configuration do I have to learn? What configuration do I have to write? What ways can we remove that configuration? That’s a big thing to to improve. In my mind, looking forward at the project is exactly like you said, we have a very solid base in terms of the capabilities that exist in the system today. Then kind of the answer to your question around what is the most important of those kind of three pillars. And in my mind, there’s kind of two answers to this. So one of the answers is that none of the three, because it’s kind of the one of the key value adds of the mesh that it brings all three of these together. Right. And Andrew kind of alluded to that some in his answer, like you need the observability to be able to see and understand what’s happening with the traffic, to be able to get a handle on the security in your system. They all kind of go together in some way. 

Zack Butcher: Right. I can say from the perspective of some of the people that I worked with the most important and some of the companies that we work with, some of the most important features for them are the security, side of the house, because we work in a lot of financial industry, I should say. And so for them, when we look at kind of the it’s still early days. It’s still kind of expensive to adopt a mesh. And they’re in for in their world, it was security was the killer thing that was most important that that gave them value that warranted adopting the mesh. So I think as well, it’s a little hard to say because it’s going to depend on your specific set of use cases. I totally agree with what Andrew says, over time, I think traffic will grow into one of the most important pieces because the observability and the security parts are really kind of table stakes. You have to have those and they need to be present in your system and they need to be configured correctly. And that gives you the insight into what’s happening and that gives you the assurance that you have control over the system. And then the traffic part is really what application developers start to deal with day to day. 

Bruce Cameron Gain: How is this analogy and please be honest, if you don’t agree that it’s applicable. But I’m just thinking speaking of that, I was thinking about of a very high performance car, say, a Tesla, for example. And you have obviously extremely high levels of torque and speed, that’s one component. You have the the user interface, so you have this magnificent screen, the middle. I don’t know if you’ve seen this or not. It’s beautiful. And then you have as well the driverless capabilities. And then the third component, obviously, is security and certain ways to keep you safe. So either of those three once negated or one stop working properly. That’s just not going to offer a proper driving experience in a Tesla. And for me, I see that analogy with the service mesh. 

Zack Butcher: Exactly. Exactly. Yeah, I think that’s a really apt analogy. Right. They really work best in concert and they make sense in concert. You know, there have been these three verticals have existed for since computing has existed. Right. And these have been separate spaces. And there are people that have really compelling products and can do really interesting things in each of the spaces of observability, of application security and application traffic management. But the real kind of game changer in my mind with a mesh is the way that it brings all of those together under a single centralized consistent control, which is that kind of control plane that gives me the single point to to configure. 

Bruce Cameron Gain: Andrew, you mentioned this a while ago in this article you wrote, and it did extremely well. But it’s a subject you guys might not necessarily like to talk about. But in some instances, you don’t need service mesh. And or at the same time. Could you argue, though, in fact, if you’re deploying on a Kubernetes environment, not counting the serverless, but the cloud native environment, especially when you have several of our different cloud environments to manage? Are there instances where you don’t need a service mesh and why? 

Andrew Jenkins: So I think there are some I mean, I don’t think it’s all on us to say, hey, everybody absolutely must use this new thing. Right. There are actually problems where you don’t need Kubernetes. And even if you look at like you may not need containers at all or if you look at serverless. Right. There’s another thing beyond that, which is there’s no-code kind of codeless application development where I don’t even write code. Well, you can in some cases and in other cases we’ll know it’s really more suitable to actually write software, right, write code. And so there is always this continuum of kind of what what pieces you need. And it’s definitely not the case that all problems are solved by a service mesh and require a service mesh. Zack talked about, especially in the early days, how the security benefits were really key for some of the users that he was working with to justify the investment in the early days of Istio to adopt Istio and use it. And so where we’re at now is I think that the security benefits of adopting Istio are at least as good, probably even significantly higher for all of those organizations. And hopefully the cost of adoption continues to go down as folks like Tetrate and Aspen Mesh and everybody else works on improving the Istio experience. And so it becomes even easier to adopt. But let’s be honest, it is a thing, service mesh is a thing that you have to understand at least a little bit about. And so there are some problems where you have very few services communicating or you have a very limited amount of ability to insert a service mesh where it may not justify the effort that you’re going to invest in trying to understand or deploy or implement a service mesh. And I think that as the cost of adoption keeps going down, those become fewer. But that doesn’t mean that that will always be the right answer. 

Zack Butcher: And if I can just parlay off that, I think Andrew is exactly correct. And even what we’ve seen and will continue to see more and more is that even within a single organization, there will be use cases that do not fit the mesh. Right. So I was talking a little bit ago with a company that does a lot of video streaming and for video streaming, a mesh doesn’t provide them very much benefit, but it adds latency in their critical path. It gives them negative things on that side of the house. However, they have a whole API side of the house, too, that where people go and interact with their products and stuff like that, where a mesh does make sense. Right. And so even within the context of a single organization, you’re going to see sets of applications or sets of use cases where it may or may not make sense as well even. And that extends out to the entire organization. 

Bruce Cameron Gain: And regardless, I’m supposing that in most cases, whether an organization presents – taking the video streaming company – for the developer, that really doesn’t matter that much. I mean, whether they’re not worrying about files, for example, don’t worry about YAML, etc., don’t worry about service mesures, whether there’s a service mesh underneath the covers, so to speak, it’s kind of immaterial usually or almost exclusively for developers or not? I mean, what do they need to know? I mean, then what, what or how does their lives or how did their lives change either way when there’s a service mesh or not. 

Zack Butcher: There’s a kind of a mix? Right. And part of this depends on how your organization has decided to approach a service mesh. And part of it depends on kind of how mature you are in that path. Right. So in the extreme, it kind of in a fully mature organization, the real goal and this is the goal with DevOps right. The goal with DevOps is to get developers doing the operations for their own services. Right. Is to get them involved in production. And, you know, whatever we say about the phrase, that idea is good. And in the extreme, a service mesh enables that right. It gives you the ability to put in the hands of individual developers that control over how their application behaves at a higher level way without having to go change code and things like that. So that and I believe that for most organizations that are adopting a mesh that is a desirable instinct, that any of their developers can go kind of reach under the hood and use the mesh to make their applications better, to achieve whatever they need to achieve with it using the mesh. But then there’s the question of like, how do you get them into that point and how do you actually enable successful adoption in an organization in that. 

Zack Butcher: Right. And so what we typically see is that the path for adoption starts with hiding the mesh. Right. Get the people operating the system, the kind of platform to get them to install a mesh and start to use it, start to onboard teams, start to provide some of the underlying visibility with it, start to provide some of the underlying security with it, maybe just do broad traffic related things. Right, that are kind of one size fits all. And then as they gain confidence, start to do more with it with respect to things like traffic management, start to give their own developers more control as they as they get more confident. So I think it’s a spectrum. Right. And then the other side of that, too, is how much does your organization have a pass or try to hide underlying infrastructure in general is going to have an influence on the amount that a developer needs to interact with a mesh or has to interact with a mesh. In general, the instinct should be developers should be able to control their own traffic. Probably the platform team should be able to control the other things. 

Bruce Cameron Gain: And we had kind of touched on this before we started our conversation or the recording excuse me, we’re talking about maybe there are alternatives out there that it’s platforms where there is indeed a service mesh type of functioning or functionality. But even for the operations team, it’s transparent and they don’t really have to worry about managing that. Is that a viable scenario or is this maybe something that is being promised that might not really work? 

Andrew Jenkins: There are definitely platforms that kind of include baked in service meshes and kind of management around a service mesh. I would say that their goal is to sort of make the the downsides as transparent as possible in managing and upgrading and things like that. But hopefully there’s the upsides are still sort of service. Observability should still be driven by the mesh. The kinds of policies that you can enact or traffic management that you can do is still driven by the mesh. And so in that sense, your developers or operations folks, the platform team, are still interacting with the mesh, even if they don’t have to interact with it as a completely separate component. And so I think there’s still the fundamental principles of service mesh apply. It’s just that there are some cases where all the choices that the platform has made around how it’s going to use a service mesh may match up one to one with your organization. And so therefore, there’s no benefit for you swapping that out and doing it all yourself. That happens sometimes. But then I’ll say that we’re also, I guess by nature, seeing a lot of cases where we’re talking to users who want a little bit of a deeper level of control. They want to be able to do some special things in the service mesh, you know, even as simple as adopting their own upgrade path for the service mesh component or having it be consistent across different platforms where they may want to do or make some choices differently than what the platforms already made. But in all cases, hopefully your developers are getting to utilize the benefits of the service mesh, whether it’s sort of baked into the platform or whether it’s something that a platform team is operating at a more custom level. 

Bruce Cameron Gain: And as far as observability goes, it’s and with Istio, it seemed as if observability is of micro services, because it seems that as if that is the key capability of Istio. Would you agree or not? 

Andrew Jenkins: I’d agree that it’s the first thing out of the box that makes a positive impact in your life as a developer. I’ll say that. 

Zack Butcher: Yeah, for sure. I can totally agree with that. Right. Like as far as day one experience goes, that is or day zero, like observability is the key from it. I would argue that identity is the single most important feature of a service mesh and in fact, identity is kind of the key thing that it does and everything else stems from identity. But that’s kind of partly philosophical. And then we can go into the weeds on that one. But yeah, in terms of user facing features, observability definitely wows from the start. 

Bruce Cameron Gain: Is it possible to maybe just a few sentences, dig down a little deeper? Why identity is such a key feature? 

Zack Butcher: Communication doesn’t matter unless you know who you’re communicating to and with. Right. What metrics are you producing? What are the metrics about? Unless you know, the client, a server that you’re communicating, how can you know, so everything in the system really stems from knowing who you’re communicating with and from that sense of identity. Right. And from that we can have policy. From that we can talk about how traffic flows and where traffic flows. Right. What is a destination in your message to send traffic to the thing with an identity we need the name for? We need a handle for it first. So what do you report metrics on? It’s a service. That’s a thing with an identity. It really all stems from having service as a reified concept, assigning identity to them at runtime, being able to use that at runtime to know who you’re actually talking with and everything else in it kind of follows from that. So that’s why I say it’s a little philosophical in that like, yes, we can’t communicate without having an identity. Yes, we can. But who are you really talking to and how can you trust those metrics? How can you trust that communication and what is actually happening there unless you know. And so that’s why I say it. 

Bruce Cameron Gain: Andrew, is that is that a feature that was prevalent or very wow feature for you at the beginning that might have evolved and changed? 

Andrew Jenkins: It’s a key part of the scaling beyond just one cluster right. This identity problem is like something approaching tractable in a tiny, self-contained environment like one Kubernetes cluster or something like that. But as you start distributing this planet scale or across data centers or with organizations that are hybrid or with a system that’s so large that it changes so quickly that it’s really hard, just to write down all of the identities of everything all at once in one place. Then you need something smarter and more flexible. And it’s already built into service mesh this ability to handle identity at a large and very flexible and very rapidly iterating scale. And so I don’t but that’s kind of like the day zero kind of thing. I think that it wasn’t first on a lot of users minds as a thing that they need. And unfortunately, because it’s already built in, it may actually be one of the things that is harder to notice, that it was so key to helping you sort of scale up. But it is absolutely crucial. It’s the part of security that’s like it actually is somewhat of a solvable problem to be able to talk to some pod in some other Kubernetes cluster. That’s not it. It’s about just like Zack said, knowing what it is, knowing who is the other end of the thing that you’re talking to and then being able to use that as a foundation for policy and all this other stuff. 

Bruce Cameron Gain: The new version of Istio had just been released. But what’s the key feature or what do you love about it the most? And what do the people migrating to Kubernetes today and looking at a service mesh? What are they going to like? 

Andrew Jenkins: I have two answers here. One is really boring and that’s good. Support for these – this is an important for me – elliptic curve crypto certs for TLS between pods, which is totally not all that mind blowing of a feature, but it shows kind of the state that Istio is in where now it’s got a lot of capacity to circle back and flesh out requirements, make sure that we adopt organizational requirements, policies, things like that. So that’s just a great example of the kind of maturity side on Istio. The other thing that’s been kind of developing over a couple of releases and is getting more and more mature and is really big in one five is WebAssembly support. And that’s going to be a way to extend Istio and especially the side of our Envoy proxy with a more portable and rapidly evolving way, rather than having to build very low level components in the system. And I think that that’s going to be great because it will allow developers to extend the capabilities, the service mesh. But without all of that having to happen in this crowded core where stability is an extremely important concern and that can be a natural drag on innovation. So we’re kind of opening up the WebAssembly front allows us to sort of do both stability and an open door for innovation. 

Bruce Cameron Gain: And that’s mainly relegated to the JavaScript side of things. Or is that maybe a wider thing? 

Andrew Jenkins: So WebAssembly is cool, because it’s kind of like the concept of JavaScript that, hey, it’s this language that can run anywhere and it’s in everybody’s browser. So it’s that conceptually, but with a lot of the technical reasons why JavaScript might not be a great fit for kind of low level applications. So you can while WebAssembly is this output format. You can do input into WebAssembly in many different programming languages like or JavaScript if you want. And that’s an important part of of broadening that that ecosystem. 

Bruce Cameron Gain: That’s fascinating so that’s kind of working on the giving programmers, engineers accessibility to improving things for the application experience with the infrastructure changes and configurations possibly. Is that correct? 

Andrew Jenkins: Yeah, yeah. 

Bruce Cameron Gain: And Zack, what are your thoughts about that? 

Zack Butcher: Yeah, Andrew took all my answers. No, I think the single biggest thing for me about Istio one point six is that like it’s kind of a boring release in a lot of respects. And I think that that is like the ultimate goal of any infrastructure project. Right. Like I in many respects, I am very happy when there are not big earth shattering features. Right. So I look at things like upgrading to one point six, for many people will be the first time that they use the operator to do an upgrade, because that was, I believe, made default in one five, maybe it was one four. But things like making the lifecycle management easier going into this next release are some of the big things that I think are really, really big and key for people, whether simply, like Andrew said, I think is is really, really going to be an awesome enabling technology in the future. Just when you would ask about JavaScript there today, actually Envoy only supports C++ and Rust for WebAssembly in there and Go is in very early stages as well. So actually there’s not even JavaScript SDKs to use with Envoy today because Envoy has to expose an API and when we program it, you need an API, a handle to that API in your programming language. Right. And so today there’s only been C++ and Rust that have been implemented semi-officially and then there’s a Go one as well. So those are the big things in my mind is just keep it going and keep upgrades are easier. I think you need even less configuration than ever before now to to do the installation and upgrade as well. And so those to me are kind of the big and exciting things now. The more boring an Istio release notes can be, the happier I am, because I think that that shows how the project is maturing, how we’re able to spend time going back and addressing not the 80 percent use cases, but the 20 percent use cases. Right. And that to me, is the really interesting stuff.

Bruce Cameron Gain: To what are you working as far as under the Tetrate umbrella? What are you working on now to solve those 20 percent use cases? 

Zack Butcher: Yeah, so generally speaking, what we’ve kind of what we’ve been doing is working hand in hand with some companies and getting at a large scale, getting service mesh into production with them. Right. And what are all of the things that need to happen to make that happen? You talk a little bit about kind of single pane of glass stuff. So we’re building out that kind of thing. Right. As an organization, I need centralized controls and that kind of thing. And so that’s the general theme kind of build out the sets of tooling and infrastructure required to get a mesh actually adopted in a real large, large enterprise. 

Bruce Cameron Gain: And Andrew, as far as Aspen Mesh goes, what what are some of the challenges that you’re working on at this time? 

Andrew Jenkins: Yeah, we’re really talking in circles and building on each other here. So I think right now, Aspen Mesh is taking a turn around some of the release and integration stuff, that is something that if it’s done right, it’s really powerful and advancing for a project. But it’s not necessarily anybody’s absolute favorite thing to do. And so we’ve been we’re stepping up to the plate around some of that at this point, there’s also there’s been some security stuff, interestingly, kind of going back to Zack’s discussion around identity. We have some users who have existing very large systems that had concepts of identity based around existing kind of concepts like domain names and TLS infrastructure. And so helping bridge the gap between what what they’re doing now, what they want to do in the future. And this migration of there will never be – there’s no way that we can just jump to the future. We’re going to have to evolve point wise from from where we are to where we’re going. So a lot of that is adding some foundational components to make sure that those identities are flexible enough to address the the use cases that are not quite as easy as, I’ve got a brand new fresh container application that I’m just going to stand in my Kubernetes cluster. It’s this brownfield hybrid environment. 

Bruce Cameron Gain: Excellent. And you were one of the earlier developers of service mesh. If I as I understand, I was wondering if you could describe maybe briefly that you know how that’s evolved. And you did already a little bit about some of the many wrong turns that have been made, especially for the open source projects. But in Istio, where is this all going at the same time? What’s next for Istio and service meshes? 

Andrew Jenkins: I’ve worked on projects even before the coining of service mesh or the term Istio. I worked on projects around how to connect applications flexibly and especially kind of as as things move to containers. Istio really changed the game in terms of broad open source adoption and an API that really natively matched up to a developer API that policy objects and things that really natively matched up to Kubernetes very well. And so that’s why Aspen Mesh is built around Istio as kind of a foundational component and why we do some of the things in the community that we do to help. We keep the underlying project project healthy. Going into the future of service meshes, I do think now that people have got it in their hands, they’re getting it into their clusters more and more. They’re starting to build applications that don’t necessarily bring along all of the components that a service mesh also provides. They’re starting to say, oh, we actually can delegate all of that stuff to the service mesh. I think that there’s going to be two big fronts that we’ll see. One is service meshes that span and interact across infrastructure components. So it won’t just be a Kubernetes cluster. You will have service meshes that your organization manages that may include virtual machines, that may include many different Kubernetes clusters. They may stitch all these things together in a way that’s secure, that maintains identity, that’s still observable. So that’s one that’s kind of adoption across a bunch of different clusters. And the second one is novel ways of deploying and managing applications built on capabilities of a service mesh. So this is kind of the progressive delivery and canary rollouts and things like that. I think that’s been a wishlist item for a lot of large organizations. And I think that with Kubernetes, containerisation and things like a service mesh, it’s going to be a lot more practical for them to actually start building on that and getting value in their application lifecycle. 

Bruce Cameron Gain: And again, I think based on what you both said, it might move in the direction of being more applicable to the data center and on premises, model it as you are migrating to cloud native environment. But maybe we’re going to move in a direction where the service mesh will be more applicable to the on premises deployments as well? 

Zack Butcher: Yeah, yeah, for sure. That’s like that’s actually a primary thing that Tetrate works on. And when I talked earlier in the podcast about the fact that data centers are not going away, we’re going to have over the next 40, 50 years. Right. That’s exactly acknowledging that the mesh has to span this heterogeneous infrastructure. I’ll use the term legacy is a dirty word because it’s not – that’s actually the stuff that’s making money in most organizations. Right. It’s the you know, so you have to go back into what is the brownfield. So I think that’s one of the big areas that it’s going. I think if I and I think Andrew is exactly right. So we’ll see better development experience, see it become more pervasive across more environments, that kind of thing. If I just even a little bit longer view, maybe this is a little bit too far. But I think Andrew is exactly correct that in the near term of the next five years or so. If we look a little bit further, I get to work with, like some folks, like the Open Networking Foundation a decent bit, and they do some really interesting things around adding software defined networking and telco standards and stuff like that. And where we see it going in there really, really long term is that it just becomes part of the network. Right. If you look at kind of the dream of what Istio wants to do, if you look at the capabilities that Envoy has, SDN is kind of approaching this from the bottom up, Envoy and Istio and these ecosystems are kind of approaching it from the top down, from the application down. And I think the real beauty is going to be eventually we’re going to meet up right and these kind of these capabilities that the mesh brings are going to be a transparent and ambient part of the network that you’re in. And that’s the beauty and boring, right? That’s when that’s when we’ve made it right. When when your service is like in the kernel and it’s just boring and it just does it. That’s the goal. 

Bruce Cameron Gain: Even for the operations folks. Right? I mean, that’s – that already, the developers get to do their magic. They can do their fun work, they get to create their applications. And then the operations folks are struggling with the security more so. And maybe they’re trying to look at ways to automate things. And maybe in five years they’re not going to have to worry about service mesh as you said. And at the same time, for the for the developers folks, it’s going to be business as usual, except maybe as far as what you brought up before, the programming languages are going to be more applicable or the choice of the menu of programming languages for certain applications used with a service mesh become much larger at the same time. So I guess you would have both, the best of both worlds. 

Zack Butcher: Yeah, I think the real goal is that eventually as an application developer, what I really want to be able to do is guarantee quality of service for my application. Right. So I want to be able to say, hey, for these types of traffic, this is the quality of service that my application needs to provide and the network should go and do what is required to implement that quality of service, whether that’s pushing it to switch pipelines – we can do, we can do per request HTTP packet switch – in a switch I can handle HTTP packets if I really need to write or I can do it in the NFV or I can do it in userspace. Right. And so we’re going to see kind of trade offs on that spectrum transparent to the user, but based on things like quality of service. And that’s where I hope we can start to get away from this. I think of almost like programming a service mesh today, like publishing individual routes before we had BGP right. You know, it’s very manual, very finicky, very one off. And we need the sets of technology that start to make it kind of more automatic, more transparent and just work completely. 

Andrew Jenkins: This is the exact right analogy. When developers start today, they don’t really worry about how to retry packets over the network because the network might be unreliable and lose packets. That’s a solved problem decades ago, but it’s built in. They don’t worry about parsing HTTP request responses. It’s built into some some library that they can use. But they have had to worry about some higher level reliability concerns or addressing addressability concerns, things like that. And as Zack says, when we get to the end state and it’s just sort of pervasive and built then we’ll know success because there will be a whole new class of things that they don’t have to worry about. And we’re starting to see that in some environments. This already happened in containers in Kubernetes. That is, you can already delegate, hey, how do I find the best instance of this service, you can delegate that down to a service mesh. How do I make sure that I’m talking to a secured version of this, that I know the identity of? A service mesh can do that. And so if this is universal for all programs everywhere, because of a combination of service mesh implementations like Istio and equivalent capabilities in NICs and switches and things like that, then that’s that’s a massive success because this is the whole developer thing right. Now there’s a whole class of problems that they don’t have to worry about, that don’t slow them down. They can focus on the next higher level thing. 

Bruce Cameron Gain: Well, I wanted to thank you both very much. Zack Butcher, founding engineer, of Tetrate, and Andrew Jenkins, co-founder and CTO, Aspen Mesh. 

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 



stock photo of electronic equipment

How a Service Mesh Amplifies Business Value

How a Service Mesh Amplifies Business Value

The New Stack Makers Podcast
How a Service Mesh Amplifies Business Value 

In this final episode of The New Stack Makers three-part podcast series featuring Aspen Mesh, Alex Williams, founder and publisher of The New Stack, and correspondent B. Cameron Gain, discuss with invitees how service meshes help DevOps stave off the pain of managing complex cloud native as well as legacy environments and how they can be translated into cost savings. With featured speakers Shawn Wormke, vice president and general manager, Aspen Mesh and Tracy Miranda, director of open source community, CloudBees, they also cover what service meshes can — and cannot — do to help meet business goals and what to expect in the future.

Alex Williams: Hello, welcome to The New Stack Makers, a podcast where we talk about at scale application development, deployment and management. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 

Alex Williams: Hey, we’re here for another episode of The New Stack Makers, and we are completing our three part series on service mesh and Istio in a discussion with Shawn Wormke, Vice President and General Manager of Aspen Mesh, and Tracy Miranda, Director of Open Source Community at CloudBees, and my good pal and colleague, Bruce Gain, who is co-host today. Bruce is a correspondent with The New Stack. Great to have you all here today. 

Tracy Miranda: Hi, Alex. Thanks for having me. 

Alex Williams: You’re welcome. I just want to start as a note that we’re not talking about the latest machinations with Istio today. We’re focusing on engineering practices. So we are not going to be talking about the Open User Commons [Open Usage Commons] today. There’ll be plenty more discussions on that topic, I’m sure, as time goes on. But for us, our focus is on how do you amplify value with a service mesh? What is it that provides the value in a service mesh architecture? And I think this gets then down to in many ways, that transformation that we’ve seen from monolithic architectures to cloud to now to microservices, whereas in a cloud environment you could be working with a platform as a service environment, you might have multiple APIs and you would have multiple APIs. But now in component based architectures, container technologies, the lifecycle has changed a lot. And that means people have to be aware of a lot more than just a few APIs. Now, it’s a lot of other issues which gets into issues around monitoring, observability, distributed tracing. It goes on and on and on. So both Shawn and Tracy are here to help us with some of the questions that we have. 

Alex Williams: And so I want to just get started with just a little bit discussion about the developer out there who is spending so much time with maintenance issues such as debugging and refactoring. They spend hours of the week on bad code. It’s such a big issue that we found that in some data that it’s nearly an $85 billion dollar worldwide opportunity cost that is lost annually. Now, you can think of opportunity cost is also just the opposite of that, the sunken cost. So you just have to take into consideration what your sunk cost is. But it’s still a huge issue. And so in this area, we want to understand how can that service mesh help increase engineering efficiency to solve these business challenges? And so I wanted to just start it off asking about the developer out there who is building microservices, what’s part of their daily work that’s still quite manual? We have seen a lot about automation and we’re starting to see a lot more automation come into processes for developer. But what is still a manual for them to really take care of? I think of things like having to increasingly do configurations in the Kubernetes environment. 

Shawn Wormke: Alex, I think that’s a great question. I think we’ve grown a lot as an industry in automating our pipelines and a lot of our testing and deployments and pieces like that. But I think that once applications are out and running and in production, a lot of the manual work comes from the monitoring of those things. And when problems start to happen, how do we efficiently get information out of those applications in a way that helps us understand their behavior in a production and runtime environment? And how do we really get to the root cause of the problem, fix it and ensure that we have a good user experience for our customers? And I think that still a big part of that is done manually. We have lots of tools to gather information and put them into things like Prometheus and OpenTracing and Jaeger and tools like that. But figuring out which things that we need to look at, which events inside of there are causing the problems, correlating those all together, getting those to the teams that can take action on them. All of that is still quite manual in our industry. And that’s where I think a service mesh can really help by consolidating that down into a single place to look and a uniform place for all of the teams to come together and get that data and information from. 

Bruce Cameron Gain: Yeah, so as far as those maintenance issues go, prior to the implementation of service mesh, does the onus of that sometimes fall on the developers now, or is this still an operations problem exclusively? 

Shawn Wormke: That’s a great question, Bruce. I think that traditionally what we saw was that a lot of that work was being done inside of the applications themselves and it was being implemented in a lot of different ways by the development teams. And that’s where that question of uniformity comes from. What we’ve seen with our customers is they want to move that down underneath the application and let the application owners really focus on business value code and let the operations teams, the ops part of the DevOps team, really work on providing them the tooling and the common infrastructure it takes to run those things in production in a large enterprise environment. 

Alex Williams: So, Tracy, you understand software lifecycle management and a question I’ve been asking people lately is about what people really are accustomed to and how that’s changing. And so we are clearly in agreement that container technologies are here to stay. I think we’re in clear agreement that monolithic architectures and micro services environments have practices that are similar. But some things are disappearing. Some things are fading away. How is that affecting the software lifecycle management as we make this transition? 

Tracy Miranda: Yeah, I think that’s a great question, and it is a case that so many things are changing, like with the whole onset of containers and microservices, I think we’ve only just started to kind of figure out what that means. I come from it a lot from the continuous delivery perspective and some of the big discussions we’re having there is if you have an app and previously that was a monolith and now it’s a bunch of microservices, how do you even define what the boundaries of that are today? And, you know, how does that influence the way you might deliver different things? And when it comes to service mesh, I think that’s really exciting. I think it’s an area where, again, we’ve just barely scratched the surface of the things we can do with service meshes because they connect all the different services together. And then you can have technology that sits on top of them. Like if you take the example of something like (?), then suddenly you open up this whole world of new things like canarying or monitoring health and kind of this whole bucket of what we sort of term these days, progressive delivery. And this is like super powerful things that you just couldn’t do in pre-containers, pre-distributed systems. So I think it’s really exciting just to watch how people handle it and how they get used to it and then the innovation that’s going to come as a result. 

Bruce Cameron Gain: Would you say that the microservices and containerization, are those really conducive to the service mesh? We’re talking about service meshes and you spoke about the wide variety of environments now that as we move from, say, a legacy system to multi-clouds, et cetera. So I was just wondering, as far as the technology goes, are service meshes really conducive to containerization and microservices. And if so, why and how? 

Tracy Miranda: Yeah, absolutely, it’s because, like, I think there is a threshold where it makes sense and depending on the number of services, if you start off with a very simple architecture and you’re not trying to orchestrate too many things, then perhaps service meshes aren’t, just at the level of complexity that you don’t need, but it doesn’t take long before you can have a significant system where you want to take advantage of the different capabilities. And if I can talk about where I’d like to see it go in terms of I think there’s some really powerful benefits you could get once you start connecting up all the different services. Like I was talking to the folks on the Jenkins X team and James Rawlings was talking about, once you have a service mesh, you could start to imagine some really clever things. For instance, you can have preview environments in general with CI/CD in Jenkins X before you commit your code, you can build it and you can run a preview environment so you can see the change you made in practically not quite production, but it looks pretty realistic and it’s a good way to evaluate the patch. Now if you throw in a service mesh, maybe you can start to do something really clever, like shadowing traffic so you could take some real world traffic that would go to your production environment, but then you can redirect that to that preview environment and now you’re testing it with some actual data. So it’s starting to become really powerful what you can do. And like I don’t know that you can do this yet. I think it’s a bit theoretical. But I think once people start to appreciate the benefits you get for what seems like a complexity cost at the beginning, I think it will end as a tooling becomes easier. I think it will start to become it’s like a no brainer that you want to have this in your systems. 

Bruce Cameron Gain: So we’re still in the early stages. I don’t think a lot of people realize that. 

Tracy Miranda: Yeah, absolutely, I think it’s just people are just getting their heads around it. Why do we need this? And we just need easy ways to get it into folks hands and help them steer clear of the pitfalls so that they can get to, I think, all the real magic you can start to do once you’ve got this orchestration, once you got all these things connected. And then you can start to do pretty clever things. 

Alex Williams: When I hear people talk about scratching the surface on things, it reminds me of just what discovery means and how do you enable discovery. If you don’t have a discovery process, you’ll never know what is unknown to you. And when you start discovering those unknowns, then you start finding more that you did not know before. And maybe we could talk, Shawn, a little bit about how service meshes are architected, for instance, and how the actual work with service mesh architectures help you discover those unknowns. 

Shawn Wormke: Yeah, that’s a great question, Alex, I think to go back a little bit to Tracy’s response around the complexity and sort of when you need a service mesh, I think that’s sort of where it all kind of starts, where customers start to find their unknowns. And we oftentimes talk to our customers about do you really actually need this service mesh at this point in your lifecycle? And so usually what we talk to them about is if you can no longer draw your sort of microservices architecture on a whiteboard or on a piece of paper and be ensured that it actually looks like that when it’s deployed into production, it’s probably time to start thinking about a service mesh. And so that’s a piece of that unknowns. And what we see is when we start to deploy these service meshes that provide you the visibility and observability and just the understanding about how service A is talking to service B, oftentimes customers start to recognize that they’re talking to services that they didn’t actually know were in their network. For example, they’re talking to services that are in AWS and they supposedly have a private cloud architecture. And so we start to uncover a lot of things inside of people’s microservice and container architectures that they had no idea that was going on. 

Shawn Wormke: I think people sort of take for granted the fact that these containers are a unique unit of work and we’re just going to deploy them and let them run. And we don’t have to worry about it because this DevOps team is the one that owns it and manages it, but ultimately in production at large scale and in large corporations, they have a data security policy that they have to follow. They have compliance needs that they need to meet and they need to have things like service mesh running around in there to discover the unknowns, to fix them, to ensure that they can’t talk to the things that they’re not supposed to, then that the things that are talking to other things are who they say they are and that you trust them. So I think that’s a big part of why service mesh architectures will be critical for large scale production deployments in the future. And like Tracy said, we’re just starting to scratch the surface on the use for these things. And it’s wide and almost dependent on the vertical or the industry that you’re deploying them in from a service provider to enterprise, to cloud and cloud native applications. 

Bruce Cameron Gain: You touched upon this already a bit, but what are some of the capabilities that they are offering that we can count on or that somebody could say, OK, in addition to their security, of course, and logging capabilities, etc.? What are some examples? 

Shawn Wormke: I think the first and foremost, you know, I think that there’s a lot of old problems that need to be solved in new form factors and new ways. So I think what it boils down to first and foremost, is there’s a bunch of traffic management features that you need in order to deploy things at scale. Right. So taking things from a test to preproduction to production, you know, in your test environments, things are simple. Things are generally running stable. There’s not a lot of traffic happening there and things are fine. But when you get into production, you need traffic management features like basic load balancing between your services that are running around. And that load balancing has to be intelligent and has to understand how those services are responding and making sure that that your applications running as efficiently as possible, things like circuit breaking, understanding when a service no longer exists. And rather than waiting for the TCP timeout to happen in two minutes worth of requests, going to that thing, going off into a black hole, you know, those things don’t happen when you have a service mesh there. Then we move into security features. Like you said, there’s a lot of encryption features inside of service mesh, people use them for certificate management inside of their container environments, mutual TLS authentication, authorization of applications. But then we move into sort of more of the day two sort of features there. And that’s integration with a lot of their enterprise systems. So most enterprises are complex places that have legacy applications talking to greenfield cloud native applications. They need a way for all of those systems to talk together and service mesh can be that bridge between the two. We see people using that oftentimes even with just certificate management and mTLS and enabling that in their legacy applications and using features of Istio and their certificate management pieces to enable that, all the way down then to, like you said, logging, tracing,visibility features, being able to gather telemetry in a single place consistently across all of your applications provides a huge amount of benefit there as far as architectures go. 

Bruce Cameron Gain: And Tracy, in the big picture sense, how would you say that this integrates with the overall software lifecycle management, a personal interest as well? I would be curious to see how that overlaps also with the developer experience. 

Tracy Miranda: So on the lifecycle side, which I’ll tackle first. I think when it comes to kind of getting your code out into production and I think we’ve touched on this, but let me emphasize it. So this comes down to your deployment methodology and there’s many different deployment methodologies. But ultimately, the one you want to get to in the ideal situation is canary deployments. And that kind of ticks all the boxes in terms of highly available, responsiveness, progressive rollout and ability to roll back. And like, the only way you’re going to get to that is by using a service mash and taking advantage of the load balancing. So, you know, I think that is where everybody is heading. And as you can build in the necessary infrastructure, that makes all the difference to how you can then get features into the hands of your customers, how you can get that feedback. Is it going well? Is there going to be some problem? Should we dial it back? And can we do that easily and in an automated way without having to suffer a big failure for customers? So I think that when we talk about lifecycles, it’s towards the end of the lifecycle, just getting things into the hands of users. 

Tracy Miranda: And then I think on the experience side. So it was specifically you asked about developer experience. So I think there were probably I think there’s still a lot of confusion. If I take developer experience as a whole, I think we’re still in the case with I don’t think there’s enough easy ways to do it. I think we’ve got the early adopters who are super good, able to get in, able to deal with different situations and know what they’re doing. But I think there’s still a lot we can do to kind of roll it out for the masses. And I have no doubt all the various communities developing service meshes are going to come up with some things that just make it easier to use, easier to understand when to apply things, how to configure things. And I think that’s the challenge and sticking with some of the complexity, prepackacing things in a way that is easier to get running, but not oversimplifying -. 

Bruce Cameron Gain: Tracy, you mentioned you touched upon this a bit, but what kind of learning curve can DevOps teams expect? And does the onus fall on the operations team, the developer teams, security teams, or who? 

Tracy Miranda: Yeah, the reality is, like you take something like Kubernetes and a lot of the teams, like we have a lot more people talking about it today, but I still find the vast majority of folks haven’t even gotten a proper handle on the distributed nature of Kubernetes. And then you start to throw in the rapid change in the cycles of how quickly is this version of Kubernetes I’m using going to be not supported? How quickly do I have to keep up with innovations? And I think it’s just a lot to contend with. And so I think that’s where it folds in kind of the best practices that we have around continuous delivery and software lifecycle and then probably say we’re going back to basics on how teams do that, really looking at the Accelerate book from Nicole Forsgren and Jez Humble those kind of underlying the principles which your team is going to need to adopt any new technology, including service mesh. 

Alex Williams: Back to the basics. Now, Shawn, my question for you is about the open source communities out there who are doing most of the upstream development and often they are so immersed into the actual code and making sure that it works that a developer experience becomes another parallel challenge to manage. How is that parallel challenge getting managed? Because we very well know that the Kubernertes plumbing is pretty much done. You can use Kubernetes. The question now is how do you build on top of it? And we are just starting to see how organizations are building on top of it. I think this speaks to what Tracy was saying, that we are starting to see some deployments, but it’s by no means are we seeing everyone do it across their organizations. So when you’re thinking about that upstream development, what are you thinking about? 

Shawn Wormke: Yeah, so Aspen Mesh is very active in the Istio community, and we have worked very hard to represent our enterprise customers in that community because I think of what you just said Alex, those developers are mired in the code. They are focused on producing the highest quality piece of software that they can. But that doesn’t always translate into a good sort of end user experience and not always into a good, manageable product oftentimes. And so, a big part of what we do is try to represent our enterprise customers and service provider customers, quite frankly, in that environment and making sure that they’re trying to make at least sane choices that don’t put these large deployments in a place that they can’t recover from or that they have an instable network. And then honestly, I think that’s an opportunity for companies like Aspen Mesh then to build on top of that. A large part of what we do is focus on how to make enterprises successful using that software and dealing with that lifecycle management of the Istio pieces itself. How does this piece of technology work within large organizations? We talk a lot to our customers about that and where it fits into their organizations. Oftentimes we see developers bringing in the technology into the company in a discovery type of mode and sort of proof of concept mode. Eventually, it gets turned over into a platform team, which then works on really helping their developers have access to the pieces that they need and that they understand while the platform team runs the other part of the business. And that’s really what we have focused over the last year or so on, is helping customers integrate that into their organizational structures just as much as we have integrating the technology into their Kubernetes stack. And that’s something that I think is oftentimes overlooked in many open source communities, is how this actually fits and how it actually works in a real large customer environment, customer deployment. 

Bruce Cameron Gain: So we have security, observability, traffic management, etc., logging capabilities, whatnot, but among those areas, where do we really need to see improvements in the immediate mid-term, or in all three? 

Shawn Wormke: I think we’ll continue to see improvements in all three. I would say the large majority of our customers come to us for the security aspects first for what’s there. They have some need to have the encryption there. They come to us for that. But I think the long term, the real potential here is around the observability pieces, because a lot of the traffic management features, as I mentioned a little earlier, these are old problems that have been solved before. We just need to sort of repackage them and reformat them. And so I think that those are known problems and it’s relatively straightforward to solve them. I won’t say it’s easy, but I think it’s relatively straightforward to solve them in this new world. But I think the real opportunities here are around observability and helping people understand what’s going on and helping them to really to provide the best user experience they can to their end customers, because I think that’s really where they want to focus is the profit center of the business, not the cost center side. And so reducing the amount of effort it takes for people to find and fix issues, deploy them, ensure that they’re going to work, ensure that they’re going to solve the problems that they were originally trying to fix is a big part of where we can see a lot of improvements in service mesh overall and in the industry in general, I think. 

Bruce Cameron Gain: So the service mesh in many respects is just the starting point. 

Shawn Wormke: Absolutely. I think you can think of it as the tap into the network that pulls all that stuff out. Right. I think the real work and the real opportunity then is on top of that, what you do with that data, how well you organize it, how you get it to the people that need it and how they take action on it and make decisions off of that data. 

Tracy Miranda: What I hope it would enable is just this culture of kind of experimentation where, you know, now you have all these things at your fingertips and you can afford to say I’ve this theory, you’re going to do this. And now we’ve got super fine grained control on traffic management and access that we can afford to see what happens and see how things play out. And that could be the really exciting part. Get to this, a business using service mesh and is taking full advantage of it. 

Alex Williams: So when you’re thinking about taking full advantage of it, one of the most interesting aspects of Kubernetes is how it’s built for a stateless environment. But so much of the work now to make Kubernetes work is to make it work with stateful environments. And so you have a lot of applications out there that need to be thought of in a way that considers issues such as storage, and storage and traditional networking and traditional enterprises are based upon how do you develop architectures that might be 10, 20 years old. Those architectures are monolithic and you just pour the code into them and then you’ve got to figure out how to get them all configured and then you’ve got to get them running and on and on and on. How are you thinking about the stateful applications in the software lifecycle management with service mesh in mind? 

Tracy Miranda: That’s a good question and I’m not sure I have a good answer for that. I think it’s emerging, I gave a talk, copresenting at KubeCon, and we’re looking there at how do you take a monolith and break it up and run it in the cloud with microservices and take advantage of that. But I have to say, even in that talk there’s so much to cover and we don’t even get to kind of aspects of service mesh. So I don’t think it’s that obvious that I certainly don’t have a good answer for that today. 

Alex Williams: Then I guess, Shawn, I’m wondering what’s the use of service mesh then? Because that’s pretty much then the kind of what everyone else is trying to figure out is how to get these stateful applications to work and speaks of why you don’t have adoption in Kubernetes and why the pipes are great. But if no one’s using the pipes, who cares? 

Shawn Wormke: Yeah, I think it’s going to be a big area of focus for many of these technologies in the coming years. I think that this is where Kubernetes and service mesh is still super early. But in Kubernetes this is sort of where the rubber meets the road is how you can actually deploy and how you can put it into these large places that aren’t all greenfield and aren’t cloud-native first. It’s when the large banks and the large airlines and manufacturing companies can start to take this and use their legacy systems with their new things and enable a speed that they haven’t seen before. I think it’s going to be a big challenge is something that we’ve been working with customers with every day to try to help them figure out. And a lot of it is just really understanding the fact that we’re going to have to build things that don’t always take the greenfield first approach. We’re going to have to embrace the fact that there is brownfield out there. We’re going to have to understand that there’s legacy protocols running around, that we’re going to have to support, you know, things like that and that stateful things are not going away any time soon. I mean, if we’re still writing code for banking applications that were written in the 60s, 70s and 80s, I don’t think that stuff’s going away any time soon. So I think just like Tracy said, this is a part where we’re going to have to figure out and we’re going to figure out if we want these types of technologies to be successful for the future, because companies have huge investments in that legacy infrastructure and they need it to bring it forward, whether it’s for failing systems or financials or whatever. It has to come forward for sure. 

Bruce Cameron Gain: Maybe you’re underselling a bit. I mean, my sense is that you are able, as far as the observability goes, are able to somewhat manage or at least observe your legacy storage, for example, your database, particularly your databases. So, you know, right now, how good can it be and how does automation come into play? 

Shawn Wormke: Yeah, I think we can get some amount of observability there. But I think that to get all the benefits out of the service mesh they need to understand those protocols is, I think, where we’re having a little bit of issue there. Right. So if you think about a layer-7 trace compared to something that we could see for a SQL database running over TCP those two things, we’re not going to be able to get you sort of the same level of visibility on those two things. So I think that’s where potentially I’m underselling a little bit. But I also think that the expectation of all these amazing features that service mesh have or what they think about when they put them there, otherwise it just looks sort of like a packet capture to people sometimes. But yeah, I think that theres security pieces that we can do there. For example, we can extend mTLS outside of Kubernetes clusters and outside of the service mesh, for example, there we can do a bunch of things around egress and ingress control for these legacy things on a very granular level that wasn’t available before and things like that that are available and that are there. But again, it’s so early for those things. And a lot of our customers have gone down the path of sort of looking at things in two buckets. We have the new and we have the old. And then they eventually get to the point where like, how do we make these two things work together? And that’s where I think we’re going to be spending a lot of time over the next eighteen to twenty-four months helping them figure that out as we roll these into real environments. Because, you know, we’ve worked with many of these customers and say, oh, we’re all greenfield, greenfield, greenfield, and then we work with them for a couple of months. Then it’s like, oh, but we need to access this database or the storage system over here. And then that’s where this comes from. 

Bruce Cameron Gain: Tracy, would you agree? 

Tracy Miranda: Yeah, I think the whole migration is something the whole industry is kind of struggling with, like I certainly see it as continuous delivery and at the Continuous Delivery Foundation, you know, we have that range of technology, ten year old technology to the brand new. And it’s just a massive divide. So we have both end users who are kind of trying to share their case studies of what they’ve done. And they’re hoping that we can have these conversations and start seeing the patterns that people can use to simplify that. But it’s still I think it’s kind of like the million dollar question at the moment. How do you bridge new to the old and how do you not lose all the investment you have in existing systems? And I don’t think we have good answers today. So it’s something we have to work on as an industry. 

Shawn Wormke: And I think too you can’t overlook the fact that the expertise to do some of these things is a very limited supply. And so to think that all companies are going to have access to the talent that it takes to do some of this is a stretch. Right. And so there has to be a massive amount of learning by kind of all people involved in order to make these kinds of transitions successful, because it’s very hard to hire the right people. It’s very hard to pull people off of their existing jobs, working on their legacy systems to learn the new things. And so there’s a lot that has to happen in the next few years to make this transition successful. 

Bruce Cameron Gain: I’ve actually heard there’s been an emergence of in-house, for example, of the service mesh expert, resident expert, kind of like the Jenkins resident expert. And what I’m wondering, though, is, you know, eventually during the next 18 to 24 months, will the automation aspects of service mesh come into play so that it’ll in many respects, not only will it help to reduce the learning curve, but definitely I would expect it to reduce the amount of operations work. I mean, that’s the whole thing, isn’t it? 

Shawn Wormke: Yeah. It’s interesting you say service mesh expert. I would say it’s experts, plural. We often times see multiple small teams working on this. And and oftentimes there’s some of the most valuable talent inside of the organization. So they’re the architects, they’re the senior engineers who are working on this. So that’s one of the things we talk a lot about. And again, another place where commercialisation of some of this does come in and help is that if you can take a team of six or eight people who are managing and running this and help them deal with their lifecycle management, help them deal with upgrades and making sure that they work and all that, you can reduce that team from six or eight people down to one. And that’s a lot better for companies. They can put those seven other brains to work on next generation problems or solving some of these stateless problems or working on retraining other folks inside of the organization. But again, it’s early days for these things. And this is bleeding edge technology. These are early adopters. And so the price that they’re paying for that, whether that’s to commercial vendors or whether it’s for the people it takes in-house to run that, is going to be high. It’ll eventually come down as automation picks up, as productization comes in, as the open source community continues to evolve the product and make things easier for their users, that costs will eventually come down to the end users for sure. 

Bruce Cameron Gain: Tracy, I was wondering if or what was your vision for when and how automation should take over? In many ways that it is just starting to now? 

Tracy Miranda: Yes, just going back to the question you had on developer experience and maybe talking about a couple of things I’m seeing, so one example specifically I’m aware of, I think, is X tries to act as an orchestration tool. So it’s pulling together different tools and trying to simplify the user experience. We don’t need to know everything there is about Kubernetes or their specific distribution in that same way, like today, you can get started with it and you can use it with Istio and with Flagger and automatically get set up with canary deployments with very few commands. And it will set up things in the different ways. And I don’t think it’s the case that you can ignore how it works, like you still need to understand what’s going on under the hood and be able to deal with things. But I think the difference there is that it gets you up and running with something like you can follow this pre-canned example and you can have something running and then you can tweak it. So it’s not like you’re putting together your initial system from a set of parts. I think that’s one thing I’ve seen where we’ll start to have these high level tools which aim to pull things together and aim to make some of the decisions for you in a more opinionated way with this expectation of, OK, I just want to get going first and then I’ll figure out how to tweak it and then I’ll figure out what’s going on let me just take some defaults, all the various decisions I can make and how to set this up. And I think that helps accelerate the starting curve. 

Alex Williams: Great. I think we have time for about two more questions, so I’m going to ask a question and I think Bruce is going to ask one. And I want to help kind of recap kind of what we’ve discussed a little bit here. And we’ve talked a lot about service mesh, the need for service mesh and how service mesh can be helpful in helping us understand those unknown unknowns, as I like to describe them. We’ve talked a little bit about how it fits into software lifecycle management as we’re thinking more about Kubernetes, how is it going to continue to fit into there? And we’ve talked a little bit about the challenges that companies face with brownfield applications and legacy applications and the issues of state and statelessness that are just inherent in complex at-scale architectures. Tracy brought up the million dollar question. And so I want to, like, get an idea of how that million dollar question has been resolved in the past and what can we learn, Tracy, from the evolution of continuous development? For instance, I was at a Kong event yesterday and we talked about continuous integration and how it’s really not as relevant anymore. I think platform as a service is not as relevant anymore. Now we’re talking about container as a service, but what can we learn from continuous delivery and apply to service mesh to help us get the answers to those big questions? 

Tracy Miranda: Yeah, good questions. I think there’s different levels, you can answer that, but I’m going to go back to the open source. I’m a big fan of open source. I’m a big fan of community. I think we do have pretty powerful communities around the technology. And that’s what’s going to make the difference to how do we solve these problems. And then if somebody solves it or comes up with some clever innovation that works for them, how does that then get shared and how do we communicate? Hey, look what I’ve done. Look how I’ve solved this problem. What do you think? Is this good? And it’s just letting people have the freeform innovation, but then coming back and sharing it and then building on it and then taking it into tooling that can be democratized for people to use. And I think that’s just kind of the beauty of open source of the code available, the permissionless nature of it, and just everybody trying to solve things and trying to get to that next level. So I think open source community is just a big part of how we will solve these problems and we’ll do it with the entire ecosystem. You know, the companies involved, the people involved. 

Bruce Cameron Gain: I guess my question then would be, as far as the open source community’s contribution goes, just how critical has it been or crucial has it been? You kind of touched upon this, but at the same time, I’d be curious to know, you know, you mentioned, you know, we are in the early stages, obviously. And, you know, have there been any really pleasant surprises or contributions that really stood out and have an effect to the next year or so?  

Tracy Miranda: I referenced this earlier, the tools like the Flagger tool, which kind of sits on top of service mesh and helps instruct it, I think that’s a perfect example of the kind of innovation functionality that will then help us realize the gains to be had from service mesh technology. 

Bruce Cameron Gain: And Shawn? 

Shawn Wormke: Yeah, I think I agree, and we’re actually big fans of the Flagger tool as well, but I think that the overall ecosystem coming together to solve these problems has been a very interesting thing to watch. But I think the most pleasant surprise for me has been sort of to watch the maturation of the Istio project overall over the last two years and to really see sort of the quality improvements, the scalability improvements all the way down to, you know, now having early disclosure processes for inside of there, which is a recognition that companies are trying to make their living off of it and large companies are deploying it. And they have to have a way to deal with real world enterprise kind of problems. I think that’s been a great thing for me to watch over the years. And I know that my team and my customers are very appreciative of all the work that the community does, and we love being partners with them and being part of that ecosystem overall. So I think that’s just been a great couple of years for us. 

Alex Williams: Well, let’s hope it’s another great couple of years ahead. We talked a lot about service mesh and I get the sense that a lot of people are still learning quite a bit. And I think that goes into the actual early adopters themselves. And in this series that we’ve had with the Aspen Mesh team and others in the industry, that’s really quite apparent. And so we look forward to understanding how the community is going to start working through these issues more. But I think it also speaks to the larger Kubernetes community and how they’re working through issues as they start to build more on top of this Kubernetes architecture. So I want to thank you all for participating today. Shawn Worme of Aspen Mesh, Tracy Miranda from CloudBees, thank you so much for joining us. And Bruce Gain. Good to see you here today. Thank you very much for your time. 

Shawn Wormke: Thanks, Alex.

Tracy Miranda: Thanks for having me. This was great. 

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise. 

Alex Williams: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.