How Istio is Built to Boost Engineering Efficiency

The New Stack Makers Podcast
How Istio is Built to Boost Engineering Efficiency

One of the bright points to emerge in Kubernetes management is how the core capabilities of the Istio service mesh can help make engineering teams more efficient in running multicluster applications. In this edition of The New Stack Makers podcast, The New Stack spoke with Dan Berg, distinguished engineer, IBM Cloud Kubernetes Services and Istio, and Neeraj Poddar, co-founder and chief architect, Aspen Mesh, F5 Networks. They discussed Istio’s wide reach for Kubernetes management and what we can look out for in the future. Alex Williams, founder and publisher of The New Stack, hosted this episode.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.

Alex Williams: Hey, it’s another episode of The New Stack Makers, and today the topic is Istio and engineering management. Today, I am joined for a conversation about Istio with Neeraj Poddar, co-founder and chief architect at Aspen Mesh. Hello, Neeraj, how are you?

Neeraj Poddar: I’m doing good. It’s great to be here Alex.

Alex Williams: Thank you for joining us – you’re live from Boulder. And live from Raleigh, North Carolina, is Dan Berg, Distinguished Engineer at IBM Cloud Kubernetes Service and Istio. That’s a mouthful.

Dan Berg: Yes, sir. I was I was worried there for a moment you weren’t going to be able to get Kubernetes out.

Alex Williams: You know, it’s been that way lately. Actually we’re just finishing our second edition of the eBook that we wrote first in 2017 about Kubernetes, service mesh was just beginning to be discussed there, and I was reading some articles and some of the articles were saying things like, well Istio is still in its early days and now today you’re telling me that you have more meetings than you can go to related to Istio. I don’t know what that means. What does that mean? What does that mean to you both? What does that say about Istio and what is Istio? So for those who may not be familiar with it.

Neeraj Poddar: You’re right. I mean, we have so many meetings and discussions, both asynchronous and and synchronously, that it’s great to see the community grow. And like you’re saying, from three years before to where we are now, it’s amazing, not just the interest from developers, it’s also the interest from end users, the feedback and then making the product and the whole community better. So coming to what Istio is, Istio is a open source service mesh platform for simplifying microservices communication. And in simple terms, it handles a lot of complicated pieces around microservices communicating with each other, things like enforcing policies, managing certificates, surfacing relevant telemetry so that you can understand what’s happening in your cluster. And those problems become more and more complicated as you add more microservices. So service mesh and Istio in a way is just taking that burden away from the developers and moving it into the infrastructure there. It’s basically decoupling the two things and enabling them to be successful at the same time.

Alex Williams: Now, Dan, you’ve been around a bit and you have your own experiences with APIs and how they evolved, and is this why we’re seeing this amazing interest in Istio? Because it takes the API to that next evolution? Is it the network effect on APIs that we’re seeing or is it something different that’s relevant to so many people?

Dan Berg: Well, I think it’s I think it’s a combination of a few things. And first off, thanks for calling me old for saying I’ve been around for a while.

Dan Berg: So I think it’s a combination of several different things. First and foremost, near and dear to my heart, obviously, is containers and the evolution of containers, especially as containers have been brought to the cloud and really driving more cloud native solutions, which drives distributed solutions in these clouds, which is driving more use of microservices. Microservices aren’t new. It’s just they’re being applied in a new way in the cloud environments. Because of that, there’s a lot of complexity around that and the distribution and delivery of those containers is a bit different than what we’ve seen in traditional VMs in the past, which means how you manage microdervices is the difference. I mean, you need the mechanism. You need a way to drive your DevOps processes that are GitOps-based, API, CLI driven. So what that naturally means is we need a better way of managing microservices and the microservices in your cloud. The evolution of Istio as a service mesh, which I often think of as the ability to program through an API, your network and your network policies. It’s a natural evolution to fit where we are today with cloud native applications based on containers. This is the modern way to manage your microservices.

Neeraj Poddar: The way Dan explained it – it’s a natural progression. I especially want to mention that in context of network policies, even when companies migrate from monoliths to microservices, when you are doing that migration, the same organisational policies lie and no one wants to give that up and you don’t want embed that into your applications. So this is the key missing piece which makes you migrate or even scale. So it gives you both the things wherever you are in your journey.

Alex Williams: So the migration and the scale. And a lot of it is almost comes down to the user experience, doesn’t it? I mean, Istio is very well suited to writing generic reusable software, isn’t it? And to manage these interservice communications, which relates directly to the network, doesn’t it?

Dan Berg: Yeah, in many ways it does. A big, big part of this, though, is that it removes a lot of the burden and the lockin from your application code. So you’re not changing your application to adopt and code to a certain microservices architecture or microservices programming model – that is abstracted away with the use of these sidecars, which is a pivotal control point within the application. But from a developer standpoint, what’s really nice about this is now you can declare your intent. A security officer can declare their intent – you know Neeraj was talking about with policies you can drive these declarations through Istio without having to go through and completely modify your code in order to get this level of control.

Alex Williams: Neeraj, so what’s the Aspen Mesh view on that? And I know you talk a lot about engineering management. This relates directly to engineering management in many ways, doesn’t it? And in terms of being able to take care of those so you can have the reusable software.

Neeraj Poddar: Absolutely. I mean, when I think of engineering management, I also think of engineering efficiency. And they both relate in a very interesting way where we want to make sure they always are always achieving business outcomes. So there are two or three business outcomes here that we want our engineering teams to achieve. We want to acquire more customers by creating more, solving more customer use cases, which means adding more features quickly. And that’s what Dan was saying. You can move some of those infrastructure pieces out of your application into the mesh so you can focus and create reusable software. But that’s basically software that’s unique IP to your company. You don’t have to write the code, which already has been written for everyone. The second outcome is once you have got a customer, once you have been successful acquiring them, you want to retain them. And that customer satisfaction comes from being able to run your software reliably and fix issues when you find them. And that’s where, again, service mesh and Aspen Mesh would excel, because we surface metrics, consistent telemetry and tracing. At the same time, you’re able to tie it back to an application view where you can easily pinpoint where the problem is. So you are getting benefits at a networking level, but you’re able to get an understanding of an application that is very crucial to your architecture.

Alex Williams: Dan what is the importance of the efficiencies at the networking level, the networking management level. What has been the historical challenge that Istio helps resolve? And how does the sidecar play into that? Because I’m always trying to figure out the sidecar. And I think for a lot of people, it’s a little bit confusing to try to understand. And Lynn, your colleague at IBM describes it pretty well as almost like, taking all the furniture out of the room and then placing it back in room piece by piece, I don’t know if that’s the correct way to describe it.

Dan Berg: Possibly. That’s one analogy. So a couple of different things. First off, networking is hard. Fundamentally, it is hard. It almost feels like if you’re developing for cloud, you need to have a PhD to do it properly. And in some levels, that’s true. Where things are difficult, I mean, simple networking fine, getting from point A to point B, not a problem. Even some things in Kubernetes with routing from one service A service to service B.. That’s pretty easy, right? There’s kube-dns. You can do the lookup and kube-proxy will do your routing for you. However, it’s not very intelligent, not intelligent at all. There’s little to no security built into that. Then of course the routing and load balancing is very generic. It’s just round robin and that is it. There is nothing fancy about it. So what happens when you need specific routing based on, let’s say, zone awareness or you need it based on the client and source that’s coming in? What if, what happens if you need a proper circuit breaker because the connection of your destination wasn’t available? So now where are you going to code that? How are you going to build in your retries logic and your time out logic? Do you put that in your application? Possibly. But wouldn’t it be nice if you didn’t have to? So there’s a lot of complications with the network. And I don’t even get into. What about security? Right. Your authentication and your authorization? Typically, that’s done in the application. All you need is one bad actor in that entire chain and the whole thing falls apart. So Istio and basically service meshes, modern service meshes really push that programming down into the network. And this notion of the sidecar, which is kind of popular inside, like Kubernetes-based environments, it’s basically you put another container inside the pod. Well, what’s so special about that one container in the pod? Well, with Istio sidecar, that sidecar is an Envoy proxy. And what it is doing is it’s capturing all inbound and outbound traffic into and out of the pod. So everything traverses through that proxy, which means policies can be enforced, security can be enforced, routing decisions can be programmed and enforced. That happens at the proxy. So when a container in the pod communicates out, it’s captured by the proxy first and then it does some things around it, make some decisions, and then forwards it on. The same thing on the inbound requests, it’s checking should I accept this? Am I allowed to accept this? It’s doing it’s that control point. And all of that is programmed by the Istio control plane. So that’s that’s where the developer experience comes in. You program it through YAML, you’re programming the controlling the control plane, the control plane propagates all that programming logic down into those sidecars. And that’s where the control point actually takes place. That’s the magic right there. Does that make sense? It’s kind of like a motorcycle that has a little sidecar – literally the sidecar. Put your dog in the sidecar. If you want to take your dog with you everywhere you want to go. And every time you make a decision, you ask your dog? That’s the Envoy sidecar.

Neeraj Poddar: That’s the image that comes to my mind. And maybe that’s because when I grew up in India, that was more prevalent than it is in the U.S. right now and now somebody from America is also bringing it up. But that’s exactly right in my mind. And just to add one thing to what Dan said, day one networking problems are easy, relatively easy. Networking is never easy, but relatively easy in Kubernetes. Day two, day three – it gets complicated real fast, like early on in the service mesh and Istio days there were people saying it’s just doing DNS. Why do I need it? Now no one is saying that because those companies have matured from doing day ibe problems and they are realizing, oh my God, do I need to do all of this in my application? And when are those application developers going to write real value add code then.

Alex Williams: All right. So let’s move into day two and day three, Neeraj. So who are the teams and who are managing a day two and day three? Who are these people? What are their personas and what roles do they play?

Neeraj Poddar: That’s a really interesting question. I mean, the same personas which kind of started your project or your product and were there in day one, they kind of move along in day two but some of the responsibilities maybe change and some of the new personas come on board. So an operator role or security or security persona is really important for day two. You want to harden your cluster environment. You don’t want unencrypted data flowing through. For maintainability, as an operator whether it’s a platform operator or a DevOps SRE persona, they need to have consistent metrics across the system, otherwise they don’t know what’s going on. Similarly for day two, I would say the developer, which is who is creating the software and who is creating the new application – they need to be brought into when failures happen, but they need to be consulted at the right time with the right context. So I always think of, in microservices that if you don’t have the right context, you’re basically going to just spend time in meetings trying to figure out where the failure is. And that’s where a consistent set of telemetry and a consistent set of tracing for day two and day three is super crucial. Moving to security. I mean, think about certificate management again. I’m going to show my age here, but if you have managed certificates in your application in multiple distributed manner, you know, the pain there. You have been yelled at by a security officer some time saying this is not working and go upgrade it and then you’re stuck trying to do this in a short time span. Moving forward to Istio, now that’s a configuration change or that’s an upgrade of the Istio proxy container. Because you know what? We fix OpenSSL bugs much quicker because we are looped into the OpenSSL ecosystem. So you know, day three problems and then even further. If you look at day three, you have upgrade issues. How do you reliably upgrade without breaking traffic or without dropping your customer experience? Can you do a feature activation or using progressive delivery? And these are the things we’re just talking about. But I think maybe these are day three point five or day four problems, but in the future you should be able to activate features in one region, even in a county, who cares and test it out to your customers without relying on applications. So that’s how I see. I mean, the personas are the same, but the benefits change and the responsibilities change as your organizations mature.

Dan Berg: I was just going to say, I mean, one of the one of the things that we see quite often, especially with the adoption of Istio like the developer first and foremost, would be, as Neeraj says, the day one, setting up your basic networking and routing is pretty easy. But then as your system and application grows, just understanding where those network flows go, it’s amazing how quickly it gets out of control that you really don’t know where. Once the once traffic gets into your, let’s say, your Kubernetes cluster, once it comes into the cluster, where does it go? Where does it traverse? Did you even have your timeouts set up properly? How do you even test that? Right. So going through the not even just the operational aspects, but just the testing aspects and how to do proper testing of your distributed system is very complicated from a networking standpoint and that’s where things like Istio timeouts, retries and circuit breakers really become helpful, and their fault injection. So you can actually test some of these failures. And then with Jaeger and doing the tracing, you could actually see where the traffic goes. But one of my favorites are Kiali – bringing that up and just seeing the real time network flows and seeing the latency, seeing the error codes. That is hugely beneficial because I actually get to see where the traffic went when it came in into my cluster. So lots of benefit for the developer beyond just the security role. I mean, the developer role is very critical here.

Neeraj Poddar: Absolutely, yeah. I mean, you can, I’ll put a plug in for even operators here, which is once you get used to programming via YAML or being able to change the data path the extension that we are making in the community through WASM, you get to control a critical piece of infrastructure where you have zero day things happening. You can actually change that by adding your own filter. Like we have seen that being so powerful in existing paradigms with BIG-IP or NGINX where you have a whole ecosystem right now for people writing crazy scripts for doing things, which is saving them lots of money, because you know what? You don’t get just time to change your application, but it can change the proxy which is next to it. So you’re going to see a lot of interesting things happening therefore, you know, day three day four use cases.

Alex Williams: But who’s writing the scripts? Who’s writing the YAML? Who’s doing that configuring? Because a lot of these people you know, developers are not used to doing configurations, so. So who does that work?

Neeraj Poddar: That’s a really good question, and the reason I’m hesitant is the answer is depends. Yeah. If you have a very mature developer workflow, I would expect developers to give you information about the applications and then the platform team takes over and converting it into the Istio specific, Kubernetes specific language. But most of the organizations might not be there yet, and that gives you will need some collaborative effort between application developers and operators. So, for example, I’ll give you what Aspen Mesh is trying – we are trying to make sure even if you have the right YAMLs and both the personas are writing it, thosde APIs are specific to those personas. So we have created application YAMLs, which an application developer can write. It has no information or no prior knowledge about Istio. The operators can write something specific about their requirements about networking and security again in a platform agnostic way, and then Aspen Mesh can lower it down to Istio specific configuration. So it depends on what kind of toolchain you are using. I would hope that in future, application developers are writing less and less configuration, just platform specific.

Dan Berg: And I think that basically echoes the fact that we do see multiple roles using the Istio YAML files and the configurations, but you don’t have to be an expert in all of it. Generally speaking, there are traffic management capabilities and things like that that a developer would use, because those are you’re defining your routes. You’re defining your characteristics specific to your application as well as the roll out of your deployment if you’re trying to do a canary release, for example. That’s that’s something that the developer would do or an application author would be responsible for. But when you’re talking about setting up policies for inbound or outbound access controls into the cluster, that may be a security advisor that’s responsible for defining those levels of policies and not necessarily the developer, you wouldn’t want the developer defining that level of security policies. It would be a security officer that would be doing that. So there’s room for multiple different roles. And therefore, you don’t have to be an expert in every aspect of Istio because it’s based on your role, which aspect you’re going to care about.

Alex Williams: When we get into the complexities, I think of telemetry and telemetry has traditionally been a concept I’ve heard Intel talk about. Right with infrastructure and systems. And now telemetry is being discussed as a way to be used in the software. How is telemetry managed in Istio? What is behind it? What is the architecture behind that telemetry that makes it manageable, that allows you to really be able to leverage it

Dan Berg: For the most part. So it all really starts with the Istio control plane, which is is gathering the actual metrics and provides the Prometheus endpoint that is made available that you can connect up to and scrape that information and use it. The how it gets the telemetry information, that’s really the key part of it is where does that information come from? Yeah. And if we take a step back and remember, I was talking about the sidecar, the sidecar being the point that makes those decisions, the routing decisions, the security decisions.

Alex Williams: Well, the dog, the dog, the dog,

Dan Berg: Yes the dog that is way smarter than you and making all the proper decisions telling you exactly where to go. So that’s exactly what’s happening here. Except since all traffic is coming in and out of, in and out of the pod is going through that proxy, it is asynchronously sending telemetry information, metrics about that communication flow, both inbound and outbound so we can track failure rates, it can track latency, it can track security settings. So it can send a large amount of information about that, that flow, that communication flow. And once you start collecting it up into the Istio control plane, into the telemetry endpoint and you start scraping that off and showing it in a Grafana dashboard as an example, there’s a vast amount of information. Now, once you start piecing it together, you can see going from service A to service B, which is nothing more than going from sidecar A to sidecar B, right we have secure identities. We know exactly where the traffic is is going because we have identities for everything in the system and everything that is joining the mesh is joined because it’s associated with a sidecar proxy. So it’s these little agents, these proxies that are collecting up all that information and sending it off into the Istio control plane so you can view it and see exactly what’s going on. And by the way, that is one of the most important pieces of Istio. As soon as you turn it on, join some services, you’ve got telemetry. It’s not like you have to do anything special. Telemetry starts flowing and there’s a huge amount of value. Once you see the actual information in front of you, traffic flowing, error rates, it’s hugely powerful.

Neeraj Poddar: Just to add to what Dan said here, the amount of contextual information that the sidecars add for every metric we export, it’s super important. Like I was in front of a customer recently and like Dan said there’s a wow factor that you can just add things to the mesh. And now suddenly you have so much information related to Kubernetes which tells you about the port, the services, the role of the application labels. So that’s super beneficial and all of that without changing the application. Another point here is if you’re doing this in applications there’s always inconsistencies between applications developed from one team versus applications developed with another. Second problem that I’ve always seen is it’s very hard to move to a different telemetry backend system. So for some reason, you might not want to use Prometheus and you want to use something else to model. If you tie all of that in your application you have to change all of this. So this proxy can also give you a way of switching backends, for example, in the future if you need without going through your application lifecycle. So it’s super powerful.

Alex Williams: So let’s talk about a little bit more about the teams and more about the capabilities and you know, I know that Aspen Mesh has come out with its latest release, 1.5, and you have a security APIs built into it, you’re enabling Envoy support, which is written in WebAssembly, which is interesting. We’re hearing a little bit more about WebAssembly but not much, traffic management, you know, and how how you think about traffic management. Give us a picture of 1.5 and higher kind of tracing Istio’s evolution with it.

Neeraj Poddar: Yeah. So, I mean, all Aspen Mesh releases are tied to the upstream Istio releases, so we don’t take away any of the capabilities that Istio provides. We only add capabilities we would think the organization will benefit from like a wrapper around it so that you have a better user experience. So Istio 1.5 by itself, moved from a monolithic architecture to – sorry, move from a microservices control plane to a monolithic one for operational simplification. Right. So we have that. Similarly telemetry V2, which is an evolution from the out of process mixer V1. We also provide that benefit where users don’t have to run mixer. There was a lot of resource contention where it was consuming a lot of CPU and memory and contributing to some latency and latency numbers, which didn’t make sense. So all of those benefits of these two communities working on you are getting with the Aspen Mesh release. But the key thing here is for us to provide rapid APIs like security APIs. I’ll give you a quick example. So Istio moved from 1.4 to 1.5, I think, between job-based policies to request authentication or authentication policies. We had to change the APIs because the older APIs, were not making sense after user feedback. There were some drawbacks. This is great for improvement, but for a customer now I have to rethink what I did.

Neeraj Poddar: When I have to upgrade, I have to make sure we move along with Istio users. So us providing a wrapper around it means we do the conversion for them. So that’s one way we provide some benefit to our customers. Like you said, WASM is an interesting development that’s happening in the community. I feel like as the ABI itself matures and more and the rich ecosystem develops, this is going to be a real powerful enhancement. Vendors can actually add extensions without rebuilding and having to rely on a C++ filters. Companies who have some necessity for which they don’t want to, you know, offload that building cost to vendors or open source. They can extend Envoy on on the fly themselves. This is a really huge thing. One thing I should talk about is that the Istio community is regularly changing or evolving the way they are installing Istio. You know Dan is here, he can tell you from the very beginning we have been doing helm, we have not been doing helm or we have gone to istioctl. It’s all in the zone the right way. Right. It’s because of user feedback and trying to make it even more smooth going forward. So we try to smooth out that code where, you know, Aspen Mesh customers can continue to use the tooling that they’re comfortable with. So those are the kind of things we have given in 1.5 Where our customers can still use help.

Alex Williams: When you’re thinking about – when you’re thinking about the security Dan and you’re thinking about what distinguishes Istio, what comes to mind, what, and especially when you’re thinking about multi cluster operations?

Dan Berg: One of the key aspects of Istio and one of the huge value benefits of Istio is that if you enable Istio and the services within the mesh, if you enable strict security policy, what that’s going to do is that’s going to enable automatic management of mutual TLS authentication between the services, which is in layman’s terms, allowing you to do encryption on the wire between your pods. And to do that, if you’re looking at a Kubernetes environment, if you’ve got a financial organization as a customer that you’re looking to support or any other customer that has strict encryption requirements and they’re asking, well, how are you going to encrypt on the wire? Well, in a Kubernetes environment, that’s kind of difficult unless you want to run IPsec tunnels everywhere, which has pretty nasty performance drain. Plus, that only works between the nodes and not necessarily between the, between the pods or you start moving to IPv6, which isn’t necessarily supported everywhere or even proven in all cases, but Istio literally through a configuration can enable a mutual TLS with certificate management and secure service identity. So hugely powerful. And you can visualize all of that with the tools and utilities from Istio as well. So you know exactly which traffic flows, like in Kiali you can see exactly what traffic flows are secured and which ones are not. So that’s hugely powerful. And then the whole multi cluster support, which you brought up as well, is an interesting direction. I would say it’s still in its infancy stages of managing more complex service mesh deployments. Istio has a lot of options for multi cluster. And while I think that’s powerful, I also think it’s complex. And I do believe that this is going to, where we’re going in this journey is to simplify those options, to make it easier for customers to deal with multi cluster. But one of the values of security and multi cluster ultimately is around this level of secure identities and the certificate management that you extend the boundaries of trust into multiple different clusters. So now you can start defining policies and traffic routing across clusters, which you can’t do today. Right. That’s very complex. But you start broadening and stretching that service mesh with the capabilities made afforded to you by Istio. And that’s just going to improve over time. I mean, it’s not, we’re on a journey right now of getting there and a lot of customers are starting to dip their toes in that multi cluster environment and Istio is right there with them and will be evolving. And it’s going to be a fantastic, fantastic story. I would just say it’s very early days.

Neeraj Poddar: Yeah, I was just going to echo like it’s in infancy, but it’s so exciting to see what you can do there. Like really when I think about it, multi cluster, you can think about new cases emerging from the telecom industry where the multi clusters are not just clusters in data centers, they’re at edge and far edge and you might have to do some crazy things.

Dan Berg: Yeah, well, that’s the that’s the interesting thing. I know earlier this year at IBM, we launched a new product called IBM Cloud Satellite. And that’s where if you own a service mesh, you’re going to be extremely excited with those those kind of edge scenarios. You’re broadening your mesh into areas that you’re putting clusters at. Two years ago, you would have never thought about putting a cluster in those locations. I think service mesh is going to become more and more important as we progress here with the distributed nature of the problems we’re trying to solve.

Alex Williams: Yeah, I was going to ask about the telco and 5G, and I think what you say sums it up and to be able to manage clusters at the edge, for instance, in the same way that you can and, you know, essentially, you know, in a data center environment.

Dan Berg: Well you’re also dealing with a lot more clusters, too, in these in these environments, instead of tens or even hundreds, you might be dealing with thousands and trying to program like in the old days at the application level, that’s going to be almost impossible. You need a way to distribute consistent policies, programmable policies distributed across all these clusters, and Istio provides some of the raw mechanics to make that happen. These are going to be incredible, incredibly important tools as we move into this new space.

Neeraj Poddar: I was just going to say, I mean, I always think of evolution of service mesh as is going to follow the same trajectory as evolution of ADC market that happened as and when the telcos and the big enterprises came in because of a lot of requirements of the telecom industry. Currently, the load balancers are so evolved. Similarly, service mesh will have a lot more capabilities. Think about the clusters running in far edge. They will have different resource constraints. You need a proxy which will be faster and slimmer. Some people will say that’s not possible, but we’ll have to make to do that, have to do that. So I’m just always excited when I think about these expansions. And like Dan said, these are not talking about tens or hundreds of clusters now we are talking about thousands.

Alex Williams: We’ve been doing research and and we find actually in our research that the clusters that are most predominant that we’re finding among the people we’re surveying are those of more than five thousand clusters. And that, I guess my last question for you. Is about day five, day six, day seven, and what role does observability play in this? And because it seems like what we’re talking about essentially is observability and I’m curious on how that concept is evolving for you. Now, you think about it in terms of as we move out to those, to the the days beyond, for people who are using Istio and service mesh capabilities,

Dan Berg: Obviously you need that sidecar. You need that dog next to you collecting all that information, sending it off. That that is hugely important. But once you start dealing with scale, you can’t go keep looking at that data time in and time out. Right. You’ve got to be able to centralize that information. Can you can you send all of that and centralize it into your centralized monitoring system at your enterprise level and the answer there is yes, you absolutely can. SysDig, a great partner that we work with, provides a mechanism for scraping all of the information from the Istio Prometheus endpoint, bringing that all in, and then they have native Istio support directly into that environment, which means they know about the Istio metrics and then can present that in a unified manner. So now you can start looking at common metrics across all of these clusters, all the service meshes in a central place, and start sending alerts, start building alerts, because you can’t look at five thousand clusters and X number of service meshes. It’s just too large. It’s too many. So you have to have the observability. You need to be collecting the metrics and you’ve got to be able to have the the alerts being generated from those metrics.

Neeraj Poddar: Yeah, and I think we need to go even a step beyond that, which is you’ll have information from your mesh, you’ll have information on your nodes, you’ll have information on your cloud, your GitHub, whatever. You get it all to a level where there is some advanced analytics making sense of it. There’s only so much that a user can do once they get the dreaded alert.

Neeraj Poddar: They need to do the next step, which is in this haystack of metrics and tracing and log. Can someone narrow it down to the place that I need to look, because you might get alerted on a microservice A, but it has dependencies which are other microservices, so the root cause might be 10 different levels down. So I think that’s the next day seven day eight problem we need to solve, how do we surface the information in a way where it’s presentable? For me, it’s even tying it back to the context of applications. Dan and I are both from networking. We love networking. I can talk networking all day, but I think we need to talk to the language of applications. That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information we are correlating it. So I think that’s that’s going to be the real thing – it’s still very early. People are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way. It’s scary but it will get there.

Alex Williams: Well Neeraj and Dan, thank you so much for joining us in this conversation about service mesh technologies and Istio and these days beyond where we are now. And I look forward to keeping in touch. Thank you very much.

Dan Berg: Thanks for having us.

Neeraj Poddar: Thank you.

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.