Aspen Mesh proud to sponsor IstioCon 2022

Presentations on Istio Security, Dual Stack Support & TLS Orig. at IstioCon 2022

Virtual IstioCon starts Monday, April 25th. It's the biggest gathering of the Istio community and a great place to learn and share ideas. Aspen Mesh is a longtime Istio open source contributor--we are a top five Istio Contributor of Pull Requests. We are proud to sponsor IstioCon 2022 for the third year.

We're excited three members of the Aspen Mesh team are presenters this year.

Aspen Mesh presentations at IstioCon 2022

Tuesday, April 26, 10:30-10:40 a.m. EST - A Beginner's Guide to Following Istio's Security Best Practices, Jacob Delgado, Senior Software Engineer
Following the Istio Security Best Practices page is a daunting task for newcomers to Istio. Even experienced operators have difficulty discerning where to begin. In this talk I will present an easy way for beginners to adopt Istio and settings/configuration I recommend based on my experience.  

Tuesday, April 26, 11 a.m. EST - Istio Upgrades, Jacob Delgado, Senior Software Engineer & Sam Naser, Software Engineer at Google
Are upgrades getting easier? How easy is easy enough? Are helm and revision based upgrades catching on? What is still painful? How often do you upgrade? How often would you like to? Are patches easier than minor upgrades?

Wednesday, April 27, 10:40-10:50 a.m. EST - TLS Origination Best Practices, Kenan O'Neal, Software Engineer
Quick dive for beginners on TLS origination to improve security. This talk will focus on settings that may not be expected for new users with a focus on validating settings. I will touch on what settings Istio uses by default and how to configure Destination Rules to correctly check certificates.  

Thursday, April 28, 10:50-11:00 a.m. EST - Dual Stack Cluster Setup, Josh Tischer, Lead DevOps Engineer
Dual Stack support is very limited in today’s cloud ecosystem. Learn how to run/test Istio on a Dual Stack cluster in AWS on both OpenShift 4.8+ and KubeAdmin. OpenShift 4.7+ is one of the few options that officially support Dual Stack mode for bare metal clusters and Azure. I am excited to share Aspen Mesh’s experience and empower your team with another option for Dual Stack support. 

You can explore the full list of IstioCon sessions here and be sure to register for IstioCon today

About Aspen Mesh
Enterprise-class Istio service mesh is all we do at Aspen Mesh by F5. We offer Professional Services and 24/7 Support plans to ensure you have OS Istio experts available when you need them. 

Explore our Service Mesh Knowledge Hub for the latest resources on how OS Istio drives performance from your microservices architecture. Deep dive white paper topics include: Multi-cluster deployment to enable hybrid and multi-cloud architectures, security, mTLS, compliance and more.  


mTLS Authentication for Microservices Security is Critical to Digital Transformation

For an enterprise, mentions of “Cloud Native” and “Digital Transformation” are two ways of saying that a service mesh deployment is on the cards. Once deployed, the service mesh forms the backbone of business operations and is usually paramount to business continuity. As an enterprise starts to implement its Digital Transformation plan and migrates from a monolithic application environment to a cloud native application environment, security becomes an immediate concern. From a business operations and revenue generation perspective, it is important to understand the benefits and deployment pitfalls of mTLS to ensure business-as-normal operation. 

Here are some common issues we run into all the time

  • Loss of Regulatory Compliance: Several industries such as healthcare and financial services require compliance with mandated specifications, with many other industries complying with agreed best practices. If compliance is lost, then business operations may be affected if regulatory compliance is lost. A solution to this headache is to enforce mutual TLS STRICT mode as default. This will help to regain regulatory compliance by ensuring end-to-end security for all devices and services
  • Loss of Brand Reputation: Customer or internal proprietary information may be made public via exposure from a security breach. Customers may choose to do business with a more secure supplier consequently. A customer or supplier is unlikely to look favorably upon their chosen supplier or vendor if highlighted on the evening news for example. By enabling mutual TLS GLOBAL setting as default will help to protect your reputation by securing customer and proprietary information. 
  • Loss of Business Agility: With dynamic cloud-based applications, upgrades to business logic, services and offerings can seem endless. Frequent security enhancements are also essential. Upgrades come with their own headaches and can slow a business from rolling out new services due to various glitches. It’s best not to rely solely on perimeter defenses, regain agility by securing critical core applications and services with end-to-end security. 
  • Loss of Business Confidence: The service mesh enables many end-users. If one supplier or service has a loss of confidence in the integrity of the service mesh, this can affect everyone -- and other services. The ability to visualize service communication can help to restore confidence by reducing misconfigurations and simplifying troubleshooting. 

 

In our experience at Aspen Mesh, deploying mutual TLS has many benefits, but is easy to misconfigure – which can lead to severe disruptions to business continuity when deploying new microservices or upgrading a service mesh. Download our new white paper, Istio mTLS: 8 Ways to Ensure End-to-End Security, to learn about several more mutual TLS concerns that could spell disaster for your business! 

 

- Andy 


Advantages of Using Istio for Multi-cluster Deployment

Understand the advantages (and disadvantages) of a multi-cluster deployment using Istio in a Kubernetes environment and best practices to mitigate risk.


Solve Istio Security Risks and Get a Handle on Regulatory Compliance

Many verticals, such as Financial services, Healthcare, Insurance, Transportation and Government contracting are highly regulated industries that must adhere to strict Information Technology (IT) compliance requirements. Each industry has unique certification and compliance requirements, for example HIPAA for Healthcare, PCI-DSS for Payment cards, and SOC2 for the Financial services industry.  

Failure to certify against a compliance regulation can mean a Chief Information Security Officer (CISO) or Chief Compliance Officer (CCO) may not be able to authorize a deployment, microservices upgrade or the addition of new services to an existing service mesh deployment. Achieving compliance with industry requirements is rarely trivial, and at the executive level it is a critical business objective. A compliance requirement raises new security concerns and forces security to be looked at from different angles in any Kubernetes based deployment, old or new. 

Our approach to compliance and security

At Aspen Mesh we work with Fortune 2000 companies to help solve their security and compliance concerns. Our certified Istio experts address many inquiries relating to security, some are related to bare-bones security capabilities and management, and the other are related to specific requirements for individual workloads. Most security related inquiries have a compliance requirement at their core. Simply put, compliance drives security requirements, and thus security headaches. We often see the security issues and concerns raised during converting monolithic applications or environments to cloud-native apps running in a Kubernetes environment as part of a digital transformation initiative.  

Security for compliance is best architected up-front during this transformation process to make compliance a central focus for the design and deployment, especially where existing legacy or hybrid transition steps are planned. 

Goal: Design-in regulatory compliance from the start

  • Simplify deployment and save time by short-circuiting compliance issues up front. 
  • Preserve operational capability of legacy IT elements and associated data. 
  • Gain immediate security benefits by securing compliant workloads from the get-go. 
  • Save time on backend deployment headaches through better tools, visibility, dashboards, and auditing capabilities. 

Istio addresses common security problems out of the box

SOC (Security Operation Center) 2 compliance from the American Institute of Certified Public Accountants, are a series of industry-recognized standards for cloud service providers, software providers and developers, web marketing companies and financial services organizations. Section CC6.6 of SOC2 specifies that unauthorized network connections must be detected. A service mesh can be useful here as it restricts interactions between microservices and automatically generates audit logs to allow you to determine who did what and at what time forming the basis of mandated Auditing tools. 

Similarly, the Health Insurance Portability and Accountability Act (HIPAA) modernized the flow of healthcare information and stipulates how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft. A service mesh helps to solve the top four most reported HIPAA violations: 

  • Unauthorized access  
  • Lack of cryptography policy  
  • No proper notification of affected parties and public officials following relevant data breaches 
  • Lack of willingness/capability to update, upgrade or address existing compliance gaps. 

Istio based service mesh streamlines the regulatory compliance process in a Kubernetes environment

Istio provides Security by default as no changes are needed to re-code applications and infrastructure. Achieving defense in depth requires integration with existing security systems to provide multiple layers of defense. A Zero-trust network is fostered by building security solutions on distrusted networks to provide each service with strong authentication and authorization to enable interoperability across clusters and clouds. Istio secures service-to-service communication and provides a management system to automate key and certificate generation, distribution, and rotation.  

Traffic encryption and flexible service access control is achieved through mutual TLS (Transport Layer Security) connections and fine-grained system access is facilitated through RBAC (Role Based Access Control). Sidecar and perimeter proxies work as Policy Enforcement Points (PEPs) to secure communication between clients and servers, including a set of Envoy proxy extensions to manage telemetry and auditing. Peer authentication is used for service-to-service authentication to verify the client making the connection. Istio offers mutual TLS as a full stack solution for transport authentication, which can be enabled without requiring service code changes. 

Eight ways Istio helps you achieve regulatory compliance

  1. End-to-end security 
  2. Granularity of security 
  3. Secure integration with legacy components 
  4. Compliant workloads run in compliant environments 
  5. Visibility to configuration changes 
  6. Dashboard reporting 
  7. Audit logs 
  8. Enable Continual Compliance 

Istio is key in your move from monolithic services to a modern microservices environment

Istio can form a central part of your move from a monolithic services system to a modern Kubernetes based microservices architecture, preserving the ability to use legacy elements. Istio provides all the hooks and handles needed to overcome security issues that cause compliance headaches, to help you achieve compliance faster – which means a quicker deployment with fewer problems and getting a step closer to continual compliance operations. 

  • Compliance requirements drive the need behind most security issues 
  • Istio provides the basis for compliance in a Kubernetes environment 

Checklist for Compliance in a Microservices World

3 Steps to help you put your compliance house in order

1. Architectural review 

  • Have compliant and noncompliant workloads been separated? 
  • Are your compliant workloads running in compliant environments? 

2. Prepare legacy elements for connection to the service mesh 

  • Are the connections to the legacy elements secure? 
  • Are microservices restricted to specific legacy elements?

3. Configure Istio for compliant operation 

  • Are the built-in security features enabled? 
  • Are the mTLS settings correct for your environment? 

About Aspen Mesh

Our experts understand what is necessary to achieve regulatory compliance. We have a team of seasoned Istio experts, and there is no one better at solving Istio security issues and relieving compliance headaches. 

  • Aspen Mesh is a top 5 contributor to the Open Source Istio project – we help shape OS Istio. 
  • Aspen Mesh has deployed service mesh projects in the world’s largest and most complex organizations. 
  • Aspen Mesh visibility tools and dashboards will help you to achieve Continual Compliance. 
  • Reach out to an Aspen Mesh expert for an architectural review and security audit 

Get in touch and we can talk about the compliance issues you are facing. We will share what we’ve done at Fortune 2000 companies to help them achieve their regulatory compliance goals. 


Istio for multi-cluster deployment: Q & A with an expert

Q&A with Brian Jimerson, Solutions Architect and Istio and multi-cloud deployment expert

I recently had a sit down with one of our Aspen Mesh expert Istio engineers to answer some questions that I hear from customers as they start their multi-cluster Istio journey. If your organization already has a series of disparate single-cluster Istio deployments, real benefits can be achieved by connecting them together to create a multi-cluster service mesh. 

Here are some of the highlights of my conversation with Brian Jimerson, one of our seasoned solution engineers whose experience runs deep optimizing Fortune 2000 enterprise architectures. I wore the hat of a customer exploring the ways moving to a multi-cluster environment might impact performance.

Q: I have stringent SLOs and SLAs for my cloud applications. How can I work to meet these?

Brian: Having multiple Kubernetes clusters in different fault zones can help with disaster recovery scenarios. Using a multi-cluster service mesh configuration can simplify cutover for an event.

Q: As an international company, I  have data privacy requirements in multiple countries. How do I ensure that data is stored in a private manner?

Brian: Using Aspen Mesh's locality-based routing can ensure that customers are using a cluster in their own country, while reducing latency. This can help ensure that private data does not leave the user's country.

Q: I need the ability to quickly scale workloads and clusters based on events. How can I do this in a way that's transparent to customers?

Brian: Using a multi-cluster service mesh can help to scale clusters out and in without interruption to users. A multi-cluster service mesh acts like a single cluster service mesh to services, so communication patterns remain unchanged.

Q: I have compliance requirements for workloads, but they also need to be accessed by other applications. How do I do this?

Brian: Using a multi-cluster service mesh, you can operate compliance workloads in a hardened cluster, and securely control traffic to those workloads from another cluster.

 

I encourage you to read the full paper, Advantages of Going Multi-cluster Using Istio -- and Best Practices to Get There. The move to a multi-cluster environment is complex and there are many things to consider. Working with Brian and our Aspen Mesh Solutions team, I've written a a deep-dive paper that spells out both the advantages and things to look out for when evaluating a move to multi-cluster. There are some disadvantages you have to weigh, and I detail those along with the expected performance improvements that come with deploying a multi-cluster service mesh. In this paper I lay out the benefits and important considerations when considering migrating to a multi-cluster that leverages Istio.

Enterprise-class Istio service mesh is all we do at Aspen Mesh by F5. We offer Professional Services and 24/7 Support plans to give you the ability to tap Istio expertise to optimize your microservices environment with industry best practices, then have peace of mind that we're backing you up. Get in touch if you seek a trusted advisor to help you navigate your OS Istio -- we've designed Istio solutions for some of the largest organizations in the world. I encourage you to get in touch if you would like to talk with Brian, a Solutions Architect who can answer any questions you have about how to chart a toward a cloud native environment. If you want to learn more about our full suite of Services delivered by our team of Istio experts (whether you have OS Istio in pre-prod or production), reach out.

 

-Andy


Tutorial | Istio expert talks install, mTLS Authentication & Traffic Control

Tutorial | Istio Expert Shares how to Get Started with OS Istio from Install to mTLS Authentication & Traffic Control

Tutorial How-To from Solutions Engineer, Brian Jimerson

You’ve got Kubernetes. Now hear an Istio expert share what it takes to get started with OS Istio, and learn how to get mTLS authentication and traffic control right out of the box.

As microservices become the de-facto standard for modern applications, developers and platform operators are realizing the complexity that microservices introduces. A service mesh such as Istio can address this complexity without requiring custom application logic. Istio adds traffic management, observability, and security to your workloads on Kubernetes. Brian will share first hand what the clear advantages are and how to get started, including:

  • Installing Istio on your cluster
  • Enabling mTLS authentication between workloads by default
  • Getting better visibility into application traffic
  • Controlling traffic between applications
  • Controlling ingress and egress traffic with Gateways

Read the tutorial or watch the video presentation to learn why Istio is the technology leader for service mesh and production-grade networking for microservices.

About Brian Jimerson, Aspen Mesh Solutions Engineer

Brian Jimerson, Solutions Engineer, Aspen Mesh

Brian Jimerson is a Solutions Engineer at Aspen Mesh by F5 with 15+ years of experience driving better platform performance. He has hands-on expertise moving organizations to a cloud-native platform, optimizing critical application delivery at scale, and leveraging OS Istio to help run cloud-native applications at scale.

Not up for a video? Below is a recap of what was covered during the Tutorial Presentation from Solutions Engineer Brian Jimerson.

Istio Expert Shares How to Get Started with OS Istio – from Install to mTLS Authentication & Traffic Control
Brian Jimerson, Senior Solutions Engineer, Aspen Mesh

What is covered:

  • Microservices and how they’ve come about
  • Challenges that they can present to customers at scale
  • Istio as a service mesh and what problems it’s trying to solve with microservices
  • Istio installation and options
  • Encryption and authorization in Istio
  • Complex traffic management in Istio
  • Observability and visibility of Istio and microservices

About Aspen Mesh

Aspen Mesh is part of F5 and we’re built to bring enterprise-class Istio to the market. We have a product that is built on top of Istio to help manage it at scale and to give you visibility, insight, and configuration of multiple Kubernetes clusters in Istio. We also have services and support for large-scale customers and we’re also a large contributor to the open source Istio project—we’re in the top five in terms of contributors for open source Istio measured by pull requests.

Microservices

I think it’s important to set the stage and talk about what Istio does and how. First, looking at the current landscape of the enterprise software development industry, over the last five to seven years—I think most companies that have been historically brick and mortar and stagnant in terms of software delivery have realized that they need to build software and to use software to deliver innovation and keep up with their competitors.

Most large companies that have been around for 100 years didn’t have those mechanisms in place. They were building large monoliths that took forever to change. If you needed to update or add a new feature, it could take months to get that feature into production and they just couldn’t deliver rapid innovation.

A large part of that is just the nature of a monolith. You have multiple teams contributing to the same deployment. You need QA schedules and testing. If you made one feature, you had to deploy the whole monolith and that took forever, and it just wasn’t working.

The notion of microservices has been around for a long time and it has quickly become the de facto development architecture for these companies because microservices allow individual components to be deployed without any concern about other features and functions and other microservices.

What is a microservice?
I like Adrian Cockroft’s definition the best. There are some others out there like Martin Fowler, but a microservice is a loosely coupled service-oriented architecture with bounded context. This allows development teams to move fast, move individually and deliver innovation through software much better than they could before. I think it’s important to unpack this this term a bit because this has set the stage for what a service mesh and Kubernetes does.

The first part of this definition is “loosely coupled.” What does “loosely coupled” mean? This is not a new concept in development. You can generally test to see if it’s loosely coupled if a service must be updated in concert with all the other services, it’s not loosely coupled. You should be able to update one service and deploy it and not have to redeploy all the other services within that system or that application.

With loose coupling, instead of having a monolith that speaks in memory to different components, you now must have a protocol and a contract to communicate with other services. As long as that contract doesn’t change, you can independently deploy services without worrying about other services. Typically, those protocols and contracts are http with the rest API, or maybe you’re doing asynchronous coupling through a message broker over TCP, but that loose coupling means that you’re now doing communication over the network instead doing it in memory for an application for monolith.

The other part of this definition is bounded context. The idea with a bounded context is that a service does one thing and one thing very well and it doesn’t concern itself with other parts of the system. You know you have a bounded context if you don’t know about your surrounding services so that means that you don’t have to worry about those other services, you just do your one thing and do it very well and you communicate with them through a contract and a protocol.

These two terms are inherent in this definition and imply that there’s a lot of network communication between services that are participating in the system. Whereas in the monolithic world, that was traditionally in memory, maybe some outside calls to an API or database, but for the most part different libraries and components within a monolith just communicate with each other in memory.

Microservice Distribution
This creates a massive growth in network traffic between services. The graph on the left (see below) is kind of a contrived system but it’s not an atypical system. You have six, seven, or eight microservices and because of the loose coupling they all need to communicate with each other over the network instead of in memory.

Because it’s a bounded context and these services don’t necessarily know much about other participating services, there’s the potential that any service could have to call another service for some reason, so you must assume it’s going to happen. You can see the huge amount of traffic increase over the network as opposed to historical monoliths.

The diagram on the left depicts a very simple example with seven microservices—probably a small application or system. For comparison on the right, this is the Netflix call graph. Now that’s the other extreme. Most organizations are going to be somewhere in the middle, but you can imagine they will still have a lot of network complexity.

Microservice Distribution

Challenges with Microservices
The introduction of all these network calls and communication introduces a lot of new challenges that historically development teams haven’t had to deal with in the past. One of the most basic challenges is “how do I register my service, what is my address and how do other services discover my address?” Also, “how do I route and do load balancing between services?”

Challenges with Microservices

In a large system, you must expect that there’s going to be degradation in performance and there’s going to be services that go down. You have to be able to tolerate that from a calling service through various techniques like timing out and circuit breaker patterns.

Additional questions you want to be able to answer include “how do I see what’s going on within my system? How do I view network traffic and latency? How do I view what services are returning errors or are slow to respond? How do I aggregate different APIs and different services into a single API? How do I transform between protocols and data through there? How do I manage configuration of a highly distributed system?”

Kubernetes is one of the first what you call “cloud native” platforms that tried to address some of these questions and it’s done a really good job on the compute side, the storage side and how to schedule workloads, but it doesn’t do much in terms of the traffic between participating services and ingress and egress. There’s a need in the market to handle the network side of things now that Kubernetes is being widely adopted.

What is Istio?

There are other service meshes out there outside of Istio, but service meshes—and Istio in particular—are meant to address a lot of these challenges that are being introduced by these highly distributed loosely coupled systems. It tries to offload a lot of these challenges to an infrastructure component much like Kubernetes did with compute and scheduling, by allowing you to connect and secure microservices to give you sophisticated routing and traffic management, give you visibility into what’s going on in the system, and applying security.

I think one of the important things that Istio and Aspen Mesh and all the different vendors of Istio have realized is that this cloud native world introduces a new way to implement security. The old security controls don’t really work in this new world for several reasons. We don’t want organizations to take a step back in their security posture and so security is first and foremost across the board with Istio.

What is Istio?

Istio Architecture
What does Istio look like? Istio has two different components in its simplest form. It has a control plane which is exactly what it sounds like, much like Kubernetes control plane. There’s a component called “istiod” or a pod called “istoid” which does all the control functions within the service mesh. And there’s the data plane that’s comprised of running workloads, and also what’s called a sidecar proxy, which is a container that runs within the same pod as the service. It’s actually Envoy, which is an open source project from Lyft and that comprises the data plane. Any pod that has an Envoy sidecar proxy injected into it is part of the data plane and istiod will send information like configuration and certificate rotation, installations, telemetry, and things like that.

One thing I want to point out is istiod used to be three separate components: “mixer,” “pilot” and “citadel” and you still see references to those names. They’ve all been collapsed into a single istiod component. If you see references these terms just know that’s part of istiod now.

One of the things in the data plane that Istio does is it automatically injects an Envoy proxy into each pod that has one or more microservice containers running within it. Part of that injection is that the istiod will modify the IP tables for that pod so that all traffic leaving the pod or coming in from external clients to the pod goes to the Envoy proxy. Then the Envoy proxy will communicate with the microservice through the loopback interface so the microservice is never exposed directly to any network interface that can go outside the pod. In this case it’s all Envoy.

Istio Architecture

Installing Istio – The 4 Commands
How do you get going and install Istio? It’s easy to get a default installation of Istio. In the Istio distribution which is open source, there’s a binary CLI called istioctl, and if you run istioctl install against the Kubernetes cluster, it’ll install the control plane. Then you can label create a new namespace and label it with Istio injection or existing namespace with Istio injection and any workload deck or any pod that gets deployed to that namespace that’s labeled will automatically have the sidecar injected into the pod.

Installing Istio

Installing Istio – Commands Installed
If we tie that back to the original architecture what happens is when you run the istioctl install, that creates a namespace called istio-system which is the root namespace for Istio and it installs istiod and related components. It might install an ingress gateway and egress gateway if you want that as well and some observability tools, but all the control plane items will get installed into that istio-system namespace.

When you create a namespace and label with Istio injection, there’s a mutating webhook that gets fired when a new pod is deployed to the namespace that has that label. That mutating webhook will start an init container that modifies the IP tables so that the service only listens on the loopback address and also installs the Envoy proxy, and iptables will then route all the traffic to and from that proxy.

A couple of interesting things about this approach: One, the service itself is transparent. As long as you’re not binding specifically to an interface or IP address that service never really knows that there’s a proxy that’s intercepting all the traffic to and from it. The other nice thing about this is that it doesn’t really matter what language your service is written in. It could be node or java or whatever, the proxy itself is just communicating via network traffic. It gives you a lot of flexibility. Your development teams don’t have to do anything special aside from some configuration for routing and this all just works.

A question you may ask is when you create something, is the webhook going to add a bunch of the information that Istio needs and you don’t have to change your services? This is correct. We’ll look at some of the configuration, but there’s nothing special your application needs to do. It can reference hosts by the same host name, it can make calls it’s all transparently proxied through the sidecar proxy based on the mutant webhook.

Installing Istio - What the commands installed

Installing Istio – Answering a Few Questions
Again back to this example when you do a kubectl apply you know whatever your application is the istiod and the mutating webhook are automatically going to grab this pod that gets deployed from this deployment and inject the Envoy proxy and modify the IP tables for that.

Another question you may ask is what are the benefits of Helm versus just using istioctl? It requires a little more work with Helm instead of just using binary but it’s easier to hook into things like CI/CD tools and that sort of thing. It’s easier to override configurations. Underneath the hood they’re both calling the same thing they see a CTL and Helm they’re all ultimately doing the same deployment, it’s just what your preference is. In practice, most organizations go down the Helm path if they’re comfortable with Helm and it’s part of their process. But in a sandbox environment you want to get up and running and kick the tires on Istio, istioctl is by far the easiest way to go.

Installing Istio

Authentication and Authorization
One of the things that usually drives people to look at a service mesh like Istio out of the box is peer-to-peer authentication. If you think back to those call graphs, all the communication that happens between them, that is going over the network even though it’s an internal network and by default if you’re just running straight up Kubernetes it’s not going to be encrypted, it’s not going to be authenticated, it’s not going to be authorized or anything of that nature. It’s going to be the wild west. Anything can talk to anything in plain text and away we go.

This usually flies in the face of most security policies for organizations. They want encrypted traffic even if it’s an internal network. They want to know identities and they want to be able to audit identities of these network calls and have some level of policy around these. One of the most common ways to do this is something called mTLS, which stands for mutual TLS.

It’s your typical TLS X.509 type of protocol, but in mutual TLS, both the client and the server or the peers, both exchange the handshake for TLS. The client will initiate a TLS handshake with the server, and the server will in turn do another TLS handshake with the client. Within that TLS connection there is identity based on usually a service account and an authorization process to make sure that one service can talk to another.

This meets a lot of the requirements for secure communications between services and the good thing is that Istio does this out of the box. You install it, and it’s automatically going to have mTLS communication between pods unless you override it. It can do that because of the proxy and istiod handles the certificate management for use for rotate certificates, it revokes certificates and that sort of thing because the microservice in a proxy or over the loopback interface doesn’t need to be encrypted. You get mTLS for free out of the box and this is probably the first thing everybody looks at and says “I need a service mesh to do mTLS authentication.”

There’s also some confusion sometimes around different types of authentication mechanisms. mTLS is really peer-to-peer. It’s using X.509 encryption and it’s concerned about “is this one pod allowed to talk to this other pod.” For authentication and authorization for an end user whether it’s a browser or an external system, that doesn’t typically use mTLS.

One of the things that does work out of the box is using JWT or JSON web tokens. JWT is part of the oauth flow and an end user would authenticate with some identity management system and that would issue a JWT token that the user would then send as part of the request to an application. The sidecar proxy would then authenticate that against that issuer to make sure that the user is allowed to do whatever they’re requesting, and it’ll propagate that job token with requests to other systems as in the header. JWT is intended to be for end users to authenticate and be authorized for requests into the system and again mTLS is meant to be up the auth mechanism between pods within the system.

You might wonder if it’s terminating at the sidecar so you don’t have to do anything with your application to support this, or does it go through the sidecar at all? And the answer is yes, it’s terminated at the sidecar. The application doesn’t need to do anything with encryption or authentication or anything of that nature it’s all terminated from sidecar to sidecar.

Another question: Is your pod is going to have two containers in it? The sidecar and then your actual application? The answer is if your pod has one container normally and you deploy it as part of the data plane, it’ll have two containers. One called “istio-proxy” and the other being your application. And again, the IP tables for the pod are modified so that the only communication the container can have is on the loopback interface which is attached to istio-proxy as well. All the other IP addresses are bound to the istio-proxy. There’s nothing for the application to do. It doesn’t need to worry about certificate management which can be a nightmare. It doesn’t need to worry about binding to interfaces or routes or anything like that. It’s all handled by the istio-proxy.

Another question I’ve been asked – since the pod has both of those on the same host usually they’re only communicating to each other. They’re not leaving the hosts that they’re on either so the communication between the istio-proxy and the microservice that’s not TLS? The answer to this question is no. That’s all just on the loopback between the microservice and istio-proxy. They’re going to be on the same host, the same pod, with the same interfaces and they’re going to communicate on the loopback. Then istio-proxy will transparently proxy request responses if it goes off the node to another node or wherever it might live in Kubernetes, it’ll take care of that.

Another point I should probably call out is that there are different configurations for mTLS. Our recommendation is to enforce mTLS mesh-wide. It’s one setting when you install it and then you can override that to be more permissive if you have, say, legacy workloads that just won’t work or if you have some need for that you can override at the individual workload level. But our recommendation is turn this on just so you make sure that you know everything’s encrypted and everything’s set up.

A question I’ve been asked is if you’re installing Istio you’re probably turning on injection for almost all your namespaces, and then you just turn this off for legacy stuff instead of just not including that in an Istio namespace? By default, the namespaces that you label as this “istio-injection=enabled” is going to have every pod in that namespace and is going to have proxy injected into that pod. Which means that it’s also going to have mTLS on by default.

There are namespaces that you probably wouldn’t want to label that, so you don’t want it to be part of the data plane. Obviously like the Kubernetes kube-system namespace and your istio-system namespace and some other system stuff that aren’t running custom workloads you don’t want the label. The ones that you do label with istio-injection=enabled will be a participant in data plane which means you get the proxy, you get mTLS, you get all the Istio goodness. You can override that injection on a per-pod basis or per-deployment basis through attributes. You can also turn off mTLS and a lot of this other stuff through attributes.

What we generally recommend is zero trust, be secure as possible out of the box and if you have a specific use case where you need to override that, do it. That way your developers don’t need to worry about encryption and routing and a lot of stuff is just they’re deploying through their CI pipeline and they automatically get all the Istio benefits out of that.

Authentication and Authorization

Traffic Management and Gateways

The second set of features that most people leverage is traffic management. You can do some complex traffic management as well as controlling ingress/egress into the service mesh. Out of the box there’s a lot that you can do in terms of managing traffic to your workloads that are running in Istio.

Traffic Management
One of the things I find really cool is weighted routing. Weighted routing means having multiple versions of your application deployed side-by-side and routing a percentage of traffic to each of the versions. That’s configurable. Let’s say you deploy two different versions, you deploy a new version so you have v1 and v2 and you could send 75 percent of your traffic to v1, the old version, and 25 percent of the traffic to v2 on the inbound part.

Traffic Management

That allows you to do a lot of cool things like canary style deployments. Canary style deployments are where you employ a new version side-by-side to the old version, you start to route some percentage of traffic to the new version, and gradually increase that over time while you’re testing performance or functions until all the traffic is routed to the new version and you can retire the old version.

It also allows you to do things like A/B testing. So, maybe you have a certain group of users that are beta testers or internal users and you can route traffic to a new version that’s running side-by-side with the old version based on header attributes or other types of attributes. You can have a group of users testing a new version that you’ve released while the rest of the common users are still using the old version until you decide to cut over. These are two scenarios, but you can imagine there’s a lot you can do with weighted routing.

There’s also locality-based load balancing and failover. Not something we’ll get into today but something to be aware of.

Istio has a configuration called multi-cluster. Multi-cluster is when you have a logical Istio instance that spans multiple Kubernetes clusters. Maybe you have on-premise Kubernetes cluster and you use a public cloud like EKS as well or Platform 9. It could span those clusters or it could span clusters that are in different regions in AWS. As long as there’s network connectivity, you can span multiple clusters with a single logical mesh.

Often people have requirements  to route the users to the closest cluster to their location and you can do that in a multi-cluster configuration with locality-based load balancing. Or, maybe you have data privacy requirements for different countries and you want to route users to the country that meets the data privacy requirements for that. It’s an advanced feature but can also be useful as people start to mature on Istio.

You can also do traffic mirroring, which is like weighted routing. Traffic mirroring means you’re sending 100 percent of the traffic to two different versions, but only one version is live and the response is sent back to the client. The other version the response is sent to nowhere. This is another useful way to do feature testing where you’re sending live traffic to a mirror in conjunction with the live version, the user doesn’t know that, but you can do testing of those features.

There’s also a lot of network resiliency. As mentioned before, you must assume that there’s going to be something that’s goes wrong in this large scale. For example, if you have hundreds of microservices they’re all talking to each other over a network, there are going to be network issues and/or performance degradation of a service. You don’t want to bring down the whole system because of one of those. So, things like a circuit breaker pattern can be done with traffic in the circuit breakers where a client can handle a bad response from a server in the mesh and handle it gracefully and not just blow up itself and cause cascading issues. You can set timeouts for requests and how often you want to retry as well so you can handle a lot of that degradation.

You can also do fault injection. If you want to test the resiliency of your applications in the mesh, you can set fault or inject fault into these applications. For example, you can say five percent of the requests I want to return a 500 error and see how the client handles that. Or, 10 percent of requests I want to inject latency of 100 milliseconds or one second and see how everything handles it. If you’re in a pre-prod environment and you want to test the resiliency of all your applications, you can do fault injection. There’s obviously more, but those are the big ones when it comes to traffic management.

It’s also worth pointing out a lot of these features are what other projects have built on such as Knative and scale zero requiring retries until the service is available. Once you get a handle on Istio, you can install something like Knative and other things on top of it and take advantage. A lot of CD tools that are out there like Argo and Flux also use these features to do automated canary deployments and feature testing and things like that as well. There are a number of projects out there that build upon these features.

Traffic Management APIs
These are the four core Istio APIs that give you traffic management rules: virtual service, destination rule, gateway, and service entry.

The virtual service and destination rule are by far the most common. Read the documentation to understand the many settings in the virtual servers that you can set as it’s too much to cover right now. It gets a little confusing as well, so if you are going to play around with this in a sandbox, the virtual service/destination rule relationship is a little confusing and honestly sometimes I wish they’d just combine them because they overlap all the time.

Traffic Management APIs

Traffic Routing and Policies
In general, the virtual service configures how a request is routed to a Kubernetes service within the mesh. The destination rule at the end of that virtual service then configures how to handle traffic for that. I tend to think of virtual service being at the layer 7, and so you can do things like traffic splitting and weighted routing and anything that deals with layer 7 types of things a virtual service handles. A destination rule’s more at the layer 3/layer 4 level.

With a destination rule, you can apply TLS policies, you can do connection timeouts, things at that layer. They do tend to go in conjunction with each other—if you have one, you usually have the other. Those are the ones that give you all the sophisticated routing and management.

There’s also a gateway object. A gateway object is what manages inbound traffic or outbound traffic for an application. There’s an ingress gateway that handles all the inbound traffic and then you assign a gateway per host or set of hosts that route the traffic into your application.

The destination rule is about how to route a request and what to do with that request before it hits the pod. If internally you have service a calling service b, you could have a virtual service set up for service b that says if the URL has “/v1” send it to service b’s v1 instance. Or if it has “/v2” in the URL, send it to v2 of the service b instance. That gives you the complex routing within the mesh. The destination rule would then kick in after the virtual service evaluated whatever rules it had and the destination rule will then say I want to make sure all requests are going to terminate TLS at service b, or I want to set a connection timeout—things at the layer 4 level. They can handle external requests, but they can also handle internal requests. They’re the core building blocks of traffic management.

On the right of the slide is a simple example of a virtual service that says any traffic coming to an ingress gateway with this host, route to this back end. Frontend is the name of the Kubernetes service, the destination. You can do more complex actions in here; it doesn’t necessarily need to be bound to an ingress gateway. It could just be general traffic within the mesh as well. This yaml example handles ingress traffic to a service but it can also be traffic from another service to the service, and that’s why they created a virtual service instead of just using the ingress resource. The challenging part of Istio is there’s a lot of different pieces to chain together. In the simplest form, a virtual service is wherever the request is coming from, this is how I want it routed to the service that I’m running.

External Services
The other interesting traffic management API is service entry. There is a setting in Istio that we at Aspen Mesh recommend you turn on called outbound traffic policy. There’s a setting for that and one of the values you can set to it is registry only. What that means is if an application that’s in the mesh tries to call something outside of the mesh, maybe it’s a third-party API or maybe it’s a database or something that’s not running within the mesh, do not allow it unless it’s in the internal service registry for Istio. By default, if you set that to registry only, any calls outside of the service mesh that Istio doesn’t know about will be rejected. A service entry is a special type of API object that will add an external service by hostname to the service registry. That’s how you can control what traffic is or where egress traffic is allowed to by turning on the outbound traffic policy to registry only and then adding service entries for allowed external services.

Resilience and Testing
All the resilience and testing features that we talked about, these aren’t standalone APIs, they’re different configurations within a virtual service or destination rule, but that’s also how you configure a lot of these resilience and testing features.

Ingress and Egress – Use Controlled Ingress/Egress Gateways to Secure Your Mesh

We’ve talked a lot about traffic within the mesh, but you’re going to have external requests coming in and you’re going to have requests that are going out of the mesh to some sort of external service and there’s a concept of an ingress gateway and egress gateway to handle this.

The ingress gateway sits behind your ingress controller on your cluster and it’s an Istio-specific object, but it gives much of the routing configuration that you need. It’s another layer of configuration but it’s powerful.

One of the things you can do with your ingress gateway and egress gateway is to configure your mesh to only allow traffic going out through the egress gateway. If you have dedicated nodes for your ingress and egress gateways, it gives you a lot of capability to do things like audits or to set firewall rules to only allow traffic from those dedicated node IP addresses. It’s powerful in improving your security posture by having these dedicated gateways for ingress and egress.

Ingress and Egress

Ingress and Egress Example
To give you an example of what that might look like, below is a project I was working on before—we have an application in the blue box that makes a call to a virtual machine outside of the mesh running RabbitMQ, and that call needs to be encrypted over TLS. Through the configuration of egress gateway’s virtual service routing, we say that all traffic leaving the mesh needs to be routed through the egress gateway and then out so that you have visibility into traffic leaving. Because that connection is encrypted, the application just knows about the host name of the server and of the protocol.

The communication between the proxy running on that app’s pod and the proxy running on the egress gateway is then wrapped and tunneled in mTLS. It’s wrapping that encrypted RabbitMQ connection and mTLS between those proxies and then the proxy on egress gateway will then terminate the mTLS and send on the encrypted traffic to RabbitMQ. Another example of the app that knows nothing about this proxy it’s just sending it to the hostname and the proxy automatically picks that up and routes it through the gateway.

People may ask if the Istio proxy is a pod or is a DaemonSet running on each of the nodes, does it run on each of the controller nodes? And the answer to that is no, the Istio proxy is a container that runs within the pod. The Istio egress gateway and ingress gateway are stand-alone Istio proxies that are configured to handle inbound and outbound traffic. They are another Envoy proxy that’s deployed in Istio system namespace and it’s a pod running with just a standalone proxy container. It’s a normal deployment, it’s not a DaemonSet, it’s just a standalone and it’s configured to allow certain ports and config traffic through whether it’s inbound or outbound.

Ingress and Egress Example

Observability

The last section I want to cover is observability. One of the challenges is you’ve got all these running microservices making all these network calls and there’s an issue that the end user reports and you’re like okay well I don’t even know where to start because there’s so much going on here. Istio collects a lot of telemetry data from the data plane as well itself to give you insights into what’s going on in the mesh.

Mesh Visualization
One of the things you can do is visualize what your mesh looks like. Istio collects all this telemetry data and it can send it to Prometheus or some sort of data store, but the tools I’m showing here are visualization tools and you can use any tool you want that supports these protocols. They’re starting to move to open telemetry. I’m showing you Kiali, but there are other tools out there commercial and open source that can give you the same graph visualization based on this data.

This Kiali example shows you the call graph for a sample application all the way from the ingress gateway through services that are running multiple versions of the pods behind them. You can get data to see if these connections are healthy (these are all green arrows in this case) and the blue ones because it’s TCP. You can look at number of requests, but it’s good to visualize the whole communication path in a more complex system of microservices.

Mesh Visualization

Distributed Tracing
When you have an issue with a system it’s hard to tell where that that issue is being introduced. If you have 50 microservices that are all calling each other, which one is being slow? So distributed tracing is a useful way to go about seeing who the culprit is within a call graph. Istio does this by injecting headers into all the requests for tracing. Things like a correlation ID and a trace iD, it injects that into all the request headers and then it can send that data so you can visualize and know where there’s bottlenecks or errors being thrown that are causing an issue. In this case this is Jaeger Zipkin, there’s several distributed tracing tools out there.

Distributed Tracing

Metrics Visualization
Metrics are typically stored in Prometheus. You can use in your APM tool of choice whether it’s Datadog or Dynatrace or in this case this is a Grafana dashboard, but there’s a ton of metrics that you can build your dashboards around to look at both the Istio control plane performance and metrics, as well as your applications.

The metrics you can see in this dashboard (below), give you information about mTLS at the bottom so you can see that your pods are protected by mTLS in this case so you can see what’s encrypted and what’s not. Throughput, latency, all sorts of data you can pull into a tool of your choice.

Metrics Visualization

One question I’ve received is if Jaeger takes into account mTLS authentication time to figure out how much overhead there is for TLS? The answer to that is no, Jaeger’s just looking at the communication between the proxies in that case and then once it hits the proxy terminates the mTLS and then there’s no real data there. You can get that data from Envoy and visualize that in something like Grafana, that is a metric that’s collected for the TLS termination overhead. Kiali itself doesn’t go that deep. We have some pretty good guides around benchmark performance of Envoy and how to tune Envoy because there is some overhead with that proxy but what we’ve found is if you tune it properly for your environment, it’s minimal.

Next Steps – How to Kick the Tires on Istio

For those who want to look at Istio further here’s some quick steps to get something up and running that you can play around with. Istio is open source so there’s documentation out there.

  • Get a sandbox cluster. If I were to go about this from the beginning, I’d recommend creating some sort of sandbox cluster whether it’s local or Platform 9. Whatever a cluster is, it needs to be Kubernetes 119 or above.
  • Download and install Istio with the demo profile. They have this concept of profiles when you do the istioctl install. I would recommend the demo profile because that installs everything – the ingress gateway, egress gateway, core Istio control plane, etc.
  • Install the tools of your choice. It doesn’t have to be Kiali, Jaeger and Grafana but these three are common ones. If you have other preferences that’s fine.
  • Install a microservice-based application. There’s a bunch of these applications to demonstrate a lot of the features in Istio, but bookinfo is shipped into Istio distribution. Aspen Mesh has one called catalog-demo which is a set of microservices as well that you can download and install.
  • Explore key features like security, observability, and traffic management. There’s a lot of good examples in the documentation to be able to do that.

Have more questions about how to get started? Get in touch with the Aspen Mesh team or learn about our Professional Services and 24/7 Expert Support.

Recent related content in the Service Mesh Knowledge Hub:


Get a complimentary health check of your os istio graphic

Get a Health Check Report of your Istio to see if everything's configured and optimized.

How do you know your Open Source Istio is operating at its full potential? At Aspen Mesh, we focus on optimizing Istio-based service mesh for our customers (service mesh is all we do).

We talk to companies every day about their OS Istio, and the most common question we get is, “How do we know we’ve got everything in our Istio implementation working correctly?” Whether you’re in a pre-production environment, have Istio deployed in a portion of your network, or network-wide, there's often a fear something’s not configured correctly or there's a potential problem lurking that you don’t have the insight to head-off. Just as importantly, we're asked if there is enhanced Istio functionality to leverage that can drive better performance.

At Aspen Mesh the first thing we do for a new customer is a 360-degree health check of their Istio implementation. It’s a lot like a 100-point diagnostic inspection for your car – a way to identify what’s working fine, where there are potential problems, and get recommendations from an expert about what’s critical to address immediately.

That got us thinking, we should give everyone this level of insight into their Istio implementation.

Aspen Mesh Now Offers a Complimentary OS Istio Health Check Report. This evaluation provides insight across all key areas, identifies critical issues, directs you to best practices, and recommends next steps. You receive an assessment of your Istio by our Istio experts. This is the same evaluation we conduct for every new Aspen Mesh customer running Istio.

A few things that are covered in the Report:

  • Platform: Ensure a stable foundation for smooth version upgrades.
  • Security: ID security risks & apply best practices.
  • Ingress/Egress: Know you’re following best practices.
  • Application Policy inspection
  • Recommendations about where to optimize your performance.
  • Steps to take to go live with confidence.

You Receive your Report After it is Complete

Our Istio expert will review the report with you and recommend remediation steps for critical items discovered – and answer any questions you have. There's no obligation and the Report typically takes about 2 business days. After the review, we give you with a copy of your report. If you want to learn how we work to tackle any Istio problem you have and optimize an Istio environment, we can also share how to take advantage of Aspen Mesh's array of customized Services and Aspen Mesh 24/7 white glove Expert Support for OS Istio.

Where we get the data about your Istio to build your Report
The Aspen Mesh Istio Inspection Report analyzes your Istio system for common misconfigurations and vulnerabilities.

The Report is done in 3 easy steps:

  1. You run the Aspen Mesh Data Collector tool on a workstation with your Kubernetes context configured. This generates a compressed file with the data collected from your Istio installation.
  2. You upload the compressed data file to the Aspen Mesh site.
  3. Aspen Mesh engineers analyze the data collected and build your customer report that details all of our findings.

The Aspen Mesh Data Collector collects the following data:

  • Kubernetes, Istio, and Envoy versions
  • Node topology (number of nodes, node size)
  • Objects installed in your cluster (Kubernetes and Istio objects)
  • Kubernetes events

Note that the Aspen Mesh Data Collector does not collect any potentially sensitive data such as secrets, certificates, or logs. All data that is collected is securely stored and accessed only by Aspen Mesh. Get in touch if you have questions about the process --  I can send you a link to our Data Collector tool and share how we gather and analyze your data to provide a comprehensive assessment. Just send me a note and I'm happy to connect.

-Steven Cheng, Sr. Solutions Engineer at Aspen Mesh


Istio mTLS: 8 Ways to Ensure End-to-End Security

Every time an mTLS problem arises, it has the potential to cause a deployment or service outage. 

Mutual TLS (mTLS) is a method of ensuring traffic is authenticated and encrypted between Kubernetes services. In a highly distributed cloud-native system, using mTLS can drastically increase your security posture by eliminating impersonators, bad actors, and traffic snooping. However, this comes with additional complexity, and troubleshooting issues can be challenging.   

Implementing mTLS is often done with a sidecar service mesh like Istio. Istio supports mTLS out of the box, but in our experience the default configuration is not enough. 

Simple configuration changes are often all that is needed to solve the most taxing of issues.  

The Aspen Mesh team has experience helping our customers deploy mutual Transport Layer Security (TLS) across a wide range of industry verticals and situations. 

8 ways to mitigate risk when deploying mutual TLS:

1. Enforce STRICT mode as a default

Why it’s important: Ensures end-to-end security for all devices and services. 

PERMISSIVE mode allows a service to accept both plain text traffic and mutual TLS traffic at the same time. This feature greatly improves the mutual TLS onboarding experience; however, this mode is not as secure as STRICT mode. When STRICT is set, the service only accepts mutual TLS traffic. For services with transport authentication enabled by an authentication policy, the peers section has an additional key (mode), which you can use to define which traffic a service can accept from its peers. This mode key can take two values: PERMISSIVE or STRICT. If PERMISSIVE is set, the service can accept both plain-text traffic as well as encrypted. If STRICT is set, the service only accepts mutual TLS traffic. 

In Istio, sidecar proxies for each service are needed to establish mutual TLS communication. However, during a service onboarding process there may be cases where the operator cannot install sidecar proxies for all clients and services at the same time. In these situations, it is ideal to enable communication between non-Istio client services and Istio target services. By enabling the PERMISSIVE mode in the authentication policy for a target service, non-Istio client services can continue to send plain-text traffic to the target service until the onboarding process is complete, when STRICT can be enabled. 

 

2. Enable mutual TLS global by default

Why it’s important: Defines scope of mutual TLS settings. 

There are three levels of granularity through which mutual TLS settings can be defined. For each service, Istio applies the narrowest matching policy. The order is: service-specific, namespace-wide, mesh-wide. It is best to enforce mTLS globally, and then change to PERMISSIVE at the service level only when necessary.   


3. Don’t rely on Perimeter defenses to secure core applications and services

Why it’s important: Greater security and compliance with industry regulations. Perimeter defenses are insufficient; secure critical core services and traffic with end-to-end security. 

Don’t rely on Perimeter defenses to secure your core applications and services. While defense-in-depth has been a valid approach to security, it does not stop an attacker from viewing unencrypted data after the security bubble has been pierced. Only services that are encrypted end-to-end can provide the level of security needed to protect confidential information such as financial transactions or medical records. 

 

4. Visibility for service communication

Why it’s important: Eases misconfigurations and aids in troubleshooting. 

A visibility tool is paramount to get a picture of what’s happening at the service level. For example, if one service has no sidecar installed, or a service with a sidecar has a service-specific policy set and there is something wrong with communication between services, the cause may be one service has mutual TLS enabled where another service may not support mutual TLS. A visibility tool that runs in real time can help to find problems quickly as they arise, dynamically visualizing services and their communications to ease misconfigurations and aid in troubleshooting. 


5. Enable an external Certificate Authority

 Why it’s important: More easily integrate with external systems.

Although the Istio Certificate Authority (CA) generates a self-signed root certificate/key, and uses them to sign all workload certificates, external parties normally will not trust a digital certificate signed by an Internal CA. Also, the certificate management overhead of an Internal CA is higher than that of external CA. External parties normally trust a digital certificate signed by a trusted External CA easing the ability to work with external services over using an Internal CA. 

 

6. Optimize System Design – Design for mTLS and service mesh from the start

Why it’s important: Retrofitting system-wide mTLS after deployment of the platform and services is difficult and error prone. Designing for mTLS from the beginning will increase your success rate and ensure that all communication channels are secured properly.  Attempting to implement mTLS after deploying the platform and applications will require reconfiguration of the service mesh, installing and configuring a root CA, redeploying all applications, and regression testing of the applications and external integrations. 

 

7. Mitigate the mutual TLS performance hit

Why it’s important: Mitigate to restore ideal system performance to prevent services operating at less than optimum levels 

After enabling mutual TLS, depending on the service or workload, a performance hit may be experienced which can affect services. A performance hit up to 10% is possible, but largely depends on the services and workload of the deployment. To mitigate any resultant performance hit, Envoy can be tuned, and the underlying deployment hardware re-sized to restore system performance. 

 

8. Decrease certificate rotation time

Why it’s important: Enhances security which helps to defeat hackers. 

Istio has a default Certificate rotation time that rotates new Certificates, and encryption keys are issued. However, the longer this rotation interval, the more opportunity an attacker has to defeat the encryption. Set the Certificate rotation time to the minimum rotation time needed for your deployment / services. This helps defeat hackers by shortening the window of time they have to execute “brute force” attacks. 

 

How can Aspen Mesh help you on your Istio journey?

Aspen Mesh provides the knowledge, leadership & expertise needed to successfully help you deploy mTLS. We have helped customers deploy in the Enterprise, Telco, Financial services, and Healthcare verticals. We have deployed mTLS in complex High Availability and Disaster Recovery environments and provided upfront training before the deployment starts, and expert support after the deployment is complete. We can: 

  • Advocate for your needs within the open-source community 
  • Provide training and expert support 
  • Mitigate your deployment risk with our service offerings, which include: 
  • A 360° quantitative and qualitative evaluation of your Istio environment to identify problems. 
  • Expertise in the design of a scalable service mesh environment. 
  • Comprehensive security review and detailed recommendations. 
  • Upgrade Istio in your environment, then ensure it’s running smoothly. 
  • Existing Istio environment benchmarking and tuning for maximum performance. 

 

Deployment success means picking the right partner

mTLS can be challenging, integration with external workloads is problematic, and debugging information can be a blackhole. Visibility is essential when something goes wrong in any deployment, but more so in a mTLS deployment. We provide 24/7 and international support delivered by our expert Istio engineers from F5, a trusted enterprise vendor you can rely on. You can be assured our engineers are with you every step of the way, no matter the problem – big or small.  

Aspen Mesh can help fully manage and monitor your mTLS deployment, setting you up for success in the long term. Our team of experts have the knowledge and expertise to guide you every step of the way. See our Professional Services including in-depth Istio Health Assessment, Architecture Design and Security Essentials and Custom Projects. Get in touch to start the conversation.



top 5 istio contributors graphic

Aspen Mesh Leads the Way for a Secure Open Source Istio

Here at Aspen Mesh, we entrenched ourselves in the Istio project not long after its start. Recognizing Istio's potential early on, we committed to building our entire company with Istio at its core. From the early days of the project, Aspen Mesh took an active role in Istio -- we've been part of the community since Fall of 2017. Among our many firsts, Aspen Mesh was the first non-founding company to have someone on the Technical Oversight Committee (TOC) and have a release manager role when we helped manage the release of Istio v1.6 in 2020.

Ensuring open source Istio continues to set the standard as the foundation for a secure enterprise-class service mesh is important to us. In fact, we helped create the PSWG in collaboration with other community leaders to ensure Istio remains a secure project with well-defined practices around responsible early disclosures and incident management.

Jacob Delgado of Aspen Mesh has been a tremendous contributor to Istio's security and he currently leads the Product Security Working Group.

Aspen Mesh leads contribution to Open Source Istio

The efforts of Aspen 'Meshers' can be seen across Istio's architecture today, and we add features to open source Istio regularly. Some of the major features we've added include Elliptic Curve Cryptography (ECC) support, Configuration validation (istio-vet -> Istio analyzers), custom tracing tags, and Help v3 support. Aspen Mesh is a Top 5 Istio Contributor of Pull Requests (PRs). One of our primary areas of focus is helping to shape and harden Istio's security. We have responsibly reported several critical CVEs and addressed them as part of PSWG like the Authentication Policy Bypass CVE. You can read more about how security releases and 0-day critical CVE patches are handled in Istio in this blog authored by Jacob.

Istio Security Assessment Report findings announced in 2021

The success of the Istio project and its critical use enforcing key security policies in infrastructure across a wide swath of industries was the impetus for a comprehensive security assessment that began in 2020. In order to determine whether there were any security issues in the Istio code base, a third-party security assessment of the Istio project was conducted last year that enlisted the NCC Group and sought collaboration with subject matter experts across the community.

This in-depth assessment focused on Istio’s architecture as a whole, looking at security related issues with a focus on key components like istiod (Pilot), Ingress/Egress gateways, and Istio’s overall Envoy usage as its data plane proxy for Istio version 1.6.5. Since the report, the Product Security Working Group has issued several security releases as new vulnerabilities were disclosed, along with fixes to address concerns raised in the report. A good outcome of the report is the detailed Security Best Practices Guide developed for Istio users.

At Aspen Mesh, we build upon the security features Istio provides and address enterprise security requirements with a zero-trust based service mesh that provides security within the Kubernetes cluster, provides monitoring and alerts, and ensures highly-regulated industries maintain compliance. You can read about how we think about security in our white paper, Adopting a Zero-Trust Approach to Security for Containerized Applications.

If you'd like to talk to us about what enterprise security in a service mesh looks like, please get in touch!

-Aspen Mesh

 

istio test stats from cncf.io

photo of rocket

How Istio is Built to Boost Engineering Efficiency

How Istio is Built to Boost Engineering Efficiency

The New Stack Makers Podcast
How Istio is Built to Boost Engineering Efficiency

One of the bright points to emerge in Kubernetes management is how the core capabilities of the Istio service mesh can help make engineering teams more efficient in running multicluster applications. In this edition of The New Stack Makers podcast, The New Stack spoke with Dan Berg, distinguished engineer, IBM Cloud Kubernetes Services and Istio, and Neeraj Poddar, co-founder and chief architect, Aspen Mesh, F5 Networks. They discussed Istio’s wide reach for Kubernetes management and what we can look out for in the future. Alex Williams, founder and publisher of The New Stack, hosted this episode.

Voiceover: Hello, welcome to The New Stack Makers, a podcast where we talk about at-scale application development, deployment and management.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.

Alex Williams: Hey, it’s another episode of The New Stack Makers, and today the topic is Istio and engineering management. Today, I am joined for a conversation about Istio with Neeraj Poddar, co-founder and chief architect at Aspen Mesh. Hello, Neeraj, how are you?

Neeraj Poddar: I’m doing good. It’s great to be here Alex.

Alex Williams: Thank you for joining us – you’re live from Boulder. And live from Raleigh, North Carolina, is Dan Berg, Distinguished Engineer at IBM Cloud Kubernetes Service and Istio. That’s a mouthful.

Dan Berg: Yes, sir. I was I was worried there for a moment you weren’t going to be able to get Kubernetes out.

Alex Williams: You know, it’s been that way lately. Actually we’re just finishing our second edition of the eBook that we wrote first in 2017 about Kubernetes, service mesh was just beginning to be discussed there, and I was reading some articles and some of the articles were saying things like, well Istio is still in its early days and now today you’re telling me that you have more meetings than you can go to related to Istio. I don’t know what that means. What does that mean? What does that mean to you both? What does that say about Istio and what is Istio? So for those who may not be familiar with it.

Neeraj Poddar: You’re right. I mean, we have so many meetings and discussions, both asynchronous and and synchronously, that it’s great to see the community grow. And like you’re saying, from three years before to where we are now, it’s amazing, not just the interest from developers, it’s also the interest from end users, the feedback and then making the product and the whole community better. So coming to what Istio is, Istio is a open source service mesh platform for simplifying microservices communication. And in simple terms, it handles a lot of complicated pieces around microservices communicating with each other, things like enforcing policies, managing certificates, surfacing relevant telemetry so that you can understand what’s happening in your cluster. And those problems become more and more complicated as you add more microservices. So service mesh and Istio in a way is just taking that burden away from the developers and moving it into the infrastructure there. It’s basically decoupling the two things and enabling them to be successful at the same time.

Alex Williams: Now, Dan, you’ve been around a bit and you have your own experiences with APIs and how they evolved, and is this why we’re seeing this amazing interest in Istio? Because it takes the API to that next evolution? Is it the network effect on APIs that we’re seeing or is it something different that’s relevant to so many people?

Dan Berg: Well, I think it’s I think it’s a combination of a few things. And first off, thanks for calling me old for saying I’ve been around for a while.

Dan Berg: So I think it’s a combination of several different things. First and foremost, near and dear to my heart, obviously, is containers and the evolution of containers, especially as containers have been brought to the cloud and really driving more cloud native solutions, which drives distributed solutions in these clouds, which is driving more use of microservices. Microservices aren’t new. It’s just they’re being applied in a new way in the cloud environments. Because of that, there’s a lot of complexity around that and the distribution and delivery of those containers is a bit different than what we’ve seen in traditional VMs in the past, which means how you manage microdervices is the difference. I mean, you need the mechanism. You need a way to drive your DevOps processes that are GitOps-based, API, CLI driven. So what that naturally means is we need a better way of managing microservices and the microservices in your cloud. The evolution of Istio as a service mesh, which I often think of as the ability to program through an API, your network and your network policies. It’s a natural evolution to fit where we are today with cloud native applications based on containers. This is the modern way to manage your microservices.

Neeraj Poddar: The way Dan explained it – it’s a natural progression. I especially want to mention that in context of network policies, even when companies migrate from monoliths to microservices, when you are doing that migration, the same organisational policies lie and no one wants to give that up and you don’t want embed that into your applications. So this is the key missing piece which makes you migrate or even scale. So it gives you both the things wherever you are in your journey.

Alex Williams: So the migration and the scale. And a lot of it is almost comes down to the user experience, doesn’t it? I mean, Istio is very well suited to writing generic reusable software, isn’t it? And to manage these interservice communications, which relates directly to the network, doesn’t it?

Dan Berg: Yeah, in many ways it does. A big, big part of this, though, is that it removes a lot of the burden and the lockin from your application code. So you’re not changing your application to adopt and code to a certain microservices architecture or microservices programming model – that is abstracted away with the use of these sidecars, which is a pivotal control point within the application. But from a developer standpoint, what’s really nice about this is now you can declare your intent. A security officer can declare their intent – you know Neeraj was talking about with policies you can drive these declarations through Istio without having to go through and completely modify your code in order to get this level of control.

Alex Williams: Neeraj, so what’s the Aspen Mesh view on that? And I know you talk a lot about engineering management. This relates directly to engineering management in many ways, doesn’t it? And in terms of being able to take care of those so you can have the reusable software.

Neeraj Poddar: Absolutely. I mean, when I think of engineering management, I also think of engineering efficiency. And they both relate in a very interesting way where we want to make sure they always are always achieving business outcomes. So there are two or three business outcomes here that we want our engineering teams to achieve. We want to acquire more customers by creating more, solving more customer use cases, which means adding more features quickly. And that’s what Dan was saying. You can move some of those infrastructure pieces out of your application into the mesh so you can focus and create reusable software. But that’s basically software that’s unique IP to your company. You don’t have to write the code, which already has been written for everyone. The second outcome is once you have got a customer, once you have been successful acquiring them, you want to retain them. And that customer satisfaction comes from being able to run your software reliably and fix issues when you find them. And that’s where, again, service mesh and Aspen Mesh would excel, because we surface metrics, consistent telemetry and tracing. At the same time, you’re able to tie it back to an application view where you can easily pinpoint where the problem is. So you are getting benefits at a networking level, but you’re able to get an understanding of an application that is very crucial to your architecture.

Alex Williams: Dan what is the importance of the efficiencies at the networking level, the networking management level. What has been the historical challenge that Istio helps resolve? And how does the sidecar play into that? Because I’m always trying to figure out the sidecar. And I think for a lot of people, it’s a little bit confusing to try to understand. And Lynn, your colleague at IBM describes it pretty well as almost like, taking all the furniture out of the room and then placing it back in room piece by piece, I don’t know if that’s the correct way to describe it.

Dan Berg: Possibly. That’s one analogy. So a couple of different things. First off, networking is hard. Fundamentally, it is hard. It almost feels like if you’re developing for cloud, you need to have a PhD to do it properly. And in some levels, that’s true. Where things are difficult, I mean, simple networking fine, getting from point A to point B, not a problem. Even some things in Kubernetes with routing from one service A service to service B.. That’s pretty easy, right? There’s kube-dns. You can do the lookup and kube-proxy will do your routing for you. However, it’s not very intelligent, not intelligent at all. There’s little to no security built into that. Then of course the routing and load balancing is very generic. It’s just round robin and that is it. There is nothing fancy about it. So what happens when you need specific routing based on, let’s say, zone awareness or you need it based on the client and source that’s coming in? What if, what happens if you need a proper circuit breaker because the connection of your destination wasn’t available? So now where are you going to code that? How are you going to build in your retries logic and your time out logic? Do you put that in your application? Possibly. But wouldn’t it be nice if you didn’t have to? So there’s a lot of complications with the network. And I don’t even get into. What about security? Right. Your authentication and your authorization? Typically, that’s done in the application. All you need is one bad actor in that entire chain and the whole thing falls apart. So Istio and basically service meshes, modern service meshes really push that programming down into the network. And this notion of the sidecar, which is kind of popular inside, like Kubernetes-based environments, it’s basically you put another container inside the pod. Well, what’s so special about that one container in the pod? Well, with Istio sidecar, that sidecar is an Envoy proxy. And what it is doing is it’s capturing all inbound and outbound traffic into and out of the pod. So everything traverses through that proxy, which means policies can be enforced, security can be enforced, routing decisions can be programmed and enforced. That happens at the proxy. So when a container in the pod communicates out, it’s captured by the proxy first and then it does some things around it, make some decisions, and then forwards it on. The same thing on the inbound requests, it’s checking should I accept this? Am I allowed to accept this? It’s doing it’s that control point. And all of that is programmed by the Istio control plane. So that’s that’s where the developer experience comes in. You program it through YAML, you’re programming the controlling the control plane, the control plane propagates all that programming logic down into those sidecars. And that’s where the control point actually takes place. That’s the magic right there. Does that make sense? It’s kind of like a motorcycle that has a little sidecar – literally the sidecar. Put your dog in the sidecar. If you want to take your dog with you everywhere you want to go. And every time you make a decision, you ask your dog? That’s the Envoy sidecar.

Neeraj Poddar: That’s the image that comes to my mind. And maybe that’s because when I grew up in India, that was more prevalent than it is in the U.S. right now and now somebody from America is also bringing it up. But that’s exactly right in my mind. And just to add one thing to what Dan said, day one networking problems are easy, relatively easy. Networking is never easy, but relatively easy in Kubernetes. Day two, day three – it gets complicated real fast, like early on in the service mesh and Istio days there were people saying it’s just doing DNS. Why do I need it? Now no one is saying that because those companies have matured from doing day ibe problems and they are realizing, oh my God, do I need to do all of this in my application? And when are those application developers going to write real value add code then.

Alex Williams: All right. So let’s move into day two and day three, Neeraj. So who are the teams and who are managing a day two and day three? Who are these people? What are their personas and what roles do they play?

Neeraj Poddar: That’s a really interesting question. I mean, the same personas which kind of started your project or your product and were there in day one, they kind of move along in day two but some of the responsibilities maybe change and some of the new personas come on board. So an operator role or security or security persona is really important for day two. You want to harden your cluster environment. You don’t want unencrypted data flowing through. For maintainability, as an operator whether it’s a platform operator or a DevOps SRE persona, they need to have consistent metrics across the system, otherwise they don’t know what’s going on. Similarly for day two, I would say the developer, which is who is creating the software and who is creating the new application – they need to be brought into when failures happen, but they need to be consulted at the right time with the right context. So I always think of, in microservices that if you don’t have the right context, you’re basically going to just spend time in meetings trying to figure out where the failure is. And that’s where a consistent set of telemetry and a consistent set of tracing for day two and day three is super crucial. Moving to security. I mean, think about certificate management again. I’m going to show my age here, but if you have managed certificates in your application in multiple distributed manner, you know, the pain there. You have been yelled at by a security officer some time saying this is not working and go upgrade it and then you’re stuck trying to do this in a short time span. Moving forward to Istio, now that’s a configuration change or that’s an upgrade of the Istio proxy container. Because you know what? We fix OpenSSL bugs much quicker because we are looped into the OpenSSL ecosystem. So you know, day three problems and then even further. If you look at day three, you have upgrade issues. How do you reliably upgrade without breaking traffic or without dropping your customer experience? Can you do a feature activation or using progressive delivery? And these are the things we’re just talking about. But I think maybe these are day three point five or day four problems, but in the future you should be able to activate features in one region, even in a county, who cares and test it out to your customers without relying on applications. So that’s how I see. I mean, the personas are the same, but the benefits change and the responsibilities change as your organizations mature.

Dan Berg: I was just going to say, I mean, one of the one of the things that we see quite often, especially with the adoption of Istio like the developer first and foremost, would be, as Neeraj says, the day one, setting up your basic networking and routing is pretty easy. But then as your system and application grows, just understanding where those network flows go, it’s amazing how quickly it gets out of control that you really don’t know where. Once the once traffic gets into your, let’s say, your Kubernetes cluster, once it comes into the cluster, where does it go? Where does it traverse? Did you even have your timeouts set up properly? How do you even test that? Right. So going through the not even just the operational aspects, but just the testing aspects and how to do proper testing of your distributed system is very complicated from a networking standpoint and that’s where things like Istio timeouts, retries and circuit breakers really become helpful, and their fault injection. So you can actually test some of these failures. And then with Jaeger and doing the tracing, you could actually see where the traffic goes. But one of my favorites are Kiali – bringing that up and just seeing the real time network flows and seeing the latency, seeing the error codes. That is hugely beneficial because I actually get to see where the traffic went when it came in into my cluster. So lots of benefit for the developer beyond just the security role. I mean, the developer role is very critical here.

Neeraj Poddar: Absolutely, yeah. I mean, you can, I’ll put a plug in for even operators here, which is once you get used to programming via YAML or being able to change the data path the extension that we are making in the community through WASM, you get to control a critical piece of infrastructure where you have zero day things happening. You can actually change that by adding your own filter. Like we have seen that being so powerful in existing paradigms with BIG-IP or NGINX where you have a whole ecosystem right now for people writing crazy scripts for doing things, which is saving them lots of money, because you know what? You don’t get just time to change your application, but it can change the proxy which is next to it. So you’re going to see a lot of interesting things happening therefore, you know, day three day four use cases.

Alex Williams: But who’s writing the scripts? Who’s writing the YAML? Who’s doing that configuring? Because a lot of these people you know, developers are not used to doing configurations, so. So who does that work?

Neeraj Poddar: That’s a really good question, and the reason I’m hesitant is the answer is depends. Yeah. If you have a very mature developer workflow, I would expect developers to give you information about the applications and then the platform team takes over and converting it into the Istio specific, Kubernetes specific language. But most of the organizations might not be there yet, and that gives you will need some collaborative effort between application developers and operators. So, for example, I’ll give you what Aspen Mesh is trying – we are trying to make sure even if you have the right YAMLs and both the personas are writing it, thosde APIs are specific to those personas. So we have created application YAMLs, which an application developer can write. It has no information or no prior knowledge about Istio. The operators can write something specific about their requirements about networking and security again in a platform agnostic way, and then Aspen Mesh can lower it down to Istio specific configuration. So it depends on what kind of toolchain you are using. I would hope that in future, application developers are writing less and less configuration, just platform specific.

Dan Berg: And I think that basically echoes the fact that we do see multiple roles using the Istio YAML files and the configurations, but you don’t have to be an expert in all of it. Generally speaking, there are traffic management capabilities and things like that that a developer would use, because those are you’re defining your routes. You’re defining your characteristics specific to your application as well as the roll out of your deployment if you’re trying to do a canary release, for example. That’s that’s something that the developer would do or an application author would be responsible for. But when you’re talking about setting up policies for inbound or outbound access controls into the cluster, that may be a security advisor that’s responsible for defining those levels of policies and not necessarily the developer, you wouldn’t want the developer defining that level of security policies. It would be a security officer that would be doing that. So there’s room for multiple different roles. And therefore, you don’t have to be an expert in every aspect of Istio because it’s based on your role, which aspect you’re going to care about.

Alex Williams: When we get into the complexities, I think of telemetry and telemetry has traditionally been a concept I’ve heard Intel talk about. Right with infrastructure and systems. And now telemetry is being discussed as a way to be used in the software. How is telemetry managed in Istio? What is behind it? What is the architecture behind that telemetry that makes it manageable, that allows you to really be able to leverage it

Dan Berg: For the most part. So it all really starts with the Istio control plane, which is is gathering the actual metrics and provides the Prometheus endpoint that is made available that you can connect up to and scrape that information and use it. The how it gets the telemetry information, that’s really the key part of it is where does that information come from? Yeah. And if we take a step back and remember, I was talking about the sidecar, the sidecar being the point that makes those decisions, the routing decisions, the security decisions.

Alex Williams: Well, the dog, the dog, the dog,

Dan Berg: Yes the dog that is way smarter than you and making all the proper decisions telling you exactly where to go. So that’s exactly what’s happening here. Except since all traffic is coming in and out of, in and out of the pod is going through that proxy, it is asynchronously sending telemetry information, metrics about that communication flow, both inbound and outbound so we can track failure rates, it can track latency, it can track security settings. So it can send a large amount of information about that, that flow, that communication flow. And once you start collecting it up into the Istio control plane, into the telemetry endpoint and you start scraping that off and showing it in a Grafana dashboard as an example, there’s a vast amount of information. Now, once you start piecing it together, you can see going from service A to service B, which is nothing more than going from sidecar A to sidecar B, right we have secure identities. We know exactly where the traffic is is going because we have identities for everything in the system and everything that is joining the mesh is joined because it’s associated with a sidecar proxy. So it’s these little agents, these proxies that are collecting up all that information and sending it off into the Istio control plane so you can view it and see exactly what’s going on. And by the way, that is one of the most important pieces of Istio. As soon as you turn it on, join some services, you’ve got telemetry. It’s not like you have to do anything special. Telemetry starts flowing and there’s a huge amount of value. Once you see the actual information in front of you, traffic flowing, error rates, it’s hugely powerful.

Neeraj Poddar: Just to add to what Dan said here, the amount of contextual information that the sidecars add for every metric we export, it’s super important. Like I was in front of a customer recently and like Dan said there’s a wow factor that you can just add things to the mesh. And now suddenly you have so much information related to Kubernetes which tells you about the port, the services, the role of the application labels. So that’s super beneficial and all of that without changing the application. Another point here is if you’re doing this in applications there’s always inconsistencies between applications developed from one team versus applications developed with another. Second problem that I’ve always seen is it’s very hard to move to a different telemetry backend system. So for some reason, you might not want to use Prometheus and you want to use something else to model. If you tie all of that in your application you have to change all of this. So this proxy can also give you a way of switching backends, for example, in the future if you need without going through your application lifecycle. So it’s super powerful.

Alex Williams: So let’s talk about a little bit more about the teams and more about the capabilities and you know, I know that Aspen Mesh has come out with its latest release, 1.5, and you have a security APIs built into it, you’re enabling Envoy support, which is written in WebAssembly, which is interesting. We’re hearing a little bit more about WebAssembly but not much, traffic management, you know, and how how you think about traffic management. Give us a picture of 1.5 and higher kind of tracing Istio’s evolution with it.

Neeraj Poddar: Yeah. So, I mean, all Aspen Mesh releases are tied to the upstream Istio releases, so we don’t take away any of the capabilities that Istio provides. We only add capabilities we would think the organization will benefit from like a wrapper around it so that you have a better user experience. So Istio 1.5 by itself, moved from a monolithic architecture to – sorry, move from a microservices control plane to a monolithic one for operational simplification. Right. So we have that. Similarly telemetry V2, which is an evolution from the out of process mixer V1. We also provide that benefit where users don’t have to run mixer. There was a lot of resource contention where it was consuming a lot of CPU and memory and contributing to some latency and latency numbers, which didn’t make sense. So all of those benefits of these two communities working on you are getting with the Aspen Mesh release. But the key thing here is for us to provide rapid APIs like security APIs. I’ll give you a quick example. So Istio moved from 1.4 to 1.5, I think, between job-based policies to request authentication or authentication policies. We had to change the APIs because the older APIs, were not making sense after user feedback. There were some drawbacks. This is great for improvement, but for a customer now I have to rethink what I did.

Neeraj Poddar: When I have to upgrade, I have to make sure we move along with Istio users. So us providing a wrapper around it means we do the conversion for them. So that’s one way we provide some benefit to our customers. Like you said, WASM is an interesting development that’s happening in the community. I feel like as the ABI itself matures and more and the rich ecosystem develops, this is going to be a real powerful enhancement. Vendors can actually add extensions without rebuilding and having to rely on a C++ filters. Companies who have some necessity for which they don’t want to, you know, offload that building cost to vendors or open source. They can extend Envoy on on the fly themselves. This is a really huge thing. One thing I should talk about is that the Istio community is regularly changing or evolving the way they are installing Istio. You know Dan is here, he can tell you from the very beginning we have been doing helm, we have not been doing helm or we have gone to istioctl. It’s all in the zone the right way. Right. It’s because of user feedback and trying to make it even more smooth going forward. So we try to smooth out that code where, you know, Aspen Mesh customers can continue to use the tooling that they’re comfortable with. So those are the kind of things we have given in 1.5 Where our customers can still use help.

Alex Williams: When you’re thinking about – when you’re thinking about the security Dan and you’re thinking about what distinguishes Istio, what comes to mind, what, and especially when you’re thinking about multi cluster operations?

Dan Berg: One of the key aspects of Istio and one of the huge value benefits of Istio is that if you enable Istio and the services within the mesh, if you enable strict security policy, what that’s going to do is that’s going to enable automatic management of mutual TLS authentication between the services, which is in layman’s terms, allowing you to do encryption on the wire between your pods. And to do that, if you’re looking at a Kubernetes environment, if you’ve got a financial organization as a customer that you’re looking to support or any other customer that has strict encryption requirements and they’re asking, well, how are you going to encrypt on the wire? Well, in a Kubernetes environment, that’s kind of difficult unless you want to run IPsec tunnels everywhere, which has pretty nasty performance drain. Plus, that only works between the nodes and not necessarily between the, between the pods or you start moving to IPv6, which isn’t necessarily supported everywhere or even proven in all cases, but Istio literally through a configuration can enable a mutual TLS with certificate management and secure service identity. So hugely powerful. And you can visualize all of that with the tools and utilities from Istio as well. So you know exactly which traffic flows, like in Kiali you can see exactly what traffic flows are secured and which ones are not. So that’s hugely powerful. And then the whole multi cluster support, which you brought up as well, is an interesting direction. I would say it’s still in its infancy stages of managing more complex service mesh deployments. Istio has a lot of options for multi cluster. And while I think that’s powerful, I also think it’s complex. And I do believe that this is going to, where we’re going in this journey is to simplify those options, to make it easier for customers to deal with multi cluster. But one of the values of security and multi cluster ultimately is around this level of secure identities and the certificate management that you extend the boundaries of trust into multiple different clusters. So now you can start defining policies and traffic routing across clusters, which you can’t do today. Right. That’s very complex. But you start broadening and stretching that service mesh with the capabilities made afforded to you by Istio. And that’s just going to improve over time. I mean, it’s not, we’re on a journey right now of getting there and a lot of customers are starting to dip their toes in that multi cluster environment and Istio is right there with them and will be evolving. And it’s going to be a fantastic, fantastic story. I would just say it’s very early days.

Neeraj Poddar: Yeah, I was just going to echo like it’s in infancy, but it’s so exciting to see what you can do there. Like really when I think about it, multi cluster, you can think about new cases emerging from the telecom industry where the multi clusters are not just clusters in data centers, they’re at edge and far edge and you might have to do some crazy things.

Dan Berg: Yeah, well, that’s the that’s the interesting thing. I know earlier this year at IBM, we launched a new product called IBM Cloud Satellite. And that’s where if you own a service mesh, you’re going to be extremely excited with those those kind of edge scenarios. You’re broadening your mesh into areas that you’re putting clusters at. Two years ago, you would have never thought about putting a cluster in those locations. I think service mesh is going to become more and more important as we progress here with the distributed nature of the problems we’re trying to solve.

Alex Williams: Yeah, I was going to ask about the telco and 5G, and I think what you say sums it up and to be able to manage clusters at the edge, for instance, in the same way that you can and, you know, essentially, you know, in a data center environment.

Dan Berg: Well you’re also dealing with a lot more clusters, too, in these in these environments, instead of tens or even hundreds, you might be dealing with thousands and trying to program like in the old days at the application level, that’s going to be almost impossible. You need a way to distribute consistent policies, programmable policies distributed across all these clusters, and Istio provides some of the raw mechanics to make that happen. These are going to be incredible, incredibly important tools as we move into this new space.

Neeraj Poddar: I was just going to say, I mean, I always think of evolution of service mesh as is going to follow the same trajectory as evolution of ADC market that happened as and when the telcos and the big enterprises came in because of a lot of requirements of the telecom industry. Currently, the load balancers are so evolved. Similarly, service mesh will have a lot more capabilities. Think about the clusters running in far edge. They will have different resource constraints. You need a proxy which will be faster and slimmer. Some people will say that’s not possible, but we’ll have to make to do that, have to do that. So I’m just always excited when I think about these expansions. And like Dan said, these are not talking about tens or hundreds of clusters now we are talking about thousands.

Alex Williams: We’ve been doing research and and we find actually in our research that the clusters that are most predominant that we’re finding among the people we’re surveying are those of more than five thousand clusters. And that, I guess my last question for you. Is about day five, day six, day seven, and what role does observability play in this? And because it seems like what we’re talking about essentially is observability and I’m curious on how that concept is evolving for you. Now, you think about it in terms of as we move out to those, to the the days beyond, for people who are using Istio and service mesh capabilities,

Dan Berg: Obviously you need that sidecar. You need that dog next to you collecting all that information, sending it off. That that is hugely important. But once you start dealing with scale, you can’t go keep looking at that data time in and time out. Right. You’ve got to be able to centralize that information. Can you can you send all of that and centralize it into your centralized monitoring system at your enterprise level and the answer there is yes, you absolutely can. SysDig, a great partner that we work with, provides a mechanism for scraping all of the information from the Istio Prometheus endpoint, bringing that all in, and then they have native Istio support directly into that environment, which means they know about the Istio metrics and then can present that in a unified manner. So now you can start looking at common metrics across all of these clusters, all the service meshes in a central place, and start sending alerts, start building alerts, because you can’t look at five thousand clusters and X number of service meshes. It’s just too large. It’s too many. So you have to have the observability. You need to be collecting the metrics and you’ve got to be able to have the the alerts being generated from those metrics.

Neeraj Poddar: Yeah, and I think we need to go even a step beyond that, which is you’ll have information from your mesh, you’ll have information on your nodes, you’ll have information on your cloud, your GitHub, whatever. You get it all to a level where there is some advanced analytics making sense of it. There’s only so much that a user can do once they get the dreaded alert.

Neeraj Poddar: They need to do the next step, which is in this haystack of metrics and tracing and log. Can someone narrow it down to the place that I need to look, because you might get alerted on a microservice A, but it has dependencies which are other microservices, so the root cause might be 10 different levels down. So I think that’s the next day seven day eight problem we need to solve, how do we surface the information in a way where it’s presentable? For me, it’s even tying it back to the context of applications. Dan and I are both from networking. We love networking. I can talk networking all day, but I think we need to talk to the language of applications. That’s where the real value will kick in and service mesh will still be a key player there, but it will be a part of an ecosystem where other pieces are also important and all of them are giving that information we are correlating it. So I think that’s that’s going to be the real thing – it’s still very early. People are just getting used to understanding service meshes. So telling them that we need to coordinate all of this information in an automated way. It’s scary but it will get there.

Alex Williams: Well Neeraj and Dan, thank you so much for joining us in this conversation about service mesh technologies and Istio and these days beyond where we are now. And I look forward to keeping in touch. Thank you very much.

Dan Berg: Thanks for having us.

Neeraj Poddar: Thank you.

Voiceover: Listen to more episodes of The New Stack Makers at thenewstack.io/podcasts, please rate and review us on iTunes, like us on YouTube and follow us on SoundCloud. Thanks for listening and see you next time.

Voiceover: Aspen Mesh provides a simpler and more powerful distribution of Istio through a service mesh policy framework, a simpler user experience delivered through the Aspen Mesh UI and a fully supported, tested and hardened distribution of Istio that makes it viable to operate service mesh in the enterprise.