Aspen Mesh proud to sponsor IstioCon 2022

Presentations on Istio Security, Dual Stack Support & TLS Orig. at IstioCon 2022

Virtual IstioCon starts Monday, April 25th. It's the biggest gathering of the Istio community and a great place to learn and share ideas. Aspen Mesh is a longtime Istio open source contributor--we are a top five Istio Contributor of Pull Requests. We are proud to sponsor IstioCon 2022 for the third year.

We're excited three members of the Aspen Mesh team are presenters this year.

Aspen Mesh presentations at IstioCon 2022

Tuesday, April 26, 10:30-10:40 a.m. EST - A Beginner's Guide to Following Istio's Security Best Practices, Jacob Delgado, Senior Software Engineer
Following the Istio Security Best Practices page is a daunting task for newcomers to Istio. Even experienced operators have difficulty discerning where to begin. In this talk I will present an easy way for beginners to adopt Istio and settings/configuration I recommend based on my experience.  

Tuesday, April 26, 11 a.m. EST - Istio Upgrades, Jacob Delgado, Senior Software Engineer & Sam Naser, Software Engineer at Google
Are upgrades getting easier? How easy is easy enough? Are helm and revision based upgrades catching on? What is still painful? How often do you upgrade? How often would you like to? Are patches easier than minor upgrades?

Wednesday, April 27, 10:40-10:50 a.m. EST - TLS Origination Best Practices, Kenan O'Neal, Software Engineer
Quick dive for beginners on TLS origination to improve security. This talk will focus on settings that may not be expected for new users with a focus on validating settings. I will touch on what settings Istio uses by default and how to configure Destination Rules to correctly check certificates.  

Thursday, April 28, 10:50-11:00 a.m. EST - Dual Stack Cluster Setup, Josh Tischer, Lead DevOps Engineer
Dual Stack support is very limited in today’s cloud ecosystem. Learn how to run/test Istio on a Dual Stack cluster in AWS on both OpenShift 4.8+ and KubeAdmin. OpenShift 4.7+ is one of the few options that officially support Dual Stack mode for bare metal clusters and Azure. I am excited to share Aspen Mesh’s experience and empower your team with another option for Dual Stack support. 

You can explore the full list of IstioCon sessions here and be sure to register for IstioCon today

About Aspen Mesh
Enterprise-class Istio service mesh is all we do at Aspen Mesh by F5. We offer Professional Services and 24/7 Support plans to ensure you have OS Istio experts available when you need them. 

Explore our Service Mesh Knowledge Hub for the latest resources on how OS Istio drives performance from your microservices architecture. Deep dive white paper topics include: Multi-cluster deployment to enable hybrid and multi-cloud architectures, security, mTLS, compliance and more.  


mTLS Authentication for Microservices Security is Critical to Digital Transformation

For an enterprise, mentions of “Cloud Native” and “Digital Transformation” are two ways of saying that a service mesh deployment is on the cards. Once deployed, the service mesh forms the backbone of business operations and is usually paramount to business continuity. As an enterprise starts to implement its Digital Transformation plan and migrates from a monolithic application environment to a cloud native application environment, security becomes an immediate concern. From a business operations and revenue generation perspective, it is important to understand the benefits and deployment pitfalls of mTLS to ensure business-as-normal operation. 

Here are some common issues we run into all the time

  • Loss of Regulatory Compliance: Several industries such as healthcare and financial services require compliance with mandated specifications, with many other industries complying with agreed best practices. If compliance is lost, then business operations may be affected if regulatory compliance is lost. A solution to this headache is to enforce mutual TLS STRICT mode as default. This will help to regain regulatory compliance by ensuring end-to-end security for all devices and services
  • Loss of Brand Reputation: Customer or internal proprietary information may be made public via exposure from a security breach. Customers may choose to do business with a more secure supplier consequently. A customer or supplier is unlikely to look favorably upon their chosen supplier or vendor if highlighted on the evening news for example. By enabling mutual TLS GLOBAL setting as default will help to protect your reputation by securing customer and proprietary information. 
  • Loss of Business Agility: With dynamic cloud-based applications, upgrades to business logic, services and offerings can seem endless. Frequent security enhancements are also essential. Upgrades come with their own headaches and can slow a business from rolling out new services due to various glitches. It’s best not to rely solely on perimeter defenses, regain agility by securing critical core applications and services with end-to-end security. 
  • Loss of Business Confidence: The service mesh enables many end-users. If one supplier or service has a loss of confidence in the integrity of the service mesh, this can affect everyone -- and other services. The ability to visualize service communication can help to restore confidence by reducing misconfigurations and simplifying troubleshooting. 

 

In our experience at Aspen Mesh, deploying mutual TLS has many benefits, but is easy to misconfigure – which can lead to severe disruptions to business continuity when deploying new microservices or upgrading a service mesh. Download our new white paper, Istio mTLS: 8 Ways to Ensure End-to-End Security, to learn about several more mutual TLS concerns that could spell disaster for your business! 

 

- Andy 


Istio for multi-cluster deployment: Q & A with an expert

Q&A with Brian Jimerson, Solutions Architect and Istio and multi-cloud deployment expert

I recently had a sit down with one of our Aspen Mesh expert Istio engineers to answer some questions that I hear from customers as they start their multi-cluster Istio journey. If your organization already has a series of disparate single-cluster Istio deployments, real benefits can be achieved by connecting them together to create a multi-cluster service mesh. 

Here are some of the highlights of my conversation with Brian Jimerson, one of our seasoned solution engineers whose experience runs deep optimizing Fortune 2000 enterprise architectures. I wore the hat of a customer exploring the ways moving to a multi-cluster environment might impact performance.

Q: I have stringent SLOs and SLAs for my cloud applications. How can I work to meet these?

Brian: Having multiple Kubernetes clusters in different fault zones can help with disaster recovery scenarios. Using a multi-cluster service mesh configuration can simplify cutover for an event.

Q: As an international company, I  have data privacy requirements in multiple countries. How do I ensure that data is stored in a private manner?

Brian: Using Aspen Mesh's locality-based routing can ensure that customers are using a cluster in their own country, while reducing latency. This can help ensure that private data does not leave the user's country.

Q: I need the ability to quickly scale workloads and clusters based on events. How can I do this in a way that's transparent to customers?

Brian: Using a multi-cluster service mesh can help to scale clusters out and in without interruption to users. A multi-cluster service mesh acts like a single cluster service mesh to services, so communication patterns remain unchanged.

Q: I have compliance requirements for workloads, but they also need to be accessed by other applications. How do I do this?

Brian: Using a multi-cluster service mesh, you can operate compliance workloads in a hardened cluster, and securely control traffic to those workloads from another cluster.

 

I encourage you to read the full paper, Advantages of Going Multi-cluster Using Istio -- and Best Practices to Get There. The move to a multi-cluster environment is complex and there are many things to consider. Working with Brian and our Aspen Mesh Solutions team, I've written a a deep-dive paper that spells out both the advantages and things to look out for when evaluating a move to multi-cluster. There are some disadvantages you have to weigh, and I detail those along with the expected performance improvements that come with deploying a multi-cluster service mesh. In this paper I lay out the benefits and important considerations when considering migrating to a multi-cluster that leverages Istio.

Enterprise-class Istio service mesh is all we do at Aspen Mesh by F5. We offer Professional Services and 24/7 Support plans to give you the ability to tap Istio expertise to optimize your microservices environment with industry best practices, then have peace of mind that we're backing you up. Get in touch if you seek a trusted advisor to help you navigate your OS Istio -- we've designed Istio solutions for some of the largest organizations in the world. I encourage you to get in touch if you would like to talk with Brian, a Solutions Architect who can answer any questions you have about how to chart a toward a cloud native environment. If you want to learn more about our full suite of Services delivered by our team of Istio experts (whether you have OS Istio in pre-prod or production), reach out.

 

-Andy


Recent security vulnerabilities require Zero-Trust Security tactics for your microservices environment

Despite significant technological advancements, security is still hard. A single phishing email, missed patch, or misconfiguration can let the bad guys in to wreak havoc or steal data. For companies moving to the cloud and the cloud-native architecture of microservices and containerized applications, it’s even harder. Now, in addition to the perimeter and the network itself, the myriad connections between microservice containers must also be protected. 

With microservices, the surface area of your network vulnerable to attack increases exponentially, putting data at greater risk. Moreover, network-related problems like access control, load balancing, and monitoring that had to be solved only once for a monolithic application now must be solved separately for each service within a cluster, as well as between clusters. 

Zero Trust Security Methodology and Networking Principles

Zero-Trust dates to the 1990’s as a method for “Perimeter-less” security. The main concept behind the methodology is “never trust, always verify” even if the network was previously verified.

  • Networks should always be considered hostile 
  • Network locality is not sufficient for deciding trust in a network 
  • Every device, user, and request should be authenticated and authorized 
  • Network policies must be dynamic and calculated from as many sources of data as possible
     

Today, it’s essential to apply a Zero-Trust approach to network security and to service mesh technology. In our white paper just completed, Zero-Trust Security for your Microservices Architecture, we outline what it takes to implement the key tenets of Zero-Trust security using a service mesh to secure a microservices environment. In the paper we provide the steps to mitigate cyberattacks to protect containerized applications. 

What's covered in our white paper, Zero-Trust Security for your Microservices Architecture:
  1. Zero-Trust authentication methodology for a service mesh
  2. mTLS encryption: Achieve non-repudiation for requests without requiring any changes or support from the applications. Identity, certificates, and authorization to ensure “every device, user, and request is authenticated and authorized” -- a Zero Trust principle
  3. Learn the built-in methods Istio uses to combat security vulnerabilities
  4. Ingress and Egress security control within a service mesh


Lastly, in the paper we touch on Aspen Mesh’s approach to Zero-Trust security, including how we configure mTLS, secure ingress, monitor egress, prevent RBAC (Role Based Access Control) misconfigurations and apply policy and configuration best practices.

Aspen Mesh has deep expertise in Istio and understands how to get the most out of it - our Services and 24/7 Service Mesh Support are unmatched in the industry.

- Andy


Get a complimentary health check of your os istio graphic

Get a Health Check Report of your Istio to see if everything's configured and optimized.

How do you know your Open Source Istio is operating at its full potential? At Aspen Mesh, we focus on optimizing Istio-based service mesh for our customers (service mesh is all we do).

We talk to companies every day about their OS Istio, and the most common question we get is, “How do we know we’ve got everything in our Istio implementation working correctly?” Whether you’re in a pre-production environment, have Istio deployed in a portion of your network, or network-wide, there's often a fear something’s not configured correctly or there's a potential problem lurking that you don’t have the insight to head-off. Just as importantly, we're asked if there is enhanced Istio functionality to leverage that can drive better performance.

At Aspen Mesh the first thing we do for a new customer is a 360-degree health check of their Istio implementation. It’s a lot like a 100-point diagnostic inspection for your car – a way to identify what’s working fine, where there are potential problems, and get recommendations from an expert about what’s critical to address immediately.

That got us thinking, we should give everyone this level of insight into their Istio implementation.

Aspen Mesh Now Offers a Complimentary OS Istio Health Check Report. This evaluation provides insight across all key areas, identifies critical issues, directs you to best practices, and recommends next steps. You receive an assessment of your Istio by our Istio experts. This is the same evaluation we conduct for every new Aspen Mesh customer running Istio.

A few things that are covered in the Report:

  • Platform: Ensure a stable foundation for smooth version upgrades.
  • Security: ID security risks & apply best practices.
  • Ingress/Egress: Know you’re following best practices.
  • Application Policy inspection
  • Recommendations about where to optimize your performance.
  • Steps to take to go live with confidence.

You Receive your Report After it is Complete

Our Istio expert will review the report with you and recommend remediation steps for critical items discovered – and answer any questions you have. There's no obligation and the Report typically takes about 2 business days. After the review, we give you with a copy of your report. If you want to learn how we work to tackle any Istio problem you have and optimize an Istio environment, we can also share how to take advantage of Aspen Mesh's array of customized Services and Aspen Mesh 24/7 white glove Expert Support for OS Istio.

Where we get the data about your Istio to build your Report
The Aspen Mesh Istio Inspection Report analyzes your Istio system for common misconfigurations and vulnerabilities.

The Report is done in 3 easy steps:

  1. You run the Aspen Mesh Data Collector tool on a workstation with your Kubernetes context configured. This generates a compressed file with the data collected from your Istio installation.
  2. You upload the compressed data file to the Aspen Mesh site.
  3. Aspen Mesh engineers analyze the data collected and build your customer report that details all of our findings.

The Aspen Mesh Data Collector collects the following data:

  • Kubernetes, Istio, and Envoy versions
  • Node topology (number of nodes, node size)
  • Objects installed in your cluster (Kubernetes and Istio objects)
  • Kubernetes events

Note that the Aspen Mesh Data Collector does not collect any potentially sensitive data such as secrets, certificates, or logs. All data that is collected is securely stored and accessed only by Aspen Mesh. Get in touch if you have questions about the process --  I can send you a link to our Data Collector tool and share how we gather and analyze your data to provide a comprehensive assessment. Just send me a note and I'm happy to connect.

-Steven Cheng, Sr. Solutions Engineer at Aspen Mesh


top 5 istio contributors graphic

Aspen Mesh Leads the Way for a Secure Open Source Istio

Here at Aspen Mesh, we entrenched ourselves in the Istio project not long after its start. Recognizing Istio's potential early on, we committed to building our entire company with Istio at its core. From the early days of the project, Aspen Mesh took an active role in Istio -- we've been part of the community since Fall of 2017. Among our many firsts, Aspen Mesh was the first non-founding company to have someone on the Technical Oversight Committee (TOC) and have a release manager role when we helped manage the release of Istio v1.6 in 2020.

Ensuring open source Istio continues to set the standard as the foundation for a secure enterprise-class service mesh is important to us. In fact, we helped create the PSWG in collaboration with other community leaders to ensure Istio remains a secure project with well-defined practices around responsible early disclosures and incident management.

Jacob Delgado of Aspen Mesh has been a tremendous contributor to Istio's security and he currently leads the Product Security Working Group.

Aspen Mesh leads contribution to Open Source Istio

The efforts of Aspen 'Meshers' can be seen across Istio's architecture today, and we add features to open source Istio regularly. Some of the major features we've added include Elliptic Curve Cryptography (ECC) support, Configuration validation (istio-vet -> Istio analyzers), custom tracing tags, and Help v3 support. Aspen Mesh is a Top 5 Istio Contributor of Pull Requests (PRs). One of our primary areas of focus is helping to shape and harden Istio's security. We have responsibly reported several critical CVEs and addressed them as part of PSWG like the Authentication Policy Bypass CVE. You can read more about how security releases and 0-day critical CVE patches are handled in Istio in this blog authored by Jacob.

Istio Security Assessment Report findings announced in 2021

The success of the Istio project and its critical use enforcing key security policies in infrastructure across a wide swath of industries was the impetus for a comprehensive security assessment that began in 2020. In order to determine whether there were any security issues in the Istio code base, a third-party security assessment of the Istio project was conducted last year that enlisted the NCC Group and sought collaboration with subject matter experts across the community.

This in-depth assessment focused on Istio’s architecture as a whole, looking at security related issues with a focus on key components like istiod (Pilot), Ingress/Egress gateways, and Istio’s overall Envoy usage as its data plane proxy for Istio version 1.6.5. Since the report, the Product Security Working Group has issued several security releases as new vulnerabilities were disclosed, along with fixes to address concerns raised in the report. A good outcome of the report is the detailed Security Best Practices Guide developed for Istio users.

At Aspen Mesh, we build upon the security features Istio provides and address enterprise security requirements with a zero-trust based service mesh that provides security within the Kubernetes cluster, provides monitoring and alerts, and ensures highly-regulated industries maintain compliance. You can read about how we think about security in our white paper, Adopting a Zero-Trust Approach to Security for Containerized Applications.

If you'd like to talk to us about what enterprise security in a service mesh looks like, please get in touch!

-Aspen Mesh

 

istio test stats from cncf.io

you've got kubernetes promotional graphic

Get App-focused Security from an Enterprise-class Service Mesh | On-demand Webinar

In our webinar you can now view on demand, You’ve Got Kubernetes. Now you Need App-focused Security using Istio, we teamed with Mirantis, an industry leader in enterprise-ready Kubernetes deployment and management, to talk about security, Kubernetes, service mesh, istio and more. If you have Kubernetes, you’re off to a great start with a great platform for security based on Microsegmentation and Network Policy. But firewalls and perimeters aren’t enough -- even in their modern, in-cluster form.  

As enterprises embark on the cloud journey, modernizing applications with microservices and containers running on Kubernetes is key to application portability, code reuse and automation. But along with these advantages come significant security and operational challenges due to security threats at various layers of the stack. While Kubernetes platform providers like Mirantis manage security at the infrastructure, orchestration and container level, the challenge at application services level remains a concern. This is where a service mesh comes in. 

Companies with a hyper focus on security – like those in healthcare, finance, government, and highly regulated industries – demand the highest level of security possible to thwart cyberthreats, data breaches and non-compliance issues. You can up level your security by adding a service mesh that’s able to secure thousands of connections between microservices containers inside of a single cluster or across the globe. Today Istio is the gold standard for enterprise-class service mesh for building Zero Trust Security. But I’m not the first to say that implementing open source Istio has its challenges -- and can cause a lot of headaches when Istio deployment and management is added to a DevOps team’s workload without some forethought.  

Aspen Mesh delivers an Istio-based, security hardened enterprise-class service mesh that’s easy to manage. Our Istio solution reduces friction between the experts in your organization because it understands your apps -- and it seamlessly integrates into your SecOps approach & certificate authority architecture. 

It’s not just about what knobs and config you adjust to get mTLS in one cluster – in our webinar we covered the architectural implications and lessons learned that’ll help you fit service mesh into your up-leveled Kubernetes security journey. It was a lively discussion with a lot of questions from attendees. Click the link below to watch the live webinar recording.

-Andrew

 

Click to watch webinar now:

On Demand Webinar | You’ve Got Kubernetes. Now you Need App-focused Security using Istio.

 The webinar gets technical as we delve into: 

  • How Istio controls North-South and East-West traffic, and how it relates to application-level traffic. 
  • How Istio secures communication between microservices. 
  • How to simplify operations and prevent security holes as the number of microservices in production grows. 
  • What is involved in hardening Istio into an enterprise-class service mesh. 
  • How mTLS provides zero-trust based approach to security. 
  • How Aspen Mesh uses crypto to give each container its own identity (using a framework called SPIFFE). Then when containers talk to each other through the service mesh, they prove who they are cryptographically. 
  • Secure ingress and egress, and Cloud Native packet capture. 

Sailing Faster with Istio

While the extraordinarily large shipping container, Ever Given, ran aground in the Suez Canal, halting a major trade route that has caused losses in the billions, our solution engineers at Aspen Mesh have been stuck diagnosing a tricky Istio and Envoy performance bottleneck on their own island for the past few weeks. Though the scale and global impacts of these two problems is quite different, it has presented an interesting way to correlate a global shipping event with the metaphorical nautical themes used by Istio. To elaborate on this theme, let’s switch from containers carrying dairy, and apparently everything else under the sun, to containers shuttling network packets.

To unlock the most from containers and microservices architecture, Istio (and Aspen Mesh) uses a sidecar proxy model. Adding sidecar proxies into your mesh provides a host of benefits, from uniform identity to security to metrics and advanced traffic routing. As Aspen Mesh customers range from large enterprises all the way to service providers, the performance impacts of adding these sidecars is as important to us as the benefits outlined above. The performance experiment that I’m going to cover in this blog is geared toward evaluating the impact of adding sidecar proxies in high throughput scenarios on the server or client, or both sides.

We have encountered workloads, especially in the service provider space, where there are high requests or transactions-per-second requirements for a particular service. Also, scaling up — i.e., adding more CPU/memory — is preferable to scaling out. We wanted to test the limits of sidecar proxies with regards to the maximum achievable throughput so that we can tune and optimize our model to meet the performance requirements of the wide variety of workloads used by our customers.

Throughput Test Setup

The test setup we used for this experiment was rather simple: a Fortio client and server running on Kubernetes on large AWS node instance types like burstable t3.2xlarge with 8 vCPUs and 32 GB of memory or dedicated m5.8xlarge instance types which have 32 vCPUs and 128 GB of memory. The test was running a single instance of the Fortio client and server pod with no resource constraints on their own dedicated nodes. The Fortio client was run in a mode to maximize throughput like this:

The above command runs the test for 60 seconds with queries per second (QPS) 0 (i.e. maximum throughput with a varying number of simultaneous parallel connections). With this setup on a t3.2xlarge machine, we were able to achieve around 100,000 QPS. Further increasing the number of parallel connections didn’t result in throughput beyond ~100K QPS, signaling a possible CPU bottleneck. Running the same experiment on an m5.8xlarge instance, we could achieve much higher throughput around 300,000 QPS or higher depending upon the parallel connection settings.

This was sufficient proof of CPU throttling. As adding more CPUs increased the QPS, we felt that we had a reasonable baseline to start evaluating the effects of adding sidecar proxies in this setup.

Adding Sidecar Proxies on Both Ends

Next, with the same setup on t3.2xlarge instances, we added Istio sidecar proxies on both Fortio client and server pods with Aspen Mesh default settings; mTLS STRICT setting, access logging enabled and the default concurrency (worker threads) of 2. With these parameters, and running the same command as before, we could only get a maximum throughput of around ~10,000 QPS.

This is a factor of 10 reduction in throughput. This was expected as we had only configured two worker threads, which were hopefully running at their maximum capacity but could not keep up with client load.

So, the logical next step for us was to increase the concurrency setting to run more worker threads to accept more connections and achieve higher throughput. In Istio and Aspen Mesh, you can set the proxy concurrency globally via the concurrency setting in proxy config under mesh config or override them via pod annotations like this:

Note that using the value “0” for concurrency configures it to use all the available cores on the machine. We increased the concurrency setting from two to four to six and saw a steady increase in maximum throughput from 10K QPS to ~15K QPS to ~20K QPS as expected. However, these numbers were still quite low (by a factor of five) as compared to the results with no sidecar proxies.

To eliminate the CPU throttling factor, we ran the same experiment on m5.8xlarge instances with even higher concurrency settings but the maximum throughput we could achieve was still around ~20,000 QPS.

This degradation was far from acceptable, so we dug into why the throughput was low even with sufficient worker threads configured on the sidecar proxies.

Peeling the Onion

To investigate this issue, we looked at the CPU utilization metrics in the server pod and noticed that the CPU utilization as a percentage of total requested CPUs was not very high. This seemed odd as we expected the proxy worker threads to be spinning as fast as possible to achieve the maximum throughput, so we needed to investigate further to understand the root cause.

To get a better understanding of low CPU utilization, we inspected the connections received by the server sidecar proxy. Envoy’s concurrency model relies on the kernel to distribute connections between the different worker threads listening on the same socket. This means that if the number of connections received at the server sidecar proxy is less than the number of worker threads, you can never fully use all CPUs.

As this investigation was purely on the server-side, we ran the above experiment again with the Fortio client pod, but this time without the sidecar proxy injected and only the Fortio server pod with the proxy injected. We found that the maximum throughput was still limited to around ~20K QPS as before, thereby hinting at issues on the server sidecar proxy.

To investigate further, we had to look at connection level metrics reported by Envoy proxy. Later in this article, we’ll see what happens to this experiment with Envoy metrics exposed. (By default, Istio and Aspen Mesh don’t expose the connection-level metrics from Envoy.)

These metrics can be enabled in Istio version 1.8 and above by following this guide and adding the appropriate pod annotations corresponding to the metrics you want to be exposed. Envoy has many low-level metrics emitted at high resolution that can easily overwhelm your metrics backend for a moderately sized cluster, so you should enable this cautiously in production environments.

Additionally, it can be quite a journey to find the right Envoy metrics to enable, so here’s what you will need to get connection-level metrics. On the server-side pod, add the following annotation:

This will enable reporting for all listeners configured by Istio, which can be a lot depending upon the number of services in your cluster, but only enable the downstream connections total counter and downstream connections active gauge metrics.

To look at these metrics, you can use your Prometheus dashboard, if it’s enabled, or port-forward to the server pod under test to port 15000 and navigate to http://localhost:15000/stats/prometheus. As there are many listeners configured by Istio, it can be tricky to find the correct one. Here’s a quick primer on how Istio sets up Envoy configuration. (You can find the complete list of Envoy listener metrics here.)

For any inbound connections to a pod from clients outside of the pod, Istio configures a virtual inbound listener at 0.0.0.0:15006, which receives all the traffic from iptables’ redirect rules. This is the only listener that’s actually configured to receive connections from the kernel, and after the connection is received, it is matched against filter chain attributes to proxy the traffic to the correct application port on localhost. This means that even though the Fortio client above is targeting port 8080, we need to look at the total and active connections for the virtual inbound listener at 0.0.0.0:15006 instead of 0.0.0.0:8080. Looking at this metric, we found that the number of active connections were close to the configured number of simultaneous connections on the Fortio client side. This invalidated our theory about the number of connections being less than worker threads.

The next step in our debugging journey was to look at the number of connections received on each worker thread. As I had alluded to earlier, Envoy relies on the kernel to distribute the accepted connections to different worker threads, and for all the worker threads to be fully utilizing the allotted CPUs, the connections also need to be fairly balanced. Luckily, Envoy has per-worker metrics for listeners that can be enabled to understand the distribution. Since these metrics are rooted at listener.<address>.<handler>.<metric name>, the regex provided in the annotation above should also expose these metrics. The per-worker metrics looked like this:

As you can see from the above image, the connections were far from being evenly distributed among the worker threads. One thread, worker 10, had 11.5K active connections as compared to some threads which had around ~1-1.5K active connections, and others were even lower. This explains the low CPU utilization numbers as most of the worker threads just didn’t have enough connections to do useful work.

In our Envoy research, we quickly stumbled upon this issue, which very nicely sums up the problem and the various efforts that have been made to fix it.

Image via Pixabay.

So, next, we went looking for a solution to fix this problem. It seemed like, for the moment, our own Ever Given was stuck as some diligent worker threads struggled to find balance. We needed an excavator to start digging.

While our intrepid team tackled the problem of scaling for high-throughput workloads by adding sidecar proxies, we encountered a bottleneck not entirely unlike what the Ever Given experienced not long ago in the Suez Canal.

Luckily, we had a few more things to try, and we were ready to take a closer look at the listener metrics.

Let There Be Equality Among Threads!

After parsing through the conversations in the issue, we found the pull request that enabled a configuration option to turn on a feature to achieve better balancing across worker threads. At this point, trying this out seemed worthwhile, so we looked at how to enable this in Istio. (Note that as part of this PR, the per-worker thread metrics were added, which was useful in diagnosing this problem.)

For all the ignoble things EnvoyFilter can do in Istio, it’s useful in situations like these to quickly try out new Envoy configuration knobs without making code changes in “istiod” or the control plane. To turn the “exact balance” feature on, we created an EnvoyFilter resource like this:

With this configuration applied and with bated breath, we ran the experiment again and looked at the per-worker thread metrics. Voila! Look at the perfectly balanced connections in the image below:

Measuring the throughput with this configuration set, we could achieve around ~80,000 QPS, which is a significant improvement over the earlier results. Looking at CPU utilization, we saw that all the CPUs were fully pegged at or near 100%. This meant that we were finally seeing the CPU throttling. At this point, by adding more CPUs and a bigger machine, we could achieve much higher numbers as expected. So far so good.

As you may recall, this experiment was purely to test the effects of server sidecar proxy, so we removed the client sidecar proxy for these tests. It was now time to measure performance with both sidecars added.

Measuring the Impacts of a Client Sidecar Proxy

With this exact balancing configuration enabled on the inbound port (server side only), we ran the experiment with sidecars on both ends. We were hoping to achieve high throughputs that could only be limited by the number of CPUs dedicated to Envoy worked threads. If only things were that simple.

We found that the maximum throughput was once again capped at around ~20K QPS.

A bit disappointing, but since we then knew about the issue of connection imbalance on the server side, we reasoned that the same could happen on the client side between the application and the sidecar proxy container on localhost. First, we enabled the following metrics on the client-side proxy:

In addition to the listener metrics, we also enabled cluster-level metrics, which emit total and active connections for any upstream cluster. We wanted to verify that the client sidecar proxy was sending a sufficient number of connections to the upstream Fortio server cluster to keep the server worker threads occupied. We found that the number of active connections mirrored the number of connections used by the Fortio client in our command. This was a good sign. Note that Envoy doesn’t report cluster-level metrics at the per-worker level, but these are all aggregated, so there’s no way for us to know how the connections were distributed on the outbound side.

Next, we inspected the listener connection statistics on the client side similar to the server side to ensure that we were not having connection imbalance issues. The outbound listeners, or the listeners set up to handle traffic originating from the application in the same pod as the sidecar proxy, are set up a bit differently in Istio as compared to the inbound side. For outbound traffic, a virtual listener “0.0.0.0:15001” is created similar to the listener on “0.0.0.0:15006,” which is the target for iptables redirect rules. Unlike the inbound side, the virtual listener hands off the connection to the more specific listener like “0.0.0.0:8080” based on the original destination address. If there are no specific matches, then the listener configuration in the virtual outbound takes effect. This can block or allow all traffic depending on your configured outbound traffic policy. In the traffic flow from the Fortio client to server, we expected the listener at “0.0.0.0:8080” to be handling connections on the client-side proxy, so we inspected connections metrics at this listener. The listener metrics looked like this:

The above image shows the connection imbalance issue between worker threads as we saw it on the server side. However, the connections on the outbound client-side proxy were only getting handled by one worker thread which explains the poor throughput QPS numbers. Having fixed this on the server-side, we applied a similar EnvoyFilter configuration with minor tweaks for context and port to address this imbalance:

Surely, applying this resource would fix our issue and we would be able to achieve high QPS with both client and server sidecar proxies with sufficient CPUs allocated to them. Well, we ran the experiment again and saw no difference in the throughput numbers. Checking the listener metrics again, we saw that even with this EnvoyFilter resource applied, only one worker thread was handling all the connections. We also tried applying the exact balance config on both virtual outbound port 15001 and outbound port 8080, but the throughput was still limited to 20K QPS.

This warranted the next round of investigations.

Original Destination Listeners, Exact Balance Issues

We went around looking in Envoy code and opened Github issues to understand why the client-side exact balance configuration was not taking effect, while the server side was working wonders. The key difference between the two listeners, other than the directionality, was that the virtual outbound listener “0.0.0.0:15001” was an original destination listener, which hands over connections to other listeners matched on the original destination address. With help from the Istio community (thanks, Yuchen Dai from Google), we found this open issue, which explains this behavior in a rather cryptic way.

Basically, the current exact balance implementation relies on connection counters per worker thread to fix the imbalance. When the original destination is enabled on the virtual outbound listener, the connection counter on the worker thread is incremented when a connection is received, but as the connection is immediately handed to the more specific listener like “0.0.0.0:8080,” it is decremented again. This quick increase and decrease in the internal count spoofs the exact balancer into thinking the balance is perfect as all these counters are always at zero. It also appears that applying the exact balance on the listener that handles the connection, “0.0.0.0:8080” in this case, but doesn’t accept the connection from the kernel has no effect due to current implementation limitations.

Fortunately, the fix for this issue is in progress, and we’ll be working with the community to get this addressed as quickly as possible. In the meantime, if you’re getting hit by these performance issues on the client side, scaling out with a lower concurrency setting is a better approach to reach higher throughput QPS numbers than scaling up with higher concurrency and worker threads. We are also working with the Istio community to provide configuration knobs for enabling exact balance in Envoy to optionally switch default settings so that everyone can benefit from our findings.

Working on this performance analysis was interesting and a challenge in its own way, like the small tractor next to the giant ship trying to make it move.

Well, maybe not exactly, but it was a learning experience for me and my team, and I’m glad we are able to share our learnings with the rest of the community as this aspect of Istio is often overlooked by the broader vendor ecosystem. We will run and publish performance numbers related to the impact of turning on various features such as mTLS, access logging and tracing in high-throughout scenarios in future blogs, so if you’re interested in this topic, subscribe to our blog to get updates or reach out to us with any questions.

Thank you Aspen Mesh team members Pawel and Bart who patiently and diligently ran various test scenarios, collected data and were uncompromising in their pursuit to get the last bit out of Istio and Aspen Mesh. It’s not surprising. After all, being part of F5, taking performance seriously is just part of our DNA. 


Installing Multicluster Aspen Mesh on KOPS Cluster

Installing Multicluster Aspen Mesh on KOPS Cluster

I recently tried installing Aspen Mesh on multicluster, and it was easier that I anticipated. In this post, I will walk you through my process. You can read the original version of this process here.

Firstly, ensure that you have two Kubernetes clusters with same version of Aspen Mesh installed on each of them (if you need an Aspen Mesh account, you can get a free 30-day trial here). Once you have an account, refer to the documentation for installing Aspen Mesh on your cluster.

kops get cluster

ssah-test1.dev.k8s.local        aws    us-west-2a
ssah-test2.dev.k8s.local        aws    us-west-2a

There are multiple ways to configure Aspen Mesh on a multicluster environment. In the following example, I have installed Aspen Mesh 1.9.1-am1 on both of my clusters, and the installation type is Multi-Primary on different network.

Pre-requisites for the Setup:

  • API: the server of each cluster must be able to access the API server of other cluster.
  • Trust: Trust must be established between all clusters in the mesh. This is achieved by having a common Root CA to generate intermediate certs for each clusters.

Configuring Trust:

I am creating an RSA type certificate for my root cert.  After I have downloaded and extracted the Aspen Mesh binary, I create a cert folder and add the folder to the directory stack.

mkdir -p certs
pushd certs

The binary downloaded should have a tools directory to create your certificate. You run the make command to create a root-ca folder, which will consist of four files: root-ca.conf, root-cert.csr, root-cert.pem and root-key.pem. For each of your clusters, you will need to generate an intermediate cert and key for Istio CA.

make -f ../tools/certs/Makefile.selfsigned.mk root-ca
make -f ../tools/certs/Makefile.selfsigned.mk cluster1-cacerts
make -f ../tools/certs/Makefile.selfsigned.mk cluster2-cacerts

You will then have to create secrets for each of your clusters in the istio-system namespace with all the input files that we generated from the last step. These secrets at each of the clusters is what configures trust between them as the same root-cert.pem is used to create the intermediate cert.

kubectl  create secret generic cacerts -n istio-system \\n  --from-file=ca-cert.pem \\n  --from-file=ca-key.pem \\n  --from-file=root-cert.pem \\n  --from-file=cert-chain.pem --context="${CTX_CLUSTER1}"

kubectl create secret generic cacerts -n istio-system \\n  --from-file=ca-cert.pem \\n  --from-file=ca-key.pem \\n  --from-file=root-cert.pem \\n  --from-file=cert-chain.pem --context="${CTX_CLUSTER2}"

Next, we will move on to the Aspen Mesh configuration, where we are enabling multicluster for istiod and giving names to the network and cluster. Add following fields in your override file which will be used during Helm installation/upgrade. Create a separate file for each cluster. You will also need to label the istio-system namespace in both of your clusters with appropriate label.

kubectl --context="${CTX_CLUSTER1}" label namespace istio-system topology.istio.io/network=network1

kubectl --context="${CTX_CLUSTER2}" label namespace istio-system topology.istio.io/network=network2

For Cluster 1

#Cluster 1

#In order to make the application service callable from any cluster, the DNS lookup must succeed in each cluster
#This provides DNS interception for all workloads with a sidecar, allowing Istio to perform DNS lookup on behalf of the application.
meshConfig:
  defaultConfig:
    proxyMetadata:
    # Enable Istio agent to handle DNS requests for known hosts
    # Unknown hosts will automatically be resolved using upstream dns servers in resolv.conf
      ISTIO_META_DNS_CAPTURE: "true"

global:
  meshID: mesh1
  multiCluster:
    # Set to true to connect two kubernetes clusters via their respective
    # ingressgateway services when pods in each cluster cannot directly
    # talk to one another. All clusters should be using Istio mTLS and must
    # have a shared root CA for this model to work.
    enabled: true
    # Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
    # to properly label proxies
    clusterName: "cluster1"
    globalDomainSuffix: "local"
    # Enable envoy filter to translate `globalDomainSuffix` to cluster local suffix for cross cluster communication
    includeEnvoyFilter: false
  network: network1

For Cluster 2

#Cluster 2

#In order to make the application service callable from any cluster, the DNS lookup must succeed in each cluster
#This provides DNS interception for all workloads with a sidecar, allowing Istio to perform DNS lookup on behalf of the application.
meshConfig:
  defaultConfig:
    proxyMetadata:
    # Enable Istio agent to handle DNS requests for known hosts
    # Unknown hosts will automatically be resolved using upstream dns servers in resolv.conf
      ISTIO_META_DNS_CAPTURE: "true"

global:
  meshID: mesh1
  multiCluster:
    # Set to true to connect two kubernetes clusters via their respective
    # ingressgateway services when pods in each cluster cannot directly
    # talk to one another. All clusters should be using Istio mTLS and must
    # have a shared root CA for this model to work.
    enabled: true
    # Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
    # to properly label proxies
    clusterName: "cluster2"
    globalDomainSuffix: "local"
    # Enable envoy filter to translate `globalDomainSuffix` to cluster local suffix for cross cluster communication
    includeEnvoyFilter: false
  network: network2

Now we will upgrade/install the istiod manifest with the newly added configuration from the override file. As you can see, I have separate override files for each cluster.

helm upgrade istiod manifests/charts/istio-control/istio-discovery -n istio-system --values sample_overrides-aspenmesh_2.yaml

helm upgrade istiod manifests/charts/istio-control/istio-discovery -n istio-system --values sample_overrides-aspenmesh.yaml

Check the pods in the istio-system namespace to see if all are in a running state. Be sure to delete all your application pods in your default namespace for the new configuration to kick in when the new pods will be spun. You can also check to see if the root cert used for pods in each cluster is the same. I am using pods from the bookinfo sample application.

istioctl pc secrets details-v1-79f774bdb9-pqpjw -o json | jq '[.dynamicActiveSecrets[] | select(.name == "ROOTCA")][0].secret.validationContext.trustedCa.inlineBytes' -r | base64 -d | openssl x509 -noout -text | md5

istioctl pc secrets details-v1-79c697d759-tw2l7 -o json | jq '[.dynamicActiveSecrets[] | select(.name == "ROOTCA")][0].secret.validationContext.trustedCa.inlineBytes' -r | base64 -d | openssl x509 -noout -text |md5

Once the istiod is upgraded, we will move on to creating an ingress gateway used for communication between two clusters via installing an east-west gateway. Use the configuration below to create a yaml file which will be used with Helm to install in each of the clusters. I have created two yaml files: cluster1_gateway_config.yaml and cluster2_gateway_config.yaml which will be used with respective clusters.

For Cluster 1

#This can be on separate override file as we will install a custom IGW
gateways:
  istio-ingressgateway:
    name: istio-eastwestgateway
    labels:
      app: istio-eastwestgateway
      istio: eastwestgateway
      topology.istio.io/network: network1
    ports:
    ## You can add custom gateway ports in user values overrides, but it must include those ports since helm replaces.
    # Note that AWS ELB will by default perform health checks on the first port
    # on this list. Setting this to the health check port will ensure that health
    # checks always work. https://github.com/istio/istio/issues/12503
    - port: 15021
      targetPort: 15021
      name: status-port
      protocol: TCP
    - port: 80
      targetPort: 8080
      name: http2
      protocol: TCP
    - port: 443
      targetPort: 8443
      name: https
      protocol: TCP
    - port: 15012
      targetPort: 15012
      name: tcp-istiod
      protocol: TCP
    # This is the port where sni routing happens
    - port: 15443
      targetPort: 15443
      name: tls
      protocol: TCP
    - name: tls-webhook
      port: 15017
      targetPort: 15017
    env:
      # A gateway with this mode ensures that pilot generates an additional
      # set of clusters for internal services but without Istio mTLS, to
      # enable cross cluster routing.
      ISTIO_META_ROUTER_MODE: "sni-dnat"
      ISTIO_META_REQUESTED_NETWORK_VIEW: "network1"
    serviceAnnotations:
      service.beta.kubernetes.io/aws-load-balancer-type: nlb

global:
  meshID: mesh1
  multiCluster:
    # Set to true to connect two kubernetes clusters via their respective
    # ingressgateway services when pods in each cluster cannot directly
    # talk to one another. All clusters should be using Istio mTLS and must
    # have a shared root CA for this model to work.
    enabled: true
    # Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
    # to properly label proxies
    clusterName: "cluster1"
    globalDomainSuffix: "local"
    # Enable envoy filter to translate `globalDomainSuffix` to cluster local suffix for cross cluster communication
    includeEnvoyFilter: false
  network: network1

For Cluster 2

gateways:
  istio-ingressgateway:
    name: istio-eastwestgateway
    labels:
      app: istio-eastwestgateway
      istio: eastwestgateway
      topology.istio.io/network: network2
    ports:
    ## You can add custom gateway ports in user values overrides, but it must include those ports since helm replaces.
    # Note that AWS ELB will by default perform health checks on the first port
    # on this list. Setting this to the health check port will ensure that health
    # checks always work. https://github.com/istio/istio/issues/12503
    - port: 15021
      targetPort: 15021
      name: status-port
      protocol: TCP
    - port: 80
      targetPort: 8080
      name: http2
      protocol: TCP
    - port: 443
      targetPort: 8443
      name: https
      protocol: TCP
    - port: 15012
      targetPort: 15012
      name: tcp-istiod
      protocol: TCP
    # This is the port where sni routing happens
    - port: 15443
      targetPort: 15443
      name: tls
      protocol: TCP
    - name: tls-webhook
      port: 15017
      targetPort: 15017
    env:
      # A gateway with this mode ensures that pilot generates an additional
      # set of clusters for internal services but without Istio mTLS, to
      # enable cross cluster routing.
      ISTIO_META_ROUTER_MODE: "sni-dnat"
      ISTIO_META_REQUESTED_NETWORK_VIEW: "network2"
    serviceAnnotations:
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
global:
  meshID: mesh1
  multiCluster:
    # Set to true to connect two kubernetes clusters via their respective
    # ingressgateway services when pods in each cluster cannot directly
    # talk to one another. All clusters should be using Istio mTLS and must
    # have a shared root CA for this model to work.
    enabled: true
    # Should be set to the name of the cluster this installation will run in. This is required for sidecar injection
    # to properly label proxies
    clusterName: "cluster2"
    globalDomainSuffix: "local"
    # Enable envoy filter to translate `globalDomainSuffix` to cluster local suffix for cross cluster communication
    includeEnvoyFilter: false
  network: network2
helm install istio-eastwestgateway manifests/charts/gateways/istio-ingress --namespace istio-system --values cluster1_gateway_config.yaml

helm install istio-eastwestgateway manifests/charts/gateways/istio-ingress --namespace istio-system --values cluster2_gateway_config.yaml

After adding the new east-west gateway, you will get an east-west gateway pod deployed in the istio-system namespace and the service which creates a Network Load Balancer specified in the annotations. You will need to resolve the IP address of the NLBs for the east-west gateways and then patch them into the service as spec.externalIPs in both of your clusters, until Multi-Cluster/Multi-Network – Cannot use a hostname-based gateway for east-west traffic · Issue #29359 · istio/istio  is fixed. This is not an ideal situation because of the following reasons.

k get svc -n istio-system istio-eastwestgateway
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                                                                  PORT(S)                                                                                      AGE
istio-eastwestgateway   LoadBalancer   100.71.211.32   a927e6<TRUNCATED>.elb.us-west-2.amazonaws.com 15021:32138/TCP,80:30420/TCP,443:31450/TCP,15012:30150/TCP,15443:30476/TCP,15017:32335/TCP   8d

nslookup a927e6<TRUNCATED>.elb.us-west-2.amazonaws.com
Server:        172.23.241.180
Address:    172.23.241.180#53
Non-authoritative answer:
Name:    a927e6<TRUNCATED>.elb.us-west-2.amazonaws.com
Address: 35.X.X.X

kubectl patch svc -n istio-system istio-eastwestgateway -p '{"spec":{"externalIPs": ["35.X.X.X"]}}'

k get svc -n istio-system istio-eastwestgateway
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP                                                                                  PORT(S)                                                                                      AGE
istio-eastwestgateway   LoadBalancer   100.71.211.32   a927e6<TRUNCATED>.elb.us-west-2.amazonaws.com,35.X.X.X   15021:32138/TCP,80:30420/TCP,443:31450/TCP,15012:30150/TCP,15443:30476/TCP,15017:32335/TCP   8d

Now that the gateway is configured to communicate, you will have to make sure the API of each cluster is able to talk to the other cluster. You can do this in AWS by making sure API instances are accessible to each other by creating specific rules for their security group. We will then need to create a secret in cluster 1 that provides access to cluster 2’s API server and vice versa for endpoint discovery.

#Enable endpoint discovery Cluster 2
istioctl x create-remote-secret --context="${CTX_CLUSTER1}" --name=cluster1 |kubectl apply -f - --context="${CTX_CLUSTER2}"

#Enable endpoint discovery Cluster 1
istioctl x create-remote-secret --context="${CTX_CLUSTER2}" --name=cluster2 |kubectl apply -f - --context="${CTX_CLUSTER1}"

At this stage, the pilot (which is bundled in istiod binary) should have the new configuration, and when you tail the logs for the pod, you should be able see the log's message “Number of remote cluster: 1”. With this version, you also would need to edit the ingress east-west gateway in the istio-system namespace that we created above as the selector label and the annotation added via Helm chart is different than expected. It shows “istio: ingressgateway” but should be “istio: eastwestgateway”. You can now create pods in each cluster and verify it is working as expected. Here is how the east-west gateway should look:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  annotations:
    meta.helm.sh/release-name: istio-eastwestgateway
    meta.helm.sh/release-namespace: istio-system
  creationTimestamp: "2021-05-13T01:56:50Z"
  generation: 2
  labels:
    app: istio-eastwestgateway
    app.kubernetes.io/managed-by: Helm
    install.operator.istio.io/owning-resource: unknown
    istio: eastwestgateway
    istio.io/rev: default
    operator.istio.io/component: IngressGateways
    release: istio-eastwestgateway
    topology.istio.io/network: network2
  name: istio-multicluster-ingressgateway
  namespace: istio-system
  resourceVersion: "6777467"
  selfLink: /apis/networking.istio.io/v1beta1/namespaces/istio-system/gateways/istio-multicluster-ingressgateway
  uid: 618b2b5b-a2bb-4b37-a4a1-7f5ab7ef03d4
spec:
  selector:
    istio: eastwestgateway
  servers:
  - hosts:
    - '*.local'
    port:
      name: tls
      number: 15443
      protocol: TLS
    tls:
      mode: AUTO_PASSTHROUGH



Improve your application with service mesh

Improving Your Application with Service Mesh

Engineering + Technology = Uptime 

Have you come across the term “application value” lately? Software-first organizations are using it as a new form of currency. Businesses delivering a product or service to its customers through an application understand the growing importance of their application’s security, reliability and feature velocity. And, as applications that people use become increasingly important to enterprises, so do engineering teams and the right tools 

The Right People for the Job: Efficient Engineering Teams 

Access to engineering talent is now more important to some companies than access to capital. 61% of executives consider this a potential threat to their business. With the average developer spending more than 17 hours each week dealing with maintenance issues, such as debugging and refactoring, plus approximately four hours a week on “bad code” (representing nearly $85 billion worldwide in opportunity cost lost annually), the necessity of driving business value with applications increases. And who is it that can help to solve these puzzles? The right engineering team, in combination with the right technologies and tools. Regarding the piece of the puzzle that can solved by your engineering team, enterprises have two options as customer demands on applications increase:  

  1. Increase the size and cost of engineering teams, or  
  2. Increase your engineering efficiency.  

Couple the need to increase the efficiency of your engineering team with the challenges around growing revenue in increasingly competitive and low margin businessesand the importance of driving value through applications is top of mind for any business. One way to help make your team more efficient is by providing the right technologies and tools. 

The Right Technology for the Job: Microservices and Service Mesh 

Using microservices architectures allows enterprises to more quickly deliver new features to customers, keeping them happy and providing them with more value over timeIn addition, with microservices, businesses can more easily keep pace with the competition in their space through better application scalability, resiliency and agility. Of course, as with any shift in technology, there can be new challenges.  

One challenge our customers sometimes face is difficulty with debugging or resolving problems within these microservices environments. It can be challenging to fix issues fast, especially when there are cascading failures that can cause your users to have a bad experience on your applicationThat’s where a service mesh can help. 

Service mesh provides ways to see, identify, trace and log when errors occurred and pinpoint their sources. It brings all of your data together into a single source of truth, removing error-prone processes, and enabling you to get fast, reliable information around downtime, failures and outages. More uptime means happy users and more revenue, and the agility with stability that you need for a competitive edge. 

Increasing Your Application Value  

Service mesh allows engineering teams to address many issues, but especially these three critical areas: 

  • Proactive issue detection, quick incident response, and workflows that accelerate fixing issues 
  • A unified source of multi-dimensional insights into application and infrastructure health and performance that provides context about the entire software system 
  • Line of sight into weak points in environments, enabling engineering teams to build more resilient systems in the future  

If you or your team are running Kubernetes-based applications at scale and are seeing the advantages, but know you can get more value out of them by increasing your engineering efficiency and uptime for your application's’ users, it’s probably time to check out a service mesh.