Steering The Future Of Istio

I’m honored to have been chosen by the Istio community to serve on the Istio Steering Committee along with Christian Posta, Zack Butcher and Zhonghu Xu. I have been fortunate to contribute to the Istio project for nearly three years and am excited by the huge strides the project has made in solving key challenges that organizations face as they shift to cloud-native architecture. 

Maybe what’s most exciting is the future direction of the project. The core Istio community realizes and advocates that innovation in Open Source doesn't stop with technology - it’s just the starting point. New and innovative ways of growing the community include making contributions easier, Working Group meetings more accessible and community meetings an open platform for end users to give their feedback. As a member of the steering committee, one of my main goals will be to make it easier for a diverse group of people to more easily contribute to the project.

Sharing my personal journey with Istio, when I started contributing to Istio, I found it intimidating to present rough ideas or proposals in an open Networking WG meeting filled with experts and leaders from Google & IBM (even though they were very welcoming). I understand how difficult it can be to get started on contributing to a new community, so I want to ensure the Working Group and community meetings are a place for end users and new contributors to share ideas openly, and also to learn from industry experts. I will focus on increasing participation from diverse groups, through working to make Istio the most welcoming community possible. In this vein, it will be important for the Steering Committee to further define and enforce a code of conduct creating a safe place for all contributors.

The Istio community’s effort towards increasing open governance by ensuring no single organization has control over the future of the project has certainly been a step in the right direction with the new makeup of the steering committee. I look forward to continuing work in this area to make Istio the most open project it can be. 

Outside of code contributions, marketing and brand identity are critically important aspects of any open source project. It will be important to encourage contributions from marketing and business leaders to ensure we recognize non-technical contributions. Addressing this is less straightforward than encouraging and crediting code commits, but a diverse vendor neutral marketing team in Open Source can create powerful ways to reach users and drive adoption, which is critical to the success of any open source project. Recent user empathy sessions and user survey forms are a great starting point, but our ability to put these learning into actions and adapt as a community will be a key driver in growing project participation.

Last, but definitely not least, I’m keen to leverage my experience and feedback from years of work with Aspen Mesh customers and broad enterprise experience to make Istio a more robust and production-ready project. 

In this vein, my fellow Aspen Mesher Jacob Delgado has worked tirelessly for many months contributing to Istio. As a result of his contributions, he has been named a co-lead for the Istio Product Security Working Group. Jacob has been instrumental in championing security best practices for the project and has also helped responsibly remediate several CVEs this year. I’m excited to see more contributors like Jacob make significant improvements to the project.

I'm humbled by the support of the community members who voted in the steering elections and chose such a talented team to shepherd Istio forward. I look forward to working with all the existing, and hopefully many new, members of the Istio community! You can always reach out to me through email, Twitter or Istio Slack for any community, technical or governance matter, or if you just want to chat about a great idea you have.

Helping Istio Sail

Around three years ago, we recognized the power of service mesh as a technology pattern to help organizations manage the next generation of software delivery challenges, and that led to us founding Aspen Mesh. And we know that efficiently managing high-scale Kubernetes applications requires the power of Istio. Having been a part of the Istio project since 0.2, we have seen countless users and customers benefit from the observability, security and traffic management capabilities that a service mesh like Istio provides. 

Our engineering team has worked closely with the Istio community over the past three years to help drive the stability and add new features that make Istio increasingly easy for users to adopt. We believe in the value that open source software — and an open community  —  brings to the enterprise service mesh space, and we are driven to help lead that effort through setting sound technical direction. By contributing expertise, code and design ideas like Virtual Service delegation and enforcing RBAC policies for Istio resources, we have focused much of our work on making Istio more enterprise-ready. One of these contributions includes Istio Vet, which was created and open sourced in early days of Aspen Mesh as a way to enhance Istio's user experience and multi resource configuration validation. Istio Vet proved to be very valuable to users, so we decided to work closely with the Istio community to create istioctl analyze in order to add key configuration analysis capabilities to Istio. It’s very exciting to see that many of these ideas have now been implemented in the community and part of the available feature set for broader consumption. 

As the Istio project and its users mature, there is a greater need for open communication around product security and CVEs. Recognizing this, we were fortunate to be able to help create the early disclosure process and lead the product security working group which ensures the overall project is secure and not vulnerable to known exploits. 

We believe that an open source community thrives when users and vendors are considered as equal partners. We feel privileged that we have been able to provide feedback from our customers that has helped accelerate Istio’s evolution. In addition, it has been a privilege to share our networking expertise from our F5’s heritage with the Istio project as maintainers and leads of important functional areas and key working groups.

The technical leadership of Istio is a meritocracy and we are honored that our sustained efforts have been recognized with - my appointment to the Technical Oversight Committee

As a TOC member, I am excited to work with other Istio community leaders to focus the roadmap on solving customer problems while keeping our user experience top of mind. The next, most critical challenges to solve are day two problems and beyond where we need to ensure smoother  upgrades, enhanced security and scalability of the system. We envision use cases emerging from industries like Telco and FinServ which will  push the envelope of technical capabilities beyond what many can imagine.

It has been amazing to see the user growth and the maturity of the Istio project over the last three years. We firmly believe that a more diverse leadership and an open governance in Istio will further help to advance the project and increase participation from developers across the globe.

The fun is just getting started and I am honored to be an integral part of Istio’s journey! 

Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.4.4 & 1.3.8 Security Update

Aspen Mesh is announcing the release of 1.4.4 and 1.3.8 (based on upstream Istio 1.4.4 & 1.3.8), both addressing a critical security update. All our customers are strongly encouraged to upgrade to these versions immediately based on your currently deployed Aspen Mesh version.

The Aspen Mesh team reported this CVE to the Istio community as per Istio CVE reporting guidelines, and our team was able to further contribute to the community--and for all Istio users--by fixing this issue in upstream Istio. As this CVE is extremely easy to exploit and the risk score was deemed very high (9.0), we wanted to respond  with urgency by getting a patch in upstream Istio and out to our customers as quickly as possible. This is just one of the many ways we are able to provide value to our customers as a trusted partner by ensuring that the pieces from Istio (and Aspen Mesh) are secure and stable.

Below are details about the CVE-2020-8595, steps to verify whether you’re currently vulnerable, and how to upgrade to the patched releases.

CVE Description

A bug in Istio's Authentication Policy exact path matching logic allows unauthorized access to resources without a valid JWT token. This bug affects all versions of Istio (and Aspen Mesh) that support JWT Authentication Policy with path based trigger rules (all 1.3 & 1.4 releases). The logic for the exact path match in the Istio JWT filter includes query strings or fragments instead of stripping them off before matching. This means attackers can bypass the JWT validation by appending “?” or “#” characters after the protected paths.

Example Vulnerable Istio Configuration

Configure JWT Authentication Policy triggers on an exact HTTP path match "/productpage" like this. In this example, the Authentication Policy is applied at the ingress gateway service so that any requests with the exact path matching “/productpage” requires a valid JWT token. In the absence of a valid JWT token, the request is denied and never forwarded to the productpage service. However, due to this CVE, any request made to the ingress gateway with path "/productpage?" without any valid JWT token is not denied but sent along to the productpage service, thereby allowing access to a protected resource without a valid JWT token. 


Since this vulnerability is in the Envoy filter added by Istio, you can also check the proxy image you’re using in the cluster locally to see if you’re currently vulnerable. Download and run this script to verify if the proxy image you’re using is vulnerable. Thanks to Francois Pesce from the Google Istio team in helping us create this test script.

Upgrading to This Version

Please follow upgrade instructions in our documentation to upgrade to these versions. Since this vulnerability affects the sidecar proxies and gateways (ingress and egress), it is important to follow the post-upgrade tasks here and rolling upgrade all of your sidecar proxies and gateways.

Aspen Mesh Policy Framework - After

Announcing Aspen Mesh Secure Ingress Policy

In the new Application Economy, developers are building and deploying applications at a far greater frequency than ever before. Organizations gain this agility by shifting to microservices architectures, powered by Kubernetes and continuous integration and delivery (CI/CD). For businesses to derive value from these applications, they need to be exposed to the outside world in a secure way so that their customers can access them--and have a great user experience. That’s such an obvious statement, you’re probably wondering why I even bothered saying it.

Well, within most organizations, securely exposing an application to the outside world is complicated. Ports, protocols, paths, and auth requirements need to be collected. Traffic routing resources and authentication policies need to be configured. DNS entries and TLS certificates need to be created and mounted. Application teams know some of these things and platform owners know others. Pulling all of this together is painful and time consuming.

This problem is exacerbated by the lack of APIs mapping intent to the persona performing the task. Let’s take a quick look at the current landscape.

Kubernetes allows orchestration (deploying and upgrading) of applications but doesn't provide any capabilities to capture application behavior like protocol, paths and security requirements for these APIs. For securely exposing an application to their users, platform operators currently need to capture this information from developers in private conversations and create additional Kubernetes resources like Ingress which in turn creates the plumbing to allow traffic from outside the cluster and route it to the appropriate backend application. Alternatively, advanced routing capabilities in Istio make it possible to control more aspects of traffic management, and at the same time allows developers to offload functionality like JWT Authentication from their applications to the infrastructure.

But the missing piece in both of these scenarios is a reliable and scalable way of gathering information about the applications from the developers independent of platform operators and enabling platform operators to securely configure the resources they need to expose these applications.

Additionally, configuring the Ingress or Istio routing APIs, is only a part of the puzzle. Operators also need to set up domain names (DNS) and get domain certificates (static or dynamic via Let’s Encrypt for example) in order to secure the traffic getting into their clusters. All of this requires managing a lot of moving pieces with the possibility of failures in multiple steps on the way.

Aspen Mesh Policy Framework - Before


To solve these challenges, we are excited to announce the release of Aspen Mesh Secure Ingress Policy.

A New Way to Securely Expose Your Applications

Our goal for developing this new Secure Ingress Policy framework is to help streamline communication between application developers and platform operators. With this new feature, both principals can be productive, but also work together.

The way it works is application developers provide a specification for their service which they can store in their code management systems and communicate to Aspen Mesh through an Application API. This spec includes the service port and protocols to expose via Ingress and the API paths and authentication requirements (e.g., JWT validation).

Platform operators provide a specification that defines the security and networking aspects of the platform and communicates it to Aspen Mesh via a Secure Ingress API. This spec includes certificate secrets, domain name, and JWKS server and issuer.

Aspen Mesh Policy Framework - After


Aspen Mesh takes these inputs and creates all of the necessary system resources like Istio Gateways, VirtualServices, Authentication Policies, configuring DNS entries and retrieving certificates to enable secure access to the application. If you have configured these by hand, you know the complexity involved in getting this right. With this new feature, we want our customers to focus on what’s important to them and let Aspen Mesh take care of your infrastructure needs. Additionally, the Aspen Mesh controllers always keep the resources in sync and augment them as the Secure Ingress and Application resources are updated.

Another important benefit of mapping APIs to personas is the ability to create ownership and storing configuration in the right place.  Keep the application-centric specifications in code right next to the application so that you can review and update both as part of your normal code review process. You don’t need another process or workflow to apply configuration changes out-of-band with your code changes. Also because these things live together, they can naturally be deployed at the same time, thereby reducing misconfigurations.

The overarching goals for our APIs were to enable platform operators to retain the strategic point of control to enforce policies while allowing application developers to move quickly and deliver customer-facing features. And most importantly, allowing our customer’s customers to use the application reliably and securely. 

Today, we're natively integrated with AWS Route 53 and expect to offer integrations with Azure and GCP in the near future. We also retrieve and renew domain certificates from Let’s Encrypt, and the only thing that operators need to provide is their registered email address and the rest is handled by the Aspen Mesh control plane. 

Interested in learning more about Secure Ingress Policy? Reach out to one of our experts to learn more about how you can implement Aspen Mesh’s Secure Ingress Policies at your organization.

Aspen Mesh 1.3

Announcing Aspen Mesh 1.3

We’re excited to announce the release of Aspen Mesh 1.3 which is based on Istio’s latest LTS release 1.3 (specific tag version 1.3.3). This release builds on our self-managed release (1.2 series), includes all the new capabilities added by the Istio community in release 1.3 plus a host of new Aspen Mesh features, all fully tested with production grade support ready for enterprise adoption.

The theme for Aspen Mesh and Istio 1.3 release was enhanced User Experience. The release includes an enhanced user dashboard that has been redesigned for easier navigation of service graph and cluster resources. The Aspen Mesh service graph view has been augmented to include ingress and egress services as well as easier access to health and policy details for nodes on the graph. While a service graph is a great tool for visualizing service communication as a team, we realized that in order  to quickly identify services that are experiencing problems, individual platform engineers need a view that allows them to dig deeper and gain additional insight into their services. To address this, we are releasing a new table view which provides access to additional information about clusters, namespaces and workloads including the ingress and egress services they are communicating with and any warnings or errors for those objects as detected by our open source configuration analyzer Istio Vet.

Aspen Mesh 1.3

The Istio community added new capabilities which makes it easy for users to adopt and debug Istio and also reduced the configuration needed for users to get service mesh working in their Kubernetes environment. The full list of features and enhancements can be found in Istio’s release announcement, but there are few features that deserve deeper analysis.

Specifying Container Ports Is No Longer Required

Before release 1.3, Istio only intercepted inbound traffic on ports that were explicitly declared as part of the container spec in Kubernetes. This was often a cause of friction for adoption as Kubernetes doesn’t require container ports to be specified and by default forwards traffic to any unlisted port. Making this even worse, any unlisted inbound port bypassed the sidecar proxy (instead of being blocked) which created a potential security risk as bypassing the proxy meant no policies were being enforced. In this release, specifying container ports is no longer required and by default all ports are intercepted for traffic and redirected to sidecar proxy which means misconfiguration will no longer lead to security violations! If for some reason, you would still like to explicitly specify inbound ports instead of capturing all which we highly recommend) you can use the annotation “” on the pod spec.

Protocol Detection

In earlier versions of Istio, all service port names were required to be explicitly named with the protocol prefix (http-, grpc-, tcp-, etc) to declare the protocol being used by the service port. In the absence of a prefix, traffic was classified as TCP which meant a loss in visibility (metrics/tracing). It was also possible to bypass policy if a user  had configured HTTP or Layer 7 policies thinking that the application was accepting Layer 7 but the mesh was classifying it as TCP traffic. Experienced users of Kubernetes who already had a lot of existing configuration had to migrate their service definitions to add this prefix which lead to a lot of missing configurations and adoption burden. In release 1.3, an experimental protocol detection feature was added which doesn’t require users to prefix the service port name for HTTP traffic. Note that this feature is experimental and only works for HTTP traffic - for all other protocols you still need to add the prefix on the port names. Protocol detection is a useful functionality which can reduce configuration burden for users but it can interact with policies and routing in unexpected ways. We are working with the Istio community to iron out these interactions and will be publishing a blog soon on best recommended practices for production usage. In the meantime, this feature is disabled by default in the Aspen Mesh release and we  encourage our customers to enable this only in staging environments. Additionally, for Aspen Mesh customers, we automatically run the service port prefix vetter and notify you if any service in the mesh has ports with missing protocol prefixes.

Mixer-less Telemetry

Earlier versions of Istio had a control plane component, Mixer, which was responsible for receiving attributes about traffic from sidecar proxies in the client and server workloads and exposing it to a telemetry backend system like Prometheus or DataDog. This architecture was great for providing an abstraction layer for operators to switch out telemetry backend systems, but this component often became a choke point which required a large amount of resources (CPU/memory) which made it expensive for operators to manage Istio. In this release, an experimental feature was added which doesn’t require running Mixer to capture telemetry. In this mode, the sidecar proxies expose the metrics directly, which can be scraped by Prometheus. This feature is disabled by default and under active development to make sure users get the same metrics with and without Mixer. This page documents how to enable and use this feature  if you’re interested in trying it out.

Telemetry for External Services

Depending on your global settings i.e. whether to allow any external service access or block all traffic without explicit ServiceEntries, there were gaps in telemetry when external traffic was either blocked or allowed. Having visibility into external services is one of the key benefits of a service mesh and the new functionality added in release 1.3 allows you to monitor all external service traffic in either of the modes. It was a highly requested feature both from our customers and other production users of Istio, and we were pleased to  contribute this functionality to open source Istio. This blog documents how the augmented metrics can be used to better understand external service access. Note that all Aspen Mesh releases by default block all external service access, which we recommend, unless explicitly declared via ServiceEntries.

We hope that these new features simplify configuration needed to adopt Aspen Mesh and the enhanced User Experience makes it easy for you to navigate the complexities of a microservices environment. You can get the latest release here or if you’re an existing customer please follow the upgrade instructions in our documentation to switch to this version.


Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.2.7 Security Update

Aspen Mesh is announcing the release of 1.2.7 which addresses important Istio security updates. Below are the details of the security fixes taken from Istio 1.2.7 security update.

Security Update 

ISTIO-SECURITY-2019-005: A DoS vulnerability has been discovered by the Envoy community. 

  • CVE-2019-15226: After investigation, the Istio team has found that this issue could be leveraged for a DoS attack in Istio if an attacker uses a high quantity of very small headers.

Bug Fix

  • Fix a bug where nodeagent was failing to start when using citadel (Issue 15876)

Additionally the Aspen Mesh 1.2.7 release contains bug fixes and enhancements from Istio release 1.2.6.  

The Aspen Mesh 1.2.7 binaries are available for download here

 For upgrading procedures of Aspen Mesh deployments installed via Helm (helm upgrade) please visit our Getting Started page.

Simplifying Microservices Security with Incremental mTLS

Kubernetes removes much of the complexity and difficulty involved in managing and operating a microservices application architecture. Out of the box, Kubernetes gives you advanced application lifecycle management techniques like rolling upgrades, resiliency via pod replication, auto-scalers and disruption budgets, efficient resource utilization with advanced scheduling strategies and health checks like readiness and liveness probes. Kubernetes also sets up basic networking capabilities which allow you to easily discover new services getting added to your cluster (via DNS) and enables pod to pod communication with basic load balancing.

However, most of the networking capabilities provided by Kubernetes and it’s CNI providers are constrained to layer 3/4 (networking/protocols like TCP/IP) of the OSI stack. This means that any advanced networking functionality (like retries or routing) which relies on higher layers i.e. parsing application protocols like HTTP/gRPC (layer 7) or encrypting traffic between pods using TLS (layer 5) has to be baked into the application. Relying on your applications to enforce network security is often fraught with landmines related to close coupling of your operations/security and development teams and at the same time adding more burden on your application developers to own complicated infrastructure code.

Let’s explore what it takes for applications to perform TLS encryption for all inbound and outbound traffic in a Kubernetes environment. In order to achieve TLS encryption, you need to establish trust between the parties involved in communication. For establishing trust, you need to create and maintain some sort of PKI infrastructure which can generate certificates, revoke them and periodically refresh them. As an operator, you now need a mechanism to provide these certificates (maybe use Kubernetes secrets?) to the running pods and update the pods when new certificates are minted. On the application side, you have to rely on OpenSSL (or its derivatives) to verify trust and encrypt traffic. The application developer team needs to handle upgrading these libraries when CVE fixes and upgrades are released. In addition to all these complexities, compliance concerns may also require you only support a TLS version (or higher) and subset of ciphers, which requires creating and supporting more configuration options in your applications. All of these challenges make it very hard for organizations to encrypt all pod network traffic on Kubernetes, whether it’s for compliance reasons or achieving a zero trust network model.

This is the problem that a service mesh leveraging the sidecar proxy approach is designed to solve. The sidecar proxy can initiate a TLS handshake and encrypt traffic without requiring any changes or support from the applications. In this architecture, the application pod makes a request in plain text to another application running in the Kubernetes cluster which the sidecar proxy takes over and transparently upgrades to use mutual TLS. Additionally, the Istio control plane component Citadel handles creating workload identities using the SPIFFE specification to create and renew certificates and mount the appropriate certificates to the sidecars. This removes the burden of encrypting traffic from developers and operators.

Istio provides a rich set of tools to configure mutual TLS globally (on or off) for the entire cluster or incrementally enabling mTLS for namespaces or a subset of services and its clients and incrementally adopting mTLS. This is where things get a little complicated. In order to correctly configure mTLS for one service, you need to configure an Authentication Policy for that service and the corresponding DestinationRules for its clients.

Both the Authentication policy and Destination rule follow a complex set of precedence rules which must be accounted for when creating these configuration objects. For example, a namespace level Authentication policy overrides the mesh level global policy, a service level policy overrides the namespace level and a service port level policy overrides the service specific Authentication policy. Destination rules allow you to specify the client side configuration based on host names where the highest precedence is the Destination rule defined in the client namespace then the server namespace and finally the global default Destination rule. On top of that, if you have conflicting Authentication policies or Destination rules, the system behavior can be indeterminate. A mismatch in Authentication policy and Destination rule can lead to subtle traffic failures which are difficult to debug and diagnose. Aspen Mesh makes it easy to understand mTLS status and avoid any configuration errors.

Editing these complex configuration files in YAML can be tricky and only compound the problem at hand. In order to simplify how you configure these resources and incrementally adopt mutual TLS in your environment, we are releasing a new feature which enables our customers to specify a service port (via APIs or UI) and their desired mTLS state (enabled or disabled). The Aspen Mesh platform automatically generates the correct set of configurations needed (Authentication policy and/or Destination rules) by inspecting the current state and configuration of your cluster. You can then view the generated YAMLs, edit as needed and store them in your CI system or apply them manually as needed. This feature removes the hassle of learning complex Istio resources and their interaction patterns, and provides you with valid, non-conflicting and functional Istio configuration.

Customers that we talk to are in various stages of migrating to a microservices architecture or Kubernetes environment which results in a hybrid environment where you have services which are consumed by clients not in the mesh or are deployed outside the Kubernetes environment, so some services require a different mTLS policy. Our hosted dashboard makes it easy for users to identify services and workloads which have mTLS turned on or off and then easily create configuration using the above workflow to change the mTLS state as needed.

If you’re an existing customer, please upgrade your cluster to our latest release (Aspen Mesh 1.1.3-am2) and login to the dashboard to start using the new capabilities.

If you’re interested in learning about Aspen Mesh and incrementally adopting mTLS in your cluster, you can sign up for a beta account here.

Announcing Aspen Mesh 1.1

Aspen Mesh release 1.1.3 is now out to address a critical security update. The Aspen Mesh release is based on security patches released in the Istio 1.1.3 release - you can read more about the update here. We recommend Aspen Mesh users running 1.1 immediately upgrade to 1.1.3.

Close on the heels of the much anticipated Istio 1.1 release, we are excited to announce the release of Aspen Mesh 1.1. Our latest release provides all the features of Istio plus the support of the Aspen Mesh platform and team, and additional features you need to operate in the enterprise.

As with previous Istio releases, the open source community has done a great job of creating a release with exciting new features, improved stability and enhanced performance. The aim of this blog is to distill what has changed in the new release and point out the few gotchas and known issues in an easy to consume format.  We often find that there are so many changes (kudos to the community for all the hard work!) that it is difficult for users to discern what pieces they should care about and what actions they need to take on the pieces they care about. Hopefully, this blog will help address a few of these issues.

Before we delve into the specifics, let’s focus on why release 1.1 was a big milestone for the Istio community and how things were handled differently compared to previous releases:

  • Quality was the major focus of this release. If you look at the history, you will notice that it took six release candidates to get the release out. The maintainers worked diligently to resolve tricky user-identified issues and address them correctly instead of getting the release out on a predefined date. We would see constant updates/PRs (even on weekends) to address these issues which is a testament to the dedication of the open source community.
  • User experience was a key area of focus in the community for this release. There was a new UX Working Group created to address various usability issues and to improve the user’s Istio journey from install to upgrade. We believe that this is a step in the right direction and will lead to easier Istio adoption. Aspen Mesh actively participates in these meetings with an eye on improving the experience of Istio and Aspen Mesh users.
  • Meaningful effort was put into improving the documentation, especially around consistent use of terminology.

It was great to see that the community listened to its users, addressed critical issues and didn’t rush to release 1.1. We look forward to how Project Mauve can further improve the engineering process, thereby improving the quality of Istio releases.

So, let’s move onto the exciting new features and improvements that are part of the Aspen Mesh 1.1 release.

Aspen Mesh 1.1 Features

Reduced sidecar memory usage
This was a long-standing issue that Istio users had faced when dealing with medium to large scale clusters. The Envoy sidecars’ memory consumption grew as new services and pods were deployed in the cluster resulting in a considerable memory footprint for each sidecar proxy. As these sidecars are part of every pod in the mesh this can quickly impact the scheduling and memory requirements for your cluster. In release 1.1, you can expect a significant reduction in the memory consumption by the sidecars. This benefit is primarily driven by reducing the set of statistics exposed by each sidecar. Previously, the sidecars were configured to expose metrics for every Envoy cluster, listener and HTTP connection manager which would increase the number of metrics reported roughly in proportion to the number of services and pods. In release 1.1, the set of metrics is now reduced to the cluster and listener managers (in addition to Istio specific stats) which always expose a fixed set of metrics. We found in our testing that the sidecar memory consumption is significantly lower compared to Aspen Mesh release 1.0.4 and we are looking forward to users being able to inject sidecars in more applications in their clusters.

New multi cluster support

Earlier versions of Istio supported multiple clusters via a single control plane topology. This meant that the Istio control plane would be deployed only on one cluster which would manage services on both local and remote clusters. Additionally, it required a flat network IP space for the pods to communicate across clusters. These restrictions limited real world uses of multi cluster functionality as the control plane could easily become a single point of failure and the flat IP space was not always feasible. In this release support was added for multiple control plane topology which provides the desired control plane availability and no restrictions on the IP layout. Networking across clusters is set up via the ingress gateways which rely on mTLS (common root Certificate Authority across clusters) to verify the peer traffic. We are excited to see new use cases emerge for multi cluster service mesh and how enterprises can leverage Aspen Mesh to truly build resilient and highly available applications deployed across clusters.   

CNI support
Istio by default sets up the pod traffic redirection to/from the sidecar proxy by injecting an init container which uses iptables under the hood. The ability to use iptables requires elevated permissions which is a hindrance to adopting Istio in various organizations due to compliance concerns. Istio and Aspen Mesh now support CNI as a new way to perform traffic redirection, removing the need for elevated permissions. It is great to see this enhancement as we think it is critical to have the principle of least privileges applied to the service mesh. We’re excited to be able to drive advanced compliance use cases with our customers over the next few months.

New sidecar resource
One of the biggest challenges users faced with the old releases was that all the sidecars in the mesh had configuration related to all the services in the cluster even though a particular sidecar proxy only needed to talk to a small subset of services. This resulted in excess churn as massive amounts of configuration were processed and transmitted to the sidecars with every configuration update. This caused intermittent request failures and CPU spikes in all the sidecars on any configuration change in the cluster. The 1.1 release added a new Sidecar resource to enable operators to configure the ingress and egress of each proxy. With this resource, users can control the scope and visibility of configuration distributed to the sidecars and attain better resource utilization and scalability of Istio components.

Apart from the aforementioned major changes, there are quite a few lesser known enhancements in this release which can be helpful in exploring Aspen Mesh capabilities.

Enabling end-user JWT authentication by path
Istio ingressgateway and sidecar proxies support decoding JWT provided by the end user and passing it to the applications as an HTTP request header. This has the operational benefit of isolating authentication from application code and instead using the service mesh infrastructure layer for these critical security operations. In earlier versions of Istio you could only enable/disable this feature on a per service or port basis but not for specific HTTP paths. This was very limiting especially for ingress gateways where you might have some paths requiring authentication and some that didn’t. In release 1.1, an experimental feature was added to enable end user JWT authentication based on request path.

New Helm installation options There are many new Helm installation options added in this release (in addition to the old ones) that are useful in customizing Aspen Mesh based on your needs. We often find that customer use cases are quite different and unique for every environment, so the addition of these options makes it easy to tailor service mesh to your needs. Some of the important new options are:

  • Node selector - Many of our customers want to install the control plane components on their own nodes for better monitoring, isolation and resilience. In this release there is an easy Helm option, global.defaultNodeSelector to achieve this functionality.
  • Tracing backend address - Users often have their tracing set up and want to easily add Istio on top to work with their existing tracing system. In the older version it was quite painful to provide a different tracing backend to Istio (used to be hardcoded to “zipkin.istio-system”). This release added a new “global.tracer.zipkin.address” Helm option to enable this functionality. If you’re an Aspen Mesh customer, we automatically set this up for you so that the traces are sent to the Aspen Mesh platform where you can access them via our hosted Jaeger service.
  • Customizable proxy access log format - The sidecar proxies in the older releases performed access logging in the default Envoy format. Even though the information is great, you might have access logging set up in other systems in your environment and want to have a uniform access logging format throughout your cluster for ease of parsing, searching and tooling. This new release supports a Helm option “global.proxy.accessLogFormat” for users to easily customize the logging format based on their environment.

This release also added many debugging enhancements which make it easy for users to operate and debug when running an Aspen Mesh cluster. Some critical enhancements in this area were:

Istioctl enhancements
Istioctl is a tool similar to kubectl for performing Istio specific operations which can aid in debugging and validating various Istio configuration and runtime issues. There were several enhancements made to this tool which are worth mentioning:

  • Verify install - Istioctl now supports an experimental command to verify the installation in your cluster. This is a great step for first time Istio users before you dive deeper into exploring all the Istio capabilities. If you’re an Aspen Mesh customer, our demo installer automatically does this step for you and lets you know if the installation was successful.
  • Static configuration validation - Istioctl supports a “validate” command for users to verify their configuration before applying it to their cluster. Using this effectively can prevent easy misconfigurations and surprises which can be hard to debug. Note that Galley now also performs validation and rejects configuration if it’s invalid in the new release. If you’re an Aspen Mesh customer, you can use this new functionality in addition to the automated runtime analysis we perform via istio-vet. We find that the static single resource validation is a good first step but an automated tool like istio-vet from Aspen Mesh which can perform runtime analysis across multiple resources is also needed to ensure a properly functioning mesh.
  • Proxy health status - Support was added to quickly inspect and verify the health status of proxy (default port 15020) which can be very useful in debugging request failures. We often found that users struggled in understanding what qualifies as a healthy Istio proxy (sidecar or gateways) and we think this can help to alleviate this issue.

Along with all of these great new improvements, there are a few gotchas or unexpected behaviors you might observe especially if you’re upgrading Istio from an older version. We’ve done a thorough investigation of these potential issues and are making sure our customers have a smooth transition with our releases. However, for the broader community let’s cover a few important gotchas to be aware of:

  • Access allowed to any external services by default - The new Istio release will by default allow access to any external service. In previous releases, all external traffic was blocked and required users to explicitly allowlist external services via ServiceEntry. This decision was reached by the community to make it easier for customers to add Istio on top of their existing deployments and not break working traffic. However, we think this is a major change that can lead to security escapes if you’re upgrading to this version. With that in mind, the Aspen Mesh distribution of the release will continue to block all external traffic by default. If you want to customize this setting, the Helm option “global.outboundTrafficPolicy.mode” can be updated based on your requirement.
  • Proxy access logs disabled by default - In this Istio release, the default behavior for proxy access logging has changed and it is now turned off by default. For first time users it is very helpful to observe access logs as the traffic flows through their services in the mesh. Additionally, if you’re upgrading to a new version and find that your logs are missing, it might break debugging capabilities that you have built around it. Because of this, the Aspen Mesh distribution has the proxy access logs turned on by default. You can customize this setting by updating the Helm option “global.proxy.accessLogFile” to “/dev/stdout”.
  • Every Sidecar resource requires “istio-system” - If you’re configuring the newly available Sidecar resource, be sure to include “istio-system” as one of the allowed egress hosts. During our testing we found that in the absence of “istio-system” namespace, the sidecar proxies will start experiencing failures communicating to the Istio control plane which can lead to cascading failures. We are working with the community to address this issue so that users can configure this resource with minimal surprises.
  • Mixer policy checks disabled by default -  Mixer policy checks were turned on by default in earlier Istio releases which meant that the sidecar proxies and gateways would always consult Mixer in the Istio control plane to check policy and forward the request to the application only if the policy allowed it. This feature was seldom used but added latency due to the out-of-process network call. This new release turned off policy checks by default after much deliberation and debate in the community. What this means is if you had previously configured Policy checks and were relying on Mixer to enforce it, after the upgrade those configurations will no longer have any effect. If you would like to enable them by default, set the Helm option “global.disablePolicyChecks” to false.

We hope this blog has made it easy to understand the scope and impact of the 1.1 release. At Aspen Mesh, we keep a close tab on the community and actively participate to make the adoption and upgrade path easier for our customers. We believe that enterprises should spend less time and effort on configuring the service mesh and focus on adding business value on top.

We'll be covering subsequent topics and deep diving into how you can set up and make the most out of new 1.1 features like multi cluster. Be sure to subscribe to the Aspen Mesh blog so you don't miss out.

If you want to quickly get started with the Aspen Mesh 1.1 release grab it here or if you’re an existing customer please follow our upgrade instructions mentioned in the documentation.

Distributed Tracing, Istio and Your Applications

In the microservices world, distributed tracing is slowly becoming the most important tool for debugging and understanding your application dependencies. During my recent conversations in MeetUps and conferences, I found there was a lot of interest in how distributed tracing works but at the same time there was a fair amount of confusion on how tracing interacts with service meshes like Istio and Aspen Mesh. In particular, I had these following questions asked frequently:

  • How does tracing work with Istio? What information is collected and reported in the spans?
  • Do I have to change my applications to benefit from distributed tracing in Istio?
  • If I am currently reporting spans in my application how will it interact with spans reported from Istio?

In this blog I am going to try and answer these questions. Before we get deeper into these questions, a quick background on why or how I ended up writing tracing related blogs. If you follow the Aspen Mesh blog you would have noticed I wrote two blogs related to tracing, one on tracing requests to AWS services when using Istio, and the second on tracing gRPC applications with Istio.

We have a pretty small engineering team at Aspen Mesh and as it goes in most startups if you work frequently on a sub-system or component you quickly become (or labeled or assigned) a resident expert. I added tracing in our microservices and integrated it with Istio in the AWS environment and in that process uncovered various interesting interactions which I thought might be worth sharing. Over the last few months we have been using tracing very heavily to gain understanding of our microservices and it has now become the first place we look when things break. With that let's move on to answering the questions I mentioned above.

How does tracing work with Istio?

Istio injects a sidecar proxy (Envoy) in the Pod in which your application container is running. This sidecar proxy transparently intercepts (iptables magic) all network traffic going in and out of your application. Because of this interception the sidecar proxy is in a unique position to automatically trace all network requests (HTTP/1.1, HTTP/2.0 & gRPC).

Let's see what changes sidecar proxy makes to an incoming request to a Pod from a client (external or other microservices). From this point on I'm going to assume tracing headers are in Zipkin format for simplicity.

  • If the incoming request doesn't have any tracing headers, the sidecar proxy will create a root span (span where trace, parent and span IDs are all the same) before passing the request to the application container in the same Pod.
  • If the incoming request has tracing information (which should be the case if you're using Istio ingress or your microservice is being called from another microservice with sidecar proxy injected), the sidecar proxy will extract the span context from these headers, create a new sibling span (same trace, span and parent ID as incoming headers) before passing the request to the application container in the same Pod.

In the reverse directon when the application container is making outbound requests (external services or services in the cluster), the sidecar proxy in the Pod performs the following actions before making the request to the upstream service:

  • If no tracing headers are present, the sidecar proxy creates root span and injects the span context as tracing headers into the new request.
  • If tracing headers are present, the sidecar proxy extracts the span context from the headers, creates child span from this context. The new context is propagated as tracing headers in the request to the upstream service.

Based on the above explanation you should note that for every hop in your microservice chain you will get two spans reported from Istio, one from the client sidecar (span.kind set to client) and one from the server sidecar (span.kind set to server). All the spans created by the sidecars are automatically reported by the sidecars to the configured tracing backend systems like Jaeger or Zipkin.

Next let's look at the information reported in the spans. The spans contain the following information:

  • x-request-id: Reported as guid:x-request-id which is very useful in correlating access logs with spans.
  • upstream cluster: The upstream service to which the request is being made. If the span is tracking an incoming request to a Pod this is typically set to in.<name>. If the span is tracking an outbound request this is set to out.<name>.
  • HTTP headers: Following HTTP headers are reported when available:
    • URL
    • Method
    • User agent
    • Protocol
    • Request size
    • Response size
    • Response Flags
  • Start and end times for each span.
  • Tracing metadata: This includes the trace ID, span ID and the span kind (client or server). Apart from these the operation name is also reported for every span. The operation name is set to the configured virtual service (or route rule in v1alpha1) which affected the route or "default-route" if the default route was chosen. This is very useful in understanding which Istio route configuration is in effect for a span.

With that let's move on to the second question.

Do I have to change my application to gain benefit from tracing in Istio?

Yes, you will need to add logic in your application to propagate tracing headers from incoming to outgoing requests to gain full benefit from Istio's distributed tracing.

If the application container makes a new outbound request in the context of an incoming request and doesn't propagate the tracing headers from the incoming request, the sidecar proxy creates a root span for the outbound request. This means you will always see traces with only two microservices. On the other hand if the application container does propagate the tracing headers from incoming to outgoing requests, the sidecar proxy will create child spans as described above. Creation of the child spans gives you the ability to understand dependencies across multiple microservices.

There are couple of options for propagating tracing headers in your application.

  1. Look for tracing headers as mentioned in the istio docs and transfer the headers from incoming to outgoing requests. This method is simple and works in almost all cases. However, it has a major drawback that you cannot add custom tags to the spans like user information. You cannot create child spans related to events in the application which you might want to report. As you are simply transferring headers without understanding the span formats or contexts there is limited ability to add application specific information.
  2. The second method is setting up a tracing client in your application and use the Opentracing APIs to propagate tracing headers from incoming to outgoing requests. I have created a sample tracing-go package which provides an easy way to setup jaeger-client-go in your applications which is compatible with Istio. Following snippet should be included in the main function of your application:
       import (



       func setupTracing() {
         // Configure Tracing
         tOpts := &tracing.Options{
           ZipkinURL:     viper.GetString("trace_zipkin_url"),
           JaegerURL:     viper.GetString("trace_jaeger_url"),
           LogTraceSpans: viper.GetBool("trace_log_spans"),
         if err := tOpts.Validate(); err != nil {
           log.Fatal("Invalid options for tracing: ", err)
         var tracer io.Closer
         if tOpts.TracingEnabled() {
           tracer, err = tracing.Configure("myapp", tOpts)
           if err != nil {
             log.Fatal("Failed to configure tracing: ", err)
           } else {
             defer tracer.Close()

The key point to note is in the tracing-go package I have set the Opentracing global tracer to the Jaeger tracer. This enables me to use the Opentracing APIs for propagating headers from incoming to outgoing requests like this:

   import (
     "ot ""

   func injectTracingHeaders(incomingReq *http.Request, addr string) {
     if span := ot.SpanFromContext(incomingReq.Context()); span != nil {
       outgoingReq, _ := http.NewRequest("GET", addr, nil)

       resp, err := ctxhttp.Do(ctx, nil, outgoingReq)
       // Do something with resp

You can also use the Opentracing APIs to set span tags or create child spans
from the tracing context added by Istio like this:

   func SetSpanTag(incomingReq *http.Request, key string, value interface{}) {
     if span := ot.SpanFromContext(incomingReq.Context()); span != nil {
       span.SetTag(key, value)

Apart from these benefits you don't have to deal with tracing headers directly but the tracer (in this case Jaeger) handles it for you. I strongly recommend using this approach as it sets the foundation in your application to add enhanced tracing capabilities without much overhead.

Now let's move on to the third question.

How does spans reported by Istio interact with spans created by applications?

If you want the spans reported by your application to be child spans of the tracing context added by Istio you should use Opentracing API StartSpanFromContext instead of using StartSpan. The StartSpanFromContext creates a child span from the parent context if present else creates a root span.

Note that in all the examples above I have used Opentracing Go APIs but you should be able to use any tracing client library written in the same language as your application as long as it is Opentracing API compatible.

Tracing and Metrics: Getting the Most Out of Istio

Are you considering or using a service mesh to help manage your microservices infrastructure? If so, here are some basics on how a service mesh can help, the different architectural options, and tips and tricks on using some key CNCF tools that integrate well with Istio to get the most out of it.

The beauty of a service mesh is that it bundles so many capabilities together, freeing engineering teams from having to spend inordinate amounts of time managing microservices architectures. Kubernetes has solved many build and deploy challenges, but it is still time consuming and difficult to ensure reliability and security at runtime. A service mesh handles the difficult, error-prone parts of cross-service communication such as latency-aware load balancing, connection pooling, service-to-service encryption, instrumentation, and request-level routing.

Once you have decided a service mesh makes sense to help manage your microservices, the next step is deciding what service mesh to use. There are several architectural options, from the earliest model of a library approach, the node agent architecture, and the model which seems to be gaining the most traction – the sidecar model. We have also seen an evolution from data plane proxies like Envoy, to service meshes such as Istio which provide distributed control and data planes. We're active users of Istio, and believers in the sidecar architecture striking the right balance between a robust set of features and a lightweight footprint, so let’s take a look at how to get the most out of tracing and metrics with Istio.


One of the capabilities Istio provides is distributed tracing. Tracing provides service dependency analysis for different microservices and it provides tracking for requests as they are traced through multiple microservices. It’s also a great way to identify performance bottlenecks and zoom into a particular request to define things like which microservice contributed to the latency of a request or which service created an error.

We use and recommend Jaeger for tracing as it has several advantages:

  • OpenTracing compatible API
  • Flexible & scalable architecture
  • Multiple storage backends
  • Advanced sampling
  • Accepts Zipkin spans
  • Great UI
  • CNCF project and active OS community


Another powerful thing you gain with Istio is the ability to collect metrics. Metrics are key to understanding historically what has happened in your applications, and when they were healthy compared to when they were not. A service mesh can gather telemetry data from across the mesh and produce consistent metrics for every hop. This makes it easier to quickly solve problems and build more resilient applications in the future.

We use and recommend Prometheus for gathering metrics for several reasons:

  • Pull model
  • Flexible query API
  • Efficient storage
  • Easy integration with Grafana
  • CNCF project and active OS community

We also use Cortex, which is a powerful tool to enhance Prometheus. Cortex provides:

  • Long term durable storage
  • Scalable Prometheus query API
  • Multi-tenancy

Check out this webinar for a deeper look into what you can do with these tools and more.