Aspen Mesh release 1.1.3 is now out to address a critical security update. The Aspen Mesh release is based on security patches released in the Istio 1.1.3 release – you can read more about the update here. We recommend Aspen Mesh users running 1.1 immediately upgrade to 1.1.3.
Close on the heels of the much anticipated Istio 1.1 release, we are excited to announce the release of Aspen Mesh 1.1. Our latest release provides all the features of Istio plus the support of the Aspen Mesh platform and team, and additional features you need to operate in the enterprise.
As with previous Istio releases, the open source community has done a great job of creating a release with exciting new features, improved stability and enhanced performance. The aim of this blog is to distill what has changed in the new release and point out the few gotchas and known issues in an easy to consume format. We often find that there are so many changes (kudos to the community for all the hard work!) that it is difficult for users to discern what pieces they should care about and what actions they need to take on the pieces they care about. Hopefully, this blog will help address a few of these issues.
Before we delve into the specifics, let’s focus on why release 1.1 was a big milestone for the Istio community and how things were handled differently compared to previous releases:
- Quality was the major focus of this release. If you look at the history, you will notice that it took six release candidates to get the release out. The maintainers worked diligently to resolve tricky user-identified issues and address them correctly instead of getting the release out on a predefined date. We would see constant updates/PRs (even on weekends) to address these issues which is a testament to the dedication of the open source community.
- User experience was a key area of focus in the community for this release. There was a new UX Working Group created to address various usability issues and to improve the user’s Istio journey from install to upgrade. We believe that this is a step in the right direction and will lead to easier Istio adoption. Aspen Mesh actively participates in these meetings with an eye on improving the experience of Istio and Aspen Mesh users.
- Meaningful effort was put into improving the documentation, especially around consistent use of terminology.
It was great to see that the community listened to its users, addressed critical issues and didn’t rush to release 1.1. We look forward to how Project Mauve can further improve the engineering process, thereby improving the quality of Istio releases.
So, let’s move onto the exciting new features and improvements that are part of the Aspen Mesh 1.1 release.
Aspen Mesh 1.1 Features
Reduced sidecar memory usage
This was a long-standing issue that Istio users had faced when dealing with medium to large scale clusters. The Envoy sidecars’ memory consumption grew as new services and pods were deployed in the cluster resulting in a considerable memory footprint for each sidecar proxy. As these sidecars are part of every pod in the mesh this can quickly impact the scheduling and memory requirements for your cluster. In release 1.1, you can expect a significant reduction in the memory consumption by the sidecars. This benefit is primarily driven by reducing the set of statistics exposed by each sidecar. Previously, the sidecars were configured to expose metrics for every Envoy cluster, listener and HTTP connection manager which would increase the number of metrics reported roughly in proportion to the number of services and pods. In release 1.1, the set of metrics is now reduced to the cluster and listener managers (in addition to Istio specific stats) which always expose a fixed set of metrics. We found in our testing that the sidecar memory consumption is significantly lower compared to Aspen Mesh release 1.0.4 and we are looking forward to users being able to inject sidecars in more applications in their clusters.
New multi cluster support
Earlier versions of Istio supported multiple clusters via a single control plane topology. This meant that the Istio control plane would be deployed only on one cluster which would manage services on both local and remote clusters. Additionally, it required a flat network IP space for the pods to communicate across clusters. These restrictions limited real world uses of multi cluster functionality as the control plane could easily become a single point of failure and the flat IP space was not always feasible. In this release support was added for multiple control plane topology which provides the desired control plane availability and no restrictions on the IP layout. Networking across clusters is set up via the ingress gateways which rely on mTLS (common root Certificate Authority across clusters) to verify the peer traffic. We are excited to see new use cases emerge for multi cluster service mesh and how enterprises can leverage Aspen Mesh to truly build resilient and highly available applications deployed across clusters.
Istio by default sets up the pod traffic redirection to/from the sidecar proxy by injecting an init container which uses iptables under the hood. The ability to use iptables requires elevated permissions which is a hindrance to adopting Istio in various organizations due to compliance concerns. Istio and Aspen Mesh now support CNI as a new way to perform traffic redirection, removing the need for elevated permissions. It is great to see this enhancement as we think it is critical to have the principle of least privileges applied to the service mesh. We’re excited to be able to drive advanced compliance use cases with our customers over the next few months.
New sidecar resource
One of the biggest challenges users faced with the old releases was that all the sidecars in the mesh had configuration related to all the services in the cluster even though a particular sidecar proxy only needed to talk to a small subset of services. This resulted in excess churn as massive amounts of configuration were processed and transmitted to the sidecars with every configuration update. This caused intermittent request failures and CPU spikes in all the sidecars on any configuration change in the cluster. The 1.1 release added a new Sidecar resource to enable operators to configure the ingress and egress of each proxy. With this resource, users can control the scope and visibility of configuration distributed to the sidecars and attain better resource utilization and scalability of Istio components.
Apart from the aforementioned major changes, there are quite a few lesser known enhancements in this release which can be helpful in exploring Aspen Mesh capabilities.
Enabling end-user JWT authentication by path
Istio ingressgateway and sidecar proxies support decoding JWT provided by the end user and passing it to the applications as an HTTP request header. This has the operational benefit of isolating authentication from application code and instead using the service mesh infrastructure layer for these critical security operations. In earlier versions of Istio you could only enable/disable this feature on a per service or port basis but not for specific HTTP paths. This was very limiting especially for ingress gateways where you might have some paths requiring authentication and some that didn’t. In release 1.1, an experimental feature was added to enable end user JWT authentication based on request path.
New Helm installation options There are many new Helm installation options added in this release (in addition to the old ones) that are useful in customizing Aspen Mesh based on your needs. We often find that customer use cases are quite different and unique for every environment, so the addition of these options makes it easy to tailor service mesh to your needs. Some of the important new options are:
- Node selector – Many of our customers want to install the control plane components on their own nodes for better monitoring, isolation and resilience. In this release there is an easy Helm option, global.defaultNodeSelector to achieve this functionality.
- Tracing backend address – Users often have their tracing set up and want to easily add Istio on top to work with their existing tracing system. In the older version it was quite painful to provide a different tracing backend to Istio (used to be hardcoded to “zipkin.istio-system”). This release added a new “global.tracer.zipkin.address” Helm option to enable this functionality. If you’re an Aspen Mesh customer, we automatically set this up for you so that the traces are sent to the Aspen Mesh platform where you can access them via our hosted Jaeger service.
- Customizable proxy access log format – The sidecar proxies in the older releases performed access logging in the default Envoy format. Even though the information is great, you might have access logging set up in other systems in your environment and want to have a uniform access logging format throughout your cluster for ease of parsing, searching and tooling. This new release supports a Helm option “global.proxy.accessLogFormat” for users to easily customize the logging format based on their environment.
This release also added many debugging enhancements which make it easy for users to operate and debug when running an Aspen Mesh cluster. Some critical enhancements in this area were:
Istioctl is a tool similar to kubectl for performing Istio specific operations which can aid in debugging and validating various Istio configuration and runtime issues. There were several enhancements made to this tool which are worth mentioning:
- Verify install – Istioctl now supports an experimental command to verify the installation in your cluster. This is a great step for first time Istio users before you dive deeper into exploring all the Istio capabilities. If you’re an Aspen Mesh customer, our demo installer automatically does this step for you and lets you know if the installation was successful.
- Static configuration validation – Istioctl supports a “validate” command for users to verify their configuration before applying it to their cluster. Using this effectively can prevent easy misconfigurations and surprises which can be hard to debug. Note that Galley now also performs validation and rejects configuration if it’s invalid in the new release. If you’re an Aspen Mesh customer, you can use this new functionality in addition to the automated runtime analysis we perform via istio-vet. We find that the static single resource validation is a good first step but an automated tool like istio-vet from Aspen Mesh which can perform runtime analysis across multiple resources is also needed to ensure a properly functioning mesh.
- Proxy health status – Support was added to quickly inspect and verify the health status of proxy (default port 15020) which can be very useful in debugging request failures. We often found that users struggled in understanding what qualifies as a healthy Istio proxy (sidecar or gateways) and we think this can help to alleviate this issue.
Along with all of these great new improvements, there are a few gotchas or unexpected behaviors you might observe especially if you’re upgrading Istio from an older version. We’ve done a thorough investigation of these potential issues and are making sure our customers have a smooth transition with our releases. However, for the broader community let’s cover a few important gotchas to be aware of:
- Access allowed to any external services by default – The new Istio release will by default allow access to any external service. In previous releases, all external traffic was blocked and required users to explicitly whitelist external services via ServiceEntry. This decision was reached by the community to make it easier for customers to add Istio on top of their existing deployments and not break working traffic. However, we think this is a major change that can lead to security escapes if you’re upgrading to this version. With that in mind, the Aspen Mesh distribution of the release will continue to block all external traffic by default. If you want to customize this setting, the Helm option “global.outboundTrafficPolicy.mode” can be updated based on your requirement.
- Proxy access logs disabled by default – In this Istio release, the default behavior for proxy access logging has changed and it is now turned off by default. For first time users it is very helpful to observe access logs as the traffic flows through their services in the mesh. Additionally, if you’re upgrading to a new version and find that your logs are missing, it might break debugging capabilities that you have built around it. Because of this, the Aspen Mesh distribution has the proxy access logs turned on by default. You can customize this setting by updating the Helm option “global.proxy.accessLogFile” to “/dev/stdout”.
- Every Sidecar resource requires “istio-system” – If you’re configuring the newly available Sidecar resource, be sure to include “istio-system” as one of the allowed egress hosts. During our testing we found that in the absence of “istio-system” namespace, the sidecar proxies will start experiencing failures communicating to the Istio control plane which can lead to cascading failures. We are working with the community to address this issue so that users can configure this resource with minimal surprises.
- Mixer policy checks disabled by default – Mixer policy checks were turned on by default in earlier Istio releases which meant that the sidecar proxies and gateways would always consult Mixer in the Istio control plane to check policy and forward the request to the application only if the policy allowed it. This feature was seldom used but added latency due to the out-of-process network call. This new release turned off policy checks by default after much deliberation and debate in the community. What this means is if you had previously configured Policy checks and were relying on Mixer to enforce it, after the upgrade those configurations will no longer have any effect. If you would like to enable them by default, set the Helm option “global.disablePolicyChecks” to false.
We hope this blog has made it easy to understand the scope and impact of the 1.1 release. At Aspen Mesh, we keep a close tab on the community and actively participate to make the adoption and upgrade path easier for our customers. We believe that enterprises should spend less time and effort on configuring the service mesh and focus on adding business value on top.
We’ll be covering subsequent topics and deep diving into how you can set up and make the most out of new 1.1 features like multi cluster. Be sure to subscribe to the Aspen Mesh blog so you don’t miss out.