Aspen Mesh - Service Mesh Security and Complinace

Announcing Aspen Mesh 1.4.6 Security Update

Aspen Mesh is announcing the release of 1.4.6 which addresses important Istio security updates. Below are the details of the security fixes taken from Istio 1.4.6 security update.

Security Update: 

ISTIO-SECURITY-2020-003: Two Uncontrolled Resource Consumption and Two Incorrect Access Control Vulnerabilities in Envoy.

  • CVE-2020-8659: The Envoy proxy may consume excessive memory when proxying HTTP/1.1 requests or responses with many small (i.e. 1 byte) chunks. Envoy allocates a separate buffer fragment for each incoming or outgoing chunk with the size rounded to the nearest 4Kb and does not release empty chunks after committing data. Processing requests or responses with a lot of small chunks may result in extremely high memory overhead if the peer is slow or unable to read proxied data. The memory overhead could be two to three orders of magnitude more than configured buffer limits.
  • CVE-2020-8660: The Envoy proxy contains a TLS inspector that can be bypassed (not recognized as a TLS client) by a client using only TLS 1.3. Because TLS extensions (SNI, ALPN) are not inspected, those connections may be matched to a wrong filter chain, possibly bypassing some security restrictions.
  • CVE-2020-8661: The Envoy proxy may consume excessive amounts of memory when responding to pipelined HTTP/1.1 requests. In the case of illegally formed requests, Envoy sends an internally generated 400 error, which is sent to the Network::Connection buffer. If the client reads these responses slowly, it is possible to build up a large number of responses, and consume functionally unlimited memory. This bypasses Envoy’s overload manager, which will itself send an internally generated response when Envoy approaches configured memory thresholds, exacerbating the problem.
  • CVE-2020-8664: For the SDS TLS validation context in the Envoy proxy, the update callback is called only when the secret is received for the first time or when its value changes. This leads to a race condition where other resources referencing the same secret (e.g,. trusted CA) remain unconfigured until the secret’s value changes, creating a potentially sizable window where a complete bypass of security checks from the static (“default”) section can occur.
    • This vulnerability only affects the SDS implementation of Istio’s certificate rotation mechanism for Istio 1.4.5 and earlier which is only when SDS and mutual TLS are enabled. SDS is off by default and must be explicitly enabled by the operator in all versions of Istio prior to Istio 1.5. Istio’s default secret distribution implementation based on Kubernetes secret mounts is not affected by this vulnerability.

Bug Fix:

  • Fixed issue preventing Kubernetes nodes from being restarted

Minor Enhancements:

  • Allow private image registries to be specified more easily
  • Better checking of problematic names in Secure Ingress
  • Better mTLS status reporting
  • Improved classification of vetter notes

If you're already using Aspen Mesh, you can get the updates here.

 

New call-to-action
Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.4.4 & 1.3.8 Security Update

Aspen Mesh is announcing the release of 1.4.4 and 1.3.8 (based on upstream Istio 1.4.4 & 1.3.8), both addressing a critical security update. All our customers are strongly encouraged to upgrade to these versions immediately based on your currently deployed Aspen Mesh version.

The Aspen Mesh team reported this CVE to the Istio community as per Istio CVE reporting guidelines, and our team was able to further contribute to the community--and for all Istio users--by fixing this issue in upstream Istio. As this CVE is extremely easy to exploit and the risk score was deemed very high (9.0), we wanted to respond  with urgency by getting a patch in upstream Istio and out to our customers as quickly as possible. This is just one of the many ways we are able to provide value to our customers as a trusted partner by ensuring that the pieces from Istio (and Aspen Mesh) are secure and stable.

Below are details about the CVE-2020-8595, steps to verify whether you’re currently vulnerable, and how to upgrade to the patched releases.

CVE Description

A bug in Istio's Authentication Policy exact path matching logic allows unauthorized access to resources without a valid JWT token. This bug affects all versions of Istio (and Aspen Mesh) that support JWT Authentication Policy with path based trigger rules (all 1.3 & 1.4 releases). The logic for the exact path match in the Istio JWT filter includes query strings or fragments instead of stripping them off before matching. This means attackers can bypass the JWT validation by appending “?” or “#” characters after the protected paths.

Example Vulnerable Istio Configuration

Configure JWT Authentication Policy triggers on an exact HTTP path match "/productpage" like this. In this example, the Authentication Policy is applied at the ingress gateway service so that any requests with the exact path matching “/productpage” requires a valid JWT token. In the absence of a valid JWT token, the request is denied and never forwarded to the productpage service. However, due to this CVE, any request made to the ingress gateway with path "/productpage?" without any valid JWT token is not denied but sent along to the productpage service, thereby allowing access to a protected resource without a valid JWT token. 

Validation

Since this vulnerability is in the Envoy filter added by Istio, you can also check the proxy image you’re using in the cluster locally to see if you’re currently vulnerable. Download and run this script to verify if the proxy image you’re using is vulnerable. Thanks to Francois Pesce from the Google Istio team in helping us create this test script.

Upgrading to This Version

Please follow upgrade instructions in our documentation to upgrade to these versions. Since this vulnerability affects the sidecar proxies and gateways (ingress and egress), it is important to follow the post-upgrade tasks here and rolling upgrade all of your sidecar proxies and gateways.


Announcing Aspen Mesh 1.4 Service Mesh

Announcing Aspen Mesh 1.4

We’re excited to announce the release of Aspen Mesh 1.4 which is based on Istio’s latest LTS release 1.4 (specific tag 1.4.3). The Aspen Mesh 1.4.3 release builds upon our latest 1.3 release along with our new Secure Ingress API, that we will be writing about in detail in a blog soon to follow, alongside the new capabilities of Istio 1.4.

Our Open Source Commitment

Aspen Mesh has been heavily involved with the Istio community for this release. A core value of Aspen Mesh is to help drive adoption of Istio. Our Istio Vet tool was the the first tool created to help analyze configuration issues and it is great to have this capability within the core open source project by using istioctl analyze. Aspen Mesh is committed to helping our customers and the larger Istio community as we will continue to add analyzers in the future based on issues that arise.

For 1.4, we are also pleased that we could help the community create istio-client-go, allowing the open source community to programmatically use and extend the Istio APIs in their software. We have deprecated the widely adopted Aspen Mesh version of our istio-client-go in favor of the the community’s version.

Automatic mTLS

Automatic mTLS is a feature that Aspen Mesh is eager about enabling in the future, but for now is disabled in upstream Istio as it is in our supported release. We are strong advocates of using mTLS for service-to-service communication, but users adopting Istio will sometimes encounter issues and will need to debug mTLS in their service mesh.

Mixer-less Telemetry

Mixer-less telemetry moves from an experimental feature into alpha in Istio 1.4. Aspen Mesh is working with the open source community to ensure feature parity with Mixer during this state of transition, however, we continue to support Istio Mixer at this time. Please see our announcement of Aspen Mesh 1.3 to see the importance of this feature.

Istio Authorization Policies

With policy being a key reason why our customers are adopting Istio, the new Authorization Policy is a step towards simplifying Policy. Istio continues to improve their APIs to better achieve a simplified model of service mesh configuration. Stay tuned for future blogs as Aspen Mesh is a strong proponent of this model.

istioctl

Along with the new analysis capabilities referenced above, istioctl also has the capability to help users with installation and upgrade along with a new experimental capability to wait for configuration to be passed to all proxies within your service mesh. At this time, we believe Helm provides the enterprise with the best way to manage deploying Istio into production, but we are always exploring and evaluating new approaches to managing your Istio deployment.

Analyzers included in Istio 1.4 help ensure that Gateways are configured properly, features that are being deprecated in Istio that are being used are given a warning message, and that all pods containing sidecars have the same, expected version of Envoy running. These are only some of the analyzers included and more information can be found here. Some of the work from Aspen Mesh’s Istio Vet was able to be ported to istioctl with the community’s help. We intend to continue porting any capabilities found in istio-vet into istioctl.

The Aspen Mesh 1.4 binaries are available for download here


Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.3.6 Security Update

Aspen Mesh is announcing the release of 1.3.6 which addresses important Istio security updates. Below are the details of the security fixes taken from Istio 1.3.6 security update.

Security Update: 

ISTIO-SECURITY-2019-007: A heap overflow and improper input validation have been discovered in Envoy.

  • CVE-2019-18801: Fix a vulnerability affecting Envoy’s processing of large HTTP/2 request headers. A successful exploitation of this vulnerability could lead to a denial of service, escalation of privileges, or information disclosure.
  • CVE-2019-18802: Fix a vulnerability resulting from whitespace after HTTP/1 header values which could allow an attacker to bypass Istio’s policy checks, potentially resulting in information disclosure or escalation of privileges.

Bug Fix:

  • Fixed an issue where a duplicate listener was generated for a proxy’s IP address when using a headless TCP service. (Issue 17748)
  • Fixed an issue with the destination_service label in HTTP related metrics incorrectly falling back to request.host which can cause a metric cardinality explosion for ingress traffic. (Issue 18818)

Minor Enhancements:

  • Added support for Citadel to periodically check the root certificate remaining lifetime and rotate expiring root certificates. (Issue 17059)
  • Added PILOT_BLOCK_HTTP_ON_443 boolean environment variable to Pilot. If enabled, this flag prevents HTTP services from running on port 443 in order to prevent conflicts with external HTTP services. This is disabled by default. (Issue 16458)

Additionally, the Aspen Mesh 1.3.5 release contains bug fixes and minor enhancements from Istio release 1.3.5.  

The Aspen Mesh 1.3.6 binaries are available for download here.  


Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.3.5 Security Update

Aspen Mesh is announcing the release of 1.3.5 which addresses important Istio security updates. Below are the details of the security fixes taken from the Istio 1.3.5 security update.

Security Update: 

ISTIO-SECURITY-2019-006: A DoS vulnerability has been discovered in Envoy.

  • CVE-2019-18817: An infinite loop can be triggered in Envoy if the option continue_on_listener_filters_timeout is set to True, which is the case in Istio. This vulnerability could be leveraged for a DoS attack. 

Bug Fix:

  • Fixed Envoy listener configuration for TCP headless services. (Issue #17748)
  • Fixed an issue which caused stale endpoints to remain even when a deployment was scaled to 0 replicas. (Issue #14436)
  • Fixed Pilot to no longer crash when an invalid Envoy configuration is generated. (Issue #17266)
  • Fixed an issue with the destination_service_name label not getting populated for TCP metrics related to BlackHole/Passthrough clusters. (Issue #17271)
  • Fixed an issue with telemetry not reporting metrics for BlackHole/Passthrough clusters when fall through filter chains were invoked. This occurred when explicit ServiceEntries were configured for external services. (Issue #17759)

Minor Enhancements:

  • Added support for Citadel to periodically check the root certificate remaining lifetime and rotate expiring root certificates. (Issue #17059)
  • Added PILOT_BLOCK_HTTP_ON_443 boolean environment variable to Pilot. If enabled, this flag prevents HTTP services from running on port 443 in order to prevent conflicts with external HTTP services. This is disabled by default. (Issue #16458)

Additionally, the Aspen Mesh 1.3.5 release contains bug fixes and minor enhancements from Istio release 1.3.4.  

The Aspen Mesh 1.3.5 binaries are available for download here.  


Aspen Mesh 1.3

Announcing Aspen Mesh 1.3

We’re excited to announce the release of Aspen Mesh 1.3 which is based on Istio’s latest LTS release 1.3 (specific tag version 1.3.3). This release builds on our self-managed release (1.2 series), includes all the new capabilities added by the Istio community in release 1.3 plus a host of new Aspen Mesh features, all fully tested with production grade support ready for enterprise adoption.

The theme for Aspen Mesh and Istio 1.3 release was enhanced User Experience. The release includes an enhanced user dashboard that has been redesigned for easier navigation of service graph and cluster resources. The Aspen Mesh service graph view has been augmented to include ingress and egress services as well as easier access to health and policy details for nodes on the graph. While a service graph is a great tool for visualizing service communication as a team, we realized that in order  to quickly identify services that are experiencing problems, individual platform engineers need a view that allows them to dig deeper and gain additional insight into their services. To address this, we are releasing a new table view which provides access to additional information about clusters, namespaces and workloads including the ingress and egress services they are communicating with and any warnings or errors for those objects as detected by our open source configuration analyzer Istio Vet.

Aspen Mesh 1.3

The Istio community added new capabilities which makes it easy for users to adopt and debug Istio and also reduced the configuration needed for users to get service mesh working in their Kubernetes environment. The full list of features and enhancements can be found in Istio’s release announcement, but there are few features that deserve deeper analysis.

Specifying Container Ports Is No Longer Required

Before release 1.3, Istio only intercepted inbound traffic on ports that were explicitly declared as part of the container spec in Kubernetes. This was often a cause of friction for adoption as Kubernetes doesn’t require container ports to be specified and by default forwards traffic to any unlisted port. Making this even worse, any unlisted inbound port bypassed the sidecar proxy (instead of being blocked) which created a potential security risk as bypassing the proxy meant no policies were being enforced. In this release, specifying container ports is no longer required and by default all ports are intercepted for traffic and redirected to sidecar proxy which means misconfiguration will no longer lead to security violations! If for some reason, you would still like to explicitly specify inbound ports instead of capturing all which we highly recommend) you can use the annotation “traffic.sidecar.istio.io/includeInboundPorts” on the pod spec.

Protocol Detection

In earlier versions of Istio, all service port names were required to be explicitly named with the protocol prefix (http-, grpc-, tcp-, etc) to declare the protocol being used by the service port. In the absence of a prefix, traffic was classified as TCP which meant a loss in visibility (metrics/tracing). It was also possible to bypass policy if a user  had configured HTTP or Layer 7 policies thinking that the application was accepting Layer 7 but the mesh was classifying it as TCP traffic. Experienced users of Kubernetes who already had a lot of existing configuration had to migrate their service definitions to add this prefix which lead to a lot of missing configurations and adoption burden. In release 1.3, an experimental protocol detection feature was added which doesn’t require users to prefix the service port name for HTTP traffic. Note that this feature is experimental and only works for HTTP traffic - for all other protocols you still need to add the prefix on the port names. Protocol detection is a useful functionality which can reduce configuration burden for users but it can interact with policies and routing in unexpected ways. We are working with the Istio community to iron out these interactions and will be publishing a blog soon on best recommended practices for production usage. In the meantime, this feature is disabled by default in the Aspen Mesh release and we  encourage our customers to enable this only in staging environments. Additionally, for Aspen Mesh customers, we automatically run the service port prefix vetter and notify you if any service in the mesh has ports with missing protocol prefixes.

Mixer-less Telemetry

Earlier versions of Istio had a control plane component, Mixer, which was responsible for receiving attributes about traffic from sidecar proxies in the client and server workloads and exposing it to a telemetry backend system like Prometheus or DataDog. This architecture was great for providing an abstraction layer for operators to switch out telemetry backend systems, but this component often became a choke point which required a large amount of resources (CPU/memory) which made it expensive for operators to manage Istio. In this release, an experimental feature was added which doesn’t require running Mixer to capture telemetry. In this mode, the sidecar proxies expose the metrics directly, which can be scraped by Prometheus. This feature is disabled by default and under active development to make sure users get the same metrics with and without Mixer. This page documents how to enable and use this feature  if you’re interested in trying it out.

Telemetry for External Services

Depending on your global settings i.e. whether to allow any external service access or block all traffic without explicit ServiceEntries, there were gaps in telemetry when external traffic was either blocked or allowed. Having visibility into external services is one of the key benefits of a service mesh and the new functionality added in release 1.3 allows you to monitor all external service traffic in either of the modes. It was a highly requested feature both from our customers and other production users of Istio, and we were pleased to  contribute this functionality to open source Istio. This blog documents how the augmented metrics can be used to better understand external service access. Note that all Aspen Mesh releases by default block all external service access, which we recommend, unless explicitly declared via ServiceEntries.

We hope that these new features simplify configuration needed to adopt Aspen Mesh and the enhanced User Experience makes it easy for you to navigate the complexities of a microservices environment. You can get the latest release here or if you’re an existing customer please follow the upgrade instructions in our documentation to switch to this version.

 


Aspen Mesh - Service Mesh Security and Complinace

Aspen Mesh 1.2.7 Security Update

Aspen Mesh is announcing the release of 1.2.7 which addresses important Istio security updates. Below are the details of the security fixes taken from Istio 1.2.7 security update.

Security Update 

ISTIO-SECURITY-2019-005: A DoS vulnerability has been discovered by the Envoy community. 

  • CVE-2019-15226: After investigation, the Istio team has found that this issue could be leveraged for a DoS attack in Istio if an attacker uses a high quantity of very small headers.

Bug Fix

  • Fix a bug where nodeagent was failing to start when using citadel (Issue 15876)

Additionally the Aspen Mesh 1.2.7 release contains bug fixes and enhancements from Istio release 1.2.6.  

The Aspen Mesh 1.2.7 binaries are available for download here

 For upgrading procedures of Aspen Mesh deployments installed via Helm (helm upgrade) please visit our Getting Started page.


Aspen Mesh Enterprise Service Mesh

Aspen Mesh for Self-hosted Environments

We’re excited to announce the launch of our first self-managed version of Aspen Mesh, designed for deployment to your infrastructure within any cloud or on-premises.  With Aspen Mesh 1.2.5-am1 you get the advanced functionality, the rich dashboard, and the expert support you’re used to from Aspen Mesh, all built on the open source power of Istio 1.2.5.

With this release, we're making it easy for enterprises with self-hosted environments to get all the benefits of Aspen Mesh.  This means you can now use your existing Prometheus, Grafana and Jaeger with Aspen Mesh and we no longer require customers to send data out of their clusters.  We’re deprecating prior versions of Aspen Mesh that include these hosted elements. All Aspen Mesh customers will need to upgrade to Aspen Mesh 1.2.5-am1 before October 17, 2019.

If you’d like any help with this upgrade, reach out your account rep or email us at support@aspenmesh.io.

Why the change?

We first started building Aspen Mesh in the summer of 2017, launching the first version built on Istio 0.2.4. Since then, we’ve focused on helping enterprises harness the power of service mesh by delivering an integrated solution that provided the core elements of Istio along with a hosted solution for Prometheus, Grafana, and Jaeger. 

What we’ve found is that while enterprises are looking to work with someone like Aspen Mesh to better harness the power of service mesh, they usually have existing installations of Prometheus, Grafana, and Jaeger and don’t want or need a hosted, integrated, supported solution.  And customers who wanted to get started with basic insights right away had to conduct security audits before they could send us service metrics and trace headers.

Installing Aspen Mesh in your environment

With those obstacles out of the way, it’s now much easier to get started using Aspen Mesh in your environment.  All you need is a Kubernetes cluster that has Prometheus and Helm/Tiller and you’re ready to go. Follow the detailed instructions in our Getting Started Guide and reach out to us on support@aspenmesh.io if you have any questions.  If you don’t have an account yet, sign up now so you can view our releases and documentation.

What’s next?

We’re heads down working on our next set of features that will help enterprises better take advantage of the rich telemetry available in the mesh and better harness the power of mesh policy at scale within their organizations.  Keep an eye on this space for more.

 

 


Important Security Updates in Aspen Mesh 1.1.13

Aspen Mesh is announcing the release of 1.1.13 which addresses important Istio security updates.  Below are the details of the security fixes taken from Istio 1.1.13 security update.

ISTIO-SECURITY-2019-003: An Envoy user reported publicly an issue (c.f. Envoy Issue 7728) about regular expressions matching that crashes Envoy with very large URIs.

  • CVE-2019-14993: After investigation, the Istio team has found that this issue could be leveraged for a DoS attack in Istio, if users are employing regular expressions in some of the Istio APIs: JWT, VirtualService, HTTPAPISpecBinding, QuotaSpecBinding .

ISTIO-SECURITY-2019-004: Envoy, and subsequently Istio are vulnerable to a series of trivial HTTP/2-based DoS attacks:

  • CVE-2019-9512: HTTP/2 flood using PING frames and queuing of response PING ACK frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9513: HTTP/2 flood using PRIORITY frames that results in excessive CPU usage and starvation of other clients.
  • CVE-2019-9514: HTTP/2 flood using HEADERS frames with invalid HTTP headers and queuing of response RST_STREAM frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9515: HTTP/2 flood using SETTINGS frames and queuing of SETTINGS ACK frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9518: HTTP/2 flood using frames with an empty payload that results in excessive CPU usage and starvation of other clients.
  • See this security bulletin for more information

The 1.1.13 binaries are available for download here 

Upgrading procedures of Aspen Mesh deployments installed via Helm (helm install) please visit our Getting Started page. 

 


Announcing Aspen Mesh 1.1

Aspen Mesh release 1.1.3 is now out to address a critical security update. The Aspen Mesh release is based on security patches released in the Istio 1.1.3 release - you can read more about the update here. We recommend Aspen Mesh users running 1.1 immediately upgrade to 1.1.3.

Close on the heels of the much anticipated Istio 1.1 release, we are excited to announce the release of Aspen Mesh 1.1. Our latest release provides all the features of Istio plus the support of the Aspen Mesh platform and team, and additional features you need to operate in the enterprise.

As with previous Istio releases, the open source community has done a great job of creating a release with exciting new features, improved stability and enhanced performance. The aim of this blog is to distill what has changed in the new release and point out the few gotchas and known issues in an easy to consume format.  We often find that there are so many changes (kudos to the community for all the hard work!) that it is difficult for users to discern what pieces they should care about and what actions they need to take on the pieces they care about. Hopefully, this blog will help address a few of these issues.

Before we delve into the specifics, let’s focus on why release 1.1 was a big milestone for the Istio community and how things were handled differently compared to previous releases:

  • Quality was the major focus of this release. If you look at the history, you will notice that it took six release candidates to get the release out. The maintainers worked diligently to resolve tricky user-identified issues and address them correctly instead of getting the release out on a predefined date. We would see constant updates/PRs (even on weekends) to address these issues which is a testament to the dedication of the open source community.
  • User experience was a key area of focus in the community for this release. There was a new UX Working Group created to address various usability issues and to improve the user’s Istio journey from install to upgrade. We believe that this is a step in the right direction and will lead to easier Istio adoption. Aspen Mesh actively participates in these meetings with an eye on improving the experience of Istio and Aspen Mesh users.
  • Meaningful effort was put into improving the documentation, especially around consistent use of terminology.

It was great to see that the community listened to its users, addressed critical issues and didn’t rush to release 1.1. We look forward to how Project Mauve can further improve the engineering process, thereby improving the quality of Istio releases.

So, let’s move onto the exciting new features and improvements that are part of the Aspen Mesh 1.1 release.

Aspen Mesh 1.1 Features

Reduced sidecar memory usage
This was a long-standing issue that Istio users had faced when dealing with medium to large scale clusters. The Envoy sidecars’ memory consumption grew as new services and pods were deployed in the cluster resulting in a considerable memory footprint for each sidecar proxy. As these sidecars are part of every pod in the mesh this can quickly impact the scheduling and memory requirements for your cluster. In release 1.1, you can expect a significant reduction in the memory consumption by the sidecars. This benefit is primarily driven by reducing the set of statistics exposed by each sidecar. Previously, the sidecars were configured to expose metrics for every Envoy cluster, listener and HTTP connection manager which would increase the number of metrics reported roughly in proportion to the number of services and pods. In release 1.1, the set of metrics is now reduced to the cluster and listener managers (in addition to Istio specific stats) which always expose a fixed set of metrics. We found in our testing that the sidecar memory consumption is significantly lower compared to Aspen Mesh release 1.0.4 and we are looking forward to users being able to inject sidecars in more applications in their clusters.

New multi cluster support

Earlier versions of Istio supported multiple clusters via a single control plane topology. This meant that the Istio control plane would be deployed only on one cluster which would manage services on both local and remote clusters. Additionally, it required a flat network IP space for the pods to communicate across clusters. These restrictions limited real world uses of multi cluster functionality as the control plane could easily become a single point of failure and the flat IP space was not always feasible. In this release support was added for multiple control plane topology which provides the desired control plane availability and no restrictions on the IP layout. Networking across clusters is set up via the ingress gateways which rely on mTLS (common root Certificate Authority across clusters) to verify the peer traffic. We are excited to see new use cases emerge for multi cluster service mesh and how enterprises can leverage Aspen Mesh to truly build resilient and highly available applications deployed across clusters.   

CNI support
Istio by default sets up the pod traffic redirection to/from the sidecar proxy by injecting an init container which uses iptables under the hood. The ability to use iptables requires elevated permissions which is a hindrance to adopting Istio in various organizations due to compliance concerns. Istio and Aspen Mesh now support CNI as a new way to perform traffic redirection, removing the need for elevated permissions. It is great to see this enhancement as we think it is critical to have the principle of least privileges applied to the service mesh. We’re excited to be able to drive advanced compliance use cases with our customers over the next few months.

New sidecar resource
One of the biggest challenges users faced with the old releases was that all the sidecars in the mesh had configuration related to all the services in the cluster even though a particular sidecar proxy only needed to talk to a small subset of services. This resulted in excess churn as massive amounts of configuration were processed and transmitted to the sidecars with every configuration update. This caused intermittent request failures and CPU spikes in all the sidecars on any configuration change in the cluster. The 1.1 release added a new Sidecar resource to enable operators to configure the ingress and egress of each proxy. With this resource, users can control the scope and visibility of configuration distributed to the sidecars and attain better resource utilization and scalability of Istio components.

Apart from the aforementioned major changes, there are quite a few lesser known enhancements in this release which can be helpful in exploring Aspen Mesh capabilities.

Enabling end-user JWT authentication by path
Istio ingressgateway and sidecar proxies support decoding JWT provided by the end user and passing it to the applications as an HTTP request header. This has the operational benefit of isolating authentication from application code and instead using the service mesh infrastructure layer for these critical security operations. In earlier versions of Istio you could only enable/disable this feature on a per service or port basis but not for specific HTTP paths. This was very limiting especially for ingress gateways where you might have some paths requiring authentication and some that didn’t. In release 1.1, an experimental feature was added to enable end user JWT authentication based on request path.

New Helm installation options There are many new Helm installation options added in this release (in addition to the old ones) that are useful in customizing Aspen Mesh based on your needs. We often find that customer use cases are quite different and unique for every environment, so the addition of these options makes it easy to tailor service mesh to your needs. Some of the important new options are:

  • Node selector - Many of our customers want to install the control plane components on their own nodes for better monitoring, isolation and resilience. In this release there is an easy Helm option, global.defaultNodeSelector to achieve this functionality.
  • Tracing backend address - Users often have their tracing set up and want to easily add Istio on top to work with their existing tracing system. In the older version it was quite painful to provide a different tracing backend to Istio (used to be hardcoded to “zipkin.istio-system”). This release added a new “global.tracer.zipkin.address” Helm option to enable this functionality. If you’re an Aspen Mesh customer, we automatically set this up for you so that the traces are sent to the Aspen Mesh platform where you can access them via our hosted Jaeger service.
  • Customizable proxy access log format - The sidecar proxies in the older releases performed access logging in the default Envoy format. Even though the information is great, you might have access logging set up in other systems in your environment and want to have a uniform access logging format throughout your cluster for ease of parsing, searching and tooling. This new release supports a Helm option “global.proxy.accessLogFormat” for users to easily customize the logging format based on their environment.

This release also added many debugging enhancements which make it easy for users to operate and debug when running an Aspen Mesh cluster. Some critical enhancements in this area were:

Istioctl enhancements
Istioctl is a tool similar to kubectl for performing Istio specific operations which can aid in debugging and validating various Istio configuration and runtime issues. There were several enhancements made to this tool which are worth mentioning:

  • Verify install - Istioctl now supports an experimental command to verify the installation in your cluster. This is a great step for first time Istio users before you dive deeper into exploring all the Istio capabilities. If you’re an Aspen Mesh customer, our demo installer automatically does this step for you and lets you know if the installation was successful.
  • Static configuration validation - Istioctl supports a “validate” command for users to verify their configuration before applying it to their cluster. Using this effectively can prevent easy misconfigurations and surprises which can be hard to debug. Note that Galley now also performs validation and rejects configuration if it’s invalid in the new release. If you’re an Aspen Mesh customer, you can use this new functionality in addition to the automated runtime analysis we perform via istio-vet. We find that the static single resource validation is a good first step but an automated tool like istio-vet from Aspen Mesh which can perform runtime analysis across multiple resources is also needed to ensure a properly functioning mesh.
  • Proxy health status - Support was added to quickly inspect and verify the health status of proxy (default port 15020) which can be very useful in debugging request failures. We often found that users struggled in understanding what qualifies as a healthy Istio proxy (sidecar or gateways) and we think this can help to alleviate this issue.

Along with all of these great new improvements, there are a few gotchas or unexpected behaviors you might observe especially if you’re upgrading Istio from an older version. We’ve done a thorough investigation of these potential issues and are making sure our customers have a smooth transition with our releases. However, for the broader community let’s cover a few important gotchas to be aware of:

  • Access allowed to any external services by default - The new Istio release will by default allow access to any external service. In previous releases, all external traffic was blocked and required users to explicitly whitelist external services via ServiceEntry. This decision was reached by the community to make it easier for customers to add Istio on top of their existing deployments and not break working traffic. However, we think this is a major change that can lead to security escapes if you’re upgrading to this version. With that in mind, the Aspen Mesh distribution of the release will continue to block all external traffic by default. If you want to customize this setting, the Helm option “global.outboundTrafficPolicy.mode” can be updated based on your requirement.
  • Proxy access logs disabled by default - In this Istio release, the default behavior for proxy access logging has changed and it is now turned off by default. For first time users it is very helpful to observe access logs as the traffic flows through their services in the mesh. Additionally, if you’re upgrading to a new version and find that your logs are missing, it might break debugging capabilities that you have built around it. Because of this, the Aspen Mesh distribution has the proxy access logs turned on by default. You can customize this setting by updating the Helm option “global.proxy.accessLogFile” to “/dev/stdout”.
  • Every Sidecar resource requires “istio-system” - If you’re configuring the newly available Sidecar resource, be sure to include “istio-system” as one of the allowed egress hosts. During our testing we found that in the absence of “istio-system” namespace, the sidecar proxies will start experiencing failures communicating to the Istio control plane which can lead to cascading failures. We are working with the community to address this issue so that users can configure this resource with minimal surprises.
  • Mixer policy checks disabled by default -  Mixer policy checks were turned on by default in earlier Istio releases which meant that the sidecar proxies and gateways would always consult Mixer in the Istio control plane to check policy and forward the request to the application only if the policy allowed it. This feature was seldom used but added latency due to the out-of-process network call. This new release turned off policy checks by default after much deliberation and debate in the community. What this means is if you had previously configured Policy checks and were relying on Mixer to enforce it, after the upgrade those configurations will no longer have any effect. If you would like to enable them by default, set the Helm option “global.disablePolicyChecks” to false.

We hope this blog has made it easy to understand the scope and impact of the 1.1 release. At Aspen Mesh, we keep a close tab on the community and actively participate to make the adoption and upgrade path easier for our customers. We believe that enterprises should spend less time and effort on configuring the service mesh and focus on adding business value on top.

We'll be covering subsequent topics and deep diving into how you can set up and make the most out of new 1.1 features like multi cluster. Be sure to subscribe to the Aspen Mesh blog so you don't miss out.

If you want to quickly get started with the Aspen Mesh 1.1 release grab it here or if you’re an existing customer please follow our upgrade instructions mentioned in the documentation.