How to Capture Packets that Don't Exist

How to Capture Packets that Don’t Exist

One of my favorite networking tools is Wireshark; it shows you a packet-by-packet view of what’s going on in your network. Wireshark’s packet capture view is the lowest level and most extensive you can get before you have to bust out the oscilloscope. This practice is well-established in the pre-Kubernetes world, but it has some challenges if you’re moving to a Cloud Native environment. If you are using or moving to Cloud Native, you’re going to want to use packet-level tools and techniques in any environment, so that’s why we built Aspen Mesh Packet Inspector. It’s designed to address these challenges across various environments, so you can more easily see what’s going on in your network without the complexity.  

Let me explain the challenges facing our users moving into a Kubernetes world. It’s important to note that there are two parts to a troubleshooting session based on packet capture: actually capturing the packets, and then loading them into your favorite tool. Aspen Mesh Packet Inspector enables users to capture packets even in Kubernetes. That’s the part you need to address to power all the existing tools you probably already have.  Leveraging the existing tools is important as our customers have invested heavily in them.  And not just monetarily – their reputation for reliable apps, services and networks depend on the reliability and usefulness of the tools, procedures and experience powered by a packet view. 

What’s so hard about capturing these packets in modern app architectures on Kubernetes? The two biggest challenges are that these packets may never be actual packets that go through a switch, and even if they were, they’d be encrypted and useless. 

Outside of the Kubernetes world, there are many different approaches to capture packets. You can capture packets right on your PC to debug a local issue. For serious network debugging, you’re usually capturing packets directly on networking hardware, like a monitor port on a switch, or dedicated packet taps or brokers. But in Kubernetes, some traffic will never hit a dedicated switch or tap. Kubernetes is used to schedule multiple containers onto the same physical or virtual machine. If one container wants to talk to another container that happens to be on the same machine, then the packets exchanged between them are virtual – they're just bytes in RAM that the operating system shuffles between containers. 

There’s no guarantee that the two containers that you care about will be scheduled onto the same machine, and there’s no guarantee that they won’t beIn fact, if you know two containers are going to want to talk to each other a lot, it’s a good idea to encourage scheduling on the same node for performance: these virtual packets don’t consume any capacity on your switch and advanced techniques can accelerate container-to-container traffic inside a machine. 

Customers that stake their reputation on reliability don’t like mixing “critical tool” and “no guarantee”.  They need to capture traffic right at the edge of the container. That’s what Aspen Mesh Packet Inspector does. It’s built into Carrier-Grade Aspen Mesh, a service mesh purpose built for these critical applications. 

There’s still a problem though – if you are building apps on Kubernetes, you should be encrypting traffic between pods. It’s a best practice that is also required by various standards including those behind 5G.  In the past, capture tools have relied on access to the encryption key to show the decrypted info. New encryption like TLS1.3 has a feature called “forward secrecy” that impedes this. Forward secrecy means every connection is protected with its own temporary key that was securely created by the client and the server – if your tool wasn’t in-the-middle when this key was generated, it’s too late. Access to the server’s encryption key later won't work. 

One approach is to force a broker or tap into the middle for all connections. But that means you need a powerful (i.e. expensive) broker, and it’s a single-point-of-failure. Worse, it’s a security single-point-of-failure: everything in the network has to trust it to get in the middle of all conversations. 

Our users already have something better suited – an Aspen Mesh sidecar (built on Envoy). They’re already using a sidecar next to each container to offload encryption using strong techniques like mutual TLS with forward secrecy. Each sidecar has only one security identity for the particular app container it is protecting, so sidecars can safely authenticate each other without any trusted-box-in-the-middle games. 

That’s the second key part of Aspen Mesh Packet Inspector – because Aspen Mesh is where the plaintext-to-encrypted operation happens (right before leaving the Kubernetes pod), we can record the plaintext. We capture the plaintext and slice it into virtual packets (in a standard “pcap” format). When we feed it to a capture system like a packet broker, we use mutual TLS to protect the captured data.  Our users combine this with a secure packet broker, and get to see the plaintext that was safely and securely transported all the way from the container edge to their screen. 

If you’re a service provider operating Kubernetes at scale, packet tapping capabilities are critical for you to be able to operate the networks effectively, securely and within regulatory and compliance standards. Aspen Mesh Packet Inspector provides the missing link in Kubernetes, providing full packet visibility for troubleshooting and meeting lawful intercept requirements.  


Aspen Mesh 1.5.10-am1 & 1.6.12-am2 Available for Envoy Security Updates

We recently released Aspen Mesh 1.5.10 as well as 1.6.12-am2, which both address important security updates related to an HTTP header security vulnerability that was recently reported in Envoy. We highly recommend our customers to update to these patched versions by following the instructions here.

At Aspen Mesh, we’re dedicated to helping keep upstream open source and cloud-native applications (including Istio and Envoy) as secure and healthy as possible. When this vulnerability was discovered, we were eager to quickly contribute to a fix in order to keep Istio’s (and Aspen Mesh’s) end-users secure.

Envoy HTTP Header Security Vulnerability

Envoy recently reported an incorrect handling of duplicate HTTP headers (CVE-2020-25017). In this CVE, it was reported that Envoy was only considering the first value when multiple values were presented for a non-inline header. The logic would then ignore all following values for a non-iline header when executing matching logic. An attacker could use this exploit as a request policy bypass by including additional non-inline headers with the same key, but with different values, provided that only the first header value matched the routing rule. As a result, client requests could be routed incorrectly and Envoy would pass different headers to the application workloads than what it has in its own internal representation leading to inconsistency. 

In this blog, we will cover test cases and policy examples which our users can deploy to validate if their current version is affected and how this CVE can be exploited by attackers in their environment. 

Test Cases Based on Istio Policies

In this Envoy CVE, there was a vulnerability allowing attackers to set multiple values of a non-inline HTTP headers, such as x-foo:bar and x-foo:baz whereby the affected Envoy components would only observe the first value, x-foo:bar, in matchers, but both x-foo:bar and x-foo:baz would be forwarded to the application workload. Upstreams may take both values into consideration, resulting in an inconsistency between Envoy’s request matching and the upstream view of the request. You can find the complete list of inline HTTP headers in Envoy here for reference.

The Aspen Mesh team worked to develop a solution to trigger and verify the fix for the security vulnerability by working through several test cases. For these test cases, we developed a VirtualService policy resource that specified incoming requests that matched an x-foo header with only a value baz. This would be routed to the httpbin service in the baz namespace. Any other request that did not match that rule would be routed to the httpbin service in the default namespace.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: httpbin-headers
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - match:
    - headers:
        x-foo:
          exact: baz
    route:
    - destination:
        host: httpbin.baz.svc.cluster.local
  - route:
    - destination:
        host: httpbin.default.svc.cluster.local

 

With this policy deployed, we generated client traffic with different combinations of non-inline headers to trigger this CVE, and the results are summarized in this table below:

Test Case 1.5.7-am5 destination service 1.5.10-am1 destination service
curl httpbin.default:8000/get -H "x-foo: bar" -H "x-foo: baz" httpbin.default.svc.cluster.local httpbin.default.svc.cluster.local
curl httpbin.default:8000/get -H "x-foo: bar" httpbin.default.svc.cluster.local httpbin.default.svc.cluster.local
curl httpbin.default:8000/get -H "x-foo: baz" httpbin.baz.svc.cluster.local httpbin.baz.svc.cluster.local
curl httpbin.default:8000/get  httpbin.default.svc.cluster.local httpbin.default.svc.cluster.local
curl httpbin.default:8000/get -H "x-foo: baz" -H "x-foo: bar" httpbin.baz.svc.cluster.local httpbin.default.svc.cluster.local

 

We can see in Aspen Mesh version 1.5.7-am5 that the request is routed incorrectly to the httpbin in the baz namespace thereby bypassing configured policy, but in 1.5.10-am1, the same request is routed to thehttpbin service in the default namespace, as we expected. This behavior change, along with the verification of the upstream commits that fixed this issue from upstream Envoy, leads us to believe that our 1.5.10-am1 release contains the required CVE fix. We ran a similar experiment on the affected Aspen Mesh version 1.6.5-am1 and the patched version 1.6.12-am2 to validate that CVE is addressed correctly.

The official remediation fixes these inconsistencies, providing a more secure Aspen Mesh installation with uniform policy enforcement. The newest Aspen Mesh binaries are available for download here.  


Doubling Down On Istio

Good startups believe deeply that something is true about the future, and organize around it.

When we founded Aspen Mesh as a startup inside of F5, my co-founders and I believed these things about the future:

  1. App developers would accelerate their pace of innovation by modularizing and building APIs between modules packaged in containers.
  2. Kubernetes APIs would become the lingua franca for describing app and infrastructure deployments and Kubernetes would be the best platform for those APIs.
  3. The most important requirement for accelerating is to preserve control without hindering modularity, and that’s best accomplished as close to the app as possible.

We built Aspen Mesh to address item 3. If you boil down reams of pitch decks, board-of-directors updates, marketing and design docs dating back to summer of 2017, that's it. That's what we believe, and I still think we're right.

Aspen Mesh is a service mesh company, and the lowest levels of our product are the open-source service mesh Istio. Istio has plenty of fans and detractors; there are plenty of legitimate gripes and more than a fair share of uncertainty and doubt (as is the case with most emerging technologies). With that in mind, I want to share why we selected Istio and Envoy for Aspen Mesh, and why we believe more strongly than ever that they're the best foundation to build on.

 

Why a service mesh at all?

A service mesh is about connecting microservices. The acceleration we're talking about relies on applications that are built out of small units (predominantly containers) that can be developed and owned by a single team. Stitching these units into an overall application requires APIs between them. APIs are the contract. Service Mesh measures and assists contract compliance. 

There's more to it than reading the 12-factor app. All these microservices have to effectively communicate to actually solve a user's problem. Communication over HTTP APIs is well supported in every language and environment so it has never been easier to get started.  However, don't let the simplicity delude: you are now building a distributed system. 

We don't believe the right approach is to demand deep networking and infrastructure expertise from everyone who wants to write a line of code.  You trade away the acceleration enabled by containers for an endless stream of low-level networking challenges (as much as we love that stuff, our users do not). Instead, you should preserve control by packaging all that expertise into a technology that lives as close to the application as possible. For Kubernetes-based applications, this is a common communication enhancement layer called a service mesh.

How close can you get? Today, we see users having the most success with Istio's sidecar container model. We forecasted that in 2017, but we believe the concept ("common enhancement near the app") will outlive the technical details.

This common layer should observe all the communication the app is making; it should secure that communication and it should handle the burdens of discovery, routing, version translation and general interoperability. The service mesh simplifies and creates uniformity: there's one metric for "HTTP 200 OK rate", and it's measured, normalized and stored the same way for every app. Your app teams don't have to write that code over and over again, and they don't have to become experts in retry storms or circuit breakers. Your app teams are unburdened of infrastructure concerns so they can focus on the business problem that needs solving.  This is true whether they write their apps in Ruby, Python, node.js, Go, Java or anything else.

That's what a service mesh is: a communication enhancement layer that lives as close to your microservice as possible, providing a common approach to controlling communication over APIs.

 

Why Istio?

Just because you need a service mesh to secure and connect your microservices doesn't mean Envoy and Istio are the only choice.  There are many options in the market when it comes to service mesh, and the market still seems to be expanding rather than contracting. Even with all the choices out there, we still think Istio and Envoy are the best choice.  Here's why.

We launched Aspen Mesh after learning some lessons with a precursor product. We took what we learned, re-evaluated some of our assumptions and reconsidered the biggest problems development teams using containers were facing. It was clear that users didn't have a handle on managing the traffic between microservices and saw there weren't many using microservices in earnest yet so we realized this problem would get more urgent as microservices adoption increased. 

So, in 2017 we asked what would characterize the technology that solved that problem?

We compared our own nascent work with other purpose-built meshes like Linkerd (in the 1.0 Scala-based implementation days) and Istio, and non-mesh proxies like NGINX and HAProxy. This was long before service mesh options like Consul, Maesh, Kuma and OSM existed. Here's what we thought was important:

  • Kubernetes First: Kubernetes is the best place to position a service mesh close to your microservice. The architecture should support VMs, but it should serve Kubernetes first.
  • Sidecar "bookend" Proxy First: To truly offload responsibility to the mesh, you need a datapath element as close as possible to the client and server.
  • Kubernetes-style APIs are Key: Configuration APIs are a key cost for users.  Human engineering time is expensive. Organizations are judicious about what APIs they ask their teams to learn. We believe Kubernetes API design and mechanics got it right. If your mesh is deployed in Kubernetes, your API needs to look and feel like Kubernetes.
  • Open Source Fundamentals: Customers will want to know that they are putting sustainable and durable technology at the core of their architecture. They don't want a technical dead-end. A vibrant open source community ensures this via public roadmaps, collaboration, public security audits and source code transparency.
  • Latency and Efficiency: These are performance keys that are more important than total throughput for modern applications.

As I look back at our documented thoughts, I see other concerns, too (p99 latency in languages with dynamic memory management, layer 7 programmability). But the above were the key items that we were willing to bet on. So it became clear that we had to palace our bet on Istio and Envoy. 

Today, most of that list seems obvious. But in 2017, Kubernetes hadn’t quite won. We were still supporting customers on Mesos and Docker Datacenter. The need for service mesh as a technology pattern was becoming more obvious, but back then Istio was novel - not mainstream. 

I'm feeling very good about our bets on Istio and Envoy. There have been growing pains to be sure. When I survey the state of these projects now, I see mature, but not stagnant, open source communities.  There's a plethora of service mesh choices, so the pattern is established.  Moreover the continued prevalence of Istio, even with so many other choices, convinces me that we got that part right.

 

But what about...?

While Istio and Envoy are a great fit for all those bullets, there are certainly additional considerations. As with most concerns in a nascent market, some are legitimate and some are merely noise. I'd like to address some of the most common that I hear from conversations with users.

"I hear the control plane is too complex" - We hear this one often. It’s largely a remnant of past versions of Istio that have been re-architected to provide something much simpler, but there's always more to do. We're always trying to simplify. The two major public steps that Istio has taken to remedy this include removing standalone Mixer, and co-locating several control plane functions into a single container named istiod.

However, there's some stuff going on behind the curtains that doesn't get enough attention. Kubernetes makes it easy to deploy multiple containers. Personally, I suspect the root of this complaint wasn't so much "there are four running containers when I install" but "Every time I upgrade or configure this thing, I have to know way too many details."  And that is fixed by attention to quality and user-focus. Istio has made enormous strides in this area. 

"Too many CRDs" - We've never had an actual user of ours take issue with a CRD count (the set of API objects it's possible to define). However, it's great to minimize the number of API objects you may have to touch to get your application running. Stealing a paraphrasing of Einstein, we want to make it as simple as possible, but no simpler. The reality: Istio drastically reduced the CRD count with new telemetry integration models (from "dozens" down to 23, with only a handful involved in routine app policies). And Aspen Mesh offers a take on making it even simpler with features like SecureIngress that map CRDs to personas - each persona only needs to touch 1 custom resource to expose an app via the service mesh.

"Envoy is a resource hog" - Performance measurement is a delicate art. The first thing to check is that wherever you're getting your info from has properly configured the system-under-measurement.  Istio provides careful advice and their own measurements here.  Expect latency additions in the single-digit-millisecond range, knowing that you can opt parts of your application out that can't tolerate even that. Also remember that Envoy is doing work, so some CPU and memory consumption should be considered a shift or offload rather than an addition. Most recent versions of Istio do not have significantly more overhead than other service meshes, but Istio does provide twice as many feature, while also being available in or integrating with many more tools and products in the market. 

"Istio is only for really complicated apps” - Sure. Don’t use Istio if you are only concerned with a single cluster and want to offload one thing to the service mesh. People move to Kubernetes specifically because they want to run several different things. If you've got a Money-Making-Monolith, it makes sense to leave it right where it is in a lot of cases. There are also situations where ingress or an API gateway is all you need. But if you've got multiple apps, multiple clusters or multiple app teams then Kubernetes is a great fit, and so is a service mesh, especially as you start to run things at greater scale.

In scenarios where you need a service mesh, it makes sense to use the service mesh that gives you a full suite of features. A nice thing about Istio is you can consume it piecemeal - it does not have to be implemented all at once. So you only need mTLS and tracing now? Perfect. You can add mTLS and tracing now and have the option to add metrics, canary, traffic shifting, ingress, RBAC, etc. when you need it.

We’re excited to be on the Istio journey and look forward to continuing to work with the open source community and project to continue advancing service mesh adoption and use cases. If you have any particular question I didn’t cover, feel free to reach out to me at @notthatjenkins. And I'm always happy to chat about the best way to get started on or continue with service mesh implementation. 


Steering The Future Of Istio

I’m honored to have been chosen by the Istio community to serve on the Istio Steering Committee along with Christian Posta, Zack Butcher and Zhonghu Xu. I have been fortunate to contribute to the Istio project for nearly three years and am excited by the huge strides the project has made in solving key challenges that organizations face as they shift to cloud-native architecture. 

Maybe what’s most exciting is the future direction of the project. The core Istio community realizes and advocates that innovation in Open Source doesn't stop with technology - it’s just the starting point. New and innovative ways of growing the community include making contributions easier, Working Group meetings more accessible and community meetings an open platform for end users to give their feedback. As a member of the steering committee, one of my main goals will be to make it easier for a diverse group of people to more easily contribute to the project.

Sharing my personal journey with Istio, when I started contributing to Istio, I found it intimidating to present rough ideas or proposals in an open Networking WG meeting filled with experts and leaders from Google & IBM (even though they were very welcoming). I understand how difficult it can be to get started on contributing to a new community, so I want to ensure the Working Group and community meetings are a place for end users and new contributors to share ideas openly, and also to learn from industry experts. I will focus on increasing participation from diverse groups, through working to make Istio the most welcoming community possible. In this vein, it will be important for the Steering Committee to further define and enforce a code of conduct creating a safe place for all contributors.

The Istio community’s effort towards increasing open governance by ensuring no single organization has control over the future of the project has certainly been a step in the right direction with the new makeup of the steering committee. I look forward to continuing work in this area to make Istio the most open project it can be. 

Outside of code contributions, marketing and brand identity are critically important aspects of any open source project. It will be important to encourage contributions from marketing and business leaders to ensure we recognize non-technical contributions. Addressing this is less straightforward than encouraging and crediting code commits, but a diverse vendor neutral marketing team in Open Source can create powerful ways to reach users and drive adoption, which is critical to the success of any open source project. Recent user empathy sessions and user survey forms are a great starting point, but our ability to put these learning into actions and adapt as a community will be a key driver in growing project participation.

Last, but definitely not least, I’m keen to leverage my experience and feedback from years of work with Aspen Mesh customers and broad enterprise experience to make Istio a more robust and production-ready project. 

In this vein, my fellow Aspen Mesher Jacob Delgado has worked tirelessly for many months contributing to Istio. As a result of his contributions, he has been named a co-lead for the Istio Product Security Working Group. Jacob has been instrumental in championing security best practices for the project and has also helped responsibly remediate several CVEs this year. I’m excited to see more contributors like Jacob make significant improvements to the project.

I'm humbled by the support of the community members who voted in the steering elections and chose such a talented team to shepherd Istio forward. I look forward to working with all the existing, and hopefully many new, members of the Istio community! You can always reach out to me through email, Twitter or Istio Slack for any community, technical or governance matter, or if you just want to chat about a great idea you have.


What Are Companies Using Service Mesh For?

We recently worked with 451 Research to identify current trends in the service mesh space. Together, we identified some key service mesh trends and patterns around how companies are adopting service mesh, and emerging use cases that are driving that adoption. Factors driving adoption include how service mesh automates and bolsters security, and a recognition of service mesh observability capabilities to ease debugging and decrease Mean Time To Resolution (MTTR). Check out this video for more from 451 Research's Senior Analyst in Application and Infrastructure Performance, Nancy Gohring, on this topic:

Who’s Using Service Mesh 

According to data and insights gathered by 451 Research, service mesh already has significant momentum, even though it is a young technology. Results from the Voice of the Enterprise: DevOps, Workloads & Key Projects 2020 survey tell us that 16% of respondents had adopted service mesh across their entire IT organizations, and 20% had adopted service mesh at the team level. Outside of those numbers, 38% of respondents also reported that they are in trials or planning to use service mesh in the future. As Kubernetes dominates the microservices landscape, the need for a service mesh to manage layer 7 communication is becoming increasingly clear. 

451 Research Service Mesh Adoption

In tandem with this growing adoption trend, the technology itself is expanding quickly. While the top driver of service mesh adoption continues to be supporting traffic management, service mesh provides many additional capabilities beyond controlling traffic. 451 found that key new capabilities the technology provides includes greatly enhanced security as well as increased observability into microservices.

Service Mesh and Security

Many organizations—particularly those in highly regulated industries such as healthcare and financial services—need to comply with very demanding security and regulatory requirements. A service mesh can be used to enforce or enhance important security and compliance policies more consistently, and across teams, at an organization-wide level. A service mesh can be used to:

  • Apply security policies to all traffic at ingress, and encrypt traffic using mTLS traveling between services
  • Add Zero-Trust networking
  • Govern certificate management for authenticating identity
  • Enforce level of least privilege with role-based access control (RBAC)
  • Manage policies consistently, regardless of protocols and runtimes 

These capabilities are particularly important for complex microservices deployments, and allow DevOps teams to ensure a strong security posture while running in production at global scale. 

Observability and Turning Your Data into Intelligence

In addition to helping enterprises improve their security posture, a service mesh also greatly improves observability through traces and metrics that allow operators to quickly root cause any failures and ensure resilient applications. Enabling the rapid resolution of performance problems allows DevOps teams to reduce mean time to resolution (MTTR) and optimize engineering efficiency

The broader market trends around observability and advanced analytics with open source technologies are also key to the success of companies adopting service mesh. There are challenges around managing microservices environments, and teams need better ways of identifying the sources of performance issues in order to resolve problems faster and more efficiently. Complex microservices-based applications generate very large amounts of data. Many open source projects are addressing this by making it easier for users to collect data from these environments, and advancements in analytics tools are enabling users to extract the signal from the noise, quickly directing users to the source of performance problems. 

Overcoming this challenge is why we created Aspen Mesh Rapid Resolve. It allows users to see any configuration or policy changes made within Kubernetes clusters, which is almost always the cause of failures. The Rapid Resolve timeline view makes it simple for operators to look back in time to pinpoint any changes that resulted in performance degradation. 

Aspen Mesh Rapid Resolve

This enables Aspen Mesh users to identify root causes, report actions and apply fixing configurations all in one place. For example, the Rapid Resolve suite offers many new features including:

  • Restore: a smarter, machine-assisted way to effectively reduce the set of things an operator or developer has to look through to find the root cause of failure in their environment. Root causing in distributed architectures is hard. Aspen Mesh Restore immediately alerts engineers to any performance outside acceptable thresholds and makes it obvious where any configuration, application or infrastructure changes occurred that are likely to be breaking changes.
  • Replay: a one-stop shop for application troubleshooting and reducing time to recovery. Aspen Mesh Replay gives you the current and the past view of your cluster state, including microservices connectivity, traffic and service health, and relevant events like configuration changes and alerts along the way. This view is great for understanding and diagnosing cascading failures. You can easily roll back in time and detect where a failure started. It's also a good tool for sharing information in larger groups where you can track the health of your cluster visually over time.

The Future of Service Mesh

Companies strive for stability with agility, which allows them to meet the market and users where they are, and thrive even in an uncertain marketplace. According to 451 Research,

“Businesses are employing containers, Kubernetes and microservices as tools that allow them to more quickly respond to customer demands and competitive threats. However, these technologies introduce new and potentially significant management challenges. Advanced organizations have turned to service mesh to help solve some of these problems. Service mesh technology can remove infrastructure burdens from developers, enabling them to focus on creating valuable application features rather than managing the mechanics of microservices communications. But managing the communications layer isn’t the only benefit a service mesh brings to the table. Increasingly, users are recognizing the role service meshes can play in collecting and analyzing important observability data, as well as their ability to support security requirements.”

The adoption of containers, Kubernetes and service mesh is continuing to grow, and both security and observability will be key drivers that increase service mesh adoption in the coming years.

 


Aspen Mesh 1.6 Service Mesh

Announcing Aspen Mesh 1.6

We’re excited to announce the release of Aspen Mesh 1.6 which is based on Istio’s release 1.6 (specific tag 1.6.5). As a release manager for Istio 1.6, I’ve been eager for Aspen Mesh’s adoption of 1.6 as Istio continues its trend of adding many enhancements and improvements. Our commitment and relationship with the Istio community continues to flourish as our co-founder and Chief Architect Neeraj Poddar was recently appointed to the Technical Oversight Committee and I joined the Product Security Working Group, a group tasked with handling sensitive security issues within Istio. With our team members joining these groups, you can be assured that your best interests are represented as we continue to develop Aspen Mesh.

As with every new major release, we’re excited to detail the new features and capabilities offered by Istio, and also new features available within Aspen Mesh. 

Hare are some key items to note for this new release:

Helm Based Installs

At Aspen Mesh, we encourage users to adopt the GitOps workflow using Helm. Istio is moving towards CLI tool based install using istioctl and the Istio Operator for installations and upgrades. While the chart structure located in the manifests/ directory shipped in Istio works with Helm, we’ve spent considerable effort in streamlining the charts to make them ready for the enterprise to ensure continuity.

However, upgrading to 1.6 will be drastic due to the structural changes made to the charts in Istio. Given the efforts we are putting into Istio, as well as the burden this places on users, our intent is to streamline this before the release of Aspen Mesh 1.7 to ease the upgrade process moving forward.

Istiod Monolith

With the change to Helm charts, users will be able to leverage Istiod and Telemetry v2. Istiod is the consolidated Istio controlplane, delivered as a monolithic deployment (with the exception of Mixer). There are two key reasons we are eager to support this consolidated deployment:

  1. It significantly reduces the memory and CPU footprint of the service mesh, resulting in lower operating costs.
  2. The simplified deployment model makes it easier for operators to debug production issues when the need arises. It’s no longer necessary to look at the logs of different services to determine the root cause of a problem, but rather just your Istiod pods.

Telemetry v2

As of Aspen Mesh 1.6, we only support Telemetry v2, also known as Mixerless telemetry. While Telemetry v2 does not have parity with Mixer, the benefits of Telemetry v2 now far outweigh the features no longer in Mixer. Don’t be alarmed as the Istio community is diligently working on having Telemetry v2 reach parity with Mixer.

Many of our users have reported Mixer related performance issues, such as high CPU load, high memory usage and even latency issues. These issues should be solved with the move towards Envoy-based filters, such as the WASM filter used by Telemetry v2. In-band and in-application, high performance C++ code should better meet the needs of large enterprises with hundreds of nodes and thousands of pods.

SDS: The Default Behavior Across Your Service Mesh

In Aspen Mesh 1.5, Secret Discovery Service (SDS) was not enabled by default for sidecar proxies across your cluster. With the Aspen Mesh 1.6 release, both gateways and workloads support SDS, allowing for better service mesh security as well as performance improvements.

For reference, beginning with Aspen Mesh 1.6, an executable istio-agent lives alongside Envoy sidecar proxy and safely shares certificates issued to it by Istiod with Envoy. This is a change from Aspen Mesh, where 1.5 Kubernetes Secrets were created by Citadel, presenting risks if the Kubernetes cluster wasn’t properly secured. One of the top benefits of SDS is that it allows Envoy to be hot-restarted when certificates are set to expire and need to be rotated.

Next Steps for You

Aspen Mesh 1.6 is available for download here. You can look at the complete list of changes in this release by visiting our release notes. If you have any questions about your upgrade path, feel free to reach out to us at support@aspenmesh.io.

If you’re new to Aspen Mesh, you can download a free 30-day trial


What is a service mesh Aspen Mesh

What’s a Service Mesh?

What is a service mesh? It’s an infrastructure layer that helps you manage the communication between your microservices.

What is a service mesh

Designed to handle a high volume of service-to-service communications using APIs, a service mesh ensures that communication among your containerized application services is fast, reliable and secure. 

A service mesh helps address many of the challenges that arise when your application is being consumed by your end users. The ability to monitor what services are communicating with each other, knowing if those communications are secure, and being able to control the service-to-service communication in your clusters is key to ensuring your applications are running securely and resiliently. You can think about service mesh as being the lexicon, API and implementation around the next tier of communication patterns for microservices.

Service Mesh Capabilities and Patterns

Some of the capabilities that a service mesh provides include service discovery, load balancing, encryption, observability, traceability, authentication and authorization, and the ability to control policy and configuration in your Kubernetes clusters. 

A service mesh sits at Layer 7, managing and securing traffic between your network and application, unlocking some patterns essential for healthy microservices. Some of these patterns include:

  • Zero-trust security that doesn’t assume a trusted perimeter
  • Tracing that shows you how and why every microservice communicated with another microservice
  • Fault injection and tolerance that lets you experimentally verify the resilience of your application
  • Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing

Check out these FAQs for answers to more general questions.

 

What Does a Service Mesh Provide?

A service mesh keeps your company’s services running the way they should. Service meshes designed for the enterprise, like Aspen Mesh, gives you all the observability, security and traffic management you need — plus access to configuration and policy patterns and expert support, so you can focus on adding the most value to your business.

A service mesh can provide many benefits: Security, reliability, observability, engineering efficiency/reduced burden, more holistic insights, operational control, and better tools for your DevOps team. The four main benefits that a service mesh provides include:

  1. Observability: A service mesh takes system monitoring a step further by providing observability. Monitoring reports overall system health, while observability focuses on highly granular insights into the behavior of systems along with rich context. 
  2. Security: A service mesh provides security features aimed at securing the services inside your network and quickly identifying any compromising traffic entering your cluster. 
  3. Operational control: A service mesh allows security and platform teams to set the right macro controls to enforce access controls, while allowing developers to make customizations they need to move quickly within these guardrails.
  4. A better user experience: A service mesh removes the burden of managing infrastructure from the developer, and provides developer-friendly features. But on top of that, the security and reliability that you get from a service mesh creates a smoother, better experience for your end users while they're using your systems or application. Building trust with your customers is invaluable.

Service mesh is new enough that codified standards have yet to emerge, but there is enough experience that some best practices are becoming clear. As early adopters develop their own approaches, it is often useful to compare notes and distill best practices. We’ve seen Kubernetes emerge as the standard way to run containers for production web applications. Standards are emergent rather than forced: It’s definitely a fine art to be neither too early nor too late to agree on common APIs, protocols and concepts.

 

When Do You Need a Service Mesh?

A service mesh provides a great way to help you manage microservices. But how do you know when it's the right time to adopt one? The answer is that it depends on your needs, but many companies we've worked with start needing a service mesh when they run into one or a combination of three things:

  1. You’re starting to run too many microservices for you to effectively manage based on team size or skills
  2. You want to free up application developers from managing infrastructure so they can spend more time adding business value to applications
  3. Your’e scaling or committed to scaling applications on Kubernetes

So how do you make sure that you and your end users get the most out of your applications and services? You need to have the right kind of access, security and support. If that’s true, then you’ve probably realized that microservices come with their own unique challenges, such as: 

  • Increased surface area that can be attacked 
  • Polyglot challenges 
  • Controlling access for distributed teams developing on a single application 

These are all scenarios where a service mesh shines. Service meshes are great at solving operational challenges and issues when running containers and microservices because they provide a uniform and highly observable way to secure, connect and monitor microservices. 

On a broader tech landscape level, we’ve been thinking about how microservices change the requirements from network infrastructure for a few years now. The swell of support and uptake for Istio demonstrated to us that there’s a community ready to develop and coalesce on policy specs, with a well-architected implementation to go along with it.

Thanks for reading! Check out Service Mesh University to learn more about service mesh at your own pace through an on-demand, video series.


Helping Istio Sail

Around three years ago, we recognized the power of service mesh as a technology pattern to help organizations manage the next generation of software delivery challenges, and that led to us founding Aspen Mesh. And we know that efficiently managing high-scale Kubernetes applications requires the power of Istio. Having been a part of the Istio project since 0.2, we have seen countless users and customers benefit from the observability, security and traffic management capabilities that a service mesh like Istio provides. 

Our engineering team has worked closely with the Istio community over the past three years to help drive the stability and add new features that make Istio increasingly easy for users to adopt. We believe in the value that open source software — and an open community  —  brings to the enterprise service mesh space, and we are driven to help lead that effort through setting sound technical direction. By contributing expertise, code and design ideas like Virtual Service delegation and enforcing RBAC policies for Istio resources, we have focused much of our work on making Istio more enterprise-ready. One of these contributions includes Istio Vet, which was created and open sourced in early days of Aspen Mesh as a way to enhance Istio's user experience and multi resource configuration validation. Istio Vet proved to be very valuable to users, so we decided to work closely with the Istio community to create istioctl analyze in order to add key configuration analysis capabilities to Istio. It’s very exciting to see that many of these ideas have now been implemented in the community and part of the available feature set for broader consumption. 

As the Istio project and its users mature, there is a greater need for open communication around product security and CVEs. Recognizing this, we were fortunate to be able to help create the early disclosure process and lead the product security working group which ensures the overall project is secure and not vulnerable to known exploits. 

We believe that an open source community thrives when users and vendors are considered as equal partners. We feel privileged that we have been able to provide feedback from our customers that has helped accelerate Istio’s evolution. In addition, it has been a privilege to share our networking expertise from our F5’s heritage with the Istio project as maintainers and leads of important functional areas and key working groups.

The technical leadership of Istio is a meritocracy and we are honored that our sustained efforts have been recognized with - my appointment to the Technical Oversight Committee

As a TOC member, I am excited to work with other Istio community leaders to focus the roadmap on solving customer problems while keeping our user experience top of mind. The next, most critical challenges to solve are day two problems and beyond where we need to ensure smoother  upgrades, enhanced security and scalability of the system. We envision use cases emerging from industries like Telco and FinServ which will  push the envelope of technical capabilities beyond what many can imagine.

It has been amazing to see the user growth and the maturity of the Istio project over the last three years. We firmly believe that a more diverse leadership and an open governance in Istio will further help to advance the project and increase participation from developers across the globe.

The fun is just getting started and I am honored to be an integral part of Istio’s journey! 


How to Achieve Engineering Efficiency with a Service Mesh

How to Achieve Engineering Efficiency with a Service Mesh

As the idea for Aspen Mesh was formulating in my mind, I had the opportunity to meet with a cable provider’s engineering and operations teams to discuss the challenges they had operating their microservice architecture. When we all gathered in the large, very corporate conference room and exchanged the normal introductions, I could see that something just wasn’t right with the folks in the room. They looked like they had been hit by a truck. The reason for that is what turned this meeting into one of the most influential meetings of my life.

It turned out that the entire team had been up all night working on an outage in some of the services that were part of their guide application. We talked about the issue, how it manifested itself and what impact it had on their customers. But there was one statement that has stuck with me since: “The worst part of this 13-hour outage was that it took us 12 hours to get the right person on the phone; and only one hour to get it fixed…”

That is when I knew that a service mesh could solve this problem and increase the engineering efficiency for teams of all sizes. First, by ensuring that in day-to-day engineering and operations, experts were focused on what they were experts of. And second, when things went sideways, it was the strategic point in the stack that would have all the information needed to root-cause a problem — but also be the place that you could rapidly restore your system.

Day-to-Day Engineering and Operations

A service mesh can play a critical role in day-to-day engineering and operations activities, by streamlining processes, reducing test environments and allowing experts to perform their duties independent of application code cycles. This allows DevOps teams to work more efficiently, by allowing developers to focus on providing value to the company’s customers through applications and operators to provide value to their customers through improved customer experience, stability and security.

The properties of a service mesh can enable your organization to run more efficiently and reduce operating costs. Here are some ways a service mesh allows you to do this:

  • Canary testing of applications in production can eliminate expensive staging environments
  • Autoscaling of applications can ensure efficient use of resources.
  • Traffic management can eliminate duplicated coding efforts to implement retry-logic, load-balancing and service discovery.
  • Encryption and certificate management can be centralized to reduce overhead and the need to make application changes and redeployment for changing security policies.
  • Metrics and tracing gives teams access to the information they need for performance and capacity planning, and can help reduce rework and over-provisioning of resources.

As organizations continue to shift-left and embrace DevOps principles, it is important to have the right tools to enable teams to move as quickly and efficiently as possible. A service mesh helps teams achieve this by moving infrastructure-like features out of the individual services and into the platform. This allows teams to leverage them in a consistent and compliant manner; it allows Devs to be Devs and Ops to be Ops, so together they can truly realize the velocity of DevOps.

Reducing Mean-Time-To-Resolution

Like it or not, outages happen. And when they do, you need to be able to root-cause the problem, develop a fix and deploy it as quickly as possible to avoid violating your customer-facing SLAs and your internal SLOs. A service mesh is a critical piece of infrastructure when it comes to reducing your MTTR and ensuring the best possible user experience for your customers. Due to its unique position in the platform, sitting between the container orchestration and application, it has the unique ability to not only gather telemetry data and metrics, but also transparently implement policy and traffic management changes at run time. Here are some ways how:

  • Metrics can be collected by the proxy in a service mesh and used to understand where problems are in the application, show which services are underperforming or using too many resources, and help inform decisions on scaling and resource optimization.
  • Layer 7 traces can be collected throughout the application and correlated together, allowing teams to see exactly where in the call-flow failed.
  • Policy can allow platform teams to direct traffic — and in the case of outages, redirect traffic to other, healthier services.

All of this functionality can be collected and implemented consistently across services — and even clusters — without impacting the application or placing additional burden or requirements on application developers.

It has been said that a minute of downtime can cost an enterprise company up to $5600 per minute. In an extreme example, let’s think back to my meeting with the cable provider. If a service mesh could have enabled their team to get the right expert on the phone in half the time, they would have saved $2,016,000.00. That’s a big number, and more importantly, all of those engineers could have been home with their families that night, instead of in front of their monitors.


Announcing Aspen Mesh 1.5

Announcing Aspen Mesh 1.5

We’re excited to announce the release of Aspen Mesh 1.5 which is based on Istio’s latest LTS release 1.5 (specific tag 1.5.2). Istio 1.5 is the most feature-rich and stable release since Istio 1.1 and contains many enhancements and improvements that we are excited about. With this release, Istio continues to increase stability, performance and usability. At Aspen Mesh, we’re proud of the work that the Istio community has made and are committed to helping it become even better. We continue to represent users’ interests and improve the istio project as maintainers and working group leads for key areas, including Networking and Product Security. It has been amazing to see the growth of the Istio community and we look forward to being a part of its inclusive and diverse culture.

Given the changes in the release, we understand that it can be challenging for users to focus on their key issues and discover how the new features and capabilities relate to their needs while adding business value. We hope this blog will make it easier for our users to digest the sheer volume of changes in Istio 1.5 and understand the reasons why we have, at times, chosen to disable or not enable certain capabilities. We evaluate features and capabilities within Istio from release to release in terms of stability, readiness and backwards compatibility, and we’re eager to enable all capabilities once they are ready for enterprise consumption.

Key updates in Istio 1.5 include: 

  • The networking APIs were moved from v1alpha1 to v1beta1 showing that they are getting closer to stabilizing; 
  • Enabling Envoy to support filters written in WebAssembly (WASM); 
  • The istio operator continues to make major strides in installing and upgrading Istio, as well as istioctl; 
  • A new API model to secure mutual TLS (mTLS) and JWT use in your service mesh.

The number of features added or changed by Istio is long and exciting, so let’s dig deeper into the changes, rationale behind them and how it will impact you as an Aspen Mesh customer.


New Security APIs

Istio 1.5 introduces new security APIs (PeerAuthentication, RequestAuthentication) that work in conjunction with Authorization Policy resources to create a stronger security posture for your applications. We understand that customers have widely adopted Authentication Policies, and the pain of incrementally adopting mTLS as well as debugging issues encountered. We believe that the new set of APIs are a step forward in the evolution of application security. With the older APIs being deprecated in Istio 1.5— and with the intent to remove these APIs in Istio 1.6—it is imperative that customers understand the work required to ease this transition.

To ease migration from the v1alpha1 APIs to the v1beta1 APIs Aspen Mesh has updated our dashboard to reflect mTLS status using the new APIs, we have updated our Secure Ingress API to use the new security APIs and we have updated our configuration generation tool to help you incrementally adopt the new APIs.


Challenges with Older APIs

While Authentication Policies worked for most of the use cases, there were architectural issues with the API choices.

Authentication Policies used to be specified at the service level, but were applied at the workload level. This often caused confusion and potential policy escapes as all other Istio resources are specified and applied at the workload level. The new API addresses this concern and is consistent with the Istio configuration model.

Authentication Policies used to handle both end user and peer authentication causing confusion. The new API separates them out as they are different concerns that solve different sets of problems. This logical separation should make it easier to secure your application as the possibility of misconfiguration is reduced. For example, when configuring mTLS if there is a conflict between multiple PeerAuthentication resources, the strongest PeerAuthentication (by security enforcement) is applied. Previously conflict resolution was not defined and behavior wasn’t deterministic.

Authentication Policies used to validate end user JWT credentials and reject any request that triggered the rule and had no or invalid/expired JWT tokens. The new API separates the Authentication and Authorization pieces into two separate composable layers. This has the benefit of expressing all allow/deny policies in a single place, i.e. Authorization resources. The downside of this approach is that if a RequestAuthentication policy is applied, but no JWT token is presented, then the request is allowed unless there's a deny Authorization policy. However, separating authentication/identity and authorization follows traditional models of securing systems, with respect to access control. If you are currently using Authentication policies to reject requests without any JWT tokens, you will need to add a deny Authorization policy for achieving parity. While Istio has greatly simplified their security model, our Secure Ingress API provides a higher level abstraction for the typical use case. Ease of adoption is one of our core tenets and we believe that Secure Ingress does just that. Feel free to reach out to us with questions about how to do this.


Migration Towards New APIs

You don’t have to worry about resources defined by the existing Authentication Policy API as they will continue to work in Istio 1.5. Aspen Mesh supports the direction of the Istio community as it trends towards using workload selectors throughout its APIs. By adopting these new APIs we are hopeful that it will alleviate some application security concerns. 

With the removal of the v1alpha1 APIs in istio 1.6 (ETA June 2020) we are focused on helping our customers migrate towards using the new APIs. Our Secure Ingress Policy has already been ported to support the new APIs, as well as our tool to help you incrementally adopt mTLS. Note that our Secure Ingress controller preserves backward compatibility by creating both RequestAuthentication and Authorization policies as needed to ensure paths protected by JWT tokens cannot be accessed without tokens.

Users upgrading to Aspen Mesh 1.5.2 will see their MeshPolicy replaced with a mesh-wide PeerAuthentication resource. This was intentional, although upstream Istio chose not to remove a previously installed MeshPolicy, so that users can quickly migrate their cluster towards using the new security APIs. With that in mind, istioctl now includes analyzers to help you find and manage your MeshPolicy/Policy resources, giving you a deprecation message with instructions to migrate away from your existing policies.


Extending Envoy Through WASM

WASM support, as an idea, can be traced back to October 2018. A proxy ABI specification for WASM was also drafted with the idea that all proxies, and not just Envoy, can adopt WASM helping to further cement the Istio community as thought leaders in the service mesh industry. The Istio community also had major contributions to Envoy through the envoy-wasm repository to enable WASM support and is the fork of Envoy used in Istio 1.5. There are plans to have their hard work incorporated into the mainline Envoy repository.

This is a major feature as it will enable users to extend Envoy in new and exciting ways.

We applaud the Istio community for its work to deliver this capability. Previously writing a filter for Envoy required extensive knowledge of not only Envoy C++ code, but also how to build it, integrate it and deploy Envoy.

WebAssembly support in Envoy will allow users to create, share and extend Envoy to meet their needs. Support for creating Envoy extensions already exists with SDKs for C++, Rust, and AssemblyScript, and there are plans to increase the number of supported languages shortly. Istio is also continuing to build its supporting infrastructure around these Envoy features to easily integrate them into the Istio platform. There is also on-going work to move already existing Istio filters to WASM; in fact, Telemetry V2 is a WASM extension. Istio is eating its own dog food and the results speak for themselves. Expect future blogs around WASM capabilities in Istio as we learn more about its feature set and capabilities.

For more information, please see Istio’s WASM announcement


istiod and Helm Based Installs

Support for Helm v3-based installs continues to be a top priority for Aspen Mesh. Aspen Mesh Helm installation process is a seamless operation for your organization. Aspen Mesh 1.5 does not support the istiod monolith yet, but continues to ship with Istio’s microservice platform.

We are actively working with the Istio community to enable installations and upgrades from a microservice platform to the monolithic platform a first-class experience with minimal downtime in istio 1.6. This will ensure that our customers will have a stable and qualified release when they migrate to istiod.

As Istio has evolved, a decision was made to simplify Istio and transform its microservice platform to that of a monolith for operational simplification and overhead reduction. The upgrade path from microservices control plane to monolith (istiod) requires a fresh install and possible downtime. We want our customers to have a smooth upgrade and experience minimal downtime disruption.


Moving Towards a Simplified Operational System (istiod)

Benefits of a monolithic approach to end users, include a reduction in installation complexity and fewer services deployed which makes it easier for users to understand what is being deployed in their Kubernetes environment, especially as the intra-communication within the controlplane and the inter-communication between the control plane and data plane continues to grow in complexity. 

With the migration towards istiod, one of the key features that we anticipate will mature in istio 1.6 is the istio in-cluster Operator. In-cluster operators have quickly gained traction within the Kubernetes community and the Istio community has made a significant effort towards managing the lifecycle of Istio. A key feature that we are eager about is the ability to have multiple control planes, which would enable such things as canaries and potentially in-place upgrades. As the in-cluster operator continues to evolve we will continue to evaluate it and when we feel it is ready for enterprise use you should expect a more detailed blog to follow.


Other Features on the Horizon

As always, Istio and Aspen Mesh are constantly moving forward. Here is a preview of some new features we see on the horizon.


Mixer-less Telemetry

Many users have inquired about mixer-less telemetry, hence referred to as Telemetry V2. During testing, Aspen Mesh found an issue with TCP blackhole telemetry, others found an issue with HTTP blackhole telemetry and a few other minor issues were found. Aspen Mesh is committed to helping the Istio community test and fix issues found in Telemetry V2, but until feature parity is reached with Mixer telemetry we will continue to ship with Mixer, aka Telemetry V1. This is perhaps our most requested feature and we are eager to enable it. The work surrounding this has been a Herculean effort by the Istio community and we are eager to enable it in a future Aspen Mesh release.


Intelligent Inbound and Outbound Protocol Detection

While upstream Istio has previously enabled outbound intelligent protocol detection, Aspen Mesh has decided to disable it; the reasoning for this was detailed in our Protocol sniffing in production blog. For Istio 1.5, inbound intelligent protocol detection was also enabled. While this has the potential to be a powerful feature, Aspen Mesh believes that deterministic behavior is best defined by our end users. There have been a handful of security exploits related to protocol sniffing and we will continue to monitor its stability and performance implications and accordingly enable them if warranted in future releases.

For this and prior releases, we recommend our users continue to prefix the port name with the appropriate supported protocol as suggested here. You can also use our vetter to discover any missing port prefixes in your service mesh.


Next Steps for You

The Aspen Mesh 1.5 binaries are available for download here. If you have any questions about your upgrade path, feel free to reach out to us at support@aspenmesh.io.

If you’re new to Aspen Mesh, you can download a free 30-day trial here