Get App-focused Security from an Enterprise-class Service Mesh | On-demand Webinar

In our webinar you can now view on demand, You’ve Got Kubernetes. Now you Need App-focused Security using Istio, we teamed with Mirantis, an industry leader in enterprise-ready Kubernetes deployment and management, to talk about security, Kubernetes, service mesh, istio and more. If you have Kubernetes, you’re off to a great start with a great platform for security based on Microsegmentation and Network Policy. But firewalls and perimeters aren’t enough -- even in their modern, in-cluster form.  

As enterprises embark on the cloud journey, modernizing applications with microservices and containers running on Kubernetes is key to application portability, code reuse and automation. But along with these advantages come significant security and operational challenges due to security threats at various layers of the stack. While Kubernetes platform providers like Mirantis manage security at the infrastructure, orchestration and container level, the challenge at application services level remains a concern. This is where a service mesh comes in. 

Companies with a hyper focus on security – like those in healthcare, finance, government, and highly regulated industries – demand the highest level of security possible to thwart cyberthreats, data breaches and non-compliance issues. You can up level your security by adding a service mesh that’s able to secure thousands of connections between microservices containers inside of a single cluster or across the globe. Today Istio is the gold standard for enterprise-class service mesh for building Zero Trust Security. But I’m not the first to say that implementing open source Istio has its challenges -- and can cause a lot of headaches when Istio deployment and management is added to a DevOps team’s workload without some forethought.  

Aspen Mesh delivers an Istio-based, security hardened enterprise-class service mesh that’s easy to manage. Our Istio solution reduces friction between the experts in your organization because it understands your apps -- and it seamlessly integrates into your SecOps approach & certificate authority architecture. 

It’s not just about what knobs and config you adjust to get mTLS in one cluster – in our webinar we covered the architectural implications and lessons learned that’ll help you fit service mesh into your up-leveled Kubernetes security journey. It was a lively discussion with a lot of questions from attendees. Click the link below to watch the live webinar recording.

-Andrew

 

Click to watch webinar now:

On Demand Webinar | You’ve Got Kubernetes. Now you Need App-focused Security using Istio.

 The webinar gets technical as we delve into: 

  • How Istio controls North-South and East-West traffic, and how it relates to application-level traffic. 
  • How Istio secures communication between microservices. 
  • How to simplify operations and prevent security holes as the number of microservices in production grows. 
  • What is involved in hardening Istio into an enterprise-class service mesh. 
  • How mTLS provides zero-trust based approach to security. 
  • How Aspen Mesh uses crypto to give each container its own identity (using a framework called SPIFFE). Then when containers talk to each other through the service mesh, they prove who they are cryptographically. 
  • Secure ingress and egress, and Cloud Native packet capture. 

Aspen Mesh Leads the Way for a Secure Open Source Istio

Here at Aspen Mesh, we entrenched ourselves in the Istio project not long after its start. Recognizing Istio's potential early on, we committed to building our entire company with Istio at its core. From the early days of the project, Aspen Mesh took an active role in Istio -- we've been part of the community since Fall of 2017. Among our many firsts, Aspen Mesh was the first non-founding company to have someone on the Technical Oversight Committee (TOC) and have a release manager role when we helped manage the release of Istio v1.6 in 2020.

Ensuring open source Istio continues to set the standard as the foundation for a secure enterprise-class service mesh is important to us. I hold a seat on the Istio Product Security Working Group (PSWG), where we continuously monitor and address potential Common Vulnerability and Exposures (CVEs) reports for Istio and its dependencies like the Envoy project. In fact, we helped create the PSWG in collaboration with other community leaders to ensure Istio remains a secure project with well-defined practices around responsible early disclosures and incident management.

Along with me, my colleague, Jacob Delgado has been a tremendous contributor to Istio's security and he currently leads the Product Security Working Group.

Aspen Mesh leads contribution to Open Source Istio

The efforts of Aspen 'Meshers' can be seen across Istio's architecture today, and we add features to open source Istio regularly. Some of the major features we've added include Elliptic Curve Cryptography (ECC) support, Configuration validation (istio-vet -> Istio analyzers), custom tracing tags, and Help v3 support. We are a Top 5 Istio Contributor of Pull Requests (PRs). One of our primary areas of focus is helping to shape and harden Istio's security. We have responsibly reported several critical CVEs and addressed them as part of PSWG like the Authentication Policy Bypass CVE. You can read more about how security releases and 0-day critical CVE patches are handled in Istio in this blog authored by my colleague Jacob.

Istio Security Assessment Report findings announced in 2021

The success of the Istio project and its critical use enforcing key security policies in infrastructure across a wide swath of industries was the impetus for a comprehensive security assessment that began in 2020. In order to determine whether there were any security issues in the Istio code base, a third-party security assessment of the Istio project was conducted last year that enlisted the NCC Group and sought collaboration with subject matter experts across the community.

This in-depth assessment focused on Istio’s architecture as a whole, looking at security related issues with a focus on key components like istiod (Pilot), Ingress/Egress gateways, and Istio’s overall Envoy usage as its data plane proxy for Istio version 1.6.5. Since the report, the Product Security Working Group has issued several security releases as new vulnerabilities were disclosed, along with fixes to address concerns raised in the report. A good outcome of the report is the detailed Security Best Practices Guide developed for Istio users.

I invite you to read a summary of the Istio Security Assessment Report I compiled for the Istio community. I detail the key areas of the report and distill what it means for Istio users today and looking ahead. Whether you're a current open source Istio user, like keeping up on all things security, or you want a deep dive into Istio Security, you can get the highlights from the blog I posted for the Istio community -- and from there you can also download the entire report.

At Aspen Mesh, we build upon the security features Istio provides and address enterprise security requirements with a zero-trust based service mesh that provides security within the Kubernetes cluster, provides monitoring and alerts, and ensures highly-regulated industries maintain compliance. You can read about how we think about security in our white paper, Adopting a Zero-Trust Approach to Security for Containerized Applications.

If you'd like to talk to us about what enterprise security in a service mesh looks like, please get in touch!

-Neeraj

 

istio test stats from cncf.io

Sailing Faster with Istio

While the extraordinarily large shipping container, Ever Given, ran aground in the Suez Canal, halting a major trade route that has caused losses in the billions, our solution engineers at Aspen Mesh have been stuck diagnosing a tricky Istio and Envoy performance bottleneck on their own island for the past few weeks. Though the scale and global impacts of these two problems is quite different, it has presented an interesting way to correlate a global shipping event with the metaphorical nautical themes used by Istio. To elaborate on this theme, let’s switch from containers carrying dairy, and apparently everything else under the sun, to containers shuttling network packets.

To unlock the most from containers and microservices architecture, Istio (and Aspen Mesh) uses a sidecar proxy model. Adding sidecar proxies into your mesh provides a host of benefits, from uniform identity to security to metrics and advanced traffic routing. As Aspen Mesh customers range from large enterprises all the way to service providers, the performance impacts of adding these sidecars is as important to us as the benefits outlined above. The performance experiment that I’m going to cover in this blog is geared toward evaluating the impact of adding sidecar proxies in high throughput scenarios on the server or client, or both sides.

We have encountered workloads, especially in the service provider space, where there are high requests or transactions-per-second requirements for a particular service. Also, scaling up — i.e., adding more CPU/memory — is preferable to scaling out. We wanted to test the limits of sidecar proxies with regards to the maximum achievable throughput so that we can tune and optimize our model to meet the performance requirements of the wide variety of workloads used by our customers.

Throughput Test Setup

The test setup we used for this experiment was rather simple: a Fortio client and server running on Kubernetes on large AWS node instance types like burstable t3.2xlarge with 8 vCPUs and 32 GB of memory or dedicated m5.8xlarge instance types which have 32 vCPUs and 128 GB of memory. The test was running a single instance of the Fortio client and server pod with no resource constraints on their own dedicated nodes. The Fortio client was run in a mode to maximize throughput like this:

The above command runs the test for 60 seconds with queries per second (QPS) 0 (i.e. maximum throughput with a varying number of simultaneous parallel connections). With this setup on a t3.2xlarge machine, we were able to achieve around 100,000 QPS. Further increasing the number of parallel connections didn’t result in throughput beyond ~100K QPS, signaling a possible CPU bottleneck. Running the same experiment on an m5.8xlarge instance, we could achieve much higher throughput around 300,000 QPS or higher depending upon the parallel connection settings.

This was sufficient proof of CPU throttling. As adding more CPUs increased the QPS, we felt that we had a reasonable baseline to start evaluating the effects of adding sidecar proxies in this setup.

Adding Sidecar Proxies on Both Ends

Next, with the same setup on t3.2xlarge instances, we added Istio sidecar proxies on both Fortio client and server pods with Aspen Mesh default settings; mTLS STRICT setting, access logging enabled and the default concurrency (worker threads) of 2. With these parameters, and running the same command as before, we could only get a maximum throughput of around ~10,000 QPS.

This is a factor of 10 reduction in throughput. This was expected as we had only configured two worker threads, which were hopefully running at their maximum capacity but could not keep up with client load.

So, the logical next step for us was to increase the concurrency setting to run more worker threads to accept more connections and achieve higher throughput. In Istio and Aspen Mesh, you can set the proxy concurrency globally via the concurrency setting in proxy config under mesh config or override them via pod annotations like this:

Note that using the value “0” for concurrency configures it to use all the available cores on the machine. We increased the concurrency setting from two to four to six and saw a steady increase in maximum throughput from 10K QPS to ~15K QPS to ~20K QPS as expected. However, these numbers were still quite low (by a factor of five) as compared to the results with no sidecar proxies.

To eliminate the CPU throttling factor, we ran the same experiment on m5.8xlarge instances with even higher concurrency settings but the maximum throughput we could achieve was still around ~20,000 QPS.

This degradation was far from acceptable, so we dug into why the throughput was low even with sufficient worker threads configured on the sidecar proxies.

Peeling the Onion

To investigate this issue, we looked at the CPU utilization metrics in the server pod and noticed that the CPU utilization as a percentage of total requested CPUs was not very high. This seemed odd as we expected the proxy worker threads to be spinning as fast as possible to achieve the maximum throughput, so we needed to investigate further to understand the root cause.

To get a better understanding of low CPU utilization, we inspected the connections received by the server sidecar proxy. Envoy’s concurrency model relies on the kernel to distribute connections between the different worker threads listening on the same socket. This means that if the number of connections received at the server sidecar proxy is less than the number of worker threads, you can never fully use all CPUs.

As this investigation was purely on the server-side, we ran the above experiment again with the Fortio client pod, but this time without the sidecar proxy injected and only the Fortio server pod with the proxy injected. We found that the maximum throughput was still limited to around ~20K QPS as before, thereby hinting at issues on the server sidecar proxy.

To investigate further, we had to look at connection level metrics reported by Envoy proxy. Later in this article, we’ll see what happens to this experiment with Envoy metrics exposed. (By default, Istio and Aspen Mesh don’t expose the connection-level metrics from Envoy.)

These metrics can be enabled in Istio version 1.8 and above by following this guide and adding the appropriate pod annotations corresponding to the metrics you want to be exposed. Envoy has many low-level metrics emitted at high resolution that can easily overwhelm your metrics backend for a moderately sized cluster, so you should enable this cautiously in production environments.

Additionally, it can be quite a journey to find the right Envoy metrics to enable, so here’s what you will need to get connection-level metrics. On the server-side pod, add the following annotation:

This will enable reporting for all listeners configured by Istio, which can be a lot depending upon the number of services in your cluster, but only enable the downstream connections total counter and downstream connections active gauge metrics.

To look at these metrics, you can use your Prometheus dashboard, if it’s enabled, or port-forward to the server pod under test to port 15000 and navigate to http://localhost:15000/stats/prometheus. As there are many listeners configured by Istio, it can be tricky to find the correct one. Here’s a quick primer on how Istio sets up Envoy configuration. (You can find the complete list of Envoy listener metrics here.)

For any inbound connections to a pod from clients outside of the pod, Istio configures a virtual inbound listener at 0.0.0.0:15006, which receives all the traffic from iptables’ redirect rules. This is the only listener that’s actually configured to receive connections from the kernel, and after the connection is received, it is matched against filter chain attributes to proxy the traffic to the correct application port on localhost. This means that even though the Fortio client above is targeting port 8080, we need to look at the total and active connections for the virtual inbound listener at 0.0.0.0:15006 instead of 0.0.0.0:8080. Looking at this metric, we found that the number of active connections were close to the configured number of simultaneous connections on the Fortio client side. This invalidated our theory about the number of connections being less than worker threads.

The next step in our debugging journey was to look at the number of connections received on each worker thread. As I had alluded to earlier, Envoy relies on the kernel to distribute the accepted connections to different worker threads, and for all the worker threads to be fully utilizing the allotted CPUs, the connections also need to be fairly balanced. Luckily, Envoy has per-worker metrics for listeners that can be enabled to understand the distribution. Since these metrics are rooted at listener.<address>.<handler>.<metric name>, the regex provided in the annotation above should also expose these metrics. The per-worker metrics looked like this:

As you can see from the above image, the connections were far from being evenly distributed among the worker threads. One thread, worker 10, had 11.5K active connections as compared to some threads which had around ~1-1.5K active connections, and others were even lower. This explains the low CPU utilization numbers as most of the worker threads just didn’t have enough connections to do useful work.

In our Envoy research, we quickly stumbled upon this issue, which very nicely sums up the problem and the various efforts that have been made to fix it.

Image via Pixabay.

So, next, we went looking for a solution to fix this problem. It seemed like, for the moment, our own Ever Given was stuck as some diligent worker threads struggled to find balance. We needed an excavator to start digging.

While our intrepid team tackled the problem of scaling for high-throughput workloads by adding sidecar proxies, we encountered a bottleneck not entirely unlike what the Ever Given experienced not long ago in the Suez Canal.

Luckily, we had a few more things to try, and we were ready to take a closer look at the listener metrics.

Let There Be Equality Among Threads!

After parsing through the conversations in the issue, we found the pull request that enabled a configuration option to turn on a feature to achieve better balancing across worker threads. At this point, trying this out seemed worthwhile, so we looked at how to enable this in Istio. (Note that as part of this PR, the per-worker thread metrics were added, which was useful in diagnosing this problem.)

For all the ignoble things EnvoyFilter can do in Istio, it’s useful in situations like these to quickly try out new Envoy configuration knobs without making code changes in “istiod” or the control plane. To turn the “exact balance” feature on, we created an EnvoyFilter resource like this:

With this configuration applied and with bated breath, we ran the experiment again and looked at the per-worker thread metrics. Voila! Look at the perfectly balanced connections in the image below:

Measuring the throughput with this configuration set, we could achieve around ~80,000 QPS, which is a significant improvement over the earlier results. Looking at CPU utilization, we saw that all the CPUs were fully pegged at or near 100%. This meant that we were finally seeing the CPU throttling. At this point, by adding more CPUs and a bigger machine, we could achieve much higher numbers as expected. So far so good.

As you may recall, this experiment was purely to test the effects of server sidecar proxy, so we removed the client sidecar proxy for these tests. It was now time to measure performance with both sidecars added.

Measuring the Impacts of a Client Sidecar Proxy

With this exact balancing configuration enabled on the inbound port (server side only), we ran the experiment with sidecars on both ends. We were hoping to achieve high throughputs that could only be limited by the number of CPUs dedicated to Envoy worked threads. If only things were that simple.

We found that the maximum throughput was once again capped at around ~20K QPS.

A bit disappointing, but since we then knew about the issue of connection imbalance on the server side, we reasoned that the same could happen on the client side between the application and the sidecar proxy container on localhost. First, we enabled the following metrics on the client-side proxy:

In addition to the listener metrics, we also enabled cluster-level metrics, which emit total and active connections for any upstream cluster. We wanted to verify that the client sidecar proxy was sending a sufficient number of connections to the upstream Fortio server cluster to keep the server worker threads occupied. We found that the number of active connections mirrored the number of connections used by the Fortio client in our command. This was a good sign. Note that Envoy doesn’t report cluster-level metrics at the per-worker level, but these are all aggregated, so there’s no way for us to know how the connections were distributed on the outbound side.

Next, we inspected the listener connection statistics on the client side similar to the server side to ensure that we were not having connection imbalance issues. The outbound listeners, or the listeners set up to handle traffic originating from the application in the same pod as the sidecar proxy, are set up a bit differently in Istio as compared to the inbound side. For outbound traffic, a virtual listener “0.0.0.0:15001” is created similar to the listener on “0.0.0.0:15006,” which is the target for iptables redirect rules. Unlike the inbound side, the virtual listener hands off the connection to the more specific listener like “0.0.0.0:8080” based on the original destination address. If there are no specific matches, then the listener configuration in the virtual outbound takes effect. This can block or allow all traffic depending on your configured outbound traffic policy. In the traffic flow from the Fortio client to server, we expected the listener at “0.0.0.0:8080” to be handling connections on the client-side proxy, so we inspected connections metrics at this listener. The listener metrics looked like this:

The above image shows the connection imbalance issue between worker threads as we saw it on the server side. However, the connections on the outbound client-side proxy were only getting handled by one worker thread which explains the poor throughput QPS numbers. Having fixed this on the server-side, we applied a similar EnvoyFilter configuration with minor tweaks for context and port to address this imbalance:

Surely, applying this resource would fix our issue and we would be able to achieve high QPS with both client and server sidecar proxies with sufficient CPUs allocated to them. Well, we ran the experiment again and saw no difference in the throughput numbers. Checking the listener metrics again, we saw that even with this EnvoyFilter resource applied, only one worker thread was handling all the connections. We also tried applying the exact balance config on both virtual outbound port 15001 and outbound port 8080, but the throughput was still limited to 20K QPS.

This warranted the next round of investigations.

Original Destination Listeners, Exact Balance Issues

We went around looking in Envoy code and opened Github issues to understand why the client-side exact balance configuration was not taking effect, while the server side was working wonders. The key difference between the two listeners, other than the directionality, was that the virtual outbound listener “0.0.0.0:15001” was an original destination listener, which hands over connections to other listeners matched on the original destination address. With help from the Istio community (thanks, Yuchen Dai from Google), we found this open issue, which explains this behavior in a rather cryptic way.

Basically, the current exact balance implementation relies on connection counters per worker thread to fix the imbalance. When the original destination is enabled on the virtual outbound listener, the connection counter on the worker thread is incremented when a connection is received, but as the connection is immediately handed to the more specific listener like “0.0.0.0:8080,” it is decremented again. This quick increase and decrease in the internal count spoofs the exact balancer into thinking the balance is perfect as all these counters are always at zero. It also appears that applying the exact balance on the listener that handles the connection, “0.0.0.0:8080” in this case, but doesn’t accept the connection from the kernel has no effect due to current implementation limitations.

Fortunately, the fix for this issue is in progress, and we’ll be working with the community to get this addressed as quickly as possible. In the meantime, if you’re getting hit by these performance issues on the client side, scaling out with a lower concurrency setting is a better approach to reach higher throughput QPS numbers than scaling up with higher concurrency and worker threads. We are also working with the Istio community to provide configuration knobs for enabling exact balance in Envoy to optionally switch default settings so that everyone can benefit from our findings.

Working on this performance analysis was interesting and a challenge in its own way, like the small tractor next to the giant ship trying to make it move.

Well, maybe not exactly, but it was a learning experience for me and my team, and I’m glad we are able to share our learnings with the rest of the community as this aspect of Istio is often overlooked by the broader vendor ecosystem. We will run and publish performance numbers related to the impact of turning on various features such as mTLS, access logging and tracing in high-throughout scenarios in future blogs, so if you’re interested in this topic, subscribe to our blog to get updates or reach out to us with any questions.

I would like to give a special mention to my team members Pawel and Bart who patiently and diligently ran various test scenarios, collected data and were uncompromising in their pursuit to get the last bit out of Istio and Aspen Mesh. It’s not surprising. After all, being part of F5, taking performance seriously is just part of our DNA. 


istiocon 9 trends

Top 9 Takeaways from IstioCon 2021

At the beginning of last year, we predicted the top three developments around service mesh in 2020 would be:

  1. A quickly growing need for service mesh
  2. Istio will be hard to beat
  3. Core service mesh use cases will emerge that will be used as models for the next wave of adopters

And we were right about all three, as evidenced by what we learned at IstioCon.

As a new community-led event, IstioCon 2021 provided the first organized opportunity for Istio’s community members to gather together on a large, worldwide scale, to present, learn and discuss the many features and benefits of the Istio service mesh. And this event was a resounding success.

With over 4,000 attendees — in its first year, and as a virtual event — IstioCon attendance exceeded expectations by multiples. The event showcased the lessons learned from running Istio in production, first-hand experiences from the Istio community, and featured maintainers from across the Istio ecosystem including Lin Sun, John Howard, Christian Posta, Neeraj Poddar, and more. With sessions presented across five days in English, as well as keynotes and sessions in Chinese, this was indeed a worldwide effort. It is well-known that the Istio community reaches far and wide, but it was fantastic to see that so many people interested in, considering, and even using Istio in production at scale were ready to show up and share.

But apart from the outstanding response of the Istio community, we were particularly excited to dig into what people are really using this service mesh for and how they’re interacting with it. So, we’ve pulled together the below curated list of top Istio trends, hot topics, and our top three list of sessions you don’t want to miss.

Top 3 Istio Service Mesh Trends to Watch

After watching each session (so you don’t have to!), we’ve distilled the top three service mesh and Istio industry takeaways that came out of IstioCon that you should keep on your radar.

1. Istio is production-ready. No longer just a shiny new object, this nascent technology has transformed over the past few years from a new infrastructure technology into the microservices management technology that people are using, now, in production and at scale at real companies. We saw insightful user story presentations from T-Mobile, Airbnb, eBay, Salesforce, FICO, and more.

2. Istio is more versatile than you thought. Did you know that Istio is being used right now by users and companies to manage everything from user-facing applications like Airbnb to behind-the-scenes infrastructure like running 5G?

3. Istio and Kubernetes have a lot in common. There are lots of similarities between Istio and Kubernetes in terms of how these technologies have developed, and how they are being adopted. It’s well known that Kubernetes is “the defacto standard for cloud native applications.” Istio is being called ”the most popular service mesh” according to the CNCF annual user survey. But more than this, the two are growing closer together in terms of the technologies themselves. We look forward to the growth of both technologies.

Top 3 Hot Topics

In addition to higher level industry trends, there were many other hot topics that surfaced as part of this conference. From security to Wasm, multicluster, integrations, policies, ORAS, and more, there is a lot going on in the service mesh marketplace that many folks may not have realized. Here are the three hot topics we’d like you to know about:

1. Mulitcluster. You can configure a single mesh to include multiple clusters. Using a multicluster deployment within a single mesh affords capabilities beyond that of a single cluster deployment, including fault isolation and fail over, location-aware routing, various control plane models, and team or project isolation. It was indeed a hot topic at IstioCon, with an entire workshop devoted to Istio Multicluster, plus two additional individual sessions and a dedicated office-hours session about multicluster.

2. Wasm. WebAssembly (Wasm) is a sandboxing technology that can be used to extend the Istio proxy (Envoy). The Proxy-Wasm sandbox API replaces Mixer as the primary extension mechanism in Istio. Over the past year, Wasm has come further to the forefront in terms of interest, as seen here by garnering two sessions plus its own office-hours session.

3. Security. Let’s face it, we’re all concerned about security, and with good reason. Istio has decided to face security challenges head on, and while not exactly a new topic, it’s one worth reiterating. The Istio Product Security Working Group had a session, plus we saw two more sessions featuring security as a headliner, and a dedicated office-hours session. 

Side note: Aspen Mesh had a tie with one another hot topic; debugging Istio. If you get a chance, check out the three recorded sessions on debugging as well.

Top 3 Sessions You Will Want to Watch On-demand

Not everyone has time to watch a conference for five days in a row. And that’s ok. There are about 77 sessions we wish you could watch, but we’ve also identified the top three we think you’ll get the most out of. Check these out:

1. Using Istio to Build the Next Generation 5G Platform. As the most-watched session at this event, we have to start here. In this session, Aspen Mesh’s Co-founder and Chief Architect Neeraj Poddar and David Lenrow, Senior Principal Cloud Security Architect at Verizon, covered what 5G is and why it matters, architecture options with Istio, platform requirements, security, and more.

2. User story from Salesforce - The Salesforce Service Mesh: Our Istio Journey. In this session, Salesforce Software Architect Pratima Nambiar talked us through their background around why they needed a service mesh, their initial implementation, Istio’s value, progressive adoption of Istio, and features they are watching and expect to adopt. 

3. User story from eBay - Istio at Scale: How eBay is Building a Massive Multitenant Service Mesh Using Istio. In this session, Sudheendra Murthy covered eBay’s story, from their applications deployment to service mesh journey, scale testing, and future direction.

What’s Next for Istio?

We were excited to be part of this year’s IstioCon, and it was wonderful to see the Istio community come together for this new event. As our team members have been key contributors to the Istio project over the past few years, we’ve had a front row seat at the growth of the project itself along with the community.

To learn more about what the Istio project has coming up on the horizon, check out this project roadmap session. We’re looking forward to the continued growth of this open source technology, so that more companies — and people — can benefit from what it has to offer.


doubling down on istio

Doubling Down On Istio

Good startups believe deeply that something is true about the future, and organize around it.

When we founded Aspen Mesh as a startup inside of F5, my co-founders and I believed these things about the future:

  1. App developers would accelerate their pace of innovation by modularizing and building APIs between modules packaged in containers.
  2. Kubernetes APIs would become the lingua franca for describing app and infrastructure deployments and Kubernetes would be the best platform for those APIs.
  3. The most important requirement for accelerating is to preserve control without hindering modularity, and that’s best accomplished as close to the app as possible.

We built Aspen Mesh to address item 3. If you boil down reams of pitch decks, board-of-directors updates, marketing and design docs dating back to summer of 2017, that's it. That's what we believe, and I still think we're right.

Aspen Mesh is a service mesh company, and the lowest levels of our product are the open-source service mesh Istio. Istio has plenty of fans and detractors; there are plenty of legitimate gripes and more than a fair share of uncertainty and doubt (as is the case with most emerging technologies). With that in mind, I want to share why we selected Istio and Envoy for Aspen Mesh, and why we believe more strongly than ever that they're the best foundation to build on.

 

Why a service mesh at all?

A service mesh is about connecting microservices. The acceleration we're talking about relies on applications that are built out of small units (predominantly containers) that can be developed and owned by a single team. Stitching these units into an overall application requires APIs between them. APIs are the contract. Service Mesh measures and assists contract compliance. 

There's more to it than reading the 12-factor app. All these microservices have to effectively communicate to actually solve a user's problem. Communication over HTTP APIs is well supported in every language and environment so it has never been easier to get started.  However, don't let the simplicity delude: you are now building a distributed system. 

We don't believe the right approach is to demand deep networking and infrastructure expertise from everyone who wants to write a line of code.  You trade away the acceleration enabled by containers for an endless stream of low-level networking challenges (as much as we love that stuff, our users do not). Instead, you should preserve control by packaging all that expertise into a technology that lives as close to the application as possible. For Kubernetes-based applications, this is a common communication enhancement layer called a service mesh.

How close can you get? Today, we see users having the most success with Istio's sidecar container model. We forecasted that in 2017, but we believe the concept ("common enhancement near the app") will outlive the technical details.

This common layer should observe all the communication the app is making; it should secure that communication and it should handle the burdens of discovery, routing, version translation and general interoperability. The service mesh simplifies and creates uniformity: there's one metric for "HTTP 200 OK rate", and it's measured, normalized and stored the same way for every app. Your app teams don't have to write that code over and over again, and they don't have to become experts in retry storms or circuit breakers. Your app teams are unburdened of infrastructure concerns so they can focus on the business problem that needs solving.  This is true whether they write their apps in Ruby, Python, node.js, Go, Java or anything else.

That's what a service mesh is: a communication enhancement layer that lives as close to your microservice as possible, providing a common approach to controlling communication over APIs.

 

Why Istio?

Just because you need a service mesh to secure and connect your microservices doesn't mean Envoy and Istio are the only choice.  There are many options in the market when it comes to service mesh, and the market still seems to be expanding rather than contracting. Even with all the choices out there, we still think Istio and Envoy are the best choice.  Here's why.

We launched Aspen Mesh after learning some lessons with a precursor product. We took what we learned, re-evaluated some of our assumptions and reconsidered the biggest problems development teams using containers were facing. It was clear that users didn't have a handle on managing the traffic between microservices and saw there weren't many using microservices in earnest yet so we realized this problem would get more urgent as microservices adoption increased. 

So, in 2017 we asked what would characterize the technology that solved that problem?

We compared our own nascent work with other purpose-built meshes like Linkerd (in the 1.0 Scala-based implementation days) and Istio, and non-mesh proxies like NGINX and HAProxy. This was long before service mesh options like Consul, Maesh, Kuma and OSM existed. Here's what we thought was important:

  • Kubernetes First: Kubernetes is the best place to position a service mesh close to your microservice. The architecture should support VMs, but it should serve Kubernetes first.
  • Sidecar "bookend" Proxy First: To truly offload responsibility to the mesh, you need a datapath element as close as possible to the client and server.
  • Kubernetes-style APIs are Key: Configuration APIs are a key cost for users.  Human engineering time is expensive. Organizations are judicious about what APIs they ask their teams to learn. We believe Kubernetes API design and mechanics got it right. If your mesh is deployed in Kubernetes, your API needs to look and feel like Kubernetes.
  • Open Source Fundamentals: Customers will want to know that they are putting sustainable and durable technology at the core of their architecture. They don't want a technical dead-end. A vibrant open source community ensures this via public roadmaps, collaboration, public security audits and source code transparency.
  • Latency and Efficiency: These are performance keys that are more important than total throughput for modern applications.

As I look back at our documented thoughts, I see other concerns, too (p99 latency in languages with dynamic memory management, layer 7 programmability). But the above were the key items that we were willing to bet on. So it became clear that we had to palace our bet on Istio and Envoy. 

Today, most of that list seems obvious. But in 2017, Kubernetes hadn’t quite won. We were still supporting customers on Mesos and Docker Datacenter. The need for service mesh as a technology pattern was becoming more obvious, but back then Istio was novel - not mainstream. 

I'm feeling very good about our bets on Istio and Envoy. There have been growing pains to be sure. When I survey the state of these projects now, I see mature, but not stagnant, open source communities.  There's a plethora of service mesh choices, so the pattern is established.  Moreover the continued prevalence of Istio, even with so many other choices, convinces me that we got that part right.

 

But what about...?

While Istio and Envoy are a great fit for all those bullets, there are certainly additional considerations. As with most concerns in a nascent market, some are legitimate and some are merely noise. I'd like to address some of the most common that I hear from conversations with users.

"I hear the control plane is too complex" - We hear this one often. It’s largely a remnant of past versions of Istio that have been re-architected to provide something much simpler, but there's always more to do. We're always trying to simplify. The two major public steps that Istio has taken to remedy this include removing standalone Mixer, and co-locating several control plane functions into a single container named istiod.

However, there's some stuff going on behind the curtains that doesn't get enough attention. Kubernetes makes it easy to deploy multiple containers. Personally, I suspect the root of this complaint wasn't so much "there are four running containers when I install" but "Every time I upgrade or configure this thing, I have to know way too many details."  And that is fixed by attention to quality and user-focus. Istio has made enormous strides in this area. 

"Too many CRDs" - We've never had an actual user of ours take issue with a CRD count (the set of API objects it's possible to define). However, it's great to minimize the number of API objects you may have to touch to get your application running. Stealing a paraphrasing of Einstein, we want to make it as simple as possible, but no simpler. The reality: Istio drastically reduced the CRD count with new telemetry integration models (from "dozens" down to 23, with only a handful involved in routine app policies). And Aspen Mesh offers a take on making it even simpler with features like SecureIngress that map CRDs to personas - each persona only needs to touch 1 custom resource to expose an app via the service mesh.

"Envoy is a resource hog" - Performance measurement is a delicate art. The first thing to check is that wherever you're getting your info from has properly configured the system-under-measurement.  Istio provides careful advice and their own measurements here.  Expect latency additions in the single-digit-millisecond range, knowing that you can opt parts of your application out that can't tolerate even that. Also remember that Envoy is doing work, so some CPU and memory consumption should be considered a shift or offload rather than an addition. Most recent versions of Istio do not have significantly more overhead than other service meshes, but Istio does provide twice as many feature, while also being available in or integrating with many more tools and products in the market. 

"Istio is only for really complicated apps” - Sure. Don’t use Istio if you are only concerned with a single cluster and want to offload one thing to the service mesh. People move to Kubernetes specifically because they want to run several different things. If you've got a Money-Making-Monolith, it makes sense to leave it right where it is in a lot of cases. There are also situations where ingress or an API gateway is all you need. But if you've got multiple apps, multiple clusters or multiple app teams then Kubernetes is a great fit, and so is a service mesh, especially as you start to run things at greater scale.

In scenarios where you need a service mesh, it makes sense to use the service mesh that gives you a full suite of features. A nice thing about Istio is you can consume it piecemeal - it does not have to be implemented all at once. So you only need mTLS and tracing now? Perfect. You can add mTLS and tracing now and have the option to add metrics, canary, traffic shifting, ingress, RBAC, etc. when you need it.

We’re excited to be on the Istio journey and look forward to continuing to work with the open source community and project to continue advancing service mesh adoption and use cases. If you have any particular question I didn’t cover, feel free to reach out to me at @notthatjenkins. And I'm always happy to chat about the best way to get started on or continue with service mesh implementation. 


steering future of istio

Steering The Future Of Istio

I’m honored to have been chosen by the Istio community to serve on the Istio Steering Committee along with Christian Posta, Zack Butcher and Zhonghu Xu. I have been fortunate to contribute to the Istio project for nearly three years and am excited by the huge strides the project has made in solving key challenges that organizations face as they shift to cloud-native architecture. 

Maybe what’s most exciting is the future direction of the project. The core Istio community realizes and advocates that innovation in Open Source doesn't stop with technology - it’s just the starting point. New and innovative ways of growing the community include making contributions easier, Working Group meetings more accessible and community meetings an open platform for end users to give their feedback. As a member of the steering committee, one of my main goals will be to make it easier for a diverse group of people to more easily contribute to the project.

Sharing my personal journey with Istio, when I started contributing to Istio, I found it intimidating to present rough ideas or proposals in an open Networking WG meeting filled with experts and leaders from Google & IBM (even though they were very welcoming). I understand how difficult it can be to get started on contributing to a new community, so I want to ensure the Working Group and community meetings are a place for end users and new contributors to share ideas openly, and also to learn from industry experts. I will focus on increasing participation from diverse groups, through working to make Istio the most welcoming community possible. In this vein, it will be important for the Steering Committee to further define and enforce a code of conduct creating a safe place for all contributors.

The Istio community’s effort towards increasing open governance by ensuring no single organization has control over the future of the project has certainly been a step in the right direction with the new makeup of the steering committee. I look forward to continuing work in this area to make Istio the most open project it can be. 

Outside of code contributions, marketing and brand identity are critically important aspects of any open source project. It will be important to encourage contributions from marketing and business leaders to ensure we recognize non-technical contributions. Addressing this is less straightforward than encouraging and crediting code commits, but a diverse vendor neutral marketing team in Open Source can create powerful ways to reach users and drive adoption, which is critical to the success of any open source project. Recent user empathy sessions and user survey forms are a great starting point, but our ability to put these learning into actions and adapt as a community will be a key driver in growing project participation.

Last, but definitely not least, I’m keen to leverage my experience and feedback from years of work with Aspen Mesh customers and broad enterprise experience to make Istio a more robust and production-ready project. 

In this vein, my fellow Aspen Mesher Jacob Delgado has worked tirelessly for many months contributing to Istio. As a result of his contributions, he has been named a co-lead for the Istio Product Security Working Group. Jacob has been instrumental in championing security best practices for the project and has also helped responsibly remediate several CVEs this year. I’m excited to see more contributors like Jacob make significant improvements to the project.

I'm humbled by the support of the community members who voted in the steering elections and chose such a talented team to shepherd Istio forward. I look forward to working with all the existing, and hopefully many new, members of the Istio community! You can always reach out to me through email, Twitter or Istio Slack for any community, technical or governance matter, or if you just want to chat about a great idea you have.


Helping Istio Sail

Around three years ago, we recognized the power of service mesh as a technology pattern to help organizations manage the next generation of software delivery challenges, and that led to us founding Aspen Mesh. And we know that efficiently managing high-scale Kubernetes applications requires the power of Istio. Having been a part of the Istio project since 0.2, we have seen countless users and customers benefit from the observability, security and traffic management capabilities that a service mesh like Istio provides. 

Our engineering team has worked closely with the Istio community over the past three years to help drive the stability and add new features that make Istio increasingly easy for users to adopt. We believe in the value that open source software — and an open community  —  brings to the enterprise service mesh space, and we are driven to help lead that effort through setting sound technical direction. By contributing expertise, code and design ideas like Virtual Service delegation and enforcing RBAC policies for Istio resources, we have focused much of our work on making Istio more enterprise-ready. One of these contributions includes Istio Vet, which was created and open sourced in early days of Aspen Mesh as a way to enhance Istio's user experience and multi resource configuration validation. Istio Vet proved to be very valuable to users, so we decided to work closely with the Istio community to create istioctl analyze in order to add key configuration analysis capabilities to Istio. It’s very exciting to see that many of these ideas have now been implemented in the community and part of the available feature set for broader consumption. 

As the Istio project and its users mature, there is a greater need for open communication around product security and CVEs. Recognizing this, we were fortunate to be able to help create the early disclosure process and lead the product security working group which ensures the overall project is secure and not vulnerable to known exploits. 

We believe that an open source community thrives when users and vendors are considered as equal partners. We feel privileged that we have been able to provide feedback from our customers that has helped accelerate Istio’s evolution. In addition, it has been a privilege to share our networking expertise from our F5’s heritage with the Istio project as maintainers and leads of important functional areas and key working groups.

The technical leadership of Istio is a meritocracy and we are honored that our sustained efforts have been recognized with - my appointment to the Technical Oversight Committee

As a TOC member, I am excited to work with other Istio community leaders to focus the roadmap on solving customer problems while keeping our user experience top of mind. The next, most critical challenges to solve are day two problems and beyond where we need to ensure smoother  upgrades, enhanced security and scalability of the system. We envision use cases emerging from industries like Telco and FinServ which will  push the envelope of technical capabilities beyond what many can imagine.

It has been amazing to see the user growth and the maturity of the Istio project over the last three years. We firmly believe that a more diverse leadership and an open governance in Istio will further help to advance the project and increase participation from developers across the globe.

The fun is just getting started and I am honored to be an integral part of Istio’s journey! 


Protocol Sniffing Service Mesh

Protocol Sniffing in Production

Istio 1.3 introduced a new capability to automatically sniff the protocol used when two containers communicate. This is a powerful benefit to easily get started with Istio, but it has some tradeoffs.  Aspen Mesh recommends that production deployments of Aspen Mesh (built on Istio) do not use protocol sniffing, and Aspen Mesh 1.3.3-am2 turns off protocol sniffing by default. This blog explains the tradeoffs and the reasoning we think turning off protocol sniffing is the better tradeoff.  

What Protocol Sniffing Is

Protocol sniffing predates Istio. For our purposes, we're going to define it as examining some communication stream and classifying it as implementing one protocol (like HTTP) or another (like SSH), without additional information. For example, here's two streams from client to server, if you've ever debugged these protocols you won't have a hard time telling them apart:

Protocol Sniffing Service Mesh

In an active service mesh, the Envoy sidecars will be handling thousands of these streams a second.  The sidecar is a proxy, so it reads every byte in the stream from one side, examines it, applies policy to it and then sends it on.  In order to apply proper policy ("Send all PUTs to /create_* to create-handler.foo.svc.cluster.local"), Envoy needs to understand the bytes it is reading.  Without protocol sniffing, that's done by configuring Envoy:

  • Layer 7 (HTTP): "All streams with a destination port of 8000 are going to follow the HTTP protocol"
  • Layer 4 (SSH): "All streams with a destination port of 22 are going to follow the SSH protocol"

When Envoy sees a stream with destination port 8000, it reads each byte and runs its own HTTP protocol implementation to understand those bytes and then apply policy.  Port 22 has SSH traffic; Envoy doesn't have an SSH protocol implementation so Envoy treats it as opaque TCP traffic. In proxies this is often called "Layer 4 mode" or "TCP mode"; this is when the proxy doesn't understand the higher-level protocol inside, so it can only apply a simpler subset of policy or collect a subset of telemetry.

For instance, Envoy can tell you how many bytes went over the SSH stream, but it can't tell you anything about whether those bytes indicated a successful SSH session or not.  But since Envoy can understand HTTP, it can say "90% of HTTP requests are successful and get a 200 OK response".

Here's an analogy - I speak English but not Italian; however, I can read and write the Latin alphabet that covers both.  So I could copy an Italian message from one piece of paper to another without understanding what's inside. Suppose I was your proxy and you said, "Andrew, copy all mail correspondence into email for me" - I could do that whether you received letters from English-speaking friends or Italian-speaking ones.  Now suppose you say, "Copy all mail correspondence into email unless it has Game of Thrones spoilers in it."  I can detect spoilers in English correspondence because I actually understand what's being said, but not Italian, where I can only copy the letters from one page to the other.

If I were a proxy, I'm a layer 7 English proxy but I only support Italian in layer 4 mode.

In Aspen Mesh and Istio, the protocol for a stream is configured in the Kubernetes service.  These are the options:

  • Specify a layer 7 protocol: Start the name of the service with a layer 7 protocol that Istio and Envoy understand, for example "http-foo" or "grpc-bar".
  • Specify a layer 4 protocol: Start the name of the service with a layer 4 protocol, for example "tcp-foo".  (You also use this if you know the layer 7 protocol but it's not one that Istio and Envoy support; for example, you might name a port "tcp-ssh")
  • Don't specify protocol at all: Name it without a protocol prefix, e.g. "clients".

If you don't specify a protocol at all, then Istio has to make a choice.  Before protocol sniffing was a feature, Istio chose to treat this with layer 4 mode.  Protocol sniffing is a new behavior that says, "try reading some of it - if it looks like a protocol you know, treat it like that protocol".

An important note here is that this sniffing applies for both passive monitoring and active management.  Istio both collects metrics and applies routing and policy. This is important because if a passive system has a sniff failure, it results only in a degradation of monitoring - details for a request may be unavailable.  But if an active system has a sniff failure, it may misapply routing or policy; it could send a request to the wrong service.

Benefits of Protocol Sniffing

The biggest benefit of protocol sniffing is that you don't have to specify the protocols.  Any communication stream can be sniffed without human intervention. If it happens to be HTTP, you can get detailed metrics on it.

That removes a significant amount of configuration burden and reduces time-to-value for your first service mesh install.  Drop it in and instantly get HTTP metrics.

Protocol Sniffing Failure Modes

However, as with most things, there is a tradeoff.  In some cases, protocol sniffing can produce results that might surprise you.  This happens when the sniffer classifies a stream differently than you or some other system would.

False Positive Match

This occurs when a protocol happens to look like HTTP, but the administrator doesn't want it to be treated by the proxy as HTTP.

One way this can happen is if the apps are speaking some custom protocol where the beginning of the communication stream looks like HTTP, but it later diverges.  Once it diverges and is no longer conforming to HTTP, the proxy has already begun treating it as HTTP and now must terminate the connection. This is one of the differences between passive sniffers and active sniffers - a passive sniffer could simply "cancel" sniffing.

Behavior Change:

  • Without sniffing: Stream is considered Layer 4 and completes fine.
  • With sniffing: Stream is considered Layer 7, and then when it later diverges, the proxy closes the stream.

False Negative Match

This occurs when the client and server think they are speaking HTTP, but the sniffer decides it isn't HTTP.  In our case, that means the sniffer downgrades to Layer 4 mode. The proxy no longer applies Layer 7 policy (like Istio's HTTP Authorization) or collects Layer 7 telemetry (like request success/failure counts).

One case where this occurs is when the client and server are both technically violating a specification but in a way that they both understand.  A classic example in the HTTP space is line termination - technically, lines in HTTP must be terminated with a CRLF; two characters 0x0d 0x0a.  But most proxies and web servers will also accept HTTP where lines are only terminated with LF (just the 0x0a character), because some ancient clients and hacked-together UNIX tools just sent LFs.

That example is usually harmless but a riskier one is if a client can speak something that looks like HTTP, that the server will treat as HTTP, but the sniffer will downgrade.  This allows the client to bypass any Layer 7 policies the proxy would enforce. Istio currently applies sniffing to outbound traffic where the outbound target is unknown (often occurs for Egress traffic) or the outbound target is a service port without a protocol annotation.

Here's an example: I know of two non-standard behaviors that node.js' HTTP server framework allows.  The first is allowing extra spaces between the Request-URI and the HTTP-Version in the Request-Line. The second is allowing spaces in a Header field-name.  Here's an example with the weird parts highlighted:

If I send this to a node.js server, it accepts it as a valid HTTP request (for the curious, the extra whitespace in the request line is dropped, and the whitespace in the Header field-name is included so the header is named "x-foo   bar"). Node.js' HTTP parser is taken from nginx which also accepts the extra spaces. Nginx is pretty darn popular so other web frameworks and a lot of servers accept this. Interestingly, so does the HTTP parser in Envoy (but not the HTTP inspector).

Suppose I have a situation like this:  We just added a new capability to delete in-progress orders to the beta version of our service, so we want all DELETE requests to be routed to "foo-beta" and all other normal requests routed to "foo".  We might write an Istio VirtualService to route DELETE requests like this:

If I send a request like this, it is properly routed to foo-2.

But if I send one like this, I bypass the route and go to foo-1.  Oops!

This means that clients can choose to "step around" routing if they can find requests that trick the sniffer. If those requests aren't accepted by the server at the other end, it should be OK.  However, if they are accepted by the server, bad things can happen. Additionally, you won't be able to audit or detect this case because you won't have Jaeger traces or access logs from the proxy since it thought the request wasn't HTTP.

(We investigated this particular case and ran our results past the Envoy and Istio security vulnerability teams before publishing this blog. While it didn't rise to the level of security issue, we want it to be obvious to our users what the tradeoffs are. While the benefits of protocol sniffing may be worthwhile in many cases, most users will want to avoid protocol sniffing in security-sensitive applications.)

Behavior Change:

  • Without sniffing: Stream is Layer 7 and invalid requests are consistently rejected.
  • With sniffing: Some streams may be classified as Layer 4 and bypass Layer 7 routing or policy.

Recommendation

Protocol sniffing lessens the configuration burden to get started with Istio, but creates uncertainty about behaviors.  Because this uncertainty can be controlled by the client, it can be surprising or potentially hazardous. In production, I'd prefer to tell the proxy everything I know and have the proxy reject everything that doesn't look as expected.  Personally, I like my test environments to look like my prod environments ("Test Like You Fly") so I'm going to also avoid sniffing in test.

I would use protocol sniffing when I first dropped a service mesh into an evaluation scenario, when I'm at the stage of, "Let's kick the tires and see what this thing can tell me about my environment."

For this reason, Aspen Mesh recommends users don't rely on protocol sniffing in production.  All service ports should be declared with a name that specifies the protocol (things like "http-app" or "tcp-custom").  Our users will continue to receive "vet" warnings for service ports that don't comply, so they can be confident that their clusters will behave predictably.


Enterprise service mesh

Aspen Mesh Open Beta Makes Istio Enterprise-ready

As companies build modern applications, they are leveraging microservices to effectively build and manage them. As they do, they realize that they are increasing complexity and de-centralizing ownership and control. These new challenges require a new way to monitor, manage and control microservice-based applications at runtime.

A service mesh is an emerging pattern that is helping ensure resiliency and uptime - a way to more effectively monitor, control and secure the modern application at runtime. Companies are adopting Istio as their service mesh of choice as it provides a toolbox of different features that address various microservices challenges. Istio provide a solution to many challenges, but leaves some critical enterprise challenges on the table. Enterprises require additional features that address observability, policy and security. With this in mind, we have built new enterprise features into a platform that runs on top of Istio, to provide all the functionality and flexibility of open source, plus features, support and guarantees needed to power enterprise applications.

At KubeCon North America 2018, Aspen Mesh announced open beta. With Aspen Mesh you get all the features of Istio, plus:

Advanced Policy Features
Aspen Mesh provides RBAC capabilities you don’t get with Istio.

Configuration Vets
Istio Vet (an Aspen Mesh open source contribution to help ensure correct configuration of your mesh) is built into Aspen Mesh and you get additional features as part of Aspen Mesh that don’t come with open source Istio Vet.

Analytics and Alerting
The Aspen Mesh platform provides insights into key metrics (latency, error rates, mTLS status) and immediate alerts so you can take action to minimize MTTD/MTTR.

Multi-cluster/Multi-cloud
See multiple clusters that live in different clouds in a single place to see what’s going on in your microservice architecture through a single pane of glass.

Canary Deploys
Aspen Mesh Experiments lets you quickly test new versions of microservices so you can qualify new versions in a production environment without disrupting users.

An Intuitive UI
Get at-a-glance views of performance and security posture as well as the ability to see service details.

Full Support
Our team of Istio experts makes it easy to get exactly what you need out of service mesh. 

You can take advantage of these features for free by signing up for Aspen Mesh Beta access.


Technology

Why You Should Care About Istio Gateways

If you’re breaking apart the monolith, one of the huge advantages to using Istio to manage your microservices is that it enables a configuration that leverages an ingress model that is similar to traditional load balancers and application delivery controllers. In the load balancers world, Virtual IPs and Virtual Servers have long been used as concepts that enable operators to configure ingress traffic in a flexible and scalable manner (Lori Macvittie has some relevant thoughts on this).

In Istio Gateways control the exposure of services at the edge of the mesh. Gateways allow operators to specify L4-L6 settings like port and TLS settings. For L7 settings of the Ingress traffic Istio allows you to tie Gateways to VirtualServices. This separation makes it easy to manage traffic flow into the mesh in much the same way you would tie Virtual IPs to Virtual Servers in traditional load balancers. This enables users of legacy technologies to migrate to microservices in a seamless way. It’s a natural progression for teams that are used to monoliths and edge load balancers rather than an entirely new way to think about networking configuration.

One important thing to note is that routing traffic within a service mesh and getting external traffic into the mesh are a bit different. Within the mesh, you specify exceptions from normal traffic since Istio by default (compatibility with Kubernetes) allows everything to talk to everything once inside the mesh. It’s necessary to add a policy if you don’t want certain services communicating. Getting traffic into the mesh is a reverse proxy situation (similar to traditional load balancers) where you have to specify exactly what you want allowed in.

Earlier version of Istio leveraged Kubernetes Ingress resource, but Istio v1alpha3 APIs leverage Gateways for richer functionality as Kubernetes Ingress has proven insufficient for Istio applications. Kubernetes Ingress API merges specification for L4-6 and L7, which makes it difficult for different teams in organizations with separate trust domains (like SecOps and NetOps) to own Ingress traffic management. Additionally, the Ingress API is less expressive than the routing capability that Istio provides with Envoy. The only way to do advanced routing in Kubernetes Ingress API is to add annotations for different ingress controllers. Separate concerns and trust domains within an organization warrant the need for a more capable way to manage ingress, which is provided by Istio Gateways and VirtualServices.

Once the traffic is into the mesh, it’s good to be able to provide a separation of concerns for VirtualServices so different teams can manage traffic routing for their services. L4-L6 specifications are generally something that SecOps or NetOps might be concerned with. L7 specifications are of most concern to cluster operators or application owners. So it’s critical to have the right separation of concerns. As we believe in the power of team based responsibilities, we think this is an important capability. And since we believe in the power of Istio, we are submitting an RFE in the Istio community which we hope will help enable ownership semantics for traffic management within the mesh.

We’re excited Istio has hit 1.0 and are excited to keep contributing to the project and community as it continues to evolve.

Originally published at https://thenewstack.io/why-you-should-care-about-istio-gateways/