Service Mesh Insider: An Interview with Shawn Wormke

Have you ever wondered how we got to service mesh? What backgrounds, experiences and technologies led to the emergence of service mesh? 

We recently put together an interview with Aspen Mesh’s Founder, Shawn Wormke in order to get the inside scoop for you. Read on to find out the answers to these three questions:

  1. What role did your technical expertise play in how Aspen Mesh focuses on enterprise service mesh?
  2. Describe how your technical and business experience combined to create an enterprise startup and inform your understanding of how to run a “modern” software company?
  3. What characteristics define a “modern” enterprise, and how does Aspen Mesh contribute to making it a reality?

1. What role did your technical expertise play in how Aspen Mesh focuses on the enterprise?

I started my career at Cisco working in network security and firewalls on the ASA product line and later the Firewall Services Module for the Catalyst 6500/7600 series switches and routers. Both of these products were focused on the enterprise at a time when security was starting to move up the stack and become more and more distributed throughout the network. We were watching our customers move from L2 transparent firewalls to L3/L4 firewalls that required application logic in order to “fixup” dynamic ports for protocols like FTP, SIP and H.323. Eventually that journey up the stack continued to L7 firewalls that were doing URL, header and payload inspection to enforce security policy.

At the same time that this move up the stack was happening, customers were starting to look at migrating workloads to VMs and were demanding new form factors and valuing different performance metrics. No longer were speeds, feeds and dragstrip numbers important, the focus was shifting to footprint and elasticity. The result in this shift in priority was a change in mindset when it came to how enterprises were thinking about expenses. They started to think about shifting expenses from large capacity stranding CAPEX purchases to more frequent OPEX transactions that were aligned with a software-first approach.

It was this idea that led me to join as one of the first engineers at a small startup in Boulder, CO called LineRate Systems which was eventually acquired by F5 Networks. The company was founded on a passion for making high performance, lightweight application delivery (aka load balancing) software that was as fast as the industry standard hardware. Our realization was that Commodity Off the Shelf (COTS) hardware had so much performance that if leveraged properly it was possible to offer the same performance at a much lower cost.

But the big idea, the one that ultimately got us noticed by F5, was that if the hardware was freely available (everyone had racks and racks of servers), we could charge our customers for a performance range and let them stamp out the software--as much as they needed--to achieve that. This removed the risk of the transaction from the customer as they no longer had to pre-plan 3-5 years worth of capacity.  It placed the burden on the provider to deliver an efficient and API-first elastic platform and a pricing model that scaled along the same dimensions as their business needs.

After acquisition we started to use containers and eventually Kubernetes for some of our build and test infrastructure. The use of these technologies led us to realize that they were great for increasing velocity and agility, but were difficult to debug and secure. We had no record of what our test containers did or who they talked to at runtime and we had no idea what data they were accessing. If we had a way to make sense of all of this, life would be so much easier.

This led us to work on some internal projects that experimented with ideas that we all now know as service mesh. We even released a product that was the beginning of this called the Application Services Proxy, which we ultimately end-of-lifed in 2017 when we made the decision to create Aspen Mesh.

In 2018 Aspen Mesh was born as an F5 Incubation. It is a culmination of almost 20 years of solving network and security problems for some of the world's largest enterprise customers and ensuring that the form-factor, consumption and pricing models are flexible and grow along with the businesses that use it. It is acknowledgement that disruption is happening everywhere and that an organization’s agility and ability to respond to disruption is it's number one business asset. Companies are realizing this agility by redefining how they deliver value to their customers as quickly as possible using technologies like cloud, containers and Kubernetes.

We know that for enterprises, agility with stability is the number one competitive advantage. Through years of experience working on enterprise products we know that companies who can meet their evolving customer needs--while staying out of the news for downtime and security breaches--will be the winners of tomorrow. Aspen Mesh’s Enterprise Service Mesh enables enterprises to rapidly deliver value to their customers in a performant, secure and compliant way.

2. Describe how your technical and learned business experience combine to build an enterprise startup and inform your understanding of how best to run a “modern” software company?

Throughout my career I have been part of waterfall to agile transformations, worked on products that enabled business agility and now run a team that requires that same flexibility and business agility. We need to be focused on getting product to market that shows value to our customers as quickly as possible. We rely on automation to ensure that we are focusing our human capital on the most important tasks. We rely on data to make our decisions and ensure that the data we have is trustworthy and secure.

The great thing is that we get to be the ones doing the disrupting, and not the ones getting disrupted. What this means is we get to move fast and don’t have the burden of a large enterprise decision-making process. We can be agile and make mistakes, and we are actually expected to make mistakes. We are told "no" more than we are told "yes." But, learning from those failures and making course corrections along the way is key to our success.

Over the years I have come to embrace the power of open source and what it can do to accelerate projects and the impacts (both positive and negative) it can have on your company. I believe that in the future all applications will be born from open technologies. Companies that acknowledge and embrace this will be the most successful in the new cloud-native and open source world. How you choose to do that depends on your business and business model. You can be a consumer of OSS in your SaaS platform, an open-core product, glue multiple projects together to create a product or provide support and services; but if you are not including open source in your modern software company, you will not be successful.

Over the past 10 years we have seen and continue to see consumption models across all verticals rapidly evolve from perpetual NCR-based sales models with annual maintenance contracts to subscription or consumption based models to fully managed SaaS based offerings. I recently read an article on subscription based banking. This is driven from the desire to shift the risk to the producer instead of the consumer. It is a realization by companies that capacity planning for 3-5 years is impossible, and that laying out that cash is a huge risk to the business they are no longer willing to take. It is up to technology producers to provide enough value to attract customers and then continue providing value to them to retain them year over year.

Considering how you are going to offer your product in a way that scales with your customers value matrix and growth plans is critical. This applies to pricing as well as product functionality and performance.

Finally, I would be negligent if I didn’t mention data as a paramount consideration when running a modern software company. Insights derived from that data need to be at the center of everything you do. This goes not only for your product, but also your internal visibility and decision making processes. 

On the product side, when dealing with large enterprises it is critical to understand what your customers are willing to give you and how much value they need to realize in return. An enterprise's first answer will often be “no” when you tell them you need to access their data to run your product, but that just means you haven’t shown them enough value to say "yes." You need to consider what data you need, how much you need of it, where it will be stored and how you are protecting it.

On the internal side you need to measure everything. The biggest challenge I have found with an early-stage, small team is taking the time to enable these measurements. It is easy to drop off the list when you are trying to get new features out the door and you don’t yet know what you're going to do with the data. Resist that urge and force your teams to think about how they can do both, and if necessary take the time to slow down and put it in. Sometimes being thoughtful early on can help you go fast later, and adding hooks to gather and analyze data is one of those times.

Operating a successful modern software company requires you embrace all the cliches about wearing multiple hats and failing fast. It's also critical to focus on being agile, embrace open source, create a consumption based offering, and rely on data, data, data and more data.

3. What characteristics define a “modern” enterprise, and how does Aspen Mesh contribute to making it a reality?

The modern enterprise craves agility and considers it to be their number one business advantage. This agility is what allows the enterprise to deliver value to customers as quickly as possible. This agility is often derived from a greater reliance on technology to enable rapid speed to market. Enterprises are constantly defending against disruption from non-traditional startup companies with seemingly unlimited venture funding and no expectation of profitability. All the while the enterprise is required to compete and deliver value while maintaining the revenue and profitability goals that their shareholders have grown to expect over years of sustained growth. 

In order to remain competitive, enterprises are embracing new business models and looking for new ways to engage their customers through new digital channels. They are relying more on data and analytics to make business decisions and to make their teams and operations more efficient. Modern enterprises are embracing automation to perform mundane repetitive tasks and are turning over their workforce to gain the technical talent that allows them to compete with the smaller upstart disruptors in their space.

But agility without stability can be detrimental to an enterprise. As witnessed by many recent reports, enterprises can struggle with challenges around data and data security, perimeter breaches and downtime. It's easy to get caught up in the promise of the latest new technology, but moving fast and embracing new technology requires careful consideration for how it integrates into your organization, it's security posture and how it scales with your business. Finding a trusted partner to accompany you on your transformation journey is key to long term success.

Aspen Mesh is that technology partner when it comes to delivering next generation application architectures based on containers and Kubernetes. We understand the power and promise of agility and scalability that these technologies offer, but we also know that they introduce a new set of challenges for enterprises. These challenges include securing communication between services, observing and controlling service behavior and problems and managing the policy associated with services across large distributed organizations. 

Aspen Mesh provides a fully supported service mesh that is focused on enterprise use cases that include:

  • An advanced policy framework that allows users to describe business goals that are enforced in the application’s runtime environment
  • Role based policy management that enables organizations to create and apply policies according to their needs
  • A catalog of policies based on industry and security best practices that are created and tested by experts
  • Data analytics-based insights for enhanced troubleshooting and debugging
  • Predictive analytics to help teams detect and mitigate problems before they happen
  • Streamlined application deployment packages that provide a uniform approach to authentication and authorization, secure communications, and ingress and egress control
  • DevOps tools and workflow integration
  • A simplified user experience with improved organization and streamlined navigation to enable users to quickly find and mitigate failures and security issues
  • A consistent view of applications across multiple clouds to allow visibility from a global perspective to a hyper-local level
  • Graph visualizations of application relationships that enable teams to collaborate seamlessly on focused subsets of their infrastructure
  • Tabular representations surfacing details to find and remediate issues across multiple clusters running dozens or hundreds of services
  • A reduced-risk scalable consumption model that allows customers to pay as they grow

Thanks for reading! We hope that helps shed some light on what goes on behind the scenes at Aspen Mesh. And if you liked this post, feel free to subscribe to our blog in order to get updates when new articles are released.


Understanding Service Mesh

The Origin of Service Mesh

In the beginning, we had packets and packet-switched networks.

Everyone on the Internet — all 30 of them — used packets to build addressing, session establishment/teardown. And then, they’d need a retransmission scheme. Then, they’d build an ordered byte stream out of it.

Eventually, they realized they had all built the same thing. The RFCs for IP and TCP standardized this, operating systems provided a TCP/IP stack, so no application ever had to turn a best-effort packet network into a reliable byte stream.

We took our reliable byte streams and used them to make applications. Turns out that a lot of those applications had common patterns again — they requested things from servers, and then got responses. So, we separated these request/responses into metadata (headers) and body.

HTTP standardized the most widely deployed request/response protocol. Same story. App developers don't have to implement the mechanics of requests and responses. They can focus on the app on top.

There's a newer set of functionality that you need to build a reliable microservices application. Service discovery, versioning, zero-trust.... all the stuff popularized by the Netflix architecture, by 12-factor apps, etc. We see the same thing happening again - an emerging set of best practices that you have to build into each microservice to be successful.

So, service mesh is about putting all that functionality again into a layer, just like HTTP, TCP, packets, that's underneath your code, but creating a network for services rather than bytes.

Questions? Download The Complete Guide to Service Mesh or keep reading to find out more about what exactly a service mesh is.

What Is A Service Mesh?

A service mesh is a transparent infrastructure layer that sits between your network and application.

It’s designed to handle a high volume of service-to-service communications using application programming interfaces (APIs). A service mesh ensures that communication among containerized application services is fast, reliable and secure.

The mesh provides critical capabilities including service discovery, load balancing, encryption, observability, traceability, authentication and authorization, and the ability to control policy and configuration in your Kubernetes clusters.

Service mesh helps address many of the challenges that arise when your application is being consumed by the end user. Being able to monitor what services are communicating with each other, if those communications are secure, and being able to control the service-to-service communication in your clusters are key to ensuring applications are running securely and resiliently.

More Efficiently Managing Microservices

The self-contained, ephemeral nature of microservices comes with some serious upside, but keeping track of every single one is a challenge — especially when trying to figure out how the rest are affected when a single microservice goes down. The end result is that if you’re operating or developing in a microservices architecture, there’s a good chance part of your days are spent wondering what the hell your services are up to.

With the adoption of microservices, problems also emerge due to the sheer number of services that exist in large systems. Problems like security, load balancing, monitoring and rate limiting that had to be solved once for a monolith, now have to be handled separately for each service.

Service mesh helps address many of these challenges so engineering teams, and businesses, can deliver applications more quickly and securely.

Why You Might Care

If you’re reading this, you’re probably responsible for making sure that you and your end users get the most out of your applications and services. In order to do that, you need to have the right kind of access, security and support. That’s probably why you started down the microservices path.

If that’s true, then you’ve probably realized that microservices come with their own unique challenges, such as:

  1. Increased surface area that can be attacked
  2. Polyglot challenges
  3. Controlling access for distributed teams developing on a single application

That’s where a service mesh comes in.

A service mesh is an infrastructure layer for microservices applications that can help reduce the complexity of managing microservices and deployments by handling infrastructure service communication quickly, securely and reliably. 

Service meshes are great at solving operational challenges and issues when running containers and microservices because they provide a uniform way to secure, connect and monitor microservices. 

Here’s the point: a good service mesh keeps your company’s services running they way they should. A service mesh designed for the enterprise, like Aspen Mesh, gives you all the observability, security and traffic management you need — plus access to engineering and support, so you can focus on adding the most value to your business.

And that is good news for DevOps.

The Rise of DevOps - and How Service Mesh Is Enabling It

It’s happening, and it’s happening fast.

Companies are transforming internal orgs and product architectures along a new axis of performance. They’re finding more value in iterations, efficiency and incremental scaling, forcing them to adopt DevOps methodologies. This focus on time-to-market is driving some of the most cutting-edge infrastructure technology that we have ever seen. Technologies like containers and Kubernetes, and a focus on stable, consistent and open APIs allow small teams to make amazing progress and move at the speeds they require. These technologies have reduced the friction and time to market.

The adoption of these technologies isn’t perfect, and as companies deploy them at scale, they realize that they have inadvertently increased complexity and de-centralized ownership and control. In many cases, it’s challenging to understand the entire system.

A service mesh enables DevOps teams by helping manage this complexity. It provides autonomy and freedom for development teams through a stable and scalable platform, while simultaneously providing a way for platform teams to enforce security, policy and compliance standards.

This empowers your development teams to make choices based on the problems they are solving rather than being concerned with the underlying infrastructure. Dev teams now have the freedom to deploy code without the fear of violating compliance or regulatory guidelines, and platform teams can put guardrails in place to ensure your applications are secure and resilient.

Want to learn more? Get the Complete Guide to Service Mesh here.


From NASA to Service Mesh

The New Stack recently published a podcast featuring our CTO, Andrew Jenkins discussing How Service Meshes Found a Former Space Dust Researcher. In the podcast, Andrew talks about how he moved from working on electrical engineering and communication protocols for NASA to software and finally service mesh development here at Aspen Mesh.

“My background is in electrical engineering, and I used to work a lot more on the hardware side of it, but I did get involved in communication, almost from the physical layer, and I worked on some NASA projects and things like that,” said Jenkins. “But then my career got further and further up into the software side of things, and I ended up at a company called F5 Networks. [Eventually] this ‘cloud thing’ came along, and F5 started seeing a lot of applications moving to the cloud. F5 offers their product in a version that you use in AWS, so what I was working on was an open source project to make a Kubernetes ingress controller for the F5 device. That was successful, but what we saw was that a lot of the traffic was shifting to the inside of the Kubernetes cluster. It was service-to-service communication from all these tiny things--these microservices--that were designed to be doing business logic. So this elevated the importance of communication...and that communication became very important for all of those tiny microservices to work together to deliver the final application experience for developers. So we started looking at that microservice communication inside and figuring out ways to make that more resilient, more secure and more observable so you can understand what’s going on between your applications.”

In addition, the podcast covers the evolution of service mesh, more details about tracing and logging, canaries, Kubernetes, YAML files and other surrounding technologies that extend service mesh to help simplify microservices management.

“I hope service meshes become the [default] way to deal with distributed tracing or certificate rotation. So, if you have an application, and you want it to be secure, you have to deal with all these certs, keys, etc.,” Jenkins said. “It’s not impossible, but when you have microservices, you do not have to do it a whole lot more times. So that’s why you get this better bang for the buck by pushing that down into that service mesh layer where you don’t have to repeat it all the time.”

To listen to the entire podcast, visit The News Stack’s post.

Interested in reading more articles like this? Subscribe to the Aspen Mesh blog:


The Complete Guide to Service Mesh

What’s Going On In The Service Mesh Universe?

Service meshes are relatively new, extremely powerful and can be complex. There’s a lot of information out there on what a service mesh is and what it can do, but it’s a lot to sort through. Sometimes, it’s helpful to have a guide. If you’ve been asking questions like “What is a service mesh?” “Why would I use one?” “What benefits can it provide?” or “How did people even come up with the idea for service mesh?” then The Complete Guide to Service Mesh is for you.

Check out the free guide to find out:

  • The service mesh origin story
  • What a service mesh is
  • Why developers and operators love service mesh
  • How a service mesh enables DevOps
  • Problems a service mesh solves

The Landscape Right Now

A service mesh overlaps, complements, and in some cases, replaces many tools that are commonly used to manage microservices. Last year was all about evaluating and trying out service meshes. But while curiosity about service mesh is still at a peak, enterprises are already in the evaluation and adoption process.

The capabilities service mesh can add to ease managing microservices applications at runtime are clearly exciting to early adopters and companies evaluating service mesh. Conversations tell us that many enterprises are already using microservices and service mesh, and many others are planning to deploy in the next six months. And if you’re not yet sure about whether or not you need a service mesh, check out the recent Gartner, 451 and IDC reports on microservices — all of which say a service mesh will be mandatory by 2020 for any organization running microservices in production.

Get Started with Service Mesh

Are you already using Kubernetes and Istio? You might be ready to get started using a service mesh. Download Aspen Mesh here or contact us to talk with a service mesh expert about getting set up for success.

Get the Guide

Fill out the form below to get your copy of The Complete Guide to Service Mesh.


Expanding Service Mesh Without Envoy

Istio uses the Envoy sidecar proxy to handle traffic within the service mesh.  The following article describes how to use an external proxy, F5 BIG-IP, to integrate with an Istio service mesh without having to use Envoy for the external proxy.  This can provide a method to extend the service mesh to services where it is not possible to deploy an Envoy proxy.

This method could be used to secure a legacy database to only allow authorized connections from a legacy app that is running in Istio, but not allow any other applications to connect.

Securing Legacy Protocols

A common problem that customers face when deploying a service mesh is how to restrict access to an external service to a limited set of services in the mesh.  When all services can run on any nodes it is not possible to restrict access by IP address (“good container” comes from the same IP as “malicious container”).

One method of securing the connection is to isolate an egress gateway to a dedicated node and restrict traffic to the database from those nodes.  This is described in Istio’s documentation:

Istio cannot securely enforce that all egress traffic actually flows through the egress gateways. Istio only enables such flow through its sidecar proxies. If attackers bypass the sidecar proxy, they could directly access external services without traversing the egress gateway. Thus, the attackers escape Istio’s control and monitoring. The cluster administrator or the cloud provider must ensure that no traffic leaves the mesh bypassing the egress gateway.

   -- https://istio.io/docs/examples/advanced-gateways/egress-gateway/#additional-security-considerations (2019-03-25)

Another method would be to use mesh expansion to install Envoy onto the VM that is hosting your database. In this scenario the Envoy proxy on the database server would validate requests prior to forwarding them to the database.

The third method that we will cover will be to deploy a BIG-IP to act as an egress device that is external to the service mesh.  This is a hybrid of mesh expansion and multicluster mesh.

Mesh Expansion Without Envoy

Under the covers Envoy is using mutual TLS to secure communication between proxies.  To participate in the mesh, the proxy must use certificates that are trusted by Istio; this is how VM mesh expansion and multicluster service mesh are configured with Envoy.  To use an alternate proxy we need to have the ability to use certificates that are trusted by Istio.

Example of Extending Without Envoy

A proof-of-concept of extending the mesh can be taken with the following example.  We will create an “echo” service that is TCP based that will live outside of the service mesh.  The goal will be to restrict access to only allow authorized “good containers” to connect to the “echo” service via the BIG-IP.  The steps involved.

  1. Retrieve/Create certificates trusted by Istio
  2. Configure external proxy (BIG-IP) to use trusted certificates and only trust Istio certificates
  3. Add policy to external proxy to only allow “good containers” to connect
  4. Register BIG-IP device as a member of the Istio service mesh
  5. Verify that “good container” can connect to “echo” and “bad container” cannot

First we install a set of certificates on the BIG-IP that Envoy will trust and configure the BIG-IP to only allow connections from Istio.  The certs could either be pulled directly from Kubernetes (similar to setting up mesh expansion) or generated by a common CA that is trusted by Istio (similar to multicluster service mesh).

Once the certs are retrieved/generated we install them onto the proxy, BIG-IP, and configure the device to only trust client side certificates that are generated by Istio.

To enable a policy to validate the identity of the “good container” we will inspect the X509 Subject Alternative Name fields of the client certificate to inspect the spiffe name that contains the identity of the container.

Once the external proxy is configured we can register the device using “istioctl register” (similar to mesh expansion).

To verify that our test scenario is working we will have two namespaces “default” and “trusted”.  Connections from “trusted” will be allowed and “default” will be reject.  From each namespace we create a pod and run the command “nc bigip.default.svc.cluster.local 9000”.  Looking at our BIG-IP logs we can verify that our policy (iRule) worked:

Mar 25 18:56:39 ip-10-1-1-7 info tmm5[17954]: Rule /Common/log_cert <CLIENTSSL_CLIENTCERT>: allowing: spiffe://cluster.local/ns/trusted/sa/sleep
Mar 25 18:57:00 ip-10-1-1-7 info tmm2[17954]: Rule /Common/log_cert <CLIENTSSL_CLIENTCERT>: rejecting spiffe://cluster.local/ns/default/sa/default

Connection from our “good container”

/ # nc bigip.default.svc.cluster.local
9000
hi
HI

Connection from our “bad container”

# nc bigip.default.svc.cluster.local 9000

In the case of the “bad container” we are unable to connect.  The “nc”, netcat, command is simulating a very basic TCP client.  A more realistic example would be connecting to an external database that contains sensitive data.  In the “good” example we are echo’ing back the capitalized input (“hi” becomes “HI”).

Just One Example

In this article we looked at expanding a service mesh without Envoy.  This was focused on egress TCP traffic, but it could be expanded to:

  • Using BIG-IP as an SNI proxy instead of NGINX
  • Securing inbound traffic using mTLS and/or JWT tokens
  • Using BIG-IP as an ingress gateway
  • Using ServiceEntry/DestinationRules instead of registered service

If you want to see the process in action, check out this short video walkthrough.

https://youtu.be/83GdmwTvWLI

Let me know in the comments whether you’re interested in any of these use-cases or come-up with your own.  Thank you!


Why Service Meshes, Orchestrators Are Do or Die for Cloud Native Deployments

The self-contained, ephemeral nature of microservices comes with some serious upside, but keeping track of every single one is a challenge, especially when trying to figure out how the rest are affected when a single microservice goes down. The end result is that if you’re operating or developing in a microservices architecture, there’s a good chance part of your days are spent wondering what the hell your services are up to.

With the adoption of microservices, problems also emerge due to the sheer number of services that exist in large systems. Problems like security, load balancing, monitoring and rate limiting that had to be solved once for a monolith, now have to be handled separately for each service.

The good news is that engineers love a good challenge. And almost as quickly as they are creating new problems with microservices, they are addressing those problems with emerging microservices tools and technology patterns. Maybe the emergence of microservices is just a smart play by engineers to ensure job security.

Today’s cloud native darling, Kubernetes, eases many of the challenges that come with microservices. Auto-scheduling, horizontal scaling and service discovery solve the majority of build-and-deploy problems you’ll encounter with microservices.

What Kubernetes leaves unsolved is a few key containerized application runtime issues. That’s where a service mesh steps in. Let’s take a look at what Kubernetes provides, and how Istio adds to Kubernetes to solve the microservices runtime issues.

Kubernetes Solves Build-and-Deploy Challenges

Managing microservices at runtime is a major challenge. A service mesh helps alleviate this challenge by providing observability, control and security for your containerized applications. Aspen Mesh is the fully supported distribution of Istio that makes service mesh simple and enterprise-ready.

Kubernetes supports a microservice architecture by enabling developers to abstract away the functionality of a set of pods, and expose services to other developers through a well-defined API. Kubernetes enables L4 load balancing, but it doesn’t help with higher-level problems, such as L7 metrics, traffic splitting, rate limiting and circuit breaking.

Service Mesh Addresses Challenges of Managing Traffic at Runtime

Service mesh helps address many of the challenges that arise when your application is being consumed by the end user. Being able to monitor what services are communicating with each other, if those communications are secure and being able to control the service-to-service communication in your clusters are key to ensuring applications are running securely and resiliently.

Istio also provides a consistent view across a microservices architecture by generating uniform metrics throughout. It removes the need to reconcile different types of metrics emitted by various runtime agents, or add arbitrary agents to gather metrics for legacy un-instrumented apps. It adds a level of observability across your polyglot services and clusters that is unachievable at such a fine-grained level with any other tool.

Istio also adds a much deeper level of security. While Kubernetes only provides basic secret distribution and control-plane certificate management, Istio provides mTLS capabilities so you can encrypt on the wire traffic to ensure your service-to-service communications are secure.

A Match Made in Heaven

Pairing Kubernetes with a service mesh-like Istio gives you the best of both worlds and since Istio was made to run on Kubernetes, the two work together seamlessly. You can use Kubernetes to manage all of your build and deploy needs and Istio takes care of the important runtime issues.

Kubernetes has matured to a point that most enterprises are using it for container orchestration. Currently, there are 74 CNCF-certified service providers — which is a testament to the fact that there is a large and growing market. I see Istio as an extension of Kubernetes and a next step to solving more challenges in what feels like a single package.

Already, Istio is quickly maturing and is starting to see more adoption in the enterprise. It’s likely that in 2019 we will see Istio emerge as the service mesh standard for enterprises in much the same way Kubernetes has emerged as the standard for container orchestration.


API Gateway vs Service Mesh

API Gateway vs Service Mesh

One of the recurring questions we get when talking to people about a service mesh is, "How is it different from an API gateway?" It's a good question. The overlap between API gateway and service mesh patterns is significant. They can both handle service discovery, request routing, authentication, rate limiting and monitoring. But there are differences in architectures and intentions. A service mesh's primary purpose is to manage internal service-to-service communication, while an API Gateway is primarily meant for external client-to-service communication.

API Gateway and service mesh

API Gateway and Service Mesh: Do You Need Both?

You may be wondering if you need both an API gateway and a service mesh. Today you probably do, but as service mesh evolves, we believe it will incorporate much of what you get from an API gateway today.

The main purpose of an API gateway is to accept traffic from outside your network and distribute it internally. The main purpose of a service mesh is to route and manage traffic within your network. A service mesh can work with an API gateway to efficiently accept external traffic then effectively route that traffic once it's in your network. The combination of these technologies can be a powerful way to ensure application uptime and resiliency, while ensuring your applications are easily consumable.

In a deployment with an API gateway and a service mesh, incoming traffic from outside the cluster would first be routed through the API gateway, then into the mesh. The API gateway could handle authentication, edge routing and other edge functions, while the service mesh provides fine-grained observability of and control of your architecture.

The interesting thing to note is that service mesh technologies are quickly evolving and are starting to take on some of the functions of an API gateway. A great example is the introduction of the Istio v1alpha3 routing API which is available in Aspen Mesh 1.0. Prior to this, Istio had used Kubernetes ingress control which is pretty basic so it made sense to use an API gateway for better functionality. But, the increased functionality introduced by the v1alpha3 API has made it easier to manage large applications and to work with with protocols other than HTTP, which was previously something an API gateway was needed to do effectively.

What The Future Holds

The v1alpha3 API provides a good example of how a service mesh is reducing the need for API gateway capabilities. As the cloud native space evolves and more organizations move to using Docker and Kubernetes to manage their microservice architectures, it seems highly likely that service mesh and API gateway functionality will merge. In the next few years, we believe that standalone API gateways will be used less and less as much of their functionality will be absorbed by service mesh.

If you have any questions about service mesh along the way, feel free to reach out.


The Road Ahead for Service Mesh

This is the third in a blog series covering how we got to a service meshwhy we decided on the type of mesh we did and where we see the future of the space.

If you’re struggling to manage microservices as architectures continue to become more complex, there’s a good chance you’ve at least heard of service mesh. For the purposes of this blog, I’ll assume you’re familiar with the basic tenets of a service mesh.

We believe that service mesh is advancing microservice communication to a new level that is unachievable with the one-off solutions that were previously being used. Things like DNS provide some capabilities like service discovery, but don’t provide fast retries, load balancing, tracing and health monitoring. The old approach also requires that you cobble together several things each time when it’s possible to bundle it all together in a reusable tool.

While it’s possible to accomplish much of what a service mesh manages with individual tools and processes, it’s manual and time consuming. The images below provides a good idea of how a mesh simplifies the management of microservices.

 

 

Right Around the Corner

So what’s in the immediate future? I think we’ll see the technology quickly mature and add more capabilities as standard features in response to enterprises realizing the efficiency gains created by a mesh and look to implement them as the standard for managing microservice architectures. Offerings like Istio are not ready for production deployments, but the roadmap is progressing quickly and it seems we’ll be to v1 in short order. Security is a feature provided by service mesh, but for most enterprises it’s a major consideration and I see policy enforcement and monitoring options becoming more robust for enterprise production deployments. A feature I see on the near horizon and one that will provide tremendous value is an analytics platform to show insights from the huge amount of telemetry data in a service mesh. I think an emerging value proposition we’ll see is that the mesh allows you to gain and act on data that will allow you to more efficiently manage your entire architecture.

Further Down the Road

There is a lot of discussion on what’s on the immediate horizon for service mesh, but what is more interesting is considering what the long term will bring. My guess is that we ultimately come to a mesh being an embedded value add in a platform. Microservices are clearly the way of the future, so organizations are going to demand an effortless way to manage them. They’ll want something automated, running in the background that never has to be thought about. This is probably years down the road, but I do believe service mesh will eventually be a ubiquitous technology that is a fully managed plug and play config. It will be interesting to see new ways of using the technology to manage infrastructure, services and applications.

We’re excited to be part of the journey, and are inspired by the ideas in the Istio community and how users are leveraging service mesh to solve direct problems created by the explosion of microservices and also find new efficiencies with it. Our goal is to make the implementation of a mesh seamless with your existing technology and provide enhanced features, knowledge and support to take the burden out of managing microservices. We’re looking forward to the road ahead and would love to work with you to make your microservices journey easier.


Service Mesh Architectures

If you are building your software and teams around microservices, you’re looking for ways to iterate faster and scale flexibly. A service mesh can help you do that while maintaining (or enhancing) visibility and control. In this blog, I’ll talk about what’s actually in a Service Mesh and what considerations you might want to make when choosing and deploying one.

So, what is a service mesh? How is it different from what’s already in your stack? A service mesh is a communication layer that rides on top of request/response unlocking some patterns essential for healthy microservices. A few of my favorites:

  • Zero-trust security that doesn’t assume a trusted perimeter
  • Tracing that shows you how and why every microservice talked to another microservice
  • Fault injection and tolerance that lets you experimentally verify the resilience of your application
  • Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing

Why a New Term?

Looking at that list, you may think “I can do all of that without a Service Mesh”, and you’re correct. The same logic applies to sliding window protocols or request framing. But once there’s an emerging standard that does what you want, it’s more efficient to rely on that layer instead of implementing it yourself. Service Mesh is that emerging layer for microservices patterns.

Service mesh is still nascent enough that codified standards have yet to emerge, but there is enough experience that some best practices are beginning to become clear. As the the bleeding-edge leaders develop their own approaches, it is often useful to compare notes and distill best practices. We’ve seen Kubernetes emerge as the standard way to run containers for production web applications. My favorite standards are emergent rather than forced: It’s definitely a fine art to be neither too early nor too late to agree on common APIs, protocols and concepts.

Think about the history of computer networking. After the innovation of best-effort packet-switched networks, we found out that many of us were creating virtual circuits over them - using handshaking, retransmission and internetworking to turn a pile of packets into an ordered stream of bytes. For the sake of interoperability and simplicity, a “best practice” stream-over-packets emerged: TCP (the Introduction of RFC675 does a good job of explaining what it layers on top of). There are alternatives - I’ve used the Licklider Transmission Protocol in space networks where distributed congestion control is neither necessary nor efficient. Your browser might already be using QUIC. Standardizing on TCP, however, freed a generation of programmers from fiddling with implementations of sliding windows, retries, and congestion collapse (well, except for those packetheads that implemented it).

Next, we found a lot of request/response protocols running on top of TCP. Many of these eventually migrated to HTTP (or sequels like HTTP/2 or gRPC). If you can factor your communication into “method, metadata, body”, you should be looking at an HTTP-like protocol to manage framing, separate metadata from body, and address head-of-line blocking. This extends beyond just browser apps - databases like Mongo provide HTTP interfaces because the ubiquity of HTTP unlocks a huge amount of tooling and developer knowledge.

You can think about service mesh as being the lexicon, API and implementation around the next tier of communication patterns for microservices.

OK, so where does that layer live? You have a couple of choices:

  • In a Library that your microservices applications import and use.
  • In a Node Agent or daemon that services all of the containers on a particular node/machine.
  • In a Sidecar container that runs alongside your application container.

Library

The library approach is the original. It is simple and straightforward. In this case, each microservice application includes library code that implements service mesh features. Libraries like Hystrix and Ribbon would be examples of this approach.

This works well for apps that are exclusively written in one language by the teams that run them (so that it’s easy to insert the libraries). The library approach also doesn’t require much cooperation from the underlying infrastructure - the container runner (like Kubernetes) doesn’t need to be aware that you’re running a Hystrix-enhanced app.

There is some work on multilanguage libraries (reimplementations of the same concepts). The challenge here is the complexity and effort involved in replicating the same behavior over and over again.

We see very limited adoption of the library model in our user base because most of our users are running applications written in many different languages (polyglot), and are also running at least a few applications that aren’t written by them so injecting libraries isn’t feasible.

This model has an advantage in work accounting: the code performing work on behalf of the microservice is actually running in that microservice. The trust boundary is also small - you only have to trust calling a library in your own process, not necessarily a remote service somewhere out over the network. That code only has as many privileges as the one microservice it is performing work on behalf of. That work is also performed in the context of the microservice, so it’s easy to fairly allocate resources like CPU time or memory for that work - the OS probably does it for you.

Node Agent

The node agent model is the next alternative. In this architecture, there’s a separate agent (often a userspace process) running on every node, servicing a heterogenous mix of workloads. For purposes of our comparison, it’s the opposite of the library model: it doesn’t care about the language of your application but it serves many different microservice tenants.

Linkerd’s recommended deployment in Kubernetes works like this. As do F5’s Application Service Proxy (ASP) and the Kubernetes default kube-proxy.

Since you need one node agent on every node, this deployment requires some cooperation from the infrastructure - this model doesn’t work without a bit of coordination. By analogy, most applications can’t just choose their own TCP stack, guess an ephemeral port number, and send or receive TCP packets directly - they delegate that to the infrastructure (operating system).

Instead of good work accounting, this model emphasizes work resource sharing - if a node agent allocates some memory to buffer data for my microservice, it might turn around and use that buffer for data for your service in a few seconds. This can be very efficient, but there’s an avenue for abuse. If my microservice asks for all the buffer space, the node agent needs to make sure it gives your microservice a shot at buffer space first. You need a bit more code to manage this for each shared resource.

Another work resource that benefits from sharing is configuration information. It’s cheaper to distribute one copy of the configuration to each node, than to distribute one copy of the configuration to each pod on each node.

A lot of functionality that containerized microservices rely on are provided by a Node Agent or something topologically equivalent. Think about kubelet initializing your pod, your favorite CNI daemon like flanneld, or stretching your brain a bit, even the operating system kernel itself as following this node agent model.

Sidecar

Sidecar is the new kid on the block. This is the model used by Istio with Envoy. Conduit also uses a sidecar approach. In Sidecar deployments, you have one adjacent container deployed for every application container. For a service mesh, the sidecar handles all the network traffic in and out of the application container.

This approach is in between the library and node agent approaches for many of the tradeoffs I discussed so far. For instance, you can deploy a sidecar service mesh without having to run a new agent on every node (so you don’t need infrastructure-wide cooperation to deploy that shared agent), but you’ll be running multiple copies of an identical sidecar. Another take on this: I can install one service mesh for a group of microservices, and you could install a different one, and (with some implementation-specific caveats) we don’t have to coordinate. This is powerful in the early days of service mesh, where you and I might share the same Kubernetes cluster but have different goals, require different feature sets, or have different tolerances for bleeding-edge vs. tried-and-true.service mesh architecture

Sidecar is advantageous for work accounting, especially in some security-related aspects. Here’s an example: suppose I’m using a service mesh to provide zero-trust style security. I want the service mesh to verify both ends (client and server) of a connection cryptographically. Let’s first consider using a node agent: When my pod wants to be the client of another server pod, the node agent is going to authenticate on behalf of my pod. The node agent is also serving other pods, so it must be careful that another pod cannot trick it into authenticating on my pod’s behalf. If we think about the sidecar case, my pod’s sidecar does not serve other pods. We can follow the principle of least privilege and give it the bare minimum it needs for the one pod it is serving in terms of authentication keys, memory and network capabilities.

So, from the outside the sidecar has the same privileges as the app it is attached to. On the other hand, the sidecar needs to intervene between the app and the outside. This creates some security tension: you want the sidecar to have as little privilege as possible, but you need to give it enough privilege to control traffic to/from the app. For example, in Istio, the init container responsible for setting up the sidecar has the NET_ADMIN permission currently (to set up the iptables rules necessary). That initialization uses good security practices - it does the minimum amount necessary and then goes away, but everything with NET_ADMIN represents attack surface. (Good news - smart people are working on enhancing thisfurther).

Once the sidecar is attached to the app, it’s very proximate from a security perspective. Not as close as a function call in your process (like library) but usually closer than calling out to a multi-tenant node agent. When using Istio in Kubernetes your app container talks to the sidecar over a loopback interface inside of the network namespace shared with your pod - so other pods and node agents generally can’t see that communication.

Most Kubernetes clusters have more than one pod per node (and therefore more than one sidecar per node). If each sidecar needs to know “the entire config” (whatever that means for your context), then you’ll need more bandwidth to distribute that config (and more memory to store copies of it). So it can be powerful to limit the scope of configuration that you have to give to each sidecar - but again there’s an opposing tension: something (in Istio’s case, Pilot) has to spend more effort computing that reduced configuration for each sidecar.

Other things that happen to be replicated across sidecars accrue a similar bill. Good news - the container runtimes will reuse things like container disk images when they’re identical and you’re using the right drivers, so the disk penalty is not especially significant in many cases, and memory like code pages can also often be shared. But each sidecar’s process-specific memory will be unique to that sidecar so it’s important to keep this under control and avoid making your sidecar “heavy weight” by doing a bunch of replicated work in each sidecar.

Service Meshes relying on sidecar provide a good balance between a full set of features, and a lightweight footprint.

Will the Node Agent or Sidecar Model Prevail?

I think you’re likely to see some of both. Now seems like a perfect time for sidecar service mesh: nascent technology, fast iteration and gradual adoption. As service mesh matures and the rate-of-change decreases, we’ll see more applications of the node agent model.

Advantages of the node agent model are particularly important as service mesh implementations mature and clusters get big:

  • Less overhead (especially memory) for things that could be shared across a node
  • Easier to scale distribution of configuration information
  • A well-built node agent can efficiently shift resources from serving one application to another

Sidecar is a novel way of providing services (like a high-level communication proxy a la Service Mesh) to applications. It is especially well-adapted for containers and Kubernetes. Some of its greatest advantages include:

  • Can be gradually added to an existing cluster without central coordination
  • Work performed for an app is accounted to that app
  • App-to-sidecar communication is easier to secure than app-to-agent

What’s Next?

As Shawn talked about in his post, we’ve been thinking about how microservices change the requirements from network infrastructure for a few years now. The swell of support and uptake for Istio demonstrated to us that there’s a community ready to develop and coalesce on policy specs, with a well-architected implementation to go along with it.

Istio is advancing state-of-the-art microservices communication, and we’re excited to help make that technology easy to operate, reliable, and well-suited for your team’s workflow in private cloud, public cloud or hybrid.

Feel free to reach out to us with any questions.


The Path to Service Mesh

When we talk to people about service mesh, there are a few questions we’re always asked. These questions range from straightforward questions about the history of our project, to deep technical questions on why we made certain decisions for our product and architecture.

To answer those questions we’ll bring you a three-part blog series on our Aspen Mesh journey and why we chose to build on top of Istio.

To begin, I’ll focus on one of the questions I’m most commonly asked.

Why did you decide to focus on service mesh and what was the path that lead you there?

LineRate Systems: High-Performance Software Only Load Balancing

The journey starts with a small Boulder startup called LineRate Systems and the acquisition of that company by F5 Networks in 2013. Besides being one of the smartest and most talented engineering teams I have ever had the privilege of being part of, LineRate was a lightweight high-performing software-only L7 proxy. When I say high performance, I am talking about turning a server you already had in your datacenter 5 years ago into a high performance 20+ Gbps 200,000+ HTTP requests/second fully featured proxy.

While the performance was eye-catching and certainly opened doors for our customers, our hypothesis was that customers wanted to pay for capacity, not hardware. That insight would turn out to be LineRate’s key value proposition. This simple concept would allow customers the ability to change the way that they consumed and deployed load balancers in front of their applications.

To fulfill that need we delivered a product and business model that allowed our customers to replicate the software as many times as needed across COTS hardware, allowing them to get peak performance regardless of how many instances they used. If a customer needed more capacity they simply upgraded their subscription tier and deployed more copies of the product until they reached the bandwidth, request rate or transaction rates the license allowed.

This was attractive, and we had some success there, but soon we had a new insight…

Efficiency Over Performance

It became apparent to us that application architectures were changing and the value curve for our customers was changing along with them. We noticed in conversations with leading-edge teams that they were talking about concepts like efficiency, agility, velocity, footprint and horizontal scale. We also started to hear from innovators in the space about this new technology called Docker, and how it was going to change the way that applications and services were delivered.

The more we talked to these teams and thought about how we were developing our own internal applications the more we realized that a shift was happening. Teams were fundamentally changing how they were delivering their applications, and the result was our customers were beginning to care less about raw performance and more about distributed proxies. There were many benefits to this shift including reducing the failure domains of applications, increased flexibility in deployments and the ability for applications to store their proxy and network configuration as code alongside their application.

At the same time containers and container orchestration systems were just starting to come on the scene, so we went to work on delivering our LineRate product in a container with a new control plane and thinking deeply about how people would be delivering applications using these new technologies in the future.

These early conversations in 2015 drove us to think about what application delivery would look like in the future…

That Idea that Just Won’t Go Away

As we thought more about the future of application delivery, we began to focus on the concept of policy and network services in a cloud-native distributed application world. Even though we had many different priorities and projects to work on, the idea of a changing application landscape, cloud-native applications and DevOps based delivery models remained in the forefront of our minds.

There just has to be a market for something new in this space.

We came up with multiple projects that for various reasons never came to fruition. We lovingly referred to them as v1.0, v1.5, and v2.0. Each of these projects had unique approaches to solving challenges in distributed application architectures (microservices).

So we thought as big as we could. A next-gen ADC architecture: a control plane that’s totally API-driven and separate from the data plane. The data plane comes in any form you can think of: purpose-built hardware, software-on-COTS, or cloud-native components that live right near a microservice (like a service mesh). This infinitely scalable architecture smooths out all tradeoffs and works perfectly for any organization of any size doing any kind of work. Pretty ambitious, huh? We had fallen into the trap of being all things to all users.

Next, we refined our approach in “1.5”, and we decided to define a policy language… The key was defining that open-source policy interface and connecting that seamlessly to the datapath pieces that get the work done. In a truly open platform, some of those datapath pieces are open source too. There were a lot of moving parts that didn’t all fall into place at once; and in hindsight we should have seen some of them coming … The market wasn’t there yet, we didn’t have expertise in open source, and we had trouble describing what we were doing and why.

But the idea just kept burning in the back of our minds, and we didn’t give up…

For Version 2.0, we devised a plan that could help F5’s users who were getting started on their container journey. The technology was new and the market was just starting to mature, but we decided that customers would take three steps on their microservice journey:

  1. Experimenting - Testing applications in containers on a laptop, server or cloud instance.
  2. Production Planning - Identifying what technology is needed to start to enable developers to deploy container-based applications in production.
  3. Operating at Scale - Focus on increasing the observability, operability and security of container applications to reduce the mean-time-to-discovery (MTTD) and mean-time-to-resolution (MTTR) of outages.

We decided there was nothing we could do for experimenting customers, but for production planning, we could create an open source connector for container orchestration environments and BIG-IP. We called this the BIG-IP Container Connector, and we were able to solve existing F5 customers’ problems and start talking to them about the next step in their journey. The container connector team continues to this day to bridge the gap between ADC-as-you-know-it and fast-changing container orchestration environments.

We also started to work on a new lightweight containerized proxy called the Application Services Proxy, or ASP. Like Linkerd and Envoy, it was designed to help microservices talk to each other efficiently, flexibly and observably. Unlike Linkerd and Envoy, it didn’t have any open source community associated with it. We thought about our open source strategy and what it meant for the ASP.

At the same time, a change was taking place within F5…

Aspen Mesh - An F5 Innovation

As we worked on our go to market plans for ASP, F5 changed how it invests in new technologies and nascent markets through incubation programs. These two events, combined with the explosive growth in the container space, led us to the decision to commit to building a product on top of an existing open source service mesh. We picked Istio because of its attractive declarative policy language, scalable control-plane architecture and other things that we’ll cover in more depth as we go.

With a plan in place it was time to pitch our idea for the incubator to the powers that be. Aspen Mesh is the result of that pitch and the end of one journey, and the first step on a new one…

Parts two and three of this series will focus on why we decided to use Istio for our service mesh core and what you can expect to see over the coming months as we build the most fully supported enterprise service mesh on the market.