The Road Ahead for Service Mesh

This is the third in a blog series covering how we got to a service meshwhy we decided on the type of mesh we did and where we see the future of the space.

If you’re struggling to manage microservices as architectures continue to become more complex, there’s a good chance you’ve at least heard of service mesh. For the purposes of this blog, I’ll assume you’re familiar with the basic tenets of a service mesh.

We believe that service mesh is advancing microservice communication to a new level that is unachievable with the one-off solutions that were previously being used. Things like DNS provide some capabilities like service discovery, but don’t provide fast retries, load balancing, tracing and health monitoring. The old approach also requires that you cobble together several things each time when it’s possible to bundle it all together in a reusable tool.

While it’s possible to accomplish much of what a service mesh manages with individual tools and processes, it’s manual and time consuming. The images below provides a good idea of how a mesh simplifies the management of microservices.

 

 

Right Around the Corner

So what’s in the immediate future? I think we’ll see the technology quickly mature and add more capabilities as standard features in response to enterprises realizing the efficiency gains created by a mesh and look to implement them as the standard for managing microservice architectures. Offerings like Istio are not ready for production deployments, but the roadmap is progressing quickly and it seems we’ll be to v1 in short order. Security is a feature provided by service mesh, but for most enterprises it’s a major consideration and I see policy enforcement and monitoring options becoming more robust for enterprise production deployments. A feature I see on the near horizon and one that will provide tremendous value is an analytics platform to show insights from the huge amount of telemetry data in a service mesh. I think an emerging value proposition we’ll see is that the mesh allows you to gain and act on data that will allow you to more efficiently manage your entire architecture.

Further Down the Road

There is a lot of discussion on what’s on the immediate horizon for service mesh, but what is more interesting is considering what the long term will bring. My guess is that we ultimately come to a mesh being an embedded value add in a platform. Microservices are clearly the way of the future, so organizations are going to demand an effortless way to manage them. They’ll want something automated, running in the background that never has to be thought about. This is probably years down the road, but I do believe service mesh will eventually be a ubiquitous technology that is a fully managed plug and play config. It will be interesting to see new ways of using the technology to manage infrastructure, services and applications.

We’re excited to be part of the journey, and are inspired by the ideas in the Istio community and how users are leveraging service mesh to solve direct problems created by the explosion of microservices and also find new efficiencies with it. Our goal is to make the implementation of a mesh seamless with your existing technology and provide enhanced features, knowledge and support to take the burden out of managing microservices. We’re looking forward to the road ahead and would love to work with you to make your microservices journey easier.


Service Mesh Architectures

If you are building your software and teams around microservices, you’re looking for ways to iterate faster and scale flexibly. A service mesh can help you do that while maintaining (or enhancing) visibility and control. In this blog, I’ll talk about what’s actually in a Service Mesh and what considerations you might want to make when choosing and deploying one.

So, what is a service mesh? How is it different from what’s already in your stack? A service mesh is a communication layer that rides on top of request/response unlocking some patterns essential for healthy microservices. A few of my favorites:

  • Zero-trust security that doesn’t assume a trusted perimeter
  • Tracing that shows you how and why every microservice talked to another microservice
  • Fault injection and tolerance that lets you experimentally verify the resilience of your application
  • Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing

Why a new term?

Looking at that list, you may think “I can do all of that without a Service Mesh”, and you’re correct. The same logic applies to sliding window protocols or request framing. But once there’s an emerging standard that does what you want, it’s more efficient to rely on that layer instead of implementing it yourself. Service Mesh is that emerging layer for microservices patterns.

Service mesh is still nascent enough that codified standards have yet to emerge, but there is enough experience that some best practices are beginning to become clear. As the the bleeding-edge leaders develop their own approaches, it is often useful to compare notes and distill best practices. We’ve seen Kubernetes emerge as the standard way to run containers for production web applications. My favorite standards are emergent rather than forced: It’s definitely a fine art to be neither too early nor too late to agree on common APIs, protocols and concepts.

Think about the history of computer networking. After the innovation of best-effort packet-switched networks, we found out that many of us were creating virtual circuits over them - using handshaking, retransmission and internetworking to turn a pile of packets into an ordered stream of bytes. For the sake of interoperability and simplicity, a “best practice” stream-over-packets emerged: TCP (the Introduction of RFC675 does a good job of explaining what it layers on top of). There are alternatives - I’ve used the Licklider Transmission Protocol in space networks where distributed congestion control is neither necessary nor efficient. Your browser might already be using QUIC. Standardizing on TCP, however, freed a generation of programmers from fiddling with implementations of sliding windows, retries, and congestion collapse (well, except for those packetheads that implemented it).

Next, we found a lot of request/response protocols running on top of TCP. Many of these eventually migrated to HTTP (or sequels like HTTP/2 or gRPC). If you can factor your communication into “method, metadata, body”, you should be looking at an HTTP-like protocol to manage framing, separate metadata from body, and address head-of-line blocking. This extends beyond just browser apps - databases like Mongo provide HTTP interfaces because the ubiquity of HTTP unlocks a huge amount of tooling and developer knowledge.

You can think about service mesh as being the lexicon, API and implementation around the next tier of communication patterns for microservices.

OK, so where does that layer live? You have a couple of choices:

  • In a Library that your microservices applications import and use.
  • In a Node Agent or daemon that services all of the containers on a particular node/machine.
  • In a Sidecar container that runs alongside your application container.

Library

The library approach is the original. It is simple and straightforward. In this case, each microservice application includes library code that implements service mesh features. Libraries like Hystrix and Ribbon would be examples of this approach.

This works well for apps that are exclusively written in one language by the teams that run them (so that it’s easy to insert the libraries). The library approach also doesn’t require much cooperation from the underlying infrastructure - the container runner (like Kubernetes) doesn’t need to be aware that you’re running a Hystrix-enhanced app.

There is some work on multilanguage libraries (reimplementations of the same concepts). The challenge here is the complexity and effort involved in replicating the same behavior over and over again.

We see very limited adoption of the library model in our user base because most of our users are running applications written in many different languages (polyglot), and are also running at least a few applications that aren’t written by them so injecting libraries isn’t feasible.

This model has an advantage in work accounting: the code performing work on behalf of the microservice is actually running in that microservice. The trust boundary is also small - you only have to trust calling a library in your own process, not necessarily a remote service somewhere out over the network. That code only has as many privileges as the one microservice it is performing work on behalf of. That work is also performed in the context of the microservice, so it’s easy to fairly allocate resources like CPU time or memory for that work - the OS probably does it for you.

Node Agent

The node agent model is the next alternative. In this architecture, there’s a separate agent (often a userspace process) running on every node, servicing a heterogenous mix of workloads. For purposes of our comparison, it’s the opposite of the library model: it doesn’t care about the language of your application but it serves many different microservice tenants.

Linkerd’s recommended deployment in Kubernetes works like this. As do F5’s Application Service Proxy (ASP) and the Kubernetes default kube-proxy.

Since you need one node agent on every node, this deployment requires some cooperation from the infrastructure - this model doesn’t work without a bit of coordination. By analogy, most applications can’t just choose their own TCP stack, guess an ephemeral port number, and send or receive TCP packets directly - they delegate that to the infrastructure (operating system).

Instead of good work accounting, this model emphasizes work resource sharing - if a node agent allocates some memory to buffer data for my microservice, it might turn around and use that buffer for data for your service in a few seconds. This can be very efficient, but there’s an avenue for abuse. If my microservice asks for all the buffer space, the node agent needs to make sure it gives your microservice a shot at buffer space first. You need a bit more code to manage this for each shared resource.

Another work resource that benefits from sharing is configuration information. It’s cheaper to distribute one copy of the configuration to each node, than to distribute one copy of the configuration to each pod on each node.

A lot of functionality that containerized microservices rely on are provided by a Node Agent or something topologically equivalent. Think about kubelet initializing your pod, your favorite CNI daemon like flanneld, or stretching your brain a bit, even the operating system kernel itself as following this node agent model.

Sidecar

Sidecar is the new kid on the block. This is the model used by Istio with Envoy. Conduit also uses a sidecar approach. In Sidecar deployments, you have one adjacent container deployed for every application container. For a service mesh, the sidecar handles all the network traffic in and out of the application container.

This approach is in between the library and node agent approaches for many of the tradeoffs I discussed so far. For instance, you can deploy a sidecar service mesh without having to run a new agent on every node (so you don’t need infrastructure-wide cooperation to deploy that shared agent), but you’ll be running multiple copies of an identical sidecar. Another take on this: I can install one service mesh for a group of microservices, and you could install a different one, and (with some implementation-specific caveats) we don’t have to coordinate. This is powerful in the early days of service mesh, where you and I might share the same Kubernetes cluster but have different goals, require different feature sets, or have different tolerances for bleeding-edge vs. tried-and-true.service mesh architecture

Sidecar is advantageous for work accounting, especially in some security-related aspects. Here’s an example: suppose I’m using a service mesh to provide zero-trust style security. I want the service mesh to verify both ends (client and server) of a connection cryptographically. Let’s first consider using a node agent: When my pod wants to be the client of another server pod, the node agent is going to authenticate on behalf of my pod. The node agent is also serving other pods, so it must be careful that another pod cannot trick it into authenticating on my pod’s behalf. If we think about the sidecar case, my pod’s sidecar does not serve other pods. We can follow the principle of least privilege and give it the bare minimum it needs for the one pod it is serving in terms of authentication keys, memory and network capabilities.

So, from the outside the sidecar has the same privileges as the app it is attached to. On the other hand, the sidecar needs to intervene between the app and the outside. This creates some security tension: you want the sidecar to have as little privilege as possible, but you need to give it enough privilege to control traffic to/from the app. For example, in Istio, the init container responsible for setting up the sidecar has the NET_ADMIN permission currently (to set up the iptables rules necessary). That initialization uses good security practices - it does the minimum amount necessary and then goes away, but everything with NET_ADMIN represents attack surface. (Good news - smart people are working on enhancing thisfurther).

Once the sidecar is attached to the app, it’s very proximate from a security perspective. Not as close as a function call in your process (like library) but usually closer than calling out to a multi-tenant node agent. When using Istio in Kubernetes your app container talks to the sidecar over a loopback interface inside of the network namespace shared with your pod - so other pods and node agents generally can’t see that communication.

Most Kubernetes clusters have more than one pod per node (and therefore more than one sidecar per node). If each sidecar needs to know “the entire config” (whatever that means for your context), then you’ll need more bandwidth to distribute that config (and more memory to store copies of it). So it can be powerful to limit the scope of configuration that you have to give to each sidecar - but again there’s an opposing tension: something (in Istio’s case, Pilot) has to spend more effort computing that reduced configuration for each sidecar.

Other things that happen to be replicated across sidecars accrue a similar bill. Good news - the container runtimes will reuse things like container disk images when they’re identical and you’re using the right drivers, so the disk penalty is not especially significant in many cases, and memory like code pages can also often be shared. But each sidecar’s process-specific memory will be unique to that sidecar so it’s important to keep this under control and avoid making your sidecar “heavy weight” by doing a bunch of replicated work in each sidecar.

Service Meshes relying on sidecar provide a good balance between a full set of features, and a lightweight footprint.

Will the node agent or sidecar model prevail?

I think you’re likely to see some of both. Now seems like a perfect time for sidecar service mesh: nascent technology, fast iteration and gradual adoption. As service mesh matures and the rate-of-change decreases, we’ll see more applications of the node agent model.

Advantages of the node agent model are particularly important as service mesh implementations mature and clusters get big:

  • Less overhead (especially memory) for things that could be shared across a node
  • Easier to scale distribution of configuration information
  • A well-built node agent can efficiently shift resources from serving one application to another

Sidecar is a novel way of providing services (like a high-level communication proxy a la Service Mesh) to applications. It is especially well-adapted for containers and Kubernetes. Some of its greatest advantages include:

  • Can be gradually added to an existing cluster without central coordination
  • Work performed for an app is accounted to that app
  • App-to-sidecar communication is easier to secure than app-to-agent

What’s next?

As Shawn talked about in his post, we’ve been thinking about how microservices change the requirements from network infrastructure for a few years now. The swell of support and uptake for Istio demonstrated to us that there’s a community ready to develop and coalesce on policy specs, with a well-architected implementation to go along with it.

Istio is advancing state-of-the-art microservices communication, and we’re excited to help make that technology easy to operate, reliable, and well-suited for your team’s workflow in private cloud, public cloud or hybrid.


Using AWS Services from Istio Service Mesh with Go

This is a quick blog on how we use AWS services from inside of an Istio Service Mesh. Why does it matter that you’re inside the mesh? Because the service mesh wants to manage all the traffic in/out of your application. This means it needs to be able to inspect the traffic and parse it if it is HTTP. Nothing too fancy here, just writing it down in case it can save you a few keystrokes.

Our example is for programs written in Go.

Step 1: Define an Egress Rule

You need to make an egress to allow the application to talk to the AWS service at all. Here’s an example egress rule to allow dynamo:

Step 2: Delegate Encryption to the Sidecar

This part is the trick we were missing. If you want to get maximum service mesh benefits, you need to pass unencrypted traffic to the sidecar. The sidecar will inspect it, apply policy and encrypt it before egressing to the AWS service (in our case Dynamo).

Don’t worry, your traffic is not going out on any real wires unencrypted. Only the loopback wire from your app container to the sidecar. In Kubernetes, this is its own network namespace so even other containers on the same system cannot see it unencrypted.

package awswrapper
import (
"net/http"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/request"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/golang/glog"
"github.com/you/repo/pkg/tracing"
)
type Config struct {
InMesh bool
Endpoint string // http://dynamodb.us-west-2.amazonaws.com
Label string // Used in logging messages to identify
}
func istioEgressEPResolver(service, region string, optFns ...func(*endpoints.Options)) (endpoints.ResolvedEndpoint, error) {
ep, err := endpoints.DefaultResolver().EndpointFor(service, region, optFns...)
if err != nil {
return ep, err
}
ep.URL = ep.URL + ":443"
return ep, nil
}
func AwsConfig(cfg Config) *aws.Config {
config := aws.NewConfig().
WithEndpoint(cfg.Endpoint)
if cfg.InMesh {
glog.Infof("Using http for AWS for %s", cfg.Label)
config = config.WithDisableSSL(true).
WithEndpointResolver(endpoints.ResolverFunc(istioEgressEPResolver))
}
return config
}
func AwsSession(label string, cfg *aws.Config) (*session.Session, error) {
sess, err := session.NewSession(cfg)
if err != nil {
return nil, err
}
// This has to be the first handler before core.SendHandler which
// performs the operation of sending request over the wire.
// Note that Send Handler is used which are invoked after the signing of
// request is completed which means Tracing headers would not be signed.
// Signing of tracing headers causes request failures as Istio changes the
// headers and signature validation fails.
sess.Handlers.Send.PushFront(addTracingHeaders)
sess.Handlers.Send.PushBack(func(r *request.Request) {
glog.V(6).Infof("%s: %s %s://%s%s",
label,
r.HTTPRequest.Method,
r.HTTPRequest.URL.Scheme,
r.HTTPRequest.URL.Host,
r.HTTPRequest.URL.Path,
)
})
// This handler is added after core.SendHandler so that the tracing headers
// can be removed. This is required in case of retries, the request is signed
// again and if the request headers contain Tracing headers retry signature
// validation will fail as Istio will update these headers.
sess.Handlers.Send.PushBack(removeTracingHeaders)
return sess, nil
}
view rawwrapAwsSession.go hosted with ❤ by GitHub

AwsConfig() is the core - you need to make a new aws.Session with these options.

The first option, WithDisableSSL(true), tells the AWS libraries to not use HTTPS and instead just speak plain HTTP. This is very bad if you are not in the mesh. But, since we are in the mesh, we’re only going to speak plain HTTP over to the sidecar, which will convert HTTP into HTTPS and send it out over the wire. In Kubernetes, the sidecar is in an isolated network namespace with your app pod, so there’s no chance for other pods or processes to snoop this plaintext traffic.

When you set the first option, the library will try to talk to http://dynamodb.<region>.us-west-2.amazonaws.com on port 80 (hey, you asked it to disable SSL). But that’s not what we want - we want to act like we’re talking to 443 so that the right egress rule gets invoked and the sidecar encrypts traffic. That’s what istioEgressEPResolver is for.

We do it this way for a little bit of belts-and-suspenders safety - we really want to avoid ever accidentally speaking HTTP to dynamo. Here are the various failure scenarios:

  • Our service is in Istio, and the user properly configured InMesh=true: everything works and is HTTPS via the sidecar.
  • Our service is not in Istio, and the user properly configured InMesh=false: everything works and is HTTPS via the AWS go library.
  • Our service is not in Istio, but oops! the user set InMesh=true: the initial request goes out to dynamo on port 443 as plain HTTP. Dynamo rejects it, so we know it’s broken before sending a bunch of data via plain HTTP.
  • Our service is in Istio, but oops! the user set InMesh=false: the sidecar rejects the traffic as it is already-encrypted HTTPS that it can’t make any sense of.

OK, now you’ve got an aws.Session instance ready to go. Pass it to your favorite AWS service interface and go:

import (
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/dynamodb"
"github.com/you/repo/pkg/awswrapper"
)
type Dynamo struct {
Session *session.Session
Db *dynamodb.DynamoDB
}
func NewWithConfig(cfg *aws.Config) (*Dynamo, error) {
sess, err := awswrapper.AwsSession("Test", cfg)
if err != nil {
return nil, err
}
dyn := &Dynamo{
Session: sess,
Db: dynamodb.New(sess),
}
return dyn, nil
}
view rawuseWrapped.go hosted with ❤ by GitHub

p.s. What’s up with addTracingHeaders() and removeTracingHeaders()? Check out Neeraj’s post. While you’re at it, you can add just a few more lines and get great end-to-end distributed tracing.


The Path to Service Mesh

When we talk to people about service mesh, there are a few questions we’re always asked. These questions range from straightforward questions about the history of our project, to deep technical questions on why we made certain decisions for our product and architecture.

To answer those questions we’ll bring you a three-part blog series on our Aspen Mesh journey and why we chose to build on top of Istio.

To begin, I’ll focus on one of the questions I’m most commonly asked.

Why did you decide to focus on service mesh and what was the path that lead you there?

LineRate Systems: High-Performance Software Only Load Balancing

The journey starts with a small Boulder startup called LineRate Systems and the acquisition of that company by F5 Networks in 2013. Besides being one of the smartest and most talented engineering teams I have ever had the privilege of being part of, LineRate was a lightweight high-performing software-only L7 proxy. When I say high performance, I am talking about turning a server you already had in your datacenter 5 years ago into a high performance 20+ Gbps 200,000+ HTTP requests/second fully featured proxy.

While the performance was eye-catching and certainly opened doors for our customers, our hypothesis was that customers wanted to pay for capacity, not hardware. That insight would turn out to be LineRate’s key value proposition. This simple concept would allow customers the ability to change the way that they consumed and deployed load balancers in front of their applications.

To fulfill that need we delivered a product and business model that allowed our customers to replicate the software as many times as needed across COTS hardware, allowing them to get peak performance regardless of how many instances they used. If a customer needed more capacity they simply upgraded their subscription tier and deployed more copies of the product until they reached the bandwidth, request rate or transaction rates the license allowed.

This was attractive, and we had some success there, but soon we had a new insight…

Efficiency Over Performance

It became apparent to us that application architectures were changing and the value curve for our customers was changing along with them. We noticed in conversations with leading-edge teams that they were talking about concepts like efficiency, agility, velocity, footprint and horizontal scale. We also started to hear from innovators in the space about this new technology called Docker, and how it was going to change the way that applications and services were delivered.

The more we talked to these teams and thought about how we were developing our own internal applications the more we realized that a shift was happening. Teams were fundamentally changing how they were delivering their applications, and the result was our customers were beginning to care less about raw performance and more about distributed proxies. There were many benefits to this shift including reducing the failure domains of applications, increased flexibility in deployments and the ability for applications to store their proxy and network configuration as code alongside their application.

At the same time containers and container orchestration systems were just starting to come on the scene, so we went to work on delivering our LineRate product in a container with a new control plane and thinking deeply about how people would be delivering applications using these new technologies in the future.

These early conversations in 2015 drove us to think about what application delivery would look like in the future…

That Idea that Just Won’t Go Away

As we thought more about the future of application delivery, we began to focus on the concept of policy and network services in a cloud-native distributed application world. Even though we had many different priorities and projects to work on, the idea of a changing application landscape, cloud-native applications and DevOps based delivery models remained in the forefront of our minds.

There just has to be a market for something new in this space.

We came up with multiple projects that for various reasons never came to fruition. We lovingly referred to them as v1.0, v1.5, and v2.0. Each of these projects had unique approaches to solving challenges in distributed application architectures (microservices).

So we thought as big as we could. A next-gen ADC architecture: a control plane that’s totally API-driven and separate from the data plane. The data plane comes in any form you can think of: purpose-built hardware, software-on-COTS, or cloud-native components that live right near a microservice (like a service mesh). This infinitely scalable architecture smooths out all tradeoffs and works perfectly for any organization of any size doing any kind of work. Pretty ambitious, huh? We had fallen into the trap of being all things to all users.

Next, we refined our approach in “1.5”, and we decided to define a policy language… The key was defining that open-source policy interface and connecting that seamlessly to the datapath pieces that get the work done. In a truly open platform, some of those datapath pieces are open source too. There were a lot of moving parts that didn’t all fall into place at once; and in hindsight we should have seen some of them coming … The market wasn’t there yet, we didn’t have expertise in open source, and we had trouble describing what we were doing and why.

But the idea just kept burning in the back of our minds, and we didn’t give up…

For Version 2.0, we devised a plan that could help F5’s users who were getting started on their container journey. The technology was new and the market was just starting to mature, but we decided that customers would take three steps on their microservice journey:

  1. Experimenting - Testing applications in containers on a laptop, server or cloud instance.
  2. Production Planning - Identifying what technology is needed to start to enable developers to deploy container-based applications in production.
  3. Operating at Scale - Focus on increasing the observability, operability and security of container applications to reduce the mean-time-to-discovery (MTTD) and mean-time-to-resolution (MTTR) of outages.

We decided there was nothing we could do for experimenting customers, but for production planning, we could create an open source connector for container orchestration environments and BIG-IP. We called this the BIG-IP Container Connector, and we were able to solve existing F5 customers’ problems and start talking to them about the next step in their journey. The container connector team continues to this day to bridge the gap between ADC-as-you-know-it and fast-changing container orchestration environments.

We also started to work on a new lightweight containerized proxy called the Application Services Proxy, or ASP. Like Linkerd and Envoy, it was designed to help microservices talk to each other efficiently, flexibly and observably. Unlike Linkerd and Envoy, it didn’t have any open source community associated with it. We thought about our open source strategy and what it meant for the ASP.

At the same time, a change was taking place within F5…

Aspen Mesh - An F5 Innovation

As we worked on our go to market plans for ASP, F5 changed how it invests in new technologies and nascent markets through incubation programs. These two events, combined with the explosive growth in the container space, led us to the decision to commit to building a product on top of an existing open source service mesh. We picked Istio because of its attractive declarative policy language, scalable control-plane architecture and other things that we’ll cover in more depth as we go.

With a plan in place it was time to pitch our idea for the incubator to the powers that be. Aspen Mesh is the result of that pitch and the end of one journey, and the first step on a new one…

Parts two and three of this series will focus on why we decided to use Istio for our service mesh core and what you can expect to see over the coming months as we build the most fully supported enterprise service mesh on the market.


Top 3 Reasons to Manage Microservices with Service Mesh


Building microservices is easy, operating a microservice architecture is hard. Many companies are successfully using tools like Kubernetes for deploys, but they still face runtime challenges. This is where the service mesh comes in. It greatly simplifies the managing of containerized applications and makes it easier to monitor and secure microservice-based applications. So what are the top 3 reasons to use a supported service mesh? Here’s my take.

Security

Since service mesh operates on a data plane, it’s possible to apply common security across the mesh which provides much greater security than multilayer environments like Kubernetes. A service mesh secures inter-service communications so you can know what a service is talking to and if that communication can be trusted.

Observability

Most failures in the microservices space occur during the interactions between services, so a view into those transactions helps teams better manage architectures to avoid failures. A service mesh provides a view into what is happening when your services interact with each other. The mesh also greatly improves tracing capabilities and provides the ability to add tracing without touching all of your applications.

Simplicity

A service mesh is not a new technology, rather a bundling together of several existing technologies in a package that makes managing the infrastructure layer much simpler. There are existing solutions that cover some of what a mesh does, take for example DNS. It’s a good way to do service discovery when you don’t care about the source trying to discover the service. If all you need in service discovery is to find the service and connect to it, DNS is sufficient, but it doesn’t give you fast retries or health monitoring. When you want to ask more advanced questions, you need a service mesh. You can cobble things together to address much of what a service mesh addresses, but why would you want to if you could just interact with a service mesh that provides a one-time, reusable packaging?

There are certainly many more advantages to managing microservices with a service mesh, but I think the above 3 are major selling points where organizations that are looking to scale their microservice architecture would find the greatest benefit. No doubt there will also be expanded capabilities in the future such as analytics dashboards that provide easy to consume insights from the huge amount of data in a service mesh. I’d love to hear other ideas you might have on top reasons to use service mesh, hit me up @zjory.