Service Mesh For App Owners

Service Mesh for App Owners

How Service Mesh Can Benefit Your Applications

You’ve heard the buzz about service mesh, and if you're like most App Owners, that means you have a lot of questions. Is it something that will be worthwhile for your company to adopt? What are business outcomes service mesh provides? Can it help you better manage your microservices? What are some measurements of success to think about when you’re considering or using service mesh?

To start with, here are five key considerations for evaluating service mesh:

  1. Consider how a service mesh supports your organization's strategic vision and objectives
  2. Have someone in your organization take inventory of your technical requirements and your current systems
  3. Identify resources needed (internal or external) for implementation – all the way through to running your service mesh
  4. Consider how timing, cost and expertise will impact the success of your service mesh implementation
  5. Design a plan to implement, run, measure and improve over time

Business Outcomes From a Service Mesh

As an App Owner, you’re ultimately on the hook for business outcomes at your company. When you're considering adding new tech to your stack, consider your strategies first. What do you plan to accomplish, and how do you intend to make those accomplishments become a reality? 

Whatever your answers may be, if you're using microservices, a service mesh is worth investigating. It has the potential to help you get from where you are to where you want to be -- more securely, and faster.

But apart from just reaching your goals faster and more securely, a service mesh can offer a lot of additional benefits. Here are a few:

  • Decreasing risk
  • Optimizing cost
  • Driving better application behavior
  • Progressive delivery 
  • Gaining a competitive advantage

Decreasing Risk

Risk analysis. Security. Compliance. These topics are priority one, if you want to stay out of the news. But a service mesh can help to provide your company with better -- and provable -- security and compliance.

Security & Compliance

Everyone’s asking a good question: What does it take to achieve security in cloud native environments?

We know that there are a lot of benefits in cloud-native architectures: greater scalability, resiliency and separation of concerns. But new patterns also bring new challenges like ephemerality and new security threats.

With an enterprise service mesh, you get access to observability into security status, end-to-end encryption, compliance features and more. Here are a few security features you can expect from a service mesh:

  • mTLS status at-a-glance: Easily understand the security posture of every service in your cluster
  • Incremental mTLS: Control exactly what’s encrypted in your cluster at the service or namespace level
  • Fine-grained RBAC: Enforce the level of least privilege to ensure your organization does not create a security concern
  • Egress control: Understand and control exactly what your services are talking to outside your clusters

Optimizing Cost

Every business needs cost optimizations. How do you choose which are going to make an impact and which aren’t? Which are most important? Which are you going to use?

As you know, one aspect to consider is talent. Your business does better when your people are working on new features and functionality rather than spending too much of their time on bug fixes. Applications, like service mesh, can help boost your development team’s productivity, allowing them to spend more time working on new business value adds and differentiators rather than bug fixes and maintenance.

But internal resources aren’t the only thing to consider. Without end-users, your company wouldn’t exist. It’s becoming increasingly important to provide a better user experience for both your stakeholders as well as your customers.

A service mesh provides help to applications running on microservice architectures rather than monolithic architectures. Microservices natively make it easier to build and maintain applications, greater agility, faster time to market and more uptime.

A service mesh can help you get the ideal mix of these cost savings and uptime.

Driving Better Application Behavior 

What happens when a new application wants to be exposed to the internet? You need to consider how to secure it, how to integrate it into your existing user-facing APIs, how you'll upgrade it and a host of other concerns. You're embracing microservices, so you might be doing this thing a lot. You want to drive better application behavior. Our advice here? You should use a service mesh policy framework to do this consistently, organization-wide.

Policy is simply a term for describing the way a system responds when something happens. A service mesh can help you improve your company’s policies by allowing you to: 

  1. Provide a clean interface specification between application teams who make new functionality and the platform operators who make it impactful to your users
  2. Make disparate microservices act as a resilient system through controlling how services communicate with each other and external systems and managing it through a single control plane
  3. Allow engineers to easily implement policies that can be mapped to application behavior outcomes, making it easy to ensure great end user experiences

An enterprise service mesh like Aspen Mesh enables each subject-matter expert in your organization to specify policies that enable you to get the intended behavior out of your applications and easily understand what that behavior will be. You can specify, from a business objective level, how you want your application to respond when something happens and use your service mesh to implement that.

Progressive Delivery

Continuous delivery has been a driving force behind software development, testing and deployment for years, and CI/CD best-practices are evolving with the advent of new technologies like Kubernetes and Istio. Progressive delivery, a term coined by James Governor, is a new approach to continuous delivery that includes “a new basket of skills and technologies… such as canarying, feature flags, [and] A/B testing at scale”.  

Progressive delivery decouples LOB and IT by allowing the business to say when it’s acceptable for new code to hit the customer. This means that the business can put guardrails around the customer experience through decoupling dev cycles and service activation. 

With progressive delivery:

  • Deployment is not the same as release
  • Service activation is not the same as deployment
  • The developer can deploy a service, you can ship the service, but that doesn't mean you're activating it for all users

Progressive delivery provides a better developer experience and also allows you to limit the blast radius of new deployments with feature flags, canary deploys and traffic mirroring. 

Gaining A Competitive Advantage

To stay ahead of your competition, you need an edge. Many sizes of companies across industries benefit from microservices or a service mesh. Enterprise companies evaluating or using a service mesh come in lots of different flavors -- those who are just starting, going through or those who have completed a digital transformation, companies shifting from monoliths to microservices, and even organizations using microservices who are working to  identify areas for improvement. 

Service Mesh Success Measurements

How do you plan to measure success with your service mesh? Since service mesh is new and evolving, it can be difficult to know what to look for in order to get a real pulse on how well it’s working for your company.

Start by asking some questions like these:

  1. Saving Resources: Is your team is more efficient with a service mesh? How much more time are they able to spend on feature and function developments rather than bug fixes and maintenance? 
  2. Your Users' Experience: Do you have a complete picture of your customers' experience and know the most valuable places to improve? How much more successful are deployments to production?
  3. Increasing Efficiency: How much time do you spend figuring out which microservice is causing an issue? Does your service mesh save you time here?

These are just a few ways to think about how your service mesh is working for you, as well as a built-in way to identify areas to improve over time. As with any really useful application, it's not just a one-and-done implementation. You'll have greater success by integrating measurement, iteration and improvement into your digital transformation and service mesh strategies.

Interested in learning more about service mesh? Check out the eBook Getting the Most Out of Your Service Mesh.


What a Service Mesh Provides

If you’re like most people with a finger in the tech-world pie, you’ve heard of a service mesh. And you know what a service mesh is. And now you’re wondering what it can solve for you.

A service mesh is an infrastructure layer for microservices applications that can help reduce the complexity of managing microservices and deployments by handling infrastructure service communication quickly, securely and reliably. Service meshes are great at solving operational challenges and issues when running containers and microservices because they provide a uniform way to secure, connect and monitor microservices. 

A good service mesh keeps your company’s services running they way they should, giving you and your team access to the powerful tools that you need — plus access to engineering and support — so you can focus on adding the most value to your business.

Want to learn more about this? Check out the free Complete Guide to Service Mesh.

Next, let’s dive into three key areas where a service mesh can really help: observability, security and operational control.

Observability

Are you interested in taking your system monitoring a step further? A service mesh provides monitoring plus observability. While monitoring reports overall system health, observability focuses on highly granular insights into the behavior of systems along with rich context.

Deep System Insights

Kubernetes seemed like the way to rapid iteration and quick development sprints, but the promise and the reality of managing containerized applications at scale are two very different things.

Service mesh - Observability

Docker and Kubernetes enable you to more easily build and deploy apps. But it’s often difficult to understand how those apps are behaving once deployed. So, a service mesh provides tracing and telemetry metrics that make it easy to understand your system and quickly root cause any problems.

An Intuitive UI

A service mesh is uniquely positioned to gather a trove of important data from your services. The sidecar approach places an Envoy sidecar next to every pod in your cluster, which then surfaces telemetry data up to the Istio control plane. This is great, but it also means a mesh will gather more data than is useful. The key is surfacing only the data you need to confirm the health and security status of your services. A good UI solves this problem, and it also lowers the bar on the engineering team, making it easier for more members of the team to understand and control the services in your organization’s architecture.

Security

A service mesh provides security features aimed at securing the services inside your network and quickly identifying any compromising traffic entering your cluster. A service mesh can help you more easily manage security through mTLS, ingress and egress control, and more.

mTLS and Why it Matters

Securing microservices is hard. There are a multitude of tools that address microservices security, but service mesh is the most elegant solution for addressing encryption of on-the-wire traffic within the network.

Service mesh - Security

Service mesh provides defense with mutual TLS (mTLS) encryption of the traffic between your services. The mesh can automatically encrypt and decrypt requests and responses, removing that burden from the application developer. It can also improve performance by prioritizing the reuse of existing, persistent connections, reducing the need for the computationally expensive creation of new ones. With service mesh, you can secure traffic over the wire and also make strong identity-based authentication and authorizations for each microservice.

We see a lot of value in this for enterprise companies. With a good service mesh, you can see whether mTLS is enabled and working between each of your services and get immediate alerts if security status changes.

Ingress & Egress Control

Service mesh adds a layer of security that allows you to monitor and address compromising traffic as it enters the mesh. Istio integrates with Kubernetes as an ingress controller and takes care of load balancing for ingress. This allows you to add a level of security at the perimeter with ingress rules. Egress control allows you to see and manage external services and control how your services interact with them.

Operational Control

A service mesh allows security and platform teams to set the right macro controls to enforce access controls, while allowing developers to make customizations they need to move quickly within these guardrails.

RBAC

A strong Role Based Access Control (RBAC) system is arguably one of the most critical requirements in large engineering organizations, since even the most secure system can be easily circumvented by overprivileged users or employees. Restricting privileged users to least privileges necessary to perform job responsibilities, ensuring access to systems are set to “deny all” by default, and ensuring proper documentation detailing roles and responsibilities are in place is one of the most critical security concerns in the enterprise.

Service Mesh - Operational Control

We’ve worked to solve this challenge by providing Istio Vet, which is designed to warn you of incorrect or incomplete configuration of your service mesh, and provide guidance to fix it. Istio Vet prevents misconfigurations by refusing to allow them in the first place. Global Istio configuration resources require a different solution, which is addressed by the Traffic Claim Enforcer solution.

The Importance of Policy Frameworks

As companies embrace DevOps and microservice architectures, their teams are moving more quickly and autonomously than ever before. The result is a faster time to market for applications, but more risk to the business. The responsibility of understanding and managing the company’s security and compliance needs is now shifted left to teams that may not have the expertise or desire to take on this burden.

Service mesh makes it easy to control policy and understand how policy settings will affect application behavior. In addition, analytics insights help you get the most out of policy through monitoring, vetting and policy violation analytics so you can quickly understand the best actions to take.

Policy frameworks allow you to securely and efficiently deploy microservices applications while limiting risk and unlocking DevOps productivity. Key to this innovation is the ability to synthesize business-level goals, regulatory or legal requirements, operational metrics, and team-level rules into high performance service mesh policy that sits adjacent to every application.

A good service mesh keeps your company’s services running they way they should, giving you observability, security and operational control plus access to engineering and support, so you are free to focus on adding more value to your business.

If you’d like to learn more about this, get your free copy of the Complete Guide to Service Mesh here.

 

 


Protocol Sniffing Service Mesh

Protocol Sniffing in Production

Istio 1.3 introduced a new capability to automatically sniff the protocol used when two containers communicate. This is a powerful benefit to easily get started with Istio, but it has some tradeoffs.  Aspen Mesh recommends that production deployments of Aspen Mesh (built on Istio) do not use protocol sniffing, and Aspen Mesh 1.3.3-am2 turns off protocol sniffing by default. This blog explains the tradeoffs and the reasoning we think turning off protocol sniffing is the better tradeoff.  

What Protocol Sniffing Is

Protocol sniffing predates Istio. For our purposes, we're going to define it as examining some communication stream and classifying it as implementing one protocol (like HTTP) or another (like SSH), without additional information. For example, here's two streams from client to server, if you've ever debugged these protocols you won't have a hard time telling them apart:

Protocol Sniffing Service Mesh

In an active service mesh, the Envoy sidecars will be handling thousands of these streams a second.  The sidecar is a proxy, so it reads every byte in the stream from one side, examines it, applies policy to it and then sends it on.  In order to apply proper policy ("Send all PUTs to /create_* to create-handler.foo.svc.cluster.local"), Envoy needs to understand the bytes it is reading.  Without protocol sniffing, that's done by configuring Envoy:

  • Layer 7 (HTTP): "All streams with a destination port of 8000 are going to follow the HTTP protocol"
  • Layer 4 (SSH): "All streams with a destination port of 22 are going to follow the SSH protocol"

When Envoy sees a stream with destination port 8000, it reads each byte and runs its own HTTP protocol implementation to understand those bytes and then apply policy.  Port 22 has SSH traffic; Envoy doesn't have an SSH protocol implementation so Envoy treats it as opaque TCP traffic. In proxies this is often called "Layer 4 mode" or "TCP mode"; this is when the proxy doesn't understand the higher-level protocol inside, so it can only apply a simpler subset of policy or collect a subset of telemetry.

For instance, Envoy can tell you how many bytes went over the SSH stream, but it can't tell you anything about whether those bytes indicated a successful SSH session or not.  But since Envoy can understand HTTP, it can say "90% of HTTP requests are successful and get a 200 OK response".

Here's an analogy - I speak English but not Italian; however, I can read and write the Latin alphabet that covers both.  So I could copy an Italian message from one piece of paper to another without understanding what's inside. Suppose I was your proxy and you said, "Andrew, copy all mail correspondence into email for me" - I could do that whether you received letters from English-speaking friends or Italian-speaking ones.  Now suppose you say, "Copy all mail correspondence into email unless it has Game of Thrones spoilers in it."  I can detect spoilers in English correspondence because I actually understand what's being said, but not Italian, where I can only copy the letters from one page to the other.

If I were a proxy, I'm a layer 7 English proxy but I only support Italian in layer 4 mode.

In Aspen Mesh and Istio, the protocol for a stream is configured in the Kubernetes service.  These are the options:

  • Specify a layer 7 protocol: Start the name of the service with a layer 7 protocol that Istio and Envoy understand, for example "http-foo" or "grpc-bar".
  • Specify a layer 4 protocol: Start the name of the service with a layer 4 protocol, for example "tcp-foo".  (You also use this if you know the layer 7 protocol but it's not one that Istio and Envoy support; for example, you might name a port "tcp-ssh")
  • Don't specify protocol at all: Name it without a protocol prefix, e.g. "clients".

If you don't specify a protocol at all, then Istio has to make a choice.  Before protocol sniffing was a feature, Istio chose to treat this with layer 4 mode.  Protocol sniffing is a new behavior that says, "try reading some of it - if it looks like a protocol you know, treat it like that protocol".

An important note here is that this sniffing applies for both passive monitoring and active management.  Istio both collects metrics and applies routing and policy. This is important because if a passive system has a sniff failure, it results only in a degradation of monitoring - details for a request may be unavailable.  But if an active system has a sniff failure, it may misapply routing or policy; it could send a request to the wrong service.

Benefits of Protocol Sniffing

The biggest benefit of protocol sniffing is that you don't have to specify the protocols.  Any communication stream can be sniffed without human intervention. If it happens to be HTTP, you can get detailed metrics on it.

That removes a significant amount of configuration burden and reduces time-to-value for your first service mesh install.  Drop it in and instantly get HTTP metrics.

Protocol Sniffing Failure Modes

However, as with most things, there is a tradeoff.  In some cases, protocol sniffing can produce results that might surprise you.  This happens when the sniffer classifies a stream differently than you or some other system would.

False Positive Match

This occurs when a protocol happens to look like HTTP, but the administrator doesn't want it to be treated by the proxy as HTTP.

One way this can happen is if the apps are speaking some custom protocol where the beginning of the communication stream looks like HTTP, but it later diverges.  Once it diverges and is no longer conforming to HTTP, the proxy has already begun treating it as HTTP and now must terminate the connection. This is one of the differences between passive sniffers and active sniffers - a passive sniffer could simply "cancel" sniffing.

Behavior Change:

  • Without sniffing: Stream is considered Layer 4 and completes fine.
  • With sniffing: Stream is considered Layer 7, and then when it later diverges, the proxy closes the stream.

False Negative Match

This occurs when the client and server think they are speaking HTTP, but the sniffer decides it isn't HTTP.  In our case, that means the sniffer downgrades to Layer 4 mode. The proxy no longer applies Layer 7 policy (like Istio's HTTP Authorization) or collects Layer 7 telemetry (like request success/failure counts).

One case where this occurs is when the client and server are both technically violating a specification but in a way that they both understand.  A classic example in the HTTP space is line termination - technically, lines in HTTP must be terminated with a CRLF; two characters 0x0d 0x0a.  But most proxies and web servers will also accept HTTP where lines are only terminated with LF (just the 0x0a character), because some ancient clients and hacked-together UNIX tools just sent LFs.

That example is usually harmless but a riskier one is if a client can speak something that looks like HTTP, that the server will treat as HTTP, but the sniffer will downgrade.  This allows the client to bypass any Layer 7 policies the proxy would enforce. Istio currently applies sniffing to outbound traffic where the outbound target is unknown (often occurs for Egress traffic) or the outbound target is a service port without a protocol annotation.

Here's an example: I know of two non-standard behaviors that node.js' HTTP server framework allows.  The first is allowing extra spaces between the Request-URI and the HTTP-Version in the Request-Line. The second is allowing spaces in a Header field-name.  Here's an example with the weird parts highlighted:

If I send this to a node.js server, it accepts it as a valid HTTP request (for the curious, the extra whitespace in the request line is dropped, and the whitespace in the Header field-name is included so the header is named "x-foo   bar"). Node.js' HTTP parser is taken from nginx which also accepts the extra spaces. Nginx is pretty darn popular so other web frameworks and a lot of servers accept this. Interestingly, so does the HTTP parser in Envoy (but not the HTTP inspector).

Suppose I have a situation like this:  We just added a new capability to delete in-progress orders to the beta version of our service, so we want all DELETE requests to be routed to "foo-beta" and all other normal requests routed to "foo".  We might write an Istio VirtualService to route DELETE requests like this:

If I send a request like this, it is properly routed to foo-2.

But if I send one like this, I bypass the route and go to foo-1.  Oops!

This means that clients can choose to "step around" routing if they can find requests that trick the sniffer. If those requests aren't accepted by the server at the other end, it should be OK.  However, if they are accepted by the server, bad things can happen. Additionally, you won't be able to audit or detect this case because you won't have Jaeger traces or access logs from the proxy since it thought the request wasn't HTTP.

(We investigated this particular case and ran our results past the Envoy and Istio security vulnerability teams before publishing this blog. While it didn't rise to the level of security issue, we want it to be obvious to our users what the tradeoffs are. While the benefits of protocol sniffing may be worthwhile in many cases, most users will want to avoid protocol sniffing in security-sensitive applications.)

Behavior Change:

  • Without sniffing: Stream is Layer 7 and invalid requests are consistently rejected.
  • With sniffing: Some streams may be classified as Layer 4 and bypass Layer 7 routing or policy.

Recommendation

Protocol sniffing lessens the configuration burden to get started with Istio, but creates uncertainty about behaviors.  Because this uncertainty can be controlled by the client, it can be surprising or potentially hazardous. In production, I'd prefer to tell the proxy everything I know and have the proxy reject everything that doesn't look as expected.  Personally, I like my test environments to look like my prod environments ("Test Like You Fly") so I'm going to also avoid sniffing in test.

I would use protocol sniffing when I first dropped a service mesh into an evaluation scenario, when I'm at the stage of, "Let's kick the tires and see what this thing can tell me about my environment."

For this reason, Aspen Mesh recommends users don't rely on protocol sniffing in production.  All service ports should be declared with a name that specifies the protocol (things like "http-app" or "tcp-custom").  Our users will continue to receive "vet" warnings for service ports that don't comply, so they can be confident that their clusters will behave predictably.


Aspen Mesh - Getting the Most Out of Your Service Mesh

How to Get the Most Out of Your Service Mesh

You’ve been hearing about service mesh. You have an idea of what it does and how it can help you manage your microservices. But what happens once you have one? How do you get as much out of it as you can?

Let’s start with a quick review of what a service mesh is, why you would need one, then move on to how to get the most out of your service mesh.

What's a Service Mesh?

  1. A transparent infrastructure layer that sits between your network and application, helping with communications between your microservices

  2. Could be your next game changing decision

A service mesh is designed to handle a high volume of service-to-service communication using application programming interfaces (APIs). It ensures that communication among containerized application services is fast, reliable and secure. The mesh provides critical capabilities including service discovery, load balancing, encryption, observability, traceability, authentication and authorization, and write-once, run anywhere policy for microservices in your Kubernetes clusters.

Service meshes also address challenges that arise when your application is being consumed by an end user. The first key capability is monitoring the health of services provided to the end user, and then tracing problems with that health quickly to the correct microservice. Next, you'll need to ensure communication is secure and resilient.

When Do You Need a Service Mesh?

We’ve been having lots of discussions with people spread across the microservices, Kubernetes and service mesh adoption curves. And while it’s clear that many enterprise organizations are at least considering microservices, many are still waiting to see best practices emerge before deciding on their own path forward. That means the landscape changes as needs are evolving. 

As an example, more organizations are looking to microservices for brownfield deployments, whereas – even a couple of years ago – almost everyone only considered building microservices architectures for greenfield. This tells us that as microservices technology and tooling continues to evolve, it’s becoming more feasible for non-unicorn companies to effectively and efficiently decompose the monolith into microservices. 

Think about it this way: in the past six months, the top three reasons we’ve heard people say they want to implement service mesh are:

  1. Observability – to better understand the behavior of Kubernetes clusters 
  2. mTLS – to add cluster-wide service encryption
  3. Distributed Tracing – to simplify debugging and speed up root cause analysis

Gauging the current state of the cloud-native infrastructure space, there’s no doubt that there’s still more exploration and evaluation of tools like Kubernetes and Istio. But the gap is definitely closing. Companies are closely watching the leaders in the space to see how they are implementing and what benefits and challenges they are facing. As more organizations successfully adopt these new technologies, it’s becoming obvious that while there’s a skills gap and new complexity that must be accounted for, the outcomes around increased velocity, better resiliency and improved customer experience mandates that many organizations actively map their own path with microservices. This will help to ensure that they are not left behind by the market leaders in their space.

Getting the Most Out of Your Service Mesh

Aspen Mesh - Getting the Most Out of Your Service MeshIn order to really stay ahead of the competition, you need to know best practices about getting the most out of your service mesh, recommendations from industry experts about how to measure your success, and ways to think about how to keep getting even more out of your technology.

But what do you want out of a service mesh? Since you’re reading this, there’s a good chance you’re responsible for making sure that your end users get the most out of your applications. That’s probably why you started down the microservices path in the first place.

If that’s true, then you’ve probably realized that microservices come with their own unique challenges, such as:

  • Increased surface area that can be attacked
  • Polyglot challenges
  • Controlling access for distributed teams developing towards a single application

That’s where a service mesh comes in. Service meshes are great at solving operational challenges and issues when running containers and microservices because they provide a uniform way to secure, connect and monitor microservices. 

TL;DR a good service mesh keeps your company’s services running they way they should, giving you the observability, security and traffic management capabilities you need to effectively manage and control containerized applications so you can focus on adding the most value to your business.

When Service Mesh is a Win/Win

Service mesh is an application that can help entire organizations work together for better outcomes. In other words, service mesh is the ultimate DevOps enabler.

Here are a few highlights of the value a service mesh provides across teams:

  • Observability: take system monitoring a step further by providing observability. Monitoring reports overall system health, while observability focuses on highly granular insights into the behavior of systems along with rich context
  • Security and Decreased Risk: better secure the services inside your network and quickly identify any compromising traffic entering your clusters
  • Operational Control: allow security and platform teams to set the right macro controls to enforce access controls, while allowing developers to make customizations they need to move quickly within defined guardrails
  • Increase Efficiency with a Developer Toolbox: remove the burden of managing infrastructure from the developer and provide developer-friendly features such as distributed tracing and easy canary deploys 

What’s the Secret to Getting the Most Out of Your Service Mesh?

There are a lot of things you can do to get more out of your service mesh. Here are three high level tactics to start with:

  1. Align on service mesh goals with your teams
  2. Choose the service mesh that can be broadly deployed to address your company's needs
  3. Measure your service mesh success over time in order to identify and make improvements

Still looking for more info about this? Check out the eBook: Getting the Most Out of Your Service Mesh.

Complete this form to get your copy of the eBook Getting the Most Out of Your Service Mesh: