On Silly Animals and Gray Codes

I love Information Theory. This is a random rumination on surprise.  

Helm (v2) is a templating engine and release manager for Kubernetes.  Basically it lets you leverage the combined knowledge of experts on how you should configure container software, but still gives you nerd knobs you can tweak as needed. When Helm deploys software, it's called a release. You can name your releases, like ingress-controller-for-prod.  You'll use this name later: "Hey, Helm, how is ingress-controller-for-prod doing?" or "Hey, Helm, delete all the stuff you made for ingress-controller-for-prod."

If you don't name a release, Helm will make up a release name for you. It's a combination of an adjective and an animal:

"Monicker ships with a couple of word lists that were written and approved by a group of giggling school children (and their dad). We built a lighthearted list based on animals and descriptive words (mostly adjectives)."

So if you don't pick a name, Helm will pick one for you. You might get jaunty ferret or gauche octopus. Helm could have decided to pick unique identifiers, say UUIDs, so instead of jaunty ferret you get 9fa485b1-6e8b-47c4-baa1-3923394382a5 or e0c2def3-bc94-44ff-b702-985d4eb38ded. To Helm itself, the UUIDs would be fine. To the humans, though, I argue 9fa485b1-6e8b-47c4-baa1-3923394382a5 is a bad option because our brains aren't good handlers of long strings like 9fa485b1-6e8b-47c4-baa1-3932394382a5; it's hard to say 9fa485b1-6e8b-47e4-baa1-3923394382a5 and you're not even going to notice that I've actually subtly mixed up digits in 9fa485b1-6e8b-47c4-baa1-3923393482a5 through this entire paragraph.  But if I had mixed up jaunty ferret and jumpy ferret you at least stand a chance. This is true even though the bitwise difference between the inputs that generated jaunty ferret and jumpy ferret is actually smaller than my UUID tricks.

Humans are awful at handling arbitrarily long numbers. We can't fake them well. We get dazzled by them. We are miserable at comparing even short numbers, sometimes people die as a result.

So, if you're building identifiers into a system, you should consider if those are going to be seen by humans. And if so, I think you should make those identifiers suitable for humans: distinctive and pronounceable.

I've seen this used elsewhere; Docker does it for container names (but scientists and hackers instead of animals).  Netlify and Github will do it for project names.  LastPass has a "Pronounceable" option and pwgen walks a fine line; they explicitly trade a little entropy to avoid users "simply writ[ing] the password on a piece of paper taped to the monitor..." in the hell that is modern user/password management. I've also worked with a respected support organization that does this for customer issues (and all the humans seemed to be massively more effective IMing/emailing/Wiki-writing/Chatting in the hall about names instead of 10-digit numbers).

Aspen Mesh does this in a few places. The first benefit is some great GIFs. On our team, if Randy asks you to fix something in the "singing clams" object, he'll Slack you a GIF as well. The second benefit is distinctiveness - after you've seen a GIF of singing clams, the likelihood you accidentally delete the boasting aardvark object is basically nil. The likelihood that your dreams are haunted by singing clams is an entirely different concern.

via GIPHY

So I argue that replacing numbers with pronounceable and memorable human-language identifiers is great when we need things to be distinguishable and possible to remember. Humans are too easily tricked by subtle changes in long numbers.

An added bonus that we enjoy is that we bring some of our most meaningful cluster names to life at Aspen Mesh. Our first development cluster, our first production cluster and our first customer cluster all have a special place in our hearts. Naturally, we took those cluster names and made them into Aspen Mesh mascots:

  • jaunty-ferret
  • gauche-octopus
  • jolly-bat

Our cluster names make it easier for us to get development work done, and come with the added bonus of making the office more fun. If you want a set of these awesome cluster animals, leave a comment or tweet us @AspenMesh and we’ll send you a sticker pack. 


Why You Want Idempotency Anyway

We've been talking about how you can use a service mesh to do progressive delivery lately.  Progressive delivery fundamentally is about decoupling software delivery from user activation of said software.  Once decoupled, the user activation portion is under business control. It's early days, but the promise here is that software engineering can build new stuff as fast as they can, and your customer success team (already keeping a finger on the pulse of the users and market) can independently choose when to introduce what new features (and associated risk) to whom.

We've demonstrated how you can use Flagger (a progressive delivery Kubernetes operator) to command a service mesh to do canary deploys.  These help you activate functionality for only a subset of traffic and then progressively increase that subset as long as the new functionality is healthy.  We're also working on an enhancement to Flagger to do traffic mirroring. I think this is really cool because it lets you try out a feature activation without actually exposing any users to the impact.  It's a "pre-stage" to a canary deployment: Send one copy to the original service, one copy to the canary, and check if the canary responds as well as the original.

There's a caveat we bring up when we talk about this, however: Idempotency.  You can only do traffic mirroring if you're OK with duplicating requests, and sending one to the primary and one to the canary.  If your app and infrastructure is OK with duplicating these requests, they are said to be idempotent.

Idempotency

Idempotency is the ability to apply an operation more than once and not change the result.  In math, we'd say that:

f(f(a)) = f(a)

An example of a mathematical function that's idempotent is ABS(), the absolute value.

ABS(-3.7) = 3.7
ABS(ABS(3.7)) = ABS(3.7) = 3.7

We can repeat taking the absolute value of something as many times as we want, it won't change any more after the first time.  Similar things in your math toolbox: CEIL(), FLOOR(), ROUND(). But SQRT() is not in general idempotent. SQRT(16) = 4, SQRT(SQRT(16)) = 2, and so on.

For web apps, idempotent requests are those that can be processed twice without causing something invalid to happen.  So read-only operations like HTTP GETs are always idempotent (as long as they're actually only reads). Some kinds of writes are also idempotent: suppose I have a request that says "Set andrews_timezone to MDT".  If that request gets processed twice, my timezone gets set to MDT twice. That's OK. The first one might have changed it from PDT to MDT, and the second one "changes" it from MDT to MDT, so no change. But in the end, my timezone is MDT and so I'm good.

An example of a not-idempotent request is one that says "Deduct $100 from andrews_account".  If we apply that request twice, then my account will actually have $200 deducted from it and I don't want that.  You remember those e-commerce order pages that say "Don't click reload or you may be billed twice"? They need some idempotency!

This is important for traffic mirroring because we're going to duplicate the request and send one copy to the primary and one to the canary.  While idempotency is great for enabling this traffic mirroring case, I'm here to tell you why it's a great thing to have anyway, even if you're never going to do progressive delivery.

Exactly-Once Delivery and Invading Generals

There's a fundamental tension that emerges if you have distributed systems that communicate over an unreliable channel.  You can never be sure that the other side received your message exactly once.  I'll retell the parable of the Invading Generals as it was told to me the first time I was sure I had designed a system that solved this paradox.

There is a very large invading army that has camped in a valley for the night.  The defenders are split into two smaller armies that surround the invading army; one set of defenders on the eastern edge of the valley and one on the west.  If both defending armies attack simultaneously, they will defeat the invaders. However, if only one attacks, it will be too small; the invaders will defeat it, and then turn to the remaining defenders at their leisure and defeat them as well.

The general for the eastern defenders needs to coordinate with the general for the western defenders for simultaneous attack.  Both defenders can send spies with messages through the valley, as many as they want. Each spy has a 10% chance of being caught by the invaders and killed before his message is delivered, and a 90% chance of successfully delivering the message.  You are the general for the eastern defenders: what message will you send to the western defenders to guarantee you both attack simultaneously and defeat the invaders?

Turns out, there is no guaranteed safe approach.  Let's go through it. First, let's send this message:

"Western General, I will attack at dawn and you must do the same."

- Eastern General

There's a 90% chance that your spy will get through, but there's a 10% chance that he won't, only you will attack and you will lose to the invaders.  The problem statement said you have infinite spies, we must be able to do better!

OK, let's send lots of spies with the same message.  Then our probability of success is 1-0.1^n, where n is the number of spies.  So we can asymptotically approach 100% probability that the other side agrees, but we can never be sure.

How about this message:

"Western General, I am prepared to attack at dawn.  Send me a spy confirming that you have received this message so I know you will also attack.  If I don't receive confirmation, I won't attack because my army will be defeated if I attack alone."

- Eastern General

Now, if you don't receive a spy back from the western general you'll send another, and another, until you get a response.  But.... put yourself in the shoes of the western general. How does the western general know that you'll receive the confirmation spy?  Should the western army attack at dawn? What if the confirmation spy was caught and now only the western army attacks, ensuring defeat?

The western general could send lots of confirmation spies, so there is a high probability that at least one gets through.  But they can't guarantee with 100% probability that one gets through.

The western general could also send this response:

"Eastern General, we have received your spy.  We are also prepared to attack at dawn. We will be defeated if you do not also attack, and I know you won't attack if you don't know that we have received your message.  Please send back a spy confirming that you have received my confirmation or else we will not attack because we will be destroyed."

 

- Western General

A confirmation of a confirmation! (In networking ARQ terms, an ACK-of-an-ACK).  Again, this can reduce probability but cannot provide guarantees: we can keep shifting uncertainty between the Eastern and Western generals but never eliminate it.

Engineering Approaches

Okay, we can't know for sure that our message is delivered exactly once (regardless of service mesh or progressive delivery or any of that), so what are we going to do?  There are a few approaches:

• Retry naturally-idempotent requests

• Uniquefy requests

• Conditional updates

• Others

Retry Naturally-Idempotent Requests

If you have a request that is naturally idempotent, like getting the temperature on a thermostat, the end user can just repeat it if they didn't get the response they want.

Uniqueify Requests

Another approach is to make requests unique at the client, and then have all the other services avoid processing the same unique request twice.  One way to do this is to invent a UUID at the client and then have servers remember all the UUIDs they've already seen. My deduction request would then look like:

This is unique request f41182d1-f4b2-49ec-83cc-f5a8a06882aa.
If you haven't seen this request before, deduct $100 from andrews_account.

Then you can submit this request as many times as you want to the processor, and the processor can check if it's handled "f41182d1-f4b2-49ec-83cc-f5a8a06882aa" before.  There are a few caveats here.

First you have to have a way to generate unique identifiers.  UUIDs are pretty good but theoretically there's an extremely small possibility of UUID collision; practically there's a couple of minor foot-guns to watch out for like generating UUIDs on two VMs or containers that both have fake virtual MAC addresses that match.  You can also have the server make the unique identifier for you (it could be an auto-generated primary key in a database that is guaranteed to be unique).

Second your server has to remember all the UUIDs that you have processed.  Typically you put these in a database (maybe using UUID as a primary key anyway).  If the record of processed UUIDs is different than the action you take when processing, there's still a "risk window": you might commit a UUID and then fail to process it, or you might process it and fail to commit the UUID.  Algorithms like two-phase commit and paxos can help close the risk window.

Conditional Updates

Another approach is to include information in the request about what things looked like when the client sent the request, so that the server can abort the request if something has changed.  This includes the case that the "change" is a duplicate request and we've already processed it.

For instance, maybe my bank ledger looks like this:

Then I would make my request look like:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

If this request gets duplicated, the first will succeed and the second will fail.  After the first:

The duplicated request will fail:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

In this case the server could respond to the first copy with "Success" and the second copy with a soft failure like "Already committed" or just tell the client to read and notice that its update already happened.  MongoDB, AWS Dynamo and others support these kinds of conditional updates.

Others

There are many practical approaches to this problem.  I recommend doing some initial reasoning about idempotency, and then try to shift as much functionality as you can to the database or persistent state layer you're using.  While I gave a quick tour of some of the things involved in idempotency, there are a lot of other tricks like write-ahead journalling, conflict-free replicated data types and others that can enhance reliability.

Conclusion

Traffic mirroring is a great way to exercise canaries in a production environment before exposing them to your users.  Mirroring makes a duplicate of each request and sends one copy to the primary, one copy to the new canary version of your microservice.  This means that you must use mirroring only for idempotent requests: requests that can be applied twice without causing something erroneous to happen.

This caveat probably exists even if you aren't doing traffic mirroring, because networks fail.  The Eastern General and Western General can never really be sure their messages are delivered exactly once, there will always be a case where they may have to retry.  I think you want to build idempotency wherever possible, and then you should use traffic mirroring to test your canary deployments.


Important Security Updates in Aspen Mesh 1.1.13

Aspen Mesh is announcing the release of 1.1.13 which addresses important Istio security updates.  Below are the details of the security fixes taken from Istio 1.1.13 security update.

ISTIO-SECURITY-2019-003: An Envoy user reported publicly an issue (c.f. Envoy Issue 7728) about regular expressions matching that crashes Envoy with very large URIs.

  • CVE-2019-14993: After investigation, the Istio team has found that this issue could be leveraged for a DoS attack in Istio, if users are employing regular expressions in some of the Istio APIs: JWT, VirtualService, HTTPAPISpecBinding, QuotaSpecBinding .

ISTIO-SECURITY-2019-004: Envoy, and subsequently Istio are vulnerable to a series of trivial HTTP/2-based DoS attacks:

  • CVE-2019-9512: HTTP/2 flood using PING frames and queuing of response PING ACK frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9513: HTTP/2 flood using PRIORITY frames that results in excessive CPU usage and starvation of other clients.
  • CVE-2019-9514: HTTP/2 flood using HEADERS frames with invalid HTTP headers and queuing of response RST_STREAM frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9515: HTTP/2 flood using SETTINGS frames and queuing of SETTINGS ACK frames that results in unbounded memory growth (which can lead to out of memory conditions).
  • CVE-2019-9518: HTTP/2 flood using frames with an empty payload that results in excessive CPU usage and starvation of other clients.
  • See this security bulletin for more information

The 1.1.13 binaries are available for download here 

Upgrading procedures of Aspen Mesh deployments installed via Helm (helm install) please visit our Getting Started page. 

 


Why Is Policy Hard?

Aspen Mesh spends a lot of time talking to users about policy, even if we don’t always start out calling it that. A common pattern we see with clients is:

  1. Concept: "Maybe I want this service mesh thing"
  2. Install: "Ok, I've got Aspen Mesh installed, now what?"
  3. Observe: "Ahhh! Now I see how my microservices are communicating.  Hmmm, what's that? That pod shouldn't be talking to that database!"
  4. Act: "Hey mesh, make sure that pod never talks to that database"

The Act phase is interesting, and there’s more to it than might be obvious at first glance. I'll propose that in this blog, we work through some thought experiments to delve into how service mesh can help you act on insights from the mesh.

First, put yourself in the shoes of the developer that just found out their test pod is accidentally talking to the staging database. (Ok, you're working from home today so you don't have to put on shoes; the cat likes sleeping on your shoeless feet better anyways.) You want to control the behavior of a narrow set of software for which you're the expert; you have local scope and focus.

Next, put on the shoes of a person responsible for operating many applications; people we talk to often have titles that include Platform, SRE, Ops, Infra. Each day they’re diving into different applications so being able to rapidly understand applications is key. A consistent way of mapping across applications, datacenters, clouds, etc. is critical. Your goal is to reduce "snowflake architecture" in favor of familiarity to make it easier when you do have to context switch.

Now let's change into the shoes of your org's Compliance Officer. You’re on the line for documenting and proving that your platform is continually meeting compliance standards. You don't want to be the head of the “Department of No”, but what’s most important to you is staying out of the headlines. A great day at work for you is when you've got clarity on what's going on across lots of apps, databases, external partners, every source of data your org touches AND you can make educated tradeoffs to help the business move fast with the right risk profile. You know it’s ridiculous to be involved in every app change, so you need separation-of-concerns.

I'd argue that all of these people have policy concerns. They want to be able to specify their goals at a suitably high level and leave the rote and repetitive portions to an automated system.  The challenging part is there's only one underlying system ("the kubernetes cluster") that has to respond to each of these disparate personas.

So, to me policy is about transforming a bunch of high-level behavioral prescriptions into much lower-level versions through progressive stages. Useful real-world policy systems do this in a way that is transparent and understandable to all users, and minimizes the time humans spend coordinating. Here's an example "day-in-the-life" of a policy:

At the top is the highest level goal: "Devs should test new code without fear". Computers are hopeless to implement this. At the bottom is a rule suitable for a computer like a firewall to implement.

The layers in the middle are where a bad policy framework can really hurt. Some personas (the hypothetical Devs) want to instantly jump to the bottom - they're the "4.3.2.1" in the above example. Other personas (the hypothetical Compliance Officer) is way up top, going down a few layers but not getting to the bottom on a day-to-day basis.

I think the best policy frameworks help each persona:

  • Quickly find the details for the layer they care about right now.
  • Help them understand where did this come from? (connect to higher layers)
  • Help them understand is this doing what I want? (trace to lower layers)
  • Know where do I go to change this? (edit/create policy)

As an example, let's look at iptables, one of the firewalling/packet mangling frameworks for Linux.  This is at that bottom layer in my example stack - very low-level packet processing that I might look at if I'm an app developer and my app's traffic isn't doing what I'd expect.  Here's an example dump:


root@kafka-0:/# iptables -n -L -v --line-numbers -t nat
Chain PREROUTING (policy ACCEPT 594K packets, 36M bytes)
num   pkts bytes target     prot opt in out   source destination
1     594K 36M ISTIO_INBOUND  tcp -- * * 0.0.0.0/0            0.0.0.0/0

Chain INPUT (policy ACCEPT 594K packets, 36M bytes)
num   pkts bytes target     prot opt in out   source destination

Chain OUTPUT (policy ACCEPT 125K packets, 7724K bytes)
num   pkts bytes target     prot opt in out   source destination
1      12M 715M ISTIO_OUTPUT  tcp -- * * 0.0.0.0/0            0.0.0.0/0

Chain POSTROUTING (policy ACCEPT 12M packets, 715M bytes)
num   pkts bytes target     prot opt in out   source destination

Chain ISTIO_INBOUND (1 references)
num   pkts bytes target     prot opt in out   source destination
1        0 0 RETURN     tcp -- * *     0.0.0.0/0 0.0.0.0/0            tcp dpt:22
2     594K 36M RETURN     tcp -- * *   0.0.0.0/0 0.0.0.0/0            tcp dpt:15020
3        2 120 ISTIO_IN_REDIRECT  tcp -- * * 0.0.0.0/0            0.0.0.0/0

Chain ISTIO_IN_REDIRECT (1 references)
num   pkts bytes target     prot opt in out   source destination
1        2 120 REDIRECT   tcp -- * *     0.0.0.0/0 0.0.0.0/0            redir ports 15006

Chain ISTIO_OUTPUT (1 references)
num   pkts bytes target     prot opt in out   source destination
1      12M 708M ISTIO_REDIRECT  all -- * lo 0.0.0.0/0           !127.0.0.1
2        7 420 RETURN     all -- * *     0.0.0.0/0 0.0.0.0/0            owner UID match 1337
3        0 0 RETURN     all -- * *     0.0.0.0/0 0.0.0.0/0            owner GID match 1337
4     119K 7122K RETURN     all -- * *   0.0.0.0/0 127.0.0.1
5        4 240 ISTIO_REDIRECT  all -- * * 0.0.0.0/0            0.0.0.0/0

Chain ISTIO_REDIRECT (2 references)
num   pkts bytes target     prot opt in out   source destination
1      12M 708M REDIRECT   tcp -- * *   0.0.0.0/0 0.0.0.0/0            redir ports 15001


This allows me to quickly understand a lot of details about what is happening at this layer. Each rule specification is on the right-hand side and is relatively intelligible to the personas that operate at this layer. On the left, I get "pkts" and "bytes" - this is a count of how many packets have triggered each rule, helping me answer "Is this doing what I want it to?". There's even more information here if I'm really struggling: I can log the individual packets that are triggering a rule, or mark them in a way that I can capture them with tcpdump.  

Finally, furthest on the left in the "num" column is a line number, which is necessary if I want to modify or delete rules or add new ones before/after; this is a little bit of help for "Where do I go to change this?". I say a little bit because in most systems that I'm familiar with, including the one I grabbed that dump from, iptables rules are produced by some program or higher-level system; they're not written by a human. So if I just added a rule, it would only apply until that higher-level system intervened and changed the rules (in my case, until a new Pod was created, which can happen at any time). I need help navigating up a few layers to find the right place to effect the change.

iptables lets you organize groups of rules into your own chains, in this case the name of the chain (ISTIO_***) is a hint that Istio produced this and so I've got a hint on what higher layer to examine.

For a much different example, how about the Kubernetes CI Robot (from Prow)? If you've ever made a PR to Kubernetes or many other CNCF projects, you likely interacted with this robot. It's an implementer of policy; in this case the policies around changing source code for Kubernetes.  One of the policies it manages is compliance with the Contributor's License Agreement; contributors agree to grant some Intellectual Property rights surrounding their contributions. If k8s-ci-robot can't confirm that everything is alright, it will add a comment to your PR:

This is much different than firewall policy, but I say it's still policy and I think the same principles apply. Let's explore. If you had to diagram the policy around this, it would start at the top with the legal principle that Kubernetes wants to make sure all the software under its umbrella has free and clear IP terms. Stepping down a layer, the Kubernetes project decided to satisfy that requirement by requiring a CLA for any contributions. So on until we get to the bottom layer, the code that implements the CLA check.

As an aside, the code that implements the CLA check is actually split into two halves: first there's a CI job that actually checks the commits in the PR against a database of signed CLAs, and then there's code that takes the result of that job and posts helpful information for users to resolve any issues. That's not visible or important at that top layer of abstraction (the CNCF lawyers shouldn't care).

This policy structure is easy to navigate. If your CLA check fails, the comment from the robot has great links. If you're an individual contributor you can likely skip up a layer, sign the CLA and move on. If you're contributing on behalf of a company, the links will take you to the document you need to send to your company's lawyers so they can sign on behalf of the company.

So those are two examples of policy. You probably encounter many other ones every day from corporate travel policy to policies (written, unwritten or communicated via email missives) around dirty dishes.

It's easy to focus on the technical capabilities of the lowest levels of your system. But I'd recommend that you don't lose focus on the operability of your system. It’s important that it be transparent and easy to understand. Both the iptables and k8s-ci-robot are transparent. The k8s-ci-robot has an additional feature: it knows you're probably wondering "Where did this come from?" and it answers that question for you. This helps you and your organization navigate the layers of policy. 

When implementing service mesh to add observability, resilience and security to your Kubernetes clusters, it’s important to consider how to set up policy in a way that can be navigated by your entire team. With that end in mind, Aspen Mesh is building a policy framework for Istio that makes it easy to implement policy and understand how it will affect application behavior.

Did you like this blog? Subscribe to get email updates when new Aspen Mesh blogs go live.


Securing Containerized Applications With Service Mesh

The self-contained, ephemeral nature of microservices comes with some serious upside, but keeping track of every single one is a challenge, especially when trying to figure out how the rest are affected when a single microservice goes down. The end result is that if you’re operating or developing a microservices architecture, there’s a good chance part of your days are spent wondering what your services are up to.

With the adoption of microservices, problems also emerge due to the sheer number of services that exist in large systems. Problems like security, load balancing, monitoring and rate limiting that had to be solved once for a monolith, now have to be handled separately for each service.

The technology aimed at addressing these microservice challenges has been  rapidly evolving:

  1. Containers facilitate the shift from monolith to microservices by enabling independence between applications and infrastructure.
  2. Container orchestration tools solve microservices build and deploy issues, but leave many unsolved runtime challenges.
  3. Service mesh addresses runtime issues including service discovery, load balancing, routing and observability.

Securing services with a service mesh

A service mesh provides an advanced toolbox that lets users add security, stability and resiliency to containerized applications. One of the more common applications of a service mesh is bolstering cluster security. There are 3 distinct capabilities provided by the mesh that enable platform owners to create a more secure architecture.

Traffic Encryption  

As a platform operator, I need to provide encryption between services in the mesh. I want to leverage mTLS to encrypt traffic between services. I want the mesh to automatically encrypt and decrypt requests and responses, so I can remove that burden from my application developers. I also want it to improve performance by prioritizing the reuse of existing connections, reducing the need for the computationally expensive creation of new ones. I also want to be able to understand and enforce how services are communicating and prove it cryptographically.

Security at the Edge

As a platform operator, I want Aspen Mesh to add a layer of security at the perimeter of my clusters so I can monitor and address compromising traffic as it enters the mesh. I can use the built in power of Kubernetes as an ingress controller to add security with ingress rules such as whitelisting and blacklisting. I can also apply service mesh route rules to manage compromising traffic at the edge. I also want control over egress so I can dictate that our network traffic does not go places it shouldn't (blacklist by default and only talk to what you whitelist).

Role Based Access Control (RBAC)

As the platform operator, It’s important that I am able to provide the level of least privilege so the developers on my platform only have access to what they need, and nothing more. I want to enable controls so app developers can write policy for their apps and only their apps so that they can move quickly without impacting other teams. I want to use the same RBAC framework that I am familiar with to provide fine-grained RBAC within my service mesh.

How a service mesh adds security

You’re probably thinking to yourself, traffic encryption and fine-grained RBAC sound great, but how does a service mesh actually get me to them? Service meshes that leverage a sidecar approach are uniquely positioned intercept and encrypt data. A sidecar proxy is a prime insertion point to ensure that every service in a cluster is secured, and being monitored in real-time. Let’s explore some details around why sidecars are a great place for security.

Sidecar is a great place for security

Securing applications and infrastructure has always been daunting, in part because the adage really is true: you are only as secure as your weakest link.  Microservices are an opportunity to improve your security posture but can also cut the other way, presenting challenges around consistency.  For example, the best organizations use the principle of least privilege: an app should only have the minimum amount of permissions and privilege it needs to get its job done.  That's easier to apply where a small, single-purpose microservice has clear and narrowly-scoped API contracts.  But there's a risk that as application count increases (lots of smaller apps), this principle can be unevenly applied. Microservices, when managed properly, increase feature velocity and enable security teams to fulfill their charter without becoming the Department of No.

There's tension: Move fast, but don't let security coverage slip through the cracks.  Prefer many smaller things to one big monolith, but secure each and every one.  Let each team pick the language of their choice, but protect them with a consistent security policy.  Encourage app teams to debug, observe and maintain their own apps but encrypt all service-to-service communication.

A sidecar is a great way to balance these tensions with an architecturally sound security posture.  Sidecar-based service meshes like Istio and Linkerd 2.0 put their datapath functionality into a separate container and then situate that container as close to the application they are protecting as possible.  In Kubernetes, the sidecar container and the application container live in the same Kubernetes Pod, so the communication path between sidecar and app is protected inside the pod's network namespace; by default it isn't visible to the host or other network namespaces on the system.  The app, the sidecar and the operating system kernel are involved in communication over this path.  Compared to putting the security functionality in a library, using a sidecar adds the surface area of kernel loopback networking inside of a namespace, instead of just kernel memory management.  This is additional surface area, but not much.

The major drawbacks of library approaches are consistency and sprawl in polyglot environments.  If you have a few different languages or application frameworks and take the library approach, you have to secure each one.  This is not impossible, but it's a lot of work.  For each different language or framework, you get or choose a TLS implementation (perhaps choosing between OpenSSL and BoringSSL).  You need a configuration layer to load certificates and keys from somewhere and safely pass them down to the TLS implementation.  You need to reload these certs and rotate them.  You need to evaluate "information leakage" paths: does your config parser log errors in plaintext (so it by default might print the TLS key to the logs)?  Is it OK for app core dumps to contain these keys?  How often does your organization require re-keying on a connection?  By bytes or time or both?  Minimum cipher strength?  When a CVE in OpenSSL comes out, what apps are using that version and need updating?  Who on each app team is responsible for updating OpenSSL, and how quickly can they do it?  How many apps have a certificate chain built into them for consuming public websites even if they are internal-only?  How many Dockerfiles will you need to update the next time a public signing authority has to revoke one?  slowloris?

Your organization can do all this work.  In fact, parts probably already have - above is our list of painful app security experiences but you probably have your own additions.  It is a lot of cross-organizational effort and process to get it right.  And you have to get it right everywhere, or your weakest link will be exploited.  Now with microservices, you have even more places to get it right.  Instead, our advice is to focus on getting it right once in the sidecar, and then distributing the sidecar everywhere, and get back to adding business value instead of duplicating effort.

There are some interesting developments on the horizon like the use of kernel TLS to defer bulk and some asymmetric crypto operations to the kernel.  That's great:  Implementations should change and evolve.  The first step is providing a good abstraction so that apps can delegate to lower layers. Once that's solid, it's straightforward to move functionality from one layer to the next as needed by use case, because you don't perturb the app any more.  As precedent, consider TCP Segmentation Offload, which lets the network card manage splitting app data into the correct size for each individual packet.  This task isn't impossible for an app to do, but it turns out to be wasted effort.  By deferring TCP segmentation to the kernel, it left the realm of the app.  Then, kernels, network drivers, and network cards were free to focus on the interoperability and semantics required to perform TCP segmentation at the right place.  That's our position for this higher-level service-to-service communication security: move it outside of the app to the sidecar, and then let sidecars, platforms, kernels and networking hardware iterate.

Envoy Is a Great Sidecar

We use Envoy as our sidecar because it's lightweight, has some great features and good API-based configurability.  Here are some of our favorite parts about Envoy:

  • Configurable TLS Parameters: Envoy exposes all the TLS configuration points you'd expect (cipher strength, protocol versions, curves).  The advantage to using Envoy is that they're configured the same way for every app using the sidecar.
  • Mutual TLS: Typically TLS is used to authenticate the server to the client, and to encrypt communication.  What's missing is authenticating the client to the server - if you do this, then the server knows what is talking to it.  Envoy supports this bi-directional authentication out of the box, which can easily be incorporated into a SPIFFE system.  In today's complex and cloud datacenter, you're better off if you trust things based on cryptographic proof of what they are, instead of network perimeter protection of where they called from.
  • BoringSSL: This fork of OpenSSL removed huge amounts of code like implementations of obsolete ciphers and cleaned up lots of vestigial implementation details that had repeatedly been the source of security vulnerabilities.  It's a good default choice if you don't need any OpenSSL-specific functionality because it's easier to get right.
  • Security Audit: A security audit can't prove the absence of vulnerabilities but it can catch mistakes that demonstrate either architectural weaknesses or implementation sloppiness.  Envoy's security audit did find issues but in our opinion indicated a high level of security health.
  • Fuzzed and Bountied: Envoy is continuously fuzzed (exposed to malformed input to see if it crashes) and covered by Google's Patch Reward security bug bounty program.
  • Good API Granularity: API-based configuration doesn't mean "just serialize/deserialize your internal state and go."  Careful APIs thoughtfully map to the "personas" of what's operating them (even if those personas are other programs).  Envoy's xDS APIs in our experience partition routing behavior from cluster membership from secrets.  This makes it easy to make well-partitioned controllers.  A knock-on benefit is that it is easy in our experience to debug and test Envoy because config constructs usually map pretty clearly to code constructs.
  • No garbage collector: There are great languages with automatic memory management like Go that we use every day.  But we find languages like C++ and Rust provide predictable and optimizable tail latency.
  • Native Extensibility via Filters: Envoy has layer 4 and layer 7 extension points via filters that are written in C++ and linked into Envoy.
  • Scripting Extensibility via Lua: You can write Lua scripts as extension points as well.  This is very convenient for rapid prototyping and debugging.

One of these benefits deserves an even deeper dive in a security-oriented discussion.  The API granularity of Envoy is based on a scheme called "xDS" which we think of as follows:  Logically split the Envoy config API based on the user of that API.  The user in this case is almost always some other program (not a human), for instance a Service Mesh control plane element.

For instance, in xDS listeners ("How should I get requests from users?") are separated from clusters ("What pods or servers are available to handle requests to the shoppingcart service?").  The "x" in "xDS" is replaced with whatever functionality is implemented ("LDS" for listener discovery service).  Our favorite security-related partitioning is that the Secret Discovery Service can be used for propagating secrets to the sidecars independent of the other xDS APIs.

Because SDS is separate, the control plane can implement the Principle of Least Privilege: nothing outside of SDS needs to handle or have access to any private key material.

Mutual TLS is a great enhancement to your security posture in a microservices environment.  We see mutual TLS adoption as gradual - almost any real-world app will have some containerized microservices ready to join the service mesh and mTLS on day one.  But practically speaking, many of these will depend on mesh-external services, containerized or not.  It is possible in most cases to integrate these services into the same trust domain as the service mesh, and oftentimes these components can even participate in client TLS authentication so you get true mutual TLS.

In our experience, this happens by gradually expanding the "circle" of things protected with mutual TLS.  First, stateless containerized business logic, next in-cluster third party services, finally external state stores like bare metal databases.  That's why we focus on making the state of mTLS easy to understand in Aspen Mesh, and provide assistants to help you detect configuration mishaps.

What lives outside the sidecar?

You need a control plane to configure all of these sidecars.  In some simple cases it may be tempting to do this with some CI integration to generate configs plus DNS-based discovery.  This is viable but it's hard to do rapid certificate rotation.  Also, it leaves out more dynamic techniques like canaries, progressive delivery and A/B testing.  For this reason, we think most real-world applications will include an online control plane that should:

  • Disseminate configuration to each of the sidecars with a scalable approach.
  • Rotate sidecar certificates rapidly to reduce the value to an attacker of a one-time exploit of an application.
  • Collect metadata on what is communicating with what.

A good security posture means you should be automating some work on top of the control plane. We think these things are important (and built them into Aspen Mesh):

  • Organizing information to help humans narrow in on problems quickly.
  • Warning on potential misconfigurations.
  • Alerting when unhealthy communication is observed.
  • Inspect the firehose of metadata for surprises - these patterns could be application bugs or security issues or both.

If you’re considering or going down the Kubernetes path, you should be thinking about the unique security challenges that comes with microservices running in a Kubernetes cluster. Kubernetes solves many of these, but there are some critical runtime issues that a service mesh can make easier and more secure. If you would like to talk about how the Aspen Mesh platform and team can address your specific security challenge, feel free to find some time to chat with us.


To Multicluster, or Not to Multicluster: Solving Kubernetes Multicluster Challenges with a Service Mesh

If you are going to be running multiple clusters for dev and organizational reasons, it’s important to understand your requirements and decide whether you want to connect these in a multicluster environment and, if so, to understand various approaches and associated tradeoffs with each option.

Kubernetes has become the container orchestration standard, and many organizations are currently running multiples clusters. But while communication issues within clusters are largely solved, communication across clusters is still a major challenge for most organizations.

Service mesh helps to address multicluster challenges. Start by identifying what you want, then shift to how to get it. We recommend understanding your specific communication use case, identifying your goals, then creating an implementation plan.

Multicluster offers a number of benefits:

  • Single pane of glass
  • Unified trust domain
  • Independent fault domains
  • Intercluster traffic
  • Heterogenous/non-flat network

Which can be achieved with various approaches:

  • Independent clusters
  • Common management
  • Cluster-aware service routing through gateways
  • Flat network
  • Split-horizon Endpoints Discovery Service (EDS)

If you have decided to multicluster, your next move is deciding the best implementation method and approach for your organization. A service mesh like Istio can help, and when used properly can make multicluster communication painless.

Read the full article here on InfoQ’s site.

 


Using Kubernetes RBAC to Control Global Configuration In Istio

Why Configuration Matters

If I'm going to get an error for my code, I like to get an error as soon as possible.  Unit test failures are better than integration test failures. I prefer compiler errors to unit test failures - that's what makes TypeScript great.  Going even further, a syntax highlighter is a very proximate feedback of errors - if that keyword doesn't turn green, I can fix the spelling with almost no conscious burden.

Shifting from coding to configuration, I like narrow configuration systems that make it easy to specify a valid configuration and tell me as quickly as possible when I've done something wrong.  In fact, my dream configuration specification wouldn't allow me to specify something invalid. Remember Newspeak from 1984?  A language so narrow that expressing ungood thoughts become impossible.  Apparently, I like my programming and configuration languages to be dystopic and Orwellian.

If you can't further narrow the language of your configuration without giving away core functionality, the next best option is to narrow the scope of configuration.  I think about this in reverse - if I am observing some behavior from the system and say to myself, "Huh, that's weird", how much configuration do I have to look at before I understand?  It's great if this is a small and expanding ring - look at config that's very local to this particular object, then the next layer up (in Kubernetes, maybe the rest of the namespace), and so on to what is hopefully a very small set of global config.  Ever tried to debug a program with global variables all over the place? Not fun.

My three principles of ideal config:

  1. Narrow Like Newspeak, don't allow me to even think of invalid configuration.
  2. Scope Only let me affect a small set of things associated with my role.
  3. Time Tell me as early as possible if it's broken.

Declarative config is readily narrow.  The core philosophy of declarative config is saying what you want, not how you get it.  For example, "we'll meet at the Denver Zoo at noon" is declarative. If instead I specify driving directions to the Denver Zoo, I'm taking a much more imperative approach.  What if you want to bike there? What if there is road construction and a detour is required? The value of declarative config is that if we focus on what we want, instead of how to get it, it's easier for me to bring my controller (my car's GPS) and you to bring yours (Google Maps in the Bike setting).

On the other hand, a big part of configuration is pulling together a bunch of disparate pieces of information together at the last moment, from a bunch of different roles (think humans, other configuration systems and controllers) just before the system actually starts running.  Some amount of flexibility is required here.

Does Cloud Native Get Us To Better Configuration?

I think a key reason for the popularity of Kubernetes is that it has a great syntax for specifying what a healthy, running microservice looks like.  Its syntax is powerful enough in all the right places to be practical for infrastructure.

Service meshes like Istio robustly connect all the microservices running in your cluster.  They can adaptively route L7 traffic, provide end-to-end mTLS based encryption, and provide circuit breaking and fault injection.  The long feature list is great, but it's not surprising that the result is a somewhat complex set of configuration resources. It's a natural result of the need for powerful syntax to support disparate use cases coupled with rapid development.

Enabling Fine-grained RBAC with Traffic Claim Enforcer

At Aspen Mesh, we found users (including ourselves) spending too much time understanding misconfiguration.  The first way we addressed that problem was with Istio Vet, which is designed to warn you of probably incorrect or incomplete config, and provide guidance to fix it.  Sometimes we know enough that we can prevent the misconfiguration by refusing to allow it in the first place.  For some Istio config resources, we do that using a solution we call Traffic Claim Enforcer.

There are four Istio configuration resources that have global implications: VirtualService, Gateway, ServiceEntry and DestinationRule.  Whenever you create one of these resources, you create it in a particular namespace. They can affect how traffic flows through the service mesh to any target they specify, even if that target isn't in the current namespace.  This surfaces a scope anti-pattern - if I'm observing weird behavior for some service, I have to examine potentially all DestinationRules in the entire Kubernetes cluster to understand why.

That might work in the lab, but we found it to be a serious problem for applications running in production.  Not only is it hard to understand the current config state of the system, it's also easy to break. It’s important to have guardrails that make it so the worst thing I can mess up when deploying my tiny microservice is my tiny microservice.  I don't want the power to mess up anything else, thank you very much. I don't want sudo. My platform lead really doesn't want me to have sudo.

Traffic Claim Enforcer is an admission webhook that waits for a user to configure one of those resources with global implications, and before allowing will check:

  1. Does the resource have a narrow scope that affects only local things?
  2. Is there a TrafficClaim that grants the resource the broader scope requested?

A TrafficClaim is a new Kubernetes custom resource we defined that exists solely to narrow and define the scope of resources in a namespace.  Here are some examples:

kind: TrafficClaim
apiVersion: networking.aspenmesh.io/v1alpha3
metadata:
  name: allow-public
  namespace: cluster-public
claims:
# Anything on www.example.com
- hosts: [ "www.example.com" ]

# Only specific paths on foo.com, bar.com
- hosts: [ "foo.com", "bar.com" ]
  ports: [ 80, 443, 8080 ]
  http:
    paths:
      exact: [ "/admin/login" ]
      prefix: [ "/products" ]

# An external service controlled by ServiceEntries
- hosts: [ "my.external.com" ]
  ports: [ 80, 443, 8080, 8443 ]

TrafficClaims are controlled by Kubernetes Role-Based Access Control (RBAC).  Generally, the same roles or people that create namespaces and set up projects would also create TrafficClaims for those namespaces that need power to define service mesh traffic policy outside of their namespace scope.  Rule 1 about local scope above can be explained as "every namespace has an implied TrafficClaim for namespace-local traffic policy", to avoid requiring a boilerplate TrafficClaim.

A pattern we use is to put global config into a namespace like "istio-public" - that's the only place that needs TrafficClaims for things like public DNS names.  Or you might have a couple of namespaces like "istio-public-prod" and "istio-public-dev" or similar. It’s up to you.

Traffic Claim Enforcer does not prevent you from thinking of invalid config, but it does help to limit scope. If I'm trying to understand what happens when traffic goes to my microservice, I no longer have to examine every DestinationRule in the system.  I only have to examine the ones in my namespace, and maybe some others that have special TrafficClaims (and hopefully keep that list small).

Traffic Claim Enforcer also provides an early failure for config problems.  Without it, it is easy to create conflicting DestinationRules even in separate namespaces. This is a problem that Istio-Vet will tell you about, but cannot fix - it doesn't know which one should have priority. If you define TrafficClaims, then Traffic Claim Enforcer can prevent you from configuring it at all.

Hat tip to my colleague, Brian Marshall, who developed the initial public spec for TrafficClaims.  The Istio community is undertaking a great deal of work to scope/manage config aimed at improving system scalability.  We made Traffic Claim Enforcer with a focus on scoping to improve the human config experience as it was a need expressed by several of our users.  We're optimistic that the way Traffic Claim Enforcer helps with human concerns will complement the system scalability side of things.

If you want to give Traffic Claim Enforcer a spin, it's included as part of Aspen Mesh.  By default it doesn't enforce anything, so out-of-the-box it is compatible with Istio. You can turn it on globally or on a namespace-by-namespace basis.

Click play below to check it out in action!

https://youtu.be/47HzynDsD8w

 


Technology

Why You Should Care About Istio Gateways

If you’re breaking apart the monolith, one of the huge advantages to using Istio to manage your microservices is that it enables a configuration that leverages an ingress model that is similar to traditional load balancers and application delivery controllers. In the load balancers world, Virtual IPs and Virtual Servers have long been used as concepts that enable operators to configure ingress traffic in a flexible and scalable manner (Lori Macvittie has some relevant thoughts on this).

In Istio Gateways control the exposure of services at the edge of the mesh. Gateways allow operators to specify L4-L6 settings like port and TLS settings. For L7 settings of the Ingress traffic Istio allows you to tie Gateways to VirtualServices. This separation makes it easy to manage traffic flow into the mesh in much the same way you would tie Virtual IPs to Virtual Servers in traditional load balancers. This enables users of legacy technologies to migrate to microservices in a seamless way. It’s a natural progression for teams that are used to monoliths and edge load balancers rather than an entirely new way to think about networking configuration.

One important thing to note is that routing traffic within a service mesh and getting external traffic into the mesh are a bit different. Within the mesh, you specify exceptions from normal traffic since Istio by default (compatibility with Kubernetes) allows everything to talk to everything once inside the mesh. It’s necessary to add a policy if you don’t want certain services communicating. Getting traffic into the mesh is a reverse proxy situation (similar to traditional load balancers) where you have to specify exactly what you want allowed in.

Earlier version of Istio leveraged Kubernetes Ingress resource, but Istio v1alpha3 APIs leverage Gateways for richer functionality as Kubernetes Ingress has proven insufficient for Istio applications. Kubernetes Ingress API merges specification for L4-6 and L7, which makes it difficult for different teams in organizations with separate trust domains (like SecOps and NetOps) to own Ingress traffic management. Additionally, the Ingress API is less expressive than the routing capability that Istio provides with Envoy. The only way to do advanced routing in Kubernetes Ingress API is to add annotations for different ingress controllers. Separate concerns and trust domains within an organization warrant the need for a more capable way to manage ingress, which is provided by Istio Gateways and VirtualServices.

Once the traffic is into the mesh, it’s good to be able to provide a separation of concerns for VirtualServices so different teams can manage traffic routing for their services. L4-L6 specifications are generally something that SecOps or NetOps might be concerned with. L7 specifications are of most concern to cluster operators or application owners. So it’s critical to have the right separation of concerns. As we believe in the power of team based responsibilities, we think this is an important capability. And since we believe in the power of Istio, we are submitting an RFE in the Istio community which we hope will help enable ownership semantics for traffic management within the mesh.

We’re excited Istio has hit 1.0 and are excited to keep contributing to the project and community as it continues to evolve.

Originally published at https://thenewstack.io/why-you-should-care-about-istio-gateways/

Service Mesh Architectures

If you are building your software and teams around microservices, you’re looking for ways to iterate faster and scale flexibly. A service mesh can help you do that while maintaining (or enhancing) visibility and control. In this blog, I’ll talk about what’s actually in a Service Mesh and what considerations you might want to make when choosing and deploying one.

So, what is a service mesh? How is it different from what’s already in your stack? A service mesh is a communication layer that rides on top of request/response unlocking some patterns essential for healthy microservices. A few of my favorites:

  • Zero-trust security that doesn’t assume a trusted perimeter
  • Tracing that shows you how and why every microservice talked to another microservice
  • Fault injection and tolerance that lets you experimentally verify the resilience of your application
  • Advanced routing that lets you do things like A/B testing, rapid versioning and deployment and request shadowing

Why a new term?

Looking at that list, you may think “I can do all of that without a Service Mesh”, and you’re correct. The same logic applies to sliding window protocols or request framing. But once there’s an emerging standard that does what you want, it’s more efficient to rely on that layer instead of implementing it yourself. Service Mesh is that emerging layer for microservices patterns.

Service mesh is still nascent enough that codified standards have yet to emerge, but there is enough experience that some best practices are beginning to become clear. As the the bleeding-edge leaders develop their own approaches, it is often useful to compare notes and distill best practices. We’ve seen Kubernetes emerge as the standard way to run containers for production web applications. My favorite standards are emergent rather than forced: It’s definitely a fine art to be neither too early nor too late to agree on common APIs, protocols and concepts.

Think about the history of computer networking. After the innovation of best-effort packet-switched networks, we found out that many of us were creating virtual circuits over them - using handshaking, retransmission and internetworking to turn a pile of packets into an ordered stream of bytes. For the sake of interoperability and simplicity, a “best practice” stream-over-packets emerged: TCP (the Introduction of RFC675 does a good job of explaining what it layers on top of). There are alternatives - I’ve used the Licklider Transmission Protocol in space networks where distributed congestion control is neither necessary nor efficient. Your browser might already be using QUIC. Standardizing on TCP, however, freed a generation of programmers from fiddling with implementations of sliding windows, retries, and congestion collapse (well, except for those packetheads that implemented it).

Next, we found a lot of request/response protocols running on top of TCP. Many of these eventually migrated to HTTP (or sequels like HTTP/2 or gRPC). If you can factor your communication into “method, metadata, body”, you should be looking at an HTTP-like protocol to manage framing, separate metadata from body, and address head-of-line blocking. This extends beyond just browser apps - databases like Mongo provide HTTP interfaces because the ubiquity of HTTP unlocks a huge amount of tooling and developer knowledge.

You can think about service mesh as being the lexicon, API and implementation around the next tier of communication patterns for microservices.

OK, so where does that layer live? You have a couple of choices:

  • In a Library that your microservices applications import and use.
  • In a Node Agent or daemon that services all of the containers on a particular node/machine.
  • In a Sidecar container that runs alongside your application container.

Library

The library approach is the original. It is simple and straightforward. In this case, each microservice application includes library code that implements service mesh features. Libraries like Hystrix and Ribbon would be examples of this approach.

This works well for apps that are exclusively written in one language by the teams that run them (so that it’s easy to insert the libraries). The library approach also doesn’t require much cooperation from the underlying infrastructure - the container runner (like Kubernetes) doesn’t need to be aware that you’re running a Hystrix-enhanced app.

There is some work on multilanguage libraries (reimplementations of the same concepts). The challenge here is the complexity and effort involved in replicating the same behavior over and over again.

We see very limited adoption of the library model in our user base because most of our users are running applications written in many different languages (polyglot), and are also running at least a few applications that aren’t written by them so injecting libraries isn’t feasible.

This model has an advantage in work accounting: the code performing work on behalf of the microservice is actually running in that microservice. The trust boundary is also small - you only have to trust calling a library in your own process, not necessarily a remote service somewhere out over the network. That code only has as many privileges as the one microservice it is performing work on behalf of. That work is also performed in the context of the microservice, so it’s easy to fairly allocate resources like CPU time or memory for that work - the OS probably does it for you.

Node Agent

The node agent model is the next alternative. In this architecture, there’s a separate agent (often a userspace process) running on every node, servicing a heterogenous mix of workloads. For purposes of our comparison, it’s the opposite of the library model: it doesn’t care about the language of your application but it serves many different microservice tenants.

Linkerd’s recommended deployment in Kubernetes works like this. As do F5’s Application Service Proxy (ASP) and the Kubernetes default kube-proxy.

Since you need one node agent on every node, this deployment requires some cooperation from the infrastructure - this model doesn’t work without a bit of coordination. By analogy, most applications can’t just choose their own TCP stack, guess an ephemeral port number, and send or receive TCP packets directly - they delegate that to the infrastructure (operating system).

Instead of good work accounting, this model emphasizes work resource sharing - if a node agent allocates some memory to buffer data for my microservice, it might turn around and use that buffer for data for your service in a few seconds. This can be very efficient, but there’s an avenue for abuse. If my microservice asks for all the buffer space, the node agent needs to make sure it gives your microservice a shot at buffer space first. You need a bit more code to manage this for each shared resource.

Another work resource that benefits from sharing is configuration information. It’s cheaper to distribute one copy of the configuration to each node, than to distribute one copy of the configuration to each pod on each node.

A lot of functionality that containerized microservices rely on are provided by a Node Agent or something topologically equivalent. Think about kubelet initializing your pod, your favorite CNI daemon like flanneld, or stretching your brain a bit, even the operating system kernel itself as following this node agent model.

Sidecar

Sidecar is the new kid on the block. This is the model used by Istio with Envoy. Conduit also uses a sidecar approach. In Sidecar deployments, you have one adjacent container deployed for every application container. For a service mesh, the sidecar handles all the network traffic in and out of the application container.

This approach is in between the library and node agent approaches for many of the tradeoffs I discussed so far. For instance, you can deploy a sidecar service mesh without having to run a new agent on every node (so you don’t need infrastructure-wide cooperation to deploy that shared agent), but you’ll be running multiple copies of an identical sidecar. Another take on this: I can install one service mesh for a group of microservices, and you could install a different one, and (with some implementation-specific caveats) we don’t have to coordinate. This is powerful in the early days of service mesh, where you and I might share the same Kubernetes cluster but have different goals, require different feature sets, or have different tolerances for bleeding-edge vs. tried-and-true.service mesh architecture

Sidecar is advantageous for work accounting, especially in some security-related aspects. Here’s an example: suppose I’m using a service mesh to provide zero-trust style security. I want the service mesh to verify both ends (client and server) of a connection cryptographically. Let’s first consider using a node agent: When my pod wants to be the client of another server pod, the node agent is going to authenticate on behalf of my pod. The node agent is also serving other pods, so it must be careful that another pod cannot trick it into authenticating on my pod’s behalf. If we think about the sidecar case, my pod’s sidecar does not serve other pods. We can follow the principle of least privilege and give it the bare minimum it needs for the one pod it is serving in terms of authentication keys, memory and network capabilities.

So, from the outside the sidecar has the same privileges as the app it is attached to. On the other hand, the sidecar needs to intervene between the app and the outside. This creates some security tension: you want the sidecar to have as little privilege as possible, but you need to give it enough privilege to control traffic to/from the app. For example, in Istio, the init container responsible for setting up the sidecar has the NET_ADMIN permission currently (to set up the iptables rules necessary). That initialization uses good security practices - it does the minimum amount necessary and then goes away, but everything with NET_ADMIN represents attack surface. (Good news - smart people are working on enhancing thisfurther).

Once the sidecar is attached to the app, it’s very proximate from a security perspective. Not as close as a function call in your process (like library) but usually closer than calling out to a multi-tenant node agent. When using Istio in Kubernetes your app container talks to the sidecar over a loopback interface inside of the network namespace shared with your pod - so other pods and node agents generally can’t see that communication.

Most Kubernetes clusters have more than one pod per node (and therefore more than one sidecar per node). If each sidecar needs to know “the entire config” (whatever that means for your context), then you’ll need more bandwidth to distribute that config (and more memory to store copies of it). So it can be powerful to limit the scope of configuration that you have to give to each sidecar - but again there’s an opposing tension: something (in Istio’s case, Pilot) has to spend more effort computing that reduced configuration for each sidecar.

Other things that happen to be replicated across sidecars accrue a similar bill. Good news - the container runtimes will reuse things like container disk images when they’re identical and you’re using the right drivers, so the disk penalty is not especially significant in many cases, and memory like code pages can also often be shared. But each sidecar’s process-specific memory will be unique to that sidecar so it’s important to keep this under control and avoid making your sidecar “heavy weight” by doing a bunch of replicated work in each sidecar.

Service Meshes relying on sidecar provide a good balance between a full set of features, and a lightweight footprint.

Will the node agent or sidecar model prevail?

I think you’re likely to see some of both. Now seems like a perfect time for sidecar service mesh: nascent technology, fast iteration and gradual adoption. As service mesh matures and the rate-of-change decreases, we’ll see more applications of the node agent model.

Advantages of the node agent model are particularly important as service mesh implementations mature and clusters get big:

  • Less overhead (especially memory) for things that could be shared across a node
  • Easier to scale distribution of configuration information
  • A well-built node agent can efficiently shift resources from serving one application to another

Sidecar is a novel way of providing services (like a high-level communication proxy a la Service Mesh) to applications. It is especially well-adapted for containers and Kubernetes. Some of its greatest advantages include:

  • Can be gradually added to an existing cluster without central coordination
  • Work performed for an app is accounted to that app
  • App-to-sidecar communication is easier to secure than app-to-agent

What’s next?

As Shawn talked about in his post, we’ve been thinking about how microservices change the requirements from network infrastructure for a few years now. The swell of support and uptake for Istio demonstrated to us that there’s a community ready to develop and coalesce on policy specs, with a well-architected implementation to go along with it.

Istio is advancing state-of-the-art microservices communication, and we’re excited to help make that technology easy to operate, reliable, and well-suited for your team’s workflow in private cloud, public cloud or hybrid.


Using AWS Services from Istio Service Mesh with Go

This is a quick blog on how we use AWS services from inside of an Istio Service Mesh. Why does it matter that you’re inside the mesh? Because the service mesh wants to manage all the traffic in/out of your application. This means it needs to be able to inspect the traffic and parse it if it is HTTP. Nothing too fancy here, just writing it down in case it can save you a few keystrokes.

Our example is for programs written in Go.

Step 1: Define an Egress Rule

You need to make an egress to allow the application to talk to the AWS service at all. Here’s an example egress rule to allow dynamo:

Step 2: Delegate Encryption to the Sidecar

This part is the trick we were missing. If you want to get maximum service mesh benefits, you need to pass unencrypted traffic to the sidecar. The sidecar will inspect it, apply policy and encrypt it before egressing to the AWS service (in our case Dynamo).

Don’t worry, your traffic is not going out on any real wires unencrypted. Only the loopback wire from your app container to the sidecar. In Kubernetes, this is its own network namespace so even other containers on the same system cannot see it unencrypted.

package awswrapper
import (
"net/http"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/request"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/golang/glog"
"github.com/you/repo/pkg/tracing"
)
type Config struct {
InMesh bool
Endpoint string // http://dynamodb.us-west-2.amazonaws.com
Label string // Used in logging messages to identify
}
func istioEgressEPResolver(service, region string, optFns ...func(*endpoints.Options)) (endpoints.ResolvedEndpoint, error) {
ep, err := endpoints.DefaultResolver().EndpointFor(service, region, optFns...)
if err != nil {
return ep, err
}
ep.URL = ep.URL + ":443"
return ep, nil
}
func AwsConfig(cfg Config) *aws.Config {
config := aws.NewConfig().
WithEndpoint(cfg.Endpoint)
if cfg.InMesh {
glog.Infof("Using http for AWS for %s", cfg.Label)
config = config.WithDisableSSL(true).
WithEndpointResolver(endpoints.ResolverFunc(istioEgressEPResolver))
}
return config
}
func AwsSession(label string, cfg *aws.Config) (*session.Session, error) {
sess, err := session.NewSession(cfg)
if err != nil {
return nil, err
}
// This has to be the first handler before core.SendHandler which
// performs the operation of sending request over the wire.
// Note that Send Handler is used which are invoked after the signing of
// request is completed which means Tracing headers would not be signed.
// Signing of tracing headers causes request failures as Istio changes the
// headers and signature validation fails.
sess.Handlers.Send.PushFront(addTracingHeaders)
sess.Handlers.Send.PushBack(func(r *request.Request) {
glog.V(6).Infof("%s: %s %s://%s%s",
label,
r.HTTPRequest.Method,
r.HTTPRequest.URL.Scheme,
r.HTTPRequest.URL.Host,
r.HTTPRequest.URL.Path,
)
})
// This handler is added after core.SendHandler so that the tracing headers
// can be removed. This is required in case of retries, the request is signed
// again and if the request headers contain Tracing headers retry signature
// validation will fail as Istio will update these headers.
sess.Handlers.Send.PushBack(removeTracingHeaders)
return sess, nil
}
view rawwrapAwsSession.go hosted with ❤ by GitHub

AwsConfig() is the core - you need to make a new aws.Session with these options.

The first option, WithDisableSSL(true), tells the AWS libraries to not use HTTPS and instead just speak plain HTTP. This is very bad if you are not in the mesh. But, since we are in the mesh, we’re only going to speak plain HTTP over to the sidecar, which will convert HTTP into HTTPS and send it out over the wire. In Kubernetes, the sidecar is in an isolated network namespace with your app pod, so there’s no chance for other pods or processes to snoop this plaintext traffic.

When you set the first option, the library will try to talk to http://dynamodb.<region>.us-west-2.amazonaws.com on port 80 (hey, you asked it to disable SSL). But that’s not what we want - we want to act like we’re talking to 443 so that the right egress rule gets invoked and the sidecar encrypts traffic. That’s what istioEgressEPResolver is for.

We do it this way for a little bit of belts-and-suspenders safety - we really want to avoid ever accidentally speaking HTTP to dynamo. Here are the various failure scenarios:

  • Our service is in Istio, and the user properly configured InMesh=true: everything works and is HTTPS via the sidecar.
  • Our service is not in Istio, and the user properly configured InMesh=false: everything works and is HTTPS via the AWS go library.
  • Our service is not in Istio, but oops! the user set InMesh=true: the initial request goes out to dynamo on port 443 as plain HTTP. Dynamo rejects it, so we know it’s broken before sending a bunch of data via plain HTTP.
  • Our service is in Istio, but oops! the user set InMesh=false: the sidecar rejects the traffic as it is already-encrypted HTTPS that it can’t make any sense of.

OK, now you’ve got an aws.Session instance ready to go. Pass it to your favorite AWS service interface and go:

import (
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/dynamodb"
"github.com/you/repo/pkg/awswrapper"
)
type Dynamo struct {
Session *session.Session
Db *dynamodb.DynamoDB
}
func NewWithConfig(cfg *aws.Config) (*Dynamo, error) {
sess, err := awswrapper.AwsSession("Test", cfg)
if err != nil {
return nil, err
}
dyn := &Dynamo{
Session: sess,
Db: dynamodb.New(sess),
}
return dyn, nil
}
view rawuseWrapped.go hosted with ❤ by GitHub

p.s. What’s up with addTracingHeaders() and removeTracingHeaders()? Check out Neeraj’s post. While you’re at it, you can add just a few more lines and get great end-to-end distributed tracing.