We’ve been talking about how you can use a service mesh to do progressive delivery lately.  Progressive delivery fundamentally is about decoupling software delivery from user activation of said software.  Once decoupled, the user activation portion is under business control. It’s early days, but the promise here is that software engineering can build new stuff as fast as they can, and your customer success team (already keeping a finger on the pulse of the users and market) can independently choose when to introduce what new features (and associated risk) to whom.

We’ve demonstrated how you can use Flagger (a progressive delivery Kubernetes operator) to command a service mesh to do canary deploys.  These help you activate functionality for only a subset of traffic and then progressively increase that subset as long as the new functionality is healthy.  We’re also working on an enhancement to Flagger to do traffic mirroring. I think this is really cool because it lets you try out a feature activation without actually exposing any users to the impact.  It’s a “pre-stage” to a canary deployment: Send one copy to the original service, one copy to the canary, and check if the canary responds as well as the original.

There’s a caveat we bring up when we talk about this, however: Idempotency.  You can only do traffic mirroring if you’re OK with duplicating requests, and sending one to the primary and one to the canary.  If your app and infrastructure is OK with duplicating these requests, they are said to be idempotent.

Idempotency

Idempotency is the ability to apply an operation more than once and not change the result.  In math, we’d say that:

f(f(a)) = f(a)

An example of a mathematical function that’s idempotent is ABS(), the absolute value.

ABS(-3.7) = 3.7
ABS(ABS(3.7)) = ABS(3.7) = 3.7

We can repeat taking the absolute value of something as many times as we want, it won’t change any more after the first time.  Similar things in your math toolbox: CEIL(), FLOOR(), ROUND(). But SQRT() is not in general idempotent. SQRT(16) = 4, SQRT(SQRT(16)) = 2, and so on.

For web apps, idempotent requests are those that can be processed twice without causing something invalid to happen.  So read-only operations like HTTP GETs are always idempotent (as long as they’re actually only reads). Some kinds of writes are also idempotent: suppose I have a request that says “Set andrews_timezone to MDT”.  If that request gets processed twice, my timezone gets set to MDT twice. That’s OK. The first one might have changed it from PDT to MDT, and the second one “changes” it from MDT to MDT, so no change. But in the end, my timezone is MDT and so I’m good.

An example of a not-idempotent request is one that says “Deduct $100 from andrews_account”.  If we apply that request twice, then my account will actually have $200 deducted from it and I don’t want that.  You remember those e-commerce order pages that say “Don’t click reload or you may be billed twice”? They need some idempotency!

This is important for traffic mirroring because we’re going to duplicate the request and send one copy to the primary and one to the canary.  While idempotency is great for enabling this traffic mirroring case, I’m here to tell you why it’s a great thing to have anyway, even if you’re never going to do progressive delivery.

Exactly-Once Delivery and Invading Generals

There’s a fundamental tension that emerges if you have distributed systems that communicate over an unreliable channel.  You can never be sure that the other side received your message exactly once.  I’ll retell the parable of the Invading Generals as it was told to me the first time I was sure I had designed a system that solved this paradox.

There is a very large invading army that has camped in a valley for the night.  The defenders are split into two smaller armies that surround the invading army; one set of defenders on the eastern edge of the valley and one on the west.  If both defending armies attack simultaneously, they will defeat the invaders. However, if only one attacks, it will be too small; the invaders will defeat it, and then turn to the remaining defenders at their leisure and defeat them as well.

The general for the eastern defenders needs to coordinate with the general for the western defenders for simultaneous attack.  Both defenders can send spies with messages through the valley, as many as they want. Each spy has a 10% chance of being caught by the invaders and killed before his message is delivered, and a 90% chance of successfully delivering the message.  You are the general for the eastern defenders: what message will you send to the western defenders to guarantee you both attack simultaneously and defeat the invaders?

Turns out, there is no guaranteed safe approach.  Let’s go through it. First, let’s send this message:

“Western General, I will attack at dawn and you must do the same.”

– Eastern General

There’s a 90% chance that your spy will get through, but there’s a 10% chance that he won’t, only you will attack and you will lose to the invaders.  The problem statement said you have infinite spies, we must be able to do better!

OK, let’s send lots of spies with the same message.  Then our probability of success is 1-0.1^n, where n is the number of spies.  So we can asymptotically approach 100% probability that the other side agrees, but we can never be sure.

How about this message:

“Western General, I am prepared to attack at dawn.  Send me a spy confirming that you have received this message so I know you will also attack.  If I don’t receive confirmation, I won’t attack because my army will be defeated if I attack alone.”

– Eastern General

Now, if you don’t receive a spy back from the western general you’ll send another, and another, until you get a response.  But…. put yourself in the shoes of the western general. How does the western general know that you’ll receive the confirmation spy?  Should the western army attack at dawn? What if the confirmation spy was caught and now only the western army attacks, ensuring defeat?

The western general could send lots of confirmation spies, so there is a high probability that at least one gets through.  But they can’t guarantee with 100% probability that one gets through.

The western general could also send this response:

“Eastern General, we have received your spy.  We are also prepared to attack at dawn. We will be defeated if you do not also attack, and I know you won’t attack if you don’t know that we have received your message.  Please send back a spy confirming that you have received my confirmation or else we will not attack because we will be destroyed.”

 

– Western General

A confirmation of a confirmation! (In networking ARQ terms, an ACK-of-an-ACK).  Again, this can reduce probability but cannot provide guarantees: we can keep shifting uncertainty between the Eastern and Western generals but never eliminate it.

Engineering Approaches

Okay, we can’t know for sure that our message is delivered exactly once (regardless of service mesh or progressive delivery or any of that), so what are we going to do?  There are a few approaches:

• Retry naturally-idempotent requests

• Uniquefy requests

• Conditional updates

• Others

Retry Naturally-Idempotent Requests

If you have a request that is naturally idempotent, like getting the temperature on a thermostat, the end user can just repeat it if they didn’t get the response they want.

Uniqueify Requests

Another approach is to make requests unique at the client, and then have all the other services avoid processing the same unique request twice.  One way to do this is to invent a UUID at the client and then have servers remember all the UUIDs they’ve already seen. My deduction request would then look like:

This is unique request f41182d1-f4b2-49ec-83cc-f5a8a06882aa.
If you haven’t seen this request before, deduct $100 from andrews_account.

Then you can submit this request as many times as you want to the processor, and the processor can check if it’s handled “f41182d1-f4b2-49ec-83cc-f5a8a06882aa” before.  There are a few caveats here.

First you have to have a way to generate unique identifiers.  UUIDs are pretty good but theoretically there’s an extremely small possibility of UUID collision; practically there’s a couple of minor foot-guns to watch out for like generating UUIDs on two VMs or containers that both have fake virtual MAC addresses that match.  You can also have the server make the unique identifier for you (it could be an auto-generated primary key in a database that is guaranteed to be unique).

Second your server has to remember all the UUIDs that you have processed.  Typically you put these in a database (maybe using UUID as a primary key anyway).  If the record of processed UUIDs is different than the action you take when processing, there’s still a “risk window”: you might commit a UUID and then fail to process it, or you might process it and fail to commit the UUID.  Algorithms like two-phase commit and paxos can help close the risk window.

Conditional Updates

Another approach is to include information in the request about what things looked like when the client sent the request, so that the server can abort the request if something has changed.  This includes the case that the “change” is a duplicate request and we’ve already processed it.

For instance, maybe my bank ledger looks like this:

Then I would make my request look like:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

If this request gets duplicated, the first will succeed and the second will fail.  After the first:

The duplicated request will fail:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

In this case the server could respond to the first copy with “Success” and the second copy with a soft failure like “Already committed” or just tell the client to read and notice that its update already happened.  MongoDB, AWS Dynamo and others support these kinds of conditional updates.

Others

There are many practical approaches to this problem.  I recommend doing some initial reasoning about idempotency, and then try to shift as much functionality as you can to the database or persistent state layer you’re using.  While I gave a quick tour of some of the things involved in idempotency, there are a lot of other tricks like write-ahead journalling, conflict-free replicated data types and others that can enhance reliability.

Conclusion

Traffic mirroring is a great way to exercise canaries in a production environment before exposing them to your users.  Mirroring makes a duplicate of each request and sends one copy to the primary, one copy to the new canary version of your microservice.  This means that you must use mirroring only for idempotent requests: requests that can be applied twice without causing something erroneous to happen.

This caveat probably exists even if you aren’t doing traffic mirroring, because networks fail.  The Eastern General and Western General can never really be sure their messages are delivered exactly once, there will always be a case where they may have to retry.  I think you want to build idempotency wherever possible, and then you should use traffic mirroring to test your canary deployments.