Pando - The Aspen Mesh Blog


« Back to Blogs

Distributed tracing with Istio in AWS

Posted on Jan 30, 2018 by Neeraj Poddar

Everybody loves tracing! Am I right?

If you attented KubeCon (my bad, CloudNativeCon!) 2017 at Austin or saw any of the keynotes posted online, you would have noticed the recurring theme explaining the benefits of tracing. Istio and service mesh were hot topics and many sessions discussed how Istio provides distributed tracing out of the box making it easier for application developers to integrate tracing into their system.

Indeed, a great benefit of using service mesh is getting more visibility and understanding of your applications. Since this is a tech post (I remember categorizing it as such) let’s dig deeper in how Istio provides application tracing.

When using Istio, a sidecar envoy proxy is automatically injected next to your applications (in Kubernetes this means adding containers to the application Pod). This sidecar proxy intercepts all traffic and can add/augment tracing headers to the requests entering/leaving the application container. Additionally, the sidecar proxy also handles asynchronous reporting of spans to the tracing backends like Jaeger, Zipkin, etc. Sounds pretty awesome!

One thing that the applications do need to implement is propagating tracing headers from incoming to outgoing requests as mentioned in this Istio guide. Simple enough right? Well it’s about to get interesting.

Before we proceed further, first a little background on why I’m writing this blog. We, here at Aspen Mesh offer a supported enterprise service mesh built on open source Istio. Not only do we offer a service mesh product but we also use it in our production SaaS platform hosted in AWS (isn’t that something?).

I was tasked with propagating tracing headers in our applications so that we get nice hierarchical traces graphing the relationship between our microservices. As we are hosted in AWS, many of our microservices make outgoing requests to AWS services. During this exercise, I found some interesting interactions between adding tracing headers and using Istio with AWS services that I decided to share my experience. This blog describes various iterations I went through to get it all working together.

The application in question for this post is a simple web server. When it receives a HTTP request it makes an outbound DynamoDB query to fetch an item. As it is deployed in the Istio service mesh, the sidecar proxy automatically adds tracing headers to the incoming request. I wanted to propagate the tracing headers from the incoming request to the DynamoDB query request for getting all the traces tied together.

First Iteration

In order to achieve this I decided to pass a custom function as request options to the AWS DynamoDB API which allows you to augment request headers before they are transmitted over the wire. In the snippet below I’m using the AWS go-sdk’s dynamo.GetItemWithContext for fetching an item and passing AddTracingHeaders as the request.Option. Note that the AddTracingHeaders method uses standard opentracing API for injecting headers from a input context.

func AddTracingHeaders() awsrequest.Option {
  return func(req *awsrequest.Request) {
    if span := ot.SpanFromContext(req.Context()); span != nil {
      ot.GlobalTracer().Inject(
      span.Context(),
      ot.HTTPHeaders,
      ot.HTTPHeadersCarrier(req.HTTPRequest.Header))
    }
  }
}

// ctx is the incoming request's context as received from the mesh
func makeDynamoQuery(ctx context.Context ) {
  // Note that AddTracingHeaders is passed as awsrequest.Option
  result, err := dynamo.GetItemWithContext(ctx, ..., AddTracingHeaders())
  // Do something with result
}

Ok, time for testing this solution! The new version compiles, and I verified locally that it is able to fetch items from DynamoDB. After deploying the new version in production with Istio (sidecar injected) I’m hoping to see the traces nicely tied together. Indeed, the traces look much better but wait all of the responses from DynamoDB are now HTTP Status Code 400. Bummer!

Looking at the error messages from aws-go-sdk we are getting AccessDeniedException which according to AWS docs indicate that the signature is not valid. Adding tracing headers seems to have broken signature validation which is odd, yet interesting as I had tested in my dev environment (without sidecar proxy) and the DynamoDB requests worked fine, but in production it stopped working. Typical developer nightmare!

Digging into the AWS sdk package, I found that the client code signs every request including headers with a few hardcoded exceptions. The difference between the earlier and the new version is the addition of tracing headers to the request which are now getting signed and then handed to the sidecar proxy. Istio’s sidecar proxy (in this case Envoy) changes these tracing headers (as it should!) before sending it to DynamoDB service which breaks the signature validation at the server.

So, to get this fixed we need to ensure that the tracing headers are added after the request is signed but before it is sent out by the AWS sdk. This is getting more complicated, but still doable.

Second Iteration

I couldn’t find an easy way to whitelist these tracing headers and prevent them from getting them signed. But, AWS session package provides a very flexible API for adding custom handlers which get invoked in various stages of the request lifecycle. Additionally, providing a session handler has the benefit of being added in all AWS service requests (not just DynamoDB) which use that session. Perfect!

Here’s the AddTracingHeaders method above added as a session handler:

sess, err := session.NewSession(cfg)

// Add the AddTracingHeaders as the first Send handler. This is important as one
// of the default Send handlers does the work of sending the request.
sess.Handlers.Send.PushFront(AddTracingHeaders)

This looks promising. Testing showed that the first request to the AWS DynamoDB service is successful (200 Ok!) Traces look good too! We are getting somewhere, time to test some failure scenarios.

I added a Istio fault injection rule to return a HTTP 500 error on outgoing DynamoDB requests to exercise the AWS sdk’s retry logic. Snap! receiving the HTTP Status Code 400 with AccessDeniedException error again on every retry.

Looking at the AWS request send logic, it appears that on retryable errors the code makes a copy of the previous request, signs it and then invokes the Send handlers. This means that on retries the previously added tracing headers would get signed again (i.e. earlier problem is back, hence 400s) and then the AddTracingHeaders handler would add back the tracing headers.

Now that we understand the issue, the solution we came up with is to add the tracing headers after the request is signed and before it is sent out just like the earllier implementation. In addition, to make retries work we now need to remove these headers after the request is sent so that the resigning and reinvocation of AddTracingHeaders is handled correctly.

Final Interation

Here’s what the final working version looks like:

func injectFromContextIntoHeader(ctx context.Context, header http.Header) {
  if span := ot.SpanFromContext(ctx); span != nil {
    ot.GlobalTracer().Inject(
    span.Context(),
    ot.HTTPHeaders,
    ot.HTTPHeadersCarrier(header))
  }
}

func AddTracingHeaders() awsrequest.Option {
  return func(req *awsrequest.Request) {
    injectFromContextIntoHeader(req.Context(), req.HTTPRequest.Header)
  }
}

// This is a bit odd, inject tracing headers into an empty header map so that
// we can remove them from the request.
func RemoveTracingHeaders(req *awsrequest.Request) {
  header := http.Header{}
  injectFromContextIntoHeader(req.Context(), header)
  for k := range header {
    req.HTTPRequest.Header.Del(k)
  }
}

sess, err := session.NewSession(cfg)

// Add the AddTracingHeaders as the first Send handler.
sess.Handlers.Send.PushFront(AddTracingHeaders)

// Pushback is used here so that this handler is added after the request has
// been sent.
sess.Handlers.Send.PushBack(RemoveTracingHeaders)

Agreed, above solution looks far from elegant but it does work. I hope this post helps if you are in a similar situation.

If you have a better solution feel free to reach out to me at neeraj@aspenmesh.io

comments powered by Disqus