Distributed Tracing, Istio and Your Applications

In the microservices world, distributed tracing is slowly becoming the most important tool for debugging and understanding your application dependencies. During my recent conversations in MeetUps and conferences, I found there was a lot of interest in how distributed tracing works but at the same time there was a fair amount of confusion on how tracing interacts with service meshes like Istio and Aspen Mesh. In particular, I had these following questions asked frequently:

  • How does tracing work with Istio? What information is collected and reported in the spans?
  • Do I have to change my applications to benefit from distributed tracing in Istio?
  • If I am currently reporting spans in my application how will it interact with spans reported from Istio?

In this blog I am going to try and answer these questions. Before we get deeper into these questions, a quick background on why or how I ended up writing tracing related blogs. If you follow the Aspen Mesh blog you would have noticed I wrote two blogs related to tracing, one on tracing requests to AWS services when using Istio, and the second on tracing gRPC applications with Istio.

We have a pretty small engineering team at Aspen Mesh and as it goes in most startups if you work frequently on a sub-system or component you quickly become (or labeled or assigned) a resident expert. I added tracing in our microservices and integrated it with Istio in the AWS environment and in that process uncovered various interesting interactions which I thought might be worth sharing. Over the last few months we have been using tracing very heavily to gain understanding of our microservices and it has now become the first place we look when things break. With that let's move on to answering the questions I mentioned above.

How does tracing work with Istio?

Istio injects a sidecar proxy (Envoy) in the Pod in which your application container is running. This sidecar proxy transparently intercepts (iptables magic) all network traffic going in and out of your application. Because of this interception the sidecar proxy is in a unique position to automatically trace all network requests (HTTP/1.1, HTTP/2.0 & gRPC).

Let's see what changes sidecar proxy makes to an incoming request to a Pod from a client (external or other microservices). From this point on I'm going to assume tracing headers are in Zipkin format for simplicity.

  • If the incoming request doesn't have any tracing headers, the sidecar proxy will create a root span (span where trace, parent and span IDs are all the same) before passing the request to the application container in the same Pod.
  • If the incoming request has tracing information (which should be the case if you're using Istio ingress or your microservice is being called from another microservice with sidecar proxy injected), the sidecar proxy will extract the span context from these headers, create a new sibling span (same trace, span and parent ID as incoming headers) before passing the request to the application container in the same Pod.

In the reverse directon when the application container is making outbound requests (external services or services in the cluster), the sidecar proxy in the Pod performs the following actions before making the request to the upstream service:

  • If no tracing headers are present, the sidecar proxy creates root span and injects the span context as tracing headers into the new request.
  • If tracing headers are present, the sidecar proxy extracts the span context from the headers, creates child span from this context. The new context is propagated as tracing headers in the request to the upstream service.

Based on the above explanation you should note that for every hop in your microservice chain you will get two spans reported from Istio, one from the client sidecar (span.kind set to client) and one from the server sidecar (span.kind set to server). All the spans created by the sidecars are automatically reported by the sidecars to the configured tracing backend systems like Jaeger or Zipkin.

Next let's look at the information reported in the spans. The spans contain the following information:

  • x-request-id: Reported as guid:x-request-id which is very useful in correlating access logs with spans.
  • upstream cluster: The upstream service to which the request is being made. If the span is tracking an incoming request to a Pod this is typically set to in.<name>. If the span is tracking an outbound request this is set to out.<name>.
  • HTTP headers: Following HTTP headers are reported when available:
    • URL
    • Method
    • User agent
    • Protocol
    • Request size
    • Response size
    • Response Flags
  • Start and end times for each span.
  • Tracing metadata: This includes the trace ID, span ID and the span kind (client or server). Apart from these the operation name is also reported for every span. The operation name is set to the configured virtual service (or route rule in v1alpha1) which affected the route or "default-route" if the default route was chosen. This is very useful in understanding which Istio route configuration is in effect for a span.

With that let's move on to the second question.

Do I have to change my application to gain benefit from tracing in Istio?

Yes, you will need to add logic in your application to propagate tracing headers from incoming to outgoing requests to gain full benefit from Istio's distributed tracing.

If the application container makes a new outbound request in the context of an incoming request and doesn't propagate the tracing headers from the incoming request, the sidecar proxy creates a root span for the outbound request. This means you will always see traces with only two microservices. On the other hand if the application container does propagate the tracing headers from incoming to outgoing requests, the sidecar proxy will create child spans as described above. Creation of the child spans gives you the ability to understand dependencies across multiple microservices.

There are couple of options for propagating tracing headers in your application.

  1. Look for tracing headers as mentioned in the istio docs and transfer the headers from incoming to outgoing requests. This method is simple and works in almost all cases. However, it has a major drawback that you cannot add custom tags to the spans like user information. You cannot create child spans related to events in the application which you might want to report. As you are simply transferring headers without understanding the span formats or contexts there is limited ability to add application specific information.
  2. The second method is setting up a tracing client in your application and use the Opentracing APIs to propagate tracing headers from incoming to outgoing requests. I have created a sample tracing-go package which provides an easy way to setup jaeger-client-go in your applications which is compatible with Istio. Following snippet should be included in the main function of your application:
       import (
         "log"

         "github.com/spf13/cobra"
         "github.com/spf13/viper"

         "github.com/aspenmesh/tracing-go"
       )

       func setupTracing() {
         // Configure Tracing
         tOpts := &tracing.Options{
           ZipkinURL:     viper.GetString("trace_zipkin_url"),
           JaegerURL:     viper.GetString("trace_jaeger_url"),
           LogTraceSpans: viper.GetBool("trace_log_spans"),
         }
         if err := tOpts.Validate(); err != nil {
           log.Fatal("Invalid options for tracing: ", err)
         }
         var tracer io.Closer
         if tOpts.TracingEnabled() {
           tracer, err = tracing.Configure("myapp", tOpts)
           if err != nil {
             tracer.Close()
             log.Fatal("Failed to configure tracing: ", err)
           } else {
             defer tracer.Close()
           }
         }
       }

The key point to note is in the tracing-go package I have set the Opentracing global tracer to the Jaeger tracer. This enables me to use the Opentracing APIs for propagating headers from incoming to outgoing requests like this:

   import (
     "net/http"
     "golang.org/x/net/context"
     "golang.org/x/net/context/ctxhttp"
     "ot "github.com/opentracing/opentracing-go"
   )

   func injectTracingHeaders(incomingReq *http.Request, addr string) {
     if span := ot.SpanFromContext(incomingReq.Context()); span != nil {
       outgoingReq, _ := http.NewRequest("GET", addr, nil)
       ot.GlobalTracer().Inject(
         span.Context(),
         ot.HTTPHeaders,
         ot.HTTPHeadersCarrier(outgoingReq.Header))

       resp, err := ctxhttp.Do(ctx, nil, outgoingReq)
       // Do something with resp
     }
   }

You can also use the Opentracing APIs to set span tags or create child spans
from the tracing context added by Istio like this:

   func SetSpanTag(incomingReq *http.Request, key string, value interface{}) {
     if span := ot.SpanFromContext(incomingReq.Context()); span != nil {
       span.SetTag(key, value)
     }
   }

Apart from these benefits you don't have to deal with tracing headers directly but the tracer (in this case Jaeger) handles it for you. I strongly recommend using this approach as it sets the foundation in your application to add enhanced tracing capabilities without much overhead.

Now let's move on to the third question.

How does spans reported by Istio interact with spans created by applications?

If you want the spans reported by your application to be child spans of the tracing context added by Istio you should use Opentracing API StartSpanFromContext instead of using StartSpan. The StartSpanFromContext creates a child span from the parent context if present else creates a root span.

Note that in all the examples above I have used Opentracing Go APIs but you should be able to use any tracing client library written in the same language as your application as long as it is Opentracing API compatible.


Tracing gRPC with Istio

At Aspen Mesh we love gRPC. Most of our public facing and many internal APIs use it. To give you a brief background in case you haven’t heard about it (would be really difficult with gRPC’s belle of the ball status), it is a new, highly efficient and optimized Remote Procedure Call (RPC) framework. It is based on the battle tested protocol buffers serialization format and HTTP/2 network protocol.

Using HTTP/2 protocol, gRPC applications can benefit from multiplexing requests, efficient connection utilization and host of other enhancements over other protocols like HTTP/1.1 which is very well documented here. Additionally, protocol buffers are an easy and extensible way for serializing structured data in binary format which in itself gives you significant performance improvements over text based formats. Combining these two results in a low latency and highly scalable RPC framework, which is in essence what gRPC is. Additionally, the growing ecosystem gives you the ability to write your applications in many supported languages like (C++, Java, Go, etc.) and an extensive set of third party libraries to use.

Apart from the benefits I listed above, what I like most about gRPC is the simplicity and intuitiveness with which you can specify your RPCs (using the protobufs IDL) and how a client application can invoke methods on the server application as if it was a local function call. A lot of the code (service descriptions and handlers, client methods, etc.) gets auto generated for you making it very convenient to use.

Now that I have laid out some background, let’s turn our attention to the main topic of this blog. Here I’m going to cover how to add tracing in your applications built on gRPC, especially if you’re using Istio or Aspen Mesh.

Tracing is great for debugging and understanding your application’s behavior. The key to making sense of all the tracing data is being able to correlate spans from different microservices which are related to a single client request.

To achieve this, all microservices in your application should propagate tracing headers. If you’re using a service mesh like Istio or Aspen Mesh, the ingress and sidecar proxies automatically add the appropriate tracing headers and report the spans to the tracing collector backend like Jaeger or Zipkin. The only thing left for applications to do is propagate tracing headers from incoming requests (which sidecar or ingress proxy adds) to any outgoing requests it makes to other microservices.

Propagating Headers from gRPC to gRPC Requests

The easiest way to do tracing header propagation is to use the grpc opentracing middleware library’s client interceptors. This can be used if your application is making a new outbound request upon receiving the incoming request. Here’s the sample code to correctly propagate tracing headers from the incoming to outgoing request:

  import (
    "golang.org/x/net/context"
    "github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing"
    "ot "github.com/opentracing/opentracing-go"
  )

  // ctx is the incoming gRPC request's context
  // addr is the address for the new outbound request
  func createGRPCConn(ctx context.Context, addr string) (*grpc.ClientConn, error) {
  	var opts []grpc.DialOption
  	opts = append(opts, grpc.WithStreamInterceptor(
  		grpc_opentracing.StreamClientInterceptor(
  			grpc_opentracing.WithTracer(ot.GlobalTracer()))))
  	opts = append(opts, grpc.WithUnaryInterceptor(
  		grpc_opentracing.UnaryClientInterceptor(
  			grpc_opentracing.WithTracer(ot.GlobalTracer()))))
  	conn, err := grpc.DialContext(ctx, addr, opts...)
  	if err != nil {
  		glog.Error("Failed to connect to application addr: ", err)
  		return nil, err
  	}
  	return conn, nil
  }

Pretty simple right?

Adding the opentracing client interceptors ensures that making any new unary or streaming gRPC request on the client connection injects the correct tracing headers. If the passed context has the tracing headers present (which should be the case if you are using Aspen Mesh or Istio and passing the incoming request’s context), then the new span is created as the child span of the span present in the passed context. On the other hand if the context has no tracing information, a new root span is created for the outbound request.

Propagating Headers from gRPC to HTTP Requests

Now let’s look at the scenario if your application makes a new outbound HTTP/1.1 request upon receiving a new incoming gRPC request. Here’s the sample code to accomplish header propagating in this case:

  import (
    "net/http"
    "golang.org/x/net/context"
    "golang.org/x/net/context/ctxhttp"
    "ot "github.com/opentracing/opentracing-go"
  )

  // ctx is the incoming gRPC request's context
  // addr is the address of the application being requested
  func makeNewRequest(ctx context.Context, addr string) {
    if span := ot.SpanFromContext(ctx); span != nil {
      req, _ := http.NewRequest("GET", addr, nil)

      ot.GlobalTracer().Inject(
        span.Context(),
        ot.HTTPHeaders,
        ot.HTTPHeadersCarrier(req.Header))

      resp, err := ctxhttp.Do(ctx, nil, req)
      // Do something with resp
    }
  }

This is quite standard for serializing tracing headers from incoming request’s (HTTP or gRPC) context.

Great! So far we have been able to use libraries or standard utility code to get what we want.

Propagating Headers When Using gRPC-Gateway

One of the libraries commonly used in gRPC applications is the grpc-gateway library to expose services as RESTful JSON APIs. This is very useful when you want to consume gRPC from clients like curl, web browser, etc. which don’t understand it or maintain a RESTful architecture. More details on how to expose RESTful APIs using grpc-gateway can be found in this great blog. I highly encourage you to read it if you’re new to this architecture.

When you start using grpc-gateway and want to propagate tracing headers there are few very interesting interactions that are worth mentioning. The grpc-gateway documentation states that all IANA permanent HTTP headers are prefixed with grpcgateway- and added as request headers. This is great but as tracing headers like x-b3-traceidx-b3-spanid, etc. are not IANA recognized permanent HTTP headers they are not copied over to gRPC requests when grpc-gateway proxies HTTP requests. This means as soon as you add the grpc-gateway to your application, the header propagation logic will stop working.

Isn’t that typical? You add one awesome thing which breaks the current working setup. No worries, I have a solution for you!

Here’s a way to ensure you don’t lose the tracing information when proxying between HTTP and gRPC using grpc-gateway:

  import (
    "net/http"
    "golang.org/x/net/context"
    "google.golang.org/grpc/metadata"
    "github.com/grpc-ecosystem/grpc-gateway/runtime"
  )

  const (
  	prefixTracerState  = "x-b3-"
  	zipkinTraceID      = prefixTracerState + "traceid"
  	zipkinSpanID       = prefixTracerState + "spanid"
  	zipkinParentSpanID = prefixTracerState + "parentspanid"
  	zipkinSampled      = prefixTracerState + "sampled"
  	zipkinFlags        = prefixTracerState + "flags"
  )

  var otHeaders = []string{
  	zipkinTraceID,
  	zipkinSpanID,
  	zipkinParentSpanID,
  	zipkinSampled,
  	zipkinFlags}

  func injectHeadersIntoMetadata(ctx context.Context, req *http.Request) metadata.MD {
  	pairs := []string{}
  	for _, h := range otHeaders {
  		if v := req.Header.Get(h); len(v) > 0 {
  			pairs = append(pairs, h, v)
  		}
  	}
  	return metadata.Pairs(pairs...)
  }

  type annotator func(context.Context, *http.Request) metadata.MD

  func chainGrpcAnnotators(annotators ...annotator) annotator {
  	return func(c context.Context, r *http.Request) metadata.MD {
  		mds := []metadata.MD{}
  		for _, a := range annotators {
  			mds = append(mds, a(c, r))
  		}
  		return metadata.Join(mds...)
  	}
  }

  // Main function of your application. Insert tracing headers into gRPC
  // metadata using annotators
  func run() {
    ...
	  annotators := []annotator{injectHeadersIntoMetadata}

	  gwmux := runtime.NewServeMux(
		  runtime.WithMetadata(chainGrpcAnnotators(annotators...)),
	  )
    ...
  }

In the code above, I have used the runtime.WithMetadata API provided by the grpc-gateway library. This API is useful for reading attributes from HTTP request and adding it to the metadata, which is exactly what we want! A little bit more work, but still using the APIs exposed by the library.

The injectHeadersIntoMetadata annotator looks for the tracing headers in the HTTP requests and appends it to the metadata, thereby ensuring that the tracing headers can be further propagated from gRPC to outbound requests using the techniques mentioned in the previous sections.

Another interesting thing you might have observed is the wrapper chainGrpcAnnotators function. The runtime.WithMetadata API only allows a single annotator to be added which might not be enough for all scenarios. In our case, we had a tracing annotator (like the one show above) and an authentication annotator which appended auth data from HTTP request to the gRPC metadata. UsingchainGrpcAnnotators allows you to add multiple annotators and the wrapper function joins the metadata from various annotators into a single metadata for the request.