Advancing the promise of service mesh: Why I work at Aspen Mesh

The themes and content expressed are mine alone, with helpful insight and thoughts from my colleagues, and are about software development in a business setting.

I’ve been working at Aspen Mesh for a little over a month and during that time numerous people have asked me why I chose to work here, given the opportunities in Boulder and the Front Range.

To answer that question, I need to talk a bit about my background. I’ve been a professional software developer for about 13 years now. During that time I’ve primarily worked on the back-end for distributed systems and have seen numerous approaches to the same problems with various pros and cons. When I take a step back, though, a lot of the major issues that I’ve seen are related to deployment and configuration around service communication:
How do I add a new service to an already existing system? How big, in scope, should these services be? Do I use a message broker? How do I handle discoverability, high availability and fault tolerance? How, and in what format, should data be exchanged between services? How do I audit the system when the system inevitably comes under scrutiny?

I’ve seen many different approaches to these problems. In fact, there are so many approaches, some orthogonal and some similar, that software developers can easily get lost. While the themes are constant, it is time consuming for developers to get up to speed with all of these technologies. There isn’t a single solution that solves every common problem seen in the backend; I’m sure the same applies to the front-end as well. It’s hard to truly understand the pros and cons of an approach until you have a working system; and when that happens and if you then realize that the cons outweigh the pros, it may be difficult and costly to get back to where you started (see sunk cost fallacy and opportunity cost). Conversely, analysis paralysis is also costly to an organization, both in terms of capital—software developers are not cheap—and an inability to quickly adapt to market pressures, be it customer needs and requirements or a competitor that is disrupting the market.

Yet the hype cycle continues. There is always a new shiny thing taking the software world by storm. You see it in discussions on languages, frameworks, databases, messaging protocols, architectures ad infinitum. Separating the wheat from the chaff is something developers must do to ensure they are able to meet their obligations. But with the signal to noise ratio being high at times and with looming deadlines not all possibilities can be explored.  

So as software developers, we have an obligation of due diligence and to be able to deliver software that provides customer value; that helps customers get their work done and doesn’t impede them, but enables them. Most customers don’t care about which languages you use or which databases you use or how you build your software or what software process methodology you adhere to, if any. They just want the software you provide to enable them to do their work. In fact, that sentiment is so strong that slogans have been made around it.

So what do customers care about, generally speaking? They care about access to their data, how they can view it and modify it and draw value from it. It should look and feel modern, but even that isn’t a strict requirement. It should be simple to use for a novice, but yet provide enough advanced capability to help your most advanced users make you learn something new about the tool you’ve created. This is information technology after all. Technology for technology’s sake is not a useful outcome.

Any work that detracts from adding customer value needs to be deprioritized, as there is always more work to do than hours in the day. As developers, it’s our job to be knee deep in the weeds so it’s easy to lose sight of that; unit testing, automation, language choice, cloud provider, software process methodology, etc… absolutely matter, but that they are a means to an end.

With that in mind, let’s create a goal: application developers should be application developers.

Not DevOps engineers, or SREs or CSRs, or any other myriad of roles they are often asked to take on. I’ve seen my peers happiest when they are solving difficult problems and challenging themselves. Not when they are figuring out what magic configuration setting is breaking the platform. Command over their domain and the ability and permission to “fix it” is important to almost every appdev.

If developers are expensive to hire, train, replace and keep then they need to be enabled to do their job to the best of their ability. If a distributed, microservices platform has led your team to solving issues in the fashion of Sherlock Holmes solving his latest mystery, then perhaps you need a different approach.

Enter Istio and Aspen Mesh

It’s hard to know where the industry is with respect to the Hype Cycle for technologies like microservices, container orchestration, service mesh and a myriad of other choices; this isn’t an exact science where we can empirically take measurements. Most companies have older, but proven, systems built on LAMP or Java application servers or monoliths or applications that run on a big iron system. Those aren’t going away anytime soon, and developers will need to continue to support and add new features and capabilities to these applications.

Any new technology must provide a path for people to migrate their existing systems to something new.

If you have decided to or are moving towards a microservice architecture, even if you have a monolith, implementing a service mesh should be among the possibilities explored. If you already have a microservice architecture that leverages gRPC or HTTP, and you're using Kubernetes then the benefits of a service mesh can be quickly realized. It's easy to sign up for our beta and install Aspen Mesh and the sample bookinfo application to see things in action. Once I did is when I became a true believer. Not being coupled with a particular cloud provider, but being flexible and able to choose where and how things are deployed empowers developers and companies to make their own choices.

Over the past month I’ve been able to quickly write application code and get it delivered faster than ever before; that is in large part due to the platform my colleagues have built on top of Kubernetes and Istio. I’ve been impressed by how easy a well built cloud-native architecture can make things, and learning more about where Aspen Mesh, Istio and Kubernetes are heading gives me confidence that community and adoption will continue to grow.

As someone that has dealt with distributed systems issues continuously throughout his career, I know managing and troubleshooting a distributed system can be exhausting. I just want to enable others, even Aspen Mesh as we dogfood our own software, to do their jobs. To enable developers to add value and solve difficult problems. To enable a company to monitor their systems, whether it be mission critical or a simple CRUD application, to help ensure high uptime and responsiveness. To enable systems to be easily auditable when the compliance personnel has GRDP, PCI DSS or HIPAA concerns. To enable developers to quickly diagnose issues within their own system, fix them and monitor the change. To enable developers to understand how their services are communicating with each other--if it’s an n-tier system or a spider’s web--and how requests propagate through their system.

The value of Istio and the benefits of Aspen Mesh in solving these challenges is what drew me here. The opportunities are abundant and fruitful. I get to program in go, in a SaaS environment and on a small team with a solid architecture. I am looking forward to becoming a part of the larger CNCF community. With microservices and cloud computing no longer being niche--which I’d argue hasn’t been the case for years--and with businesses adopting these new technology patterns quickly, I feel as if I made the right long-term career choice.


Top 3 Service Mesh Developments in 2019

Last year was about service mesh evaluation, trialing — and even hype.

While the interest in service mesh as a technology pattern was very high, it was mostly about evaluation and did not see widespread adoption. The capabilities service mesh can add to ease managing microservice-based applications at runtime are obvious, but the technology still needs to reach maturity before gaining widespread production adoption.

What we can say is service mesh adoption should evolve from the hype stage in a very real way this year.

What can we expect to see in 2019?

  1. The evolution and coalescing of service mesh as a technology pattern;
  2. The evolution of Istio as the way enterprises choose to implement service mesh;
  3. Clear uses cases that lead to wider adoption.

The Evolution of Service Mesh

There are several service mesh architectural options when it comes to service mesh, but undoubtedly, the sidecar architecture will see the most widespread usage in 2019. Sidecar proxy as the architectural pattern, and more specifically, Envoy as the technology, have emerged as clear winners for how the majority will implement service mesh.

Considering control plane service meshes, we have seen the space coalesce around leveraging sidecar proxies. Linkerd, with its merging of Conduit and release of Linkerd 2, got on the sidecar train. And the original sidecar control plane mesh, Istio, certainly has the most momentum in the cloud native space. A look at the Istio Github repo shows:

  • 14,500 stars;
  • 6,400 commits;
  • 300 contributors.

And if these numbers don’t clearly demonstrate the momentum of the project, just consider the number of companies building around Istio:

  • Aspen Mesh;
  • Avi Networks;
  • Cisco;
  • OpenShift;
  • NGINX;
  • Rancher;
  • Tufin Orca;
  • Tigera;
  • Twistlock;
  • VMware.

The Evolution of Istio

So the big question is where is the Istio project headed in 2019? I should start with the disclaimer that the following are all guesses. — they are well-informed guesses, but guesses nonetheless.

Community Growth

Now that Istio has hit 1.0, the number of contributors outside the core Google and IBM team are starting to grow. I’d hazard the guess that Istio will be truly stable around 1.3 sometime in June or July. Once the project gets to the point it is usable at scale in production, I think you’ll really see it take off.

Emerging Vendor Landscape

At Aspen Mesh, we hedged our bets on Istio 18 months ago. It seems to be becoming clear that Istio will win service mesh in much the same way Kubernetes has won container orchestration.

Istio is a powerful toolbox that directly addresses many microservices challenges that are being solved with multiple manual processes, or are not being solved at all. The power of the open source community surrounding it also seems to be a factor that will lead to widespread adoption. As this becomes clearer, the number of companies building on Istio and building Istio integrations will increase.

Istio Will Join the Cloud Native Computing Foundation

Total guess here, but I’d bet on this happening in 2019. CNCF has proven to be an effective steward of cloud-native open source projects. I think this will also be a key to widespread adoption which will be key to the long-term success of Istio. We shall see what the project founders decide, but this move will benefit everyone once the Istio project is at the point it makes sense for it to become a CNCF project.

Real-World Use Cases Are Key To Spreading Adoption

Service mesh is still a nascent market and in the next 12-24 months, we should see the market expand past just the early adopters. But for those who have been paying attention, the why of a service mesh has largely been answered. The whyis also certain to evolve, but for now, the reasons to implement a service mesh are clear. I think that large parts of the how are falling into place, but more will emerge as service mesh encounters real-world use cases in 2019.

I think what remains unanswered is “what are the real world benefits I am going to see when I put this into practice”? This is not a new question around an emerging technology. Neither will the way this question gets answered be anything new: and that will be through uses cases. I can’t emphasize enough how use cases based on actual users will be key.

Service mesh is a powerful toolbox, but only a small swath of users will care about how cool the tech is. The rest will want to know what problems it solves.

I predict 2019 will be the year of service mesh use cases that will naturally emerge as the number of adopters increases and begins to talk about the value they are getting with a service mesh.

Some Final Thoughts

If you are already using a service mesh, you understand the value it brings. If you’re considering a service mesh, pay close attention to this space and the number of uses cases will make the real world value proposition more clear. And if you’re not yet decided on whether or not you need a service mesh, check out the recent Gartner451 and IDC reports on microservices — all of which say a service mesh will be mandatory by 2020 for any organization running microservices in production.


Inline yaml Editing with yq

So you're working hard at building a solid Kubernetes cluster — maybe using kops to create a new instance group and BAM you are presented with an editor session to edit the details of that shiny new instance group. No biggie; you just need to add a simple little detailedInstanceMonitoring: true to the spec and you are good to go.

Okay, now you need to do this several times a day to test the performance of the latest build and this is just one of several steps to get the cluster up and running. You want to automate building that cluster as much as possible but every time you get to the step to create that instance group, BAM there it is again — your favorite editor, you have to add that same line every time.

Standard practice is to use cluster templating but there are times when you need something more lightweight. Enter yq.

yq is great for digging through yaml files but it also has an in-place merge function that can modify a file directly just like any editor. And kops, along with several other command line tools honor the EDITOR environment variable so you can automate your yaml editing along with the rest of your cluster handy work.

Making it work

The first roadblock is that you can pass command line options via the EDITOR environment variable but the file being edited in-place must be the last option (actually passed to the editor by kops as it invokes the editor). yq wants you to pass it the file to be edited followed by a patch file with instructions on editing the file (more on that below). To get around this issue I use a little bash script to invoke yq and reorder the last two command line options like so (I'll call the file yq-merge-editor.sh):

#!/usr/bin/env bash

if [[ $# != 2 ]]; then
    echo "Usage: $0 <merge file (supplied by script)> <file being edited (supplied by invoker of EDITOR)>"
    exit 1
fi

yq merge --inplace --overwrite $2 $1

In the above script, the merge option tells yq we want to merge yaml files and --inplace says to edit the first file in-place. The --overwrite option instructs yq to overwrite existing sections of the file if they are defined in the merge file. $2 is the file to be edited and $1 is the merge file (the opposite order of what the script gets them in). There are other useful options available documented in the yq merge documentation.

Example 1: Turning on detailed instance monitoring

The next step is to create a patch file containing the edit you want to perform. In this example, we will turn on detailed instance monitoring which is a useful way to get more metrics from your nodes. Here's the merge file (we will call this file ig-monitoring.yaml):

spec:
  detailedInstanceMonitoring: true

To put it all together, you can invoke kops with a custom editor command:

EDITOR="./yq-merge-editor.sh ./ig-monitoring.yaml" kops edit instancegroups nodes

That's it! kops creates a temporary file and invokes your editor script which invokes yq. yq edits the temporary file in-place and kops takes the edited output and moves on.

Example 2: Temporarily add nodes

Say you want to temporarily add capacity to your cluster while performing some maintenance. This is a temporary change, so there's no need to update your cluster's configuration permanently. The following patch file will update the min and max node counts in an instance group:

spec:
  maxSize: 25
  minSize: 25

Then invoke the same script from above followed by a kops update:

EDITOR="./yq-merge-editor.sh ig-nodes-25.yaml" kops edit instancegroups nodes
kops update cluster $NAME --yes

These tips should make it easier to build lots of happy clusters!