Helping Istio Sail

Around three years ago, we recognized the power of service mesh as a technology pattern to help organizations manage the next generation of software delivery challenges, and that led to us founding Aspen Mesh. And we know that efficiently managing high-scale Kubernetes applications requires the power of Istio. Having been a part of the Istio project since 0.2, we have seen countless users and customers benefit from the observability, security and traffic management capabilities that a service mesh like Istio provides. 

Our engineering team has worked closely with the Istio community over the past three years to help drive the stability and add new features that make Istio increasingly easy for users to adopt. We believe in the value that open source software — and an open community  —  brings to the enterprise service mesh space, and we are driven to help lead that effort through setting sound technical direction. By contributing expertise, code and design ideas like Virtual Service delegation and enforcing RBAC policies for Istio resources, we have focused much of our work on making Istio more enterprise-ready. One of these contributions includes Istio Vet, which was created and open sourced in early days of Aspen Mesh as a way to enhance Istio's user experience and multi resource configuration validation. Istio Vet proved to be very valuable to users, so we decided to work closely with the Istio community to create istioctl analyze in order to add key configuration analysis capabilities to Istio. It’s very exciting to see that many of these ideas have now been implemented in the community and part of the available feature set for broader consumption. 

As the Istio project and its users mature, there is a greater need for open communication around product security and CVEs. Recognizing this, we were fortunate to be able to help create the early disclosure process and lead the product security working group which ensures the overall project is secure and not vulnerable to known exploits. 

We believe that an open source community thrives when users and vendors are considered as equal partners. We feel privileged that we have been able to provide feedback from our customers that has helped accelerate Istio’s evolution. In addition, it has been a privilege to share our networking expertise from our F5’s heritage with the Istio project as maintainers and leads of important functional areas and key working groups.

The technical leadership of Istio is a meritocracy and we are honored that our sustained efforts have been recognized with - my appointment to the Technical Oversight Committee

As a TOC member, I am excited to work with other Istio community leaders to focus the roadmap on solving customer problems while keeping our user experience top of mind. The next, most critical challenges to solve are day two problems and beyond where we need to ensure smoother  upgrades, enhanced security and scalability of the system. We envision use cases emerging from industries like Telco and FinServ which will  push the envelope of technical capabilities beyond what many can imagine.

It has been amazing to see the user growth and the maturity of the Istio project over the last three years. We firmly believe that a more diverse leadership and an open governance in Istio will further help to advance the project and increase participation from developers across the globe.

The fun is just getting started and I am honored to be an integral part of Istio’s journey! 


How to Achieve Engineering Efficiency with a Service Mesh

How to Achieve Engineering Efficiency with a Service Mesh

As the idea for Aspen Mesh was formulating in my mind, I had the opportunity to meet with a cable provider’s engineering and operations teams to discuss the challenges they had operating their microservice architecture. When we all gathered in the large, very corporate conference room and exchanged the normal introductions, I could see that something just wasn’t right with the folks in the room. They looked like they had been hit by a truck. The reason for that is what turned this meeting into one of the most influential meetings of my life.

It turned out that the entire team had been up all night working on an outage in some of the services that were part of their guide application. We talked about the issue, how it manifested itself and what impact it had on their customers. But there was one statement that has stuck with me since: “The worst part of this 13-hour outage was that it took us 12 hours to get the right person on the phone; and only one hour to get it fixed…”

That is when I knew that a service mesh could solve this problem and increase the engineering efficiency for teams of all sizes. First, by ensuring that in day-to-day engineering and operations, experts were focused on what they were experts of. And second, when things went sideways, it was the strategic point in the stack that would have all the information needed to root-cause a problem — but also be the place that you could rapidly restore your system.

Day-to-Day Engineering and Operations

A service mesh can play a critical role in day-to-day engineering and operations activities, by streamlining processes, reducing test environments and allowing experts to perform their duties independent of application code cycles. This allows DevOps teams to work more efficiently, by allowing developers to focus on providing value to the company’s customers through applications and operators to provide value to their customers through improved customer experience, stability and security.

The properties of a service mesh can enable your organization to run more efficiently and reduce operating costs. Here are some ways a service mesh allows you to do this:

  • Canary testing of applications in production can eliminate expensive staging environments
  • Autoscaling of applications can ensure efficient use of resources.
  • Traffic management can eliminate duplicated coding efforts to implement retry-logic, load-balancing and service discovery.
  • Encryption and certificate management can be centralized to reduce overhead and the need to make application changes and redeployment for changing security policies.
  • Metrics and tracing gives teams access to the information they need for performance and capacity planning, and can help reduce rework and over-provisioning of resources.

As organizations continue to shift-left and embrace DevOps principles, it is important to have the right tools to enable teams to move as quickly and efficiently as possible. A service mesh helps teams achieve this by moving infrastructure-like features out of the individual services and into the platform. This allows teams to leverage them in a consistent and compliant manner; it allows Devs to be Devs and Ops to be Ops, so together they can truly realize the velocity of DevOps.

Reducing Mean-Time-To-Resolution

Like it or not, outages happen. And when they do, you need to be able to root-cause the problem, develop a fix and deploy it as quickly as possible to avoid violating your customer-facing SLAs and your internal SLOs. A service mesh is a critical piece of infrastructure when it comes to reducing your MTTR and ensuring the best possible user experience for your customers. Due to its unique position in the platform, sitting between the container orchestration and application, it has the unique ability to not only gather telemetry data and metrics, but also transparently implement policy and traffic management changes at run time. Here are some ways how:

  • Metrics can be collected by the proxy in a service mesh and used to understand where problems are in the application, show which services are underperforming or using too many resources, and help inform decisions on scaling and resource optimization.
  • Layer 7 traces can be collected throughout the application and correlated together, allowing teams to see exactly where in the call-flow failed.
  • Policy can allow platform teams to direct traffic — and in the case of outages, redirect traffic to other, healthier services.

All of this functionality can be collected and implemented consistently across services — and even clusters — without impacting the application or placing additional burden or requirements on application developers.

It has been said that a minute of downtime can cost an enterprise company up to $5600 per minute. In an extreme example, let’s think back to my meeting with the cable provider. If a service mesh could have enabled their team to get the right expert on the phone in half the time, they would have saved $2,016,000.00. That’s a big number, and more importantly, all of those engineers could have been home with their families that night, instead of in front of their monitors.