Virtual Host Routing with Logical DNS Names

Let me describe a common service mesh scenario...

You've deployed your application and it is happily consuming some external resources on the 'net. For example, say that reviews.default.svc.cluster.local is communicating with external service redis-12.eu-n-3.example.com. But you need to switch to a new external service redis-db-4.eu-n-1.example.com. You're using a service mesh, right? The light bulb goes on — how about we just redirect all traffic from redis-12.eu-n-3.example.com to redis-db-4.eu-n-1.example.com? That certainly will work; add or modify a few resources and voila, traffic is re-routed and with zero downtime!

Only now there's a new problem — your system is looking less like the tidy cluster you started with and more like a bowl of spaghetti!

What if we used a neutral name for the database? How about db.default.svc.cluster.local? We might start with the same mechanism for re-routing traffic: from db.default.svc.cluster.local to redis-12.eu-n-3.example.com. Then when we needed to make the above change we just need to update the configuration to route traffic from db.default.svc.cluster.local to redis-db-4.eu-n-1.example.com. Done and again with zero downtime!

This is Virtual Host Routing to a Logical DNS Name. Virtual Host Routing is traditionally a server-side concept — a server responding to requests for one or more virtual servers. With a service mesh, it's fairly common to also apply this routing to the client side, redirecting traffic destined for one service to another service.

To give you a bit more context, a "logical name" is defined as a placeholder name that is mapped to a physical name when a request is made. An application might be configured to talk to its database at db.default.svc.cluster.local which is then mapped to redis-12.eu-n-3.example.com in one cluster and redis-db-4.eu-n-1.example.com in another.

Common practice is to use configuration to supply the DNS names to an application (add a DB_HOST environment variable set directly to redis-12.eu-n-3.example.com or redis-db-4.eu-n-1.example.com). By setting the configuration to a physical server, it's harder to redirect the traffic later.

Best Practices

What are some best practices for working with external services? Processes like restricting outbound traffic and TLS origination can have a significant impact. The best practices listed below are not required, but this post is written assuming these practices are being followed.

Restricting Outbound Traffic

The outbound traffic policy determines if external services must be declared. A common setting for this policy is ALLOW_ANY — any application running in your cluster can communicate to any external service.  We recommend that the outbound traffic policy is set to REGISTRY_ONLY which requires that external services are defined explicitly. For security, the Aspen Mesh distribution of Istio has REGISTRY_ONLY by default.

If you are using an Istio distribution or if you want to explicitly set the outbound traffic policy, restrict outbound traffic by adding the following to your values file when deploying the istio chart:

global:
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY

TLS Origination

If an application communicates directly over HTTPS to upstream services, the service mesh can't inspect the traffic and it has no idea if requests are failing (it's all just encrypted traffic to the service mesh). The proxy is just routing bits. By having the proxy do "TLS origination", the service mesh sees both requests and responses and can even do some intelligent routing based on the content of the requests.

We'll use the rest of this blog to step through how to configure your application to communicate over just HTTP (change https://... configuration to just http://...).

How to Set Up Virtual Host Routing to a Logical DNS Name

Service

A logical DNS name must still be resolvable. Otherwise the service mesh won't attempt to route traffic to it. In the yaml below, we are defining a DNS name of httpbin.default.svc.cluster.local so that we can route traffic to it.

apiVersion: v1
kind: Service
metadata:
  name: httpbin
spec:
  ports:
  - port: 443
    name: https
  - port: 80
    name: http

ServiceEntry

A service entry indicates that we have services running in our cluster that need to communicate to the outside Internet. The actual host (physical name) is listed (httpbin.org in this example). Note that because we have the proxy doing TLS origination (just plain http between the application and the proxy), port 443 lists a protocol of HTTP (instead of HTTPS).

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.org
  ports:
  - number: 443
    name: http-port-for-tls-origination
    protocol: HTTP
  resolution: DNS
  location: MESH_EXTERNAL

VirtualService

A virtual service defines a set of rules to apply when traffic is routed to a specific host. In this example when traffic is routed to the /foo endpoint of httpbin.default.svc.cluster.local, the following rules are applied:

  1. Rewrite the URI from /foo to /get
  2. Rewrite the HOST header from httpbin.default.svc.cluster.local to httpbin.org
  3. Re-route the traffic to httpbin.org

Note that just re-routing the traffic is not sufficient for the server to handle our requests. The HOST header is how a server understands how to process a request.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - httpbin.default.svc.cluster.local
  http:
  - match:
    - uri:
        prefix: /foo
    rewrite:
      uri: /get
      authority: httpbin.org
    route:
    - destination:
        host: httpbin.org
        port:
          number: 443

DestinationRule

A destination rule defines policies that are applied to traffic after routing has occurred. In this case we define policies for traffic going to port 443 of httpbin.org. The above configuration is routing plain HTTP traffic to port 443. The following destination rule indicates that this traffic should be sent over HTTPS via TLS (the proxy will do TLS origination).

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin.org
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN
    portLevelSettings:
    - port:
        number: 443
      tls:
        mode: SIMPLE # initiates HTTPS when accessing httpbin.org

Testing with a simple pod

That's it! You can now deploy a service and configure it to talk to http://httpbin.default.svc.cluster.local/foo and traffic will get re-routed to https://httpbin.org/get. Let's test it out...

1. Create a pod (just for testing; typically you use deployments to create and manage pods):

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: test-container
    image: pstauffer/curl
command: ["/bin/sleep", "3650d"]
$ kubectl apply -f pod.yaml

The above pod just sleeps for 10 years. Not very interesting by itself but it also provides the curl command that we can use for testing.

2. Curl the logical name:

$ kubectl exec -c test-container test-pod -it -- \
    curl -v http://httpbin.default.svc.cluster.local/foo

Here is the expected output (the response body was removed for brevity):

*   Trying 100.66.72.128...
* TCP_NODELAY set
* Connected to httpbin.virtual-host-routing.svc.cluster.local (100.66.72.128) port 80 (#0)
> GET /foo HTTP/1.1
> Host: httpbin.virtual-host-routing.svc.cluster.local
> User-Agent: curl/7.60.0
> Accept: */*
>
< HTTP/1.1 200 OK
< date: Mon, 23 Sep 2019 22:04:08 GMT
< content-type: application/json
< content-length: 916
< x-amzn-requestid: 1743ed99-df5b-41c2-aa46-9662e10be674
< cache-control: public, max-age=86400
< x-envoy-upstream-service-time: 219
< server: envoy
<
* Connection #0 to host httpbin.virtual-host-routing.svc.cluster.local left intact
...

And with that, you should be set!

Virtual Host Routing to a Logical DNS Name can be a useful tool, allowing a server to communicate with external services without needing to specify the physical DNS name of the external service. And a service mesh makes it easy, enhancing your capabilities and keeping things rational (no offense to spaghetti lovers!).

If you enjoyed learning about (and trying out!) this topic, subscribe to our blog to get updates when new articles are posted.


Microservice Security and Compliance in Highly Regulated Industries: Threat Modeling

The year is 2019, and the number of reported data breaches is up 54% compared to midyear 2018 and is set to be the “worst year on record,’ according to RiskBased Security research. Nearly 31 million records have been exposed in the 13 most significant data breaches of the first half of this year. Exposed documents included personal health information (PHI), personally identifiable information (PII) and financial data. Most of these data breaches were caused by one common flaw: poor technical and human controls that could have easily been mitigated if an essential security process were followed. This simple and essential security process is known as threat modeling.

What is threat modeling?

Threat modeling is the process of identifying and communicating potential risks and threats, then creating countermeasures to respond to those threats. Threat modeling can be applied to multiple areas such as software, systems, networks, and business processes. When threat modeling, you must ask and answer questions about the systems you are working to protect. 

Per OWASP, threat model methodologies answer one or more of the following questions: 

  • What are we building?
    • Outputs:
      • Architecture diagrams
      • Dataflow transitions
      • Data classifications
  • What can go wrong?
    • To best answer this question, organizations typically brainstorm or use structures such as STRIDE, CAPEC or Kill Chains to help determine primary threats that apply to your systems and organization. 
    • Outputs:
      • A list of the main threats that apply to your system.
  • What are we going to do about that?
    • Output
      • Actionable tasks to address your findings.
  • Did we do an acceptable job?
    • Review the quality, feasibility, process, and planning of the work you have done.

These questions require that you step out of your day-to-day responsibilities and holistically consider systems and processes surrounding them. When done right, threat modeling provides a clear view of the project requirements and helps justify security efforts in language everyone in the organization can understand.

Who should be included in threat modeling?

The short answer is, everyone. Threat modeling should not be conducted in a silo by just the security team but should be worked on by a diverse group made up of representatives across the organization. Representatives should include application owners, administrators, architects, developers, product team members, security engineers, data engineers, and even users. Everyone should come together to ask questions, flag concerns and discuss solutions.

A security checklist is essential

In addition to asking and answering general system and process questions, a security checklist should be used for facilitating these discussions. Without a defined and agreed-upon list, your team may overlook critical security controls and won’t be able to evaluate and continually improve standards.

Here’s a simple example of a security checklist:

Authentication and Authorization

☐ Are actors required to authenticate so that there is a guarantee of non-repudiation?

☐ Do all operations in the system require authorization?

Access Control

☐ Is access granted in a role-based fashion?

☐ Are all access decisions relevant at the time the request is performed?

Trust Boundaries

☐ Can you clearly identify where the levels of trust change in your model?

☐ Can you map those to authentication, authorization and access control?

Accounting and Auditing

☐ Are all operations being logged?

☐ Can you guarantee there is no PII, ePHI or secrets being logged?

☐ Are all audit logs adequately tagged?  

When should I start threat modeling? 

“The sooner the better, but never too late.” - OWASP

How often should threat modeling occur?

Threat modeling should occur during system design, and anytime systems or processes change. Ideally, threat modeling is tightly integrated into your development methodology and is performed for all new features and modifications prior to those changes being implemented. By tightly integrating with your development process, you can catch and address issues early in the development lifecycle before they’re expensive and time-consuming to resolve.

Threat modeling: critical for a secure and compliant microservice environment

Securing distributed microservice systems is difficult. The attack surface is substantially larger than an equivalent single system architecture and is often much more difficult to fully comprehend all of the ways data flows through the system. Given that microservices can be short-lived and replaced on a moment's notice, the complexity can quickly compound. This is why it is critical that threat modeling is tightly integrated into your development process as early as possible.     

Aspen Mesh makes it easier to implement security controls determined during threat modeling

Threat modeling is only one step in a series of steps required to secure your systems. Thankfully, Aspen Mesh makes it trivial to implement security and compliance controls with little to no custom development required, thus allowing you to achieve your security and compliance goals with ease. If you would like to discuss the most effective way for your organization to secure their microservice environments, grab some time to talk through your use case and how Aspen Mesh can help solve your security concerns.


Why You Want Idempotency Anyway

We've been talking about how you can use a service mesh to do progressive delivery lately.  Progressive delivery fundamentally is about decoupling software delivery from user activation of said software.  Once decoupled, the user activation portion is under business control. It's early days, but the promise here is that software engineering can build new stuff as fast as they can, and your customer success team (already keeping a finger on the pulse of the users and market) can independently choose when to introduce what new features (and associated risk) to whom.

We've demonstrated how you can use Flagger (a progressive delivery Kubernetes operator) to command a service mesh to do canary deploys.  These help you activate functionality for only a subset of traffic and then progressively increase that subset as long as the new functionality is healthy.  We're also working on an enhancement to Flagger to do traffic mirroring. I think this is really cool because it lets you try out a feature activation without actually exposing any users to the impact.  It's a "pre-stage" to a canary deployment: Send one copy to the original service, one copy to the canary, and check if the canary responds as well as the original.

There's a caveat we bring up when we talk about this, however: Idempotency.  You can only do traffic mirroring if you're OK with duplicating requests, and sending one to the primary and one to the canary.  If your app and infrastructure is OK with duplicating these requests, they are said to be idempotent.

Idempotency

Idempotency is the ability to apply an operation more than once and not change the result.  In math, we'd say that:

f(f(a)) = f(a)

An example of a mathematical function that's idempotent is ABS(), the absolute value.

ABS(-3.7) = 3.7
ABS(ABS(3.7)) = ABS(3.7) = 3.7

We can repeat taking the absolute value of something as many times as we want, it won't change any more after the first time.  Similar things in your math toolbox: CEIL(), FLOOR(), ROUND(). But SQRT() is not in general idempotent. SQRT(16) = 4, SQRT(SQRT(16)) = 2, and so on.

For web apps, idempotent requests are those that can be processed twice without causing something invalid to happen.  So read-only operations like HTTP GETs are always idempotent (as long as they're actually only reads). Some kinds of writes are also idempotent: suppose I have a request that says "Set andrews_timezone to MDT".  If that request gets processed twice, my timezone gets set to MDT twice. That's OK. The first one might have changed it from PDT to MDT, and the second one "changes" it from MDT to MDT, so no change. But in the end, my timezone is MDT and so I'm good.

An example of a not-idempotent request is one that says "Deduct $100 from andrews_account".  If we apply that request twice, then my account will actually have $200 deducted from it and I don't want that.  You remember those e-commerce order pages that say "Don't click reload or you may be billed twice"? They need some idempotency!

This is important for traffic mirroring because we're going to duplicate the request and send one copy to the primary and one to the canary.  While idempotency is great for enabling this traffic mirroring case, I'm here to tell you why it's a great thing to have anyway, even if you're never going to do progressive delivery.

Exactly-Once Delivery and Invading Generals

There's a fundamental tension that emerges if you have distributed systems that communicate over an unreliable channel.  You can never be sure that the other side received your message exactly once.  I'll retell the parable of the Invading Generals as it was told to me the first time I was sure I had designed a system that solved this paradox.

There is a very large invading army that has camped in a valley for the night.  The defenders are split into two smaller armies that surround the invading army; one set of defenders on the eastern edge of the valley and one on the west.  If both defending armies attack simultaneously, they will defeat the invaders. However, if only one attacks, it will be too small; the invaders will defeat it, and then turn to the remaining defenders at their leisure and defeat them as well.

The general for the eastern defenders needs to coordinate with the general for the western defenders for simultaneous attack.  Both defenders can send spies with messages through the valley, as many as they want. Each spy has a 10% chance of being caught by the invaders and killed before his message is delivered, and a 90% chance of successfully delivering the message.  You are the general for the eastern defenders: what message will you send to the western defenders to guarantee you both attack simultaneously and defeat the invaders?

Turns out, there is no guaranteed safe approach.  Let's go through it. First, let's send this message:

"Western General, I will attack at dawn and you must do the same."

- Eastern General

There's a 90% chance that your spy will get through, but there's a 10% chance that he won't, only you will attack and you will lose to the invaders.  The problem statement said you have infinite spies, we must be able to do better!

OK, let's send lots of spies with the same message.  Then our probability of success is 1-0.1^n, where n is the number of spies.  So we can asymptotically approach 100% probability that the other side agrees, but we can never be sure.

How about this message:

"Western General, I am prepared to attack at dawn.  Send me a spy confirming that you have received this message so I know you will also attack.  If I don't receive confirmation, I won't attack because my army will be defeated if I attack alone."

- Eastern General

Now, if you don't receive a spy back from the western general you'll send another, and another, until you get a response.  But.... put yourself in the shoes of the western general. How does the western general know that you'll receive the confirmation spy?  Should the western army attack at dawn? What if the confirmation spy was caught and now only the western army attacks, ensuring defeat?

The western general could send lots of confirmation spies, so there is a high probability that at least one gets through.  But they can't guarantee with 100% probability that one gets through.

The western general could also send this response:

"Eastern General, we have received your spy.  We are also prepared to attack at dawn. We will be defeated if you do not also attack, and I know you won't attack if you don't know that we have received your message.  Please send back a spy confirming that you have received my confirmation or else we will not attack because we will be destroyed."

 

- Western General

A confirmation of a confirmation! (In networking ARQ terms, an ACK-of-an-ACK).  Again, this can reduce probability but cannot provide guarantees: we can keep shifting uncertainty between the Eastern and Western generals but never eliminate it.

Engineering Approaches

Okay, we can't know for sure that our message is delivered exactly once (regardless of service mesh or progressive delivery or any of that), so what are we going to do?  There are a few approaches:

• Retry naturally-idempotent requests

• Uniquefy requests

• Conditional updates

• Others

Retry Naturally-Idempotent Requests

If you have a request that is naturally idempotent, like getting the temperature on a thermostat, the end user can just repeat it if they didn't get the response they want.

Uniqueify Requests

Another approach is to make requests unique at the client, and then have all the other services avoid processing the same unique request twice.  One way to do this is to invent a UUID at the client and then have servers remember all the UUIDs they've already seen. My deduction request would then look like:

This is unique request f41182d1-f4b2-49ec-83cc-f5a8a06882aa.
If you haven't seen this request before, deduct $100 from andrews_account.

Then you can submit this request as many times as you want to the processor, and the processor can check if it's handled "f41182d1-f4b2-49ec-83cc-f5a8a06882aa" before.  There are a few caveats here.

First you have to have a way to generate unique identifiers.  UUIDs are pretty good but theoretically there's an extremely small possibility of UUID collision; practically there's a couple of minor foot-guns to watch out for like generating UUIDs on two VMs or containers that both have fake virtual MAC addresses that match.  You can also have the server make the unique identifier for you (it could be an auto-generated primary key in a database that is guaranteed to be unique).

Second your server has to remember all the UUIDs that you have processed.  Typically you put these in a database (maybe using UUID as a primary key anyway).  If the record of processed UUIDs is different than the action you take when processing, there's still a "risk window": you might commit a UUID and then fail to process it, or you might process it and fail to commit the UUID.  Algorithms like two-phase commit and paxos can help close the risk window.

Conditional Updates

Another approach is to include information in the request about what things looked like when the client sent the request, so that the server can abort the request if something has changed.  This includes the case that the "change" is a duplicate request and we've already processed it.

For instance, maybe my bank ledger looks like this:

Then I would make my request look like:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

If this request gets duplicated, the first will succeed and the second will fail.  After the first:

The duplicated request will fail:

As long as the last transaction in andrews_account is number 563,
Create entry 564: Deduct $100 from andrews_account

In this case the server could respond to the first copy with "Success" and the second copy with a soft failure like "Already committed" or just tell the client to read and notice that its update already happened.  MongoDB, AWS Dynamo and others support these kinds of conditional updates.

Others

There are many practical approaches to this problem.  I recommend doing some initial reasoning about idempotency, and then try to shift as much functionality as you can to the database or persistent state layer you're using.  While I gave a quick tour of some of the things involved in idempotency, there are a lot of other tricks like write-ahead journalling, conflict-free replicated data types and others that can enhance reliability.

Conclusion

Traffic mirroring is a great way to exercise canaries in a production environment before exposing them to your users.  Mirroring makes a duplicate of each request and sends one copy to the primary, one copy to the new canary version of your microservice.  This means that you must use mirroring only for idempotent requests: requests that can be applied twice without causing something erroneous to happen.

This caveat probably exists even if you aren't doing traffic mirroring, because networks fail.  The Eastern General and Western General can never really be sure their messages are delivered exactly once, there will always be a case where they may have to retry.  I think you want to build idempotency wherever possible, and then you should use traffic mirroring to test your canary deployments.


Aspen Mesh Enterprise Service Mesh

Aspen Mesh for Self-hosted Environments

We’re excited to announce the launch of our first self-managed version of Aspen Mesh, designed for deployment to your infrastructure within any cloud or on-premises.  With Aspen Mesh 1.2.5-am1 you get the advanced functionality, the rich dashboard, and the expert support you’re used to from Aspen Mesh, all built on the open source power of Istio 1.2.5.

With this release, we're making it easy for enterprises with self-hosted environments to get all the benefits of Aspen Mesh.  This means you can now use your existing Prometheus, Grafana and Jaeger with Aspen Mesh and we no longer require customers to send data out of their clusters.  We’re deprecating prior versions of Aspen Mesh that include these hosted elements. All Aspen Mesh customers will need to upgrade to Aspen Mesh 1.2.5-am1 before October 17, 2019.

If you’d like any help with this upgrade, reach out your account rep or email us at support@aspenmesh.io.

Why the change?

We first started building Aspen Mesh in the summer of 2017, launching the first version built on Istio 0.2.4. Since then, we’ve focused on helping enterprises harness the power of service mesh by delivering an integrated solution that provided the core elements of Istio along with a hosted solution for Prometheus, Grafana, and Jaeger. 

What we’ve found is that while enterprises are looking to work with someone like Aspen Mesh to better harness the power of service mesh, they usually have existing installations of Prometheus, Grafana, and Jaeger and don’t want or need a hosted, integrated, supported solution.  And customers who wanted to get started with basic insights right away had to conduct security audits before they could send us service metrics and trace headers.

Installing Aspen Mesh in your environment

With those obstacles out of the way, it’s now much easier to get started using Aspen Mesh in your environment.  All you need is a Kubernetes cluster that has Prometheus and Helm/Tiller and you’re ready to go. Follow the detailed instructions in our Getting Started Guide and reach out to us on support@aspenmesh.io if you have any questions.  If you don’t have an account yet, sign up now so you can view our releases and documentation.

What’s next?

We’re heads down working on our next set of features that will help enterprises better take advantage of the rich telemetry available in the mesh and better harness the power of mesh policy at scale within their organizations.  Keep an eye on this space for more.


Service Mesh Insider: An Interview with Shawn Wormke

Have you ever wondered how we got to service mesh? What backgrounds, experiences and technologies led to the emergence of service mesh? 

We recently put together an interview with Aspen Mesh’s Founder, Shawn Wormke in order to get the inside scoop for you. Read on to find out the answers to these three questions:

  1. What role did your technical expertise play in how Aspen Mesh focuses on enterprise service mesh?
  2. Describe how your technical and business experience combined to create an enterprise startup and inform your understanding of how to run a “modern” software company?
  3. What characteristics define a “modern” enterprise, and how does Aspen Mesh contribute to making it a reality?

1. What role did your technical expertise play in how Aspen Mesh focuses on the enterprise?

I started my career at Cisco working in network security and firewalls on the ASA product line and later the Firewall Services Module for the Catalyst 6500/7600 series switches and routers. Both of these products were focused on the enterprise at a time when security was starting to move up the stack and become more and more distributed throughout the network. We were watching our customers move from L2 transparent firewalls to L3/L4 firewalls that required application logic in order to “fixup” dynamic ports for protocols like FTP, SIP and H.323. Eventually that journey up the stack continued to L7 firewalls that were doing URL, header and payload inspection to enforce security policy.

At the same time that this move up the stack was happening, customers were starting to look at migrating workloads to VMs and were demanding new form factors and valuing different performance metrics. No longer were speeds, feeds and dragstrip numbers important, the focus was shifting to footprint and elasticity. The result in this shift in priority was a change in mindset when it came to how enterprises were thinking about expenses. They started to think about shifting expenses from large capacity stranding CAPEX purchases to more frequent OPEX transactions that were aligned with a software-first approach.

It was this idea that led me to join as one of the first engineers at a small startup in Boulder, CO called LineRate Systems which was eventually acquired by F5 Networks. The company was founded on a passion for making high performance, lightweight application delivery (aka load balancing) software that was as fast as the industry standard hardware. Our realization was that Commodity Off the Shelf (COTS) hardware had so much performance that if leveraged properly it was possible to offer the same performance at a much lower cost.

But the big idea, the one that ultimately got us noticed by F5, was that if the hardware was freely available (everyone had racks and racks of servers), we could charge our customers for a performance range and let them stamp out the software--as much as they needed--to achieve that. This removed the risk of the transaction from the customer as they no longer had to pre-plan 3-5 years worth of capacity.  It placed the burden on the provider to deliver an efficient and API-first elastic platform and a pricing model that scaled along the same dimensions as their business needs.

After acquisition we started to use containers and eventually Kubernetes for some of our build and test infrastructure. The use of these technologies led us to realize that they were great for increasing velocity and agility, but were difficult to debug and secure. We had no record of what our test containers did or who they talked to at runtime and we had no idea what data they were accessing. If we had a way to make sense of all of this, life would be so much easier.

This led us to work on some internal projects that experimented with ideas that we all now know as service mesh. We even released a product that was the beginning of this called the Application Services Proxy, which we ultimately end-of-lifed in 2017 when we made the decision to create Aspen Mesh.

In 2018 Aspen Mesh was born as an F5 Incubation. It is a culmination of almost 20 years of solving network and security problems for some of the world's largest enterprise customers and ensuring that the form-factor, consumption and pricing models are flexible and grow along with the businesses that use it. It is acknowledgement that disruption is happening everywhere and that an organization’s agility and ability to respond to disruption is it's number one business asset. Companies are realizing this agility by redefining how they deliver value to their customers as quickly as possible using technologies like cloud, containers and Kubernetes.

We know that for enterprises, agility with stability is the number one competitive advantage. Through years of experience working on enterprise products we know that companies who can meet their evolving customer needs--while staying out of the news for downtime and security breaches--will be the winners of tomorrow. Aspen Mesh’s Enterprise Service Mesh enables enterprises to rapidly deliver value to their customers in a performant, secure and compliant way.

2. Describe how your technical and learned business experience combine to build an enterprise startup and inform your understanding of how best to run a “modern” software company?

Throughout my career I have been part of waterfall to agile transformations, worked on products that enabled business agility and now run a team that requires that same flexibility and business agility. We need to be focused on getting product to market that shows value to our customers as quickly as possible. We rely on automation to ensure that we are focusing our human capital on the most important tasks. We rely on data to make our decisions and ensure that the data we have is trustworthy and secure.

The great thing is that we get to be the ones doing the disrupting, and not the ones getting disrupted. What this means is we get to move fast and don’t have the burden of a large enterprise decision-making process. We can be agile and make mistakes, and we are actually expected to make mistakes. We are told "no" more than we are told "yes." But, learning from those failures and making course corrections along the way is key to our success.

Over the years I have come to embrace the power of open source and what it can do to accelerate projects and the impacts (both positive and negative) it can have on your company. I believe that in the future all applications will be born from open technologies. Companies that acknowledge and embrace this will be the most successful in the new cloud-native and open source world. How you choose to do that depends on your business and business model. You can be a consumer of OSS in your SaaS platform, an open-core product, glue multiple projects together to create a product or provide support and services; but if you are not including open source in your modern software company, you will not be successful.

Over the past 10 years we have seen and continue to see consumption models across all verticals rapidly evolve from perpetual NCR-based sales models with annual maintenance contracts to subscription or consumption based models to fully managed SaaS based offerings. I recently read an article on subscription based banking. This is driven from the desire to shift the risk to the producer instead of the consumer. It is a realization by companies that capacity planning for 3-5 years is impossible, and that laying out that cash is a huge risk to the business they are no longer willing to take. It is up to technology producers to provide enough value to attract customers and then continue providing value to them to retain them year over year.

Considering how you are going to offer your product in a way that scales with your customers value matrix and growth plans is critical. This applies to pricing as well as product functionality and performance.

Finally, I would be negligent if I didn’t mention data as a paramount consideration when running a modern software company. Insights derived from that data need to be at the center of everything you do. This goes not only for your product, but also your internal visibility and decision making processes. 

On the product side, when dealing with large enterprises it is critical to understand what your customers are willing to give you and how much value they need to realize in return. An enterprise's first answer will often be “no” when you tell them you need to access their data to run your product, but that just means you haven’t shown them enough value to say "yes." You need to consider what data you need, how much you need of it, where it will be stored and how you are protecting it.

On the internal side you need to measure everything. The biggest challenge I have found with an early-stage, small team is taking the time to enable these measurements. It is easy to drop off the list when you are trying to get new features out the door and you don’t yet know what you're going to do with the data. Resist that urge and force your teams to think about how they can do both, and if necessary take the time to slow down and put it in. Sometimes being thoughtful early on can help you go fast later, and adding hooks to gather and analyze data is one of those times.

Operating a successful modern software company requires you embrace all the cliches about wearing multiple hats and failing fast. It's also critical to focus on being agile, embrace open source, create a consumption based offering, and rely on data, data, data and more data.

3. What characteristics define a “modern” enterprise, and how does Aspen Mesh contribute to making it a reality?

The modern enterprise craves agility and considers it to be their number one business advantage. This agility is what allows the enterprise to deliver value to customers as quickly as possible. This agility is often derived from a greater reliance on technology to enable rapid speed to market. Enterprises are constantly defending against disruption from non-traditional startup companies with seemingly unlimited venture funding and no expectation of profitability. All the while the enterprise is required to compete and deliver value while maintaining the revenue and profitability goals that their shareholders have grown to expect over years of sustained growth. 

In order to remain competitive, enterprises are embracing new business models and looking for new ways to engage their customers through new digital channels. They are relying more on data and analytics to make business decisions and to make their teams and operations more efficient. Modern enterprises are embracing automation to perform mundane repetitive tasks and are turning over their workforce to gain the technical talent that allows them to compete with the smaller upstart disruptors in their space.

But agility without stability can be detrimental to an enterprise. As witnessed by many recent reports, enterprises can struggle with challenges around data and data security, perimeter breaches and downtime. It's easy to get caught up in the promise of the latest new technology, but moving fast and embracing new technology requires careful consideration for how it integrates into your organization, it's security posture and how it scales with your business. Finding a trusted partner to accompany you on your transformation journey is key to long term success.

Aspen Mesh is that technology partner when it comes to delivering next generation application architectures based on containers and Kubernetes. We understand the power and promise of agility and scalability that these technologies offer, but we also know that they introduce a new set of challenges for enterprises. These challenges include securing communication between services, observing and controlling service behavior and problems and managing the policy associated with services across large distributed organizations. 

Aspen Mesh provides a fully supported service mesh that is focused on enterprise use cases that include:

  • An advanced policy framework that allows users to describe business goals that are enforced in the application’s runtime environment
  • Role based policy management that enables organizations to create and apply policies according to their needs
  • A catalog of policies based on industry and security best practices that are created and tested by experts
  • Data analytics-based insights for enhanced troubleshooting and debugging
  • Predictive analytics to help teams detect and mitigate problems before they happen
  • Streamlined application deployment packages that provide a uniform approach to authentication and authorization, secure communications, and ingress and egress control
  • DevOps tools and workflow integration
  • A simplified user experience with improved organization and streamlined navigation to enable users to quickly find and mitigate failures and security issues
  • A consistent view of applications across multiple clouds to allow visibility from a global perspective to a hyper-local level
  • Graph visualizations of application relationships that enable teams to collaborate seamlessly on focused subsets of their infrastructure
  • Tabular representations surfacing details to find and remediate issues across multiple clusters running dozens or hundreds of services
  • A reduced-risk scalable consumption model that allows customers to pay as they grow

Thanks for reading! We hope that helps shed some light on what goes on behind the scenes at Aspen Mesh. And if you liked this post, feel free to subscribe to our blog in order to get updates when new articles are released.