Microservice Security and Compliance in Highly Regulated Industries: Threat Modeling

The year is 2019, and the number of reported data breaches is up 54% compared to midyear 2018 and is set to be the “worst year on record,’ according to RiskBased Security research. Nearly 31 million records have been exposed in the 13 most significant data breaches of the first half of this year. Exposed documents included personal health information (PHI), personally identifiable information (PII) and financial data. Most of these data breaches were caused by one common flaw: poor technical and human controls that could have easily been mitigated if an essential security process were followed. This simple and essential security process is known as threat modeling.

What is threat modeling?

Threat modeling is the process of identifying and communicating potential risks and threats, then creating countermeasures to respond to those threats. Threat modeling can be applied to multiple areas such as software, systems, networks, and business processes. When threat modeling, you must ask and answer questions about the systems you are working to protect. 

Per OWASP, threat model methodologies answer one or more of the following questions: 

  • What are we building?
    • Outputs:
      • Architecture diagrams
      • Dataflow transitions
      • Data classifications
  • What can go wrong?
    • To best answer this question, organizations typically brainstorm or use structures such as STRIDE, CAPEC or Kill Chains to help determine primary threats that apply to your systems and organization. 
    • Outputs:
      • A list of the main threats that apply to your system.
  • What are we going to do about that?
    • Output
      • Actionable tasks to address your findings.
  • Did we do an acceptable job?
    • Review the quality, feasibility, process, and planning of the work you have done.

These questions require that you step out of your day-to-day responsibilities and holistically consider systems and processes surrounding them. When done right, threat modeling provides a clear view of the project requirements and helps justify security efforts in language everyone in the organization can understand.

Who should be included in threat modeling?

The short answer is, everyone. Threat modeling should not be conducted in a silo by just the security team but should be worked on by a diverse group made up of representatives across the organization. Representatives should include application owners, administrators, architects, developers, product team members, security engineers, data engineers, and even users. Everyone should come together to ask questions, flag concerns and discuss solutions.

A security checklist is essential

In addition to asking and answering general system and process questions, a security checklist should be used for facilitating these discussions. Without a defined and agreed-upon list, your team may overlook critical security controls and won’t be able to evaluate and continually improve standards.

Here’s a simple example of a security checklist:

Authentication and Authorization

☐ Are actors required to authenticate so that there is a guarantee of non-repudiation?

☐ Do all operations in the system require authorization?

Access Control

☐ Is access granted in a role-based fashion?

☐ Are all access decisions relevant at the time the request is performed?

Trust Boundaries

☐ Can you clearly identify where the levels of trust change in your model?

☐ Can you map those to authentication, authorization and access control?

Accounting and Auditing

☐ Are all operations being logged?

☐ Can you guarantee there is no PII, ePHI or secrets being logged?

☐ Are all audit logs adequately tagged?  

When should I start threat modeling? 

“The sooner the better, but never too late.” - OWASP

How often should threat modeling occur?

Threat modeling should occur during system design, and anytime systems or processes change. Ideally, threat modeling is tightly integrated into your development methodology and is performed for all new features and modifications prior to those changes being implemented. By tightly integrating with your development process, you can catch and address issues early in the development lifecycle before they’re expensive and time-consuming to resolve.

Threat modeling: critical for a secure and compliant microservice environment

Securing distributed microservice systems is difficult. The attack surface is substantially larger than an equivalent single system architecture and is often much more difficult to fully comprehend all of the ways data flows through the system. Given that microservices can be short-lived and replaced on a moment's notice, the complexity can quickly compound. This is why it is critical that threat modeling is tightly integrated into your development process as early as possible.     

Aspen Mesh makes it easier to implement security controls determined during threat modeling

Threat modeling is only one step in a series of steps required to secure your systems. Thankfully, Aspen Mesh makes it trivial to implement security and compliance controls with little to no custom development required, thus allowing you to achieve your security and compliance goals with ease. If you would like to discuss the most effective way for your organization to secure their microservice environments, grab some time to talk through your use case and how Aspen Mesh can help solve your security concerns.


Microservice Security and Compliance in Highly Regulated Industries: Zero Trust Security

Zero Trust Security

Security is the most critical part of your application to implement correctly. Failing to secure your users’ data can be very expensive and can make customers lose their faith in your ability to protect their sensitive data. A recent IBM-sponsored study showed that the average cost of a data breach is $3.92 million, with healthcare being the most expensive industry with an average of $6.45 million per breach. What else might be surprising is that the average time to identify and contain a breach is 279 days, while the average lifecycle of a malicious attack from breach to containment is 314 days. 

Traditionally network security has been based on having a strong perimeter to help thwart attackers, commonly known as the moat-and-castle approach. This approach is no longer effective in a world where employees expect access to applications and data from anywhere in the world, on any device. This shift is forcing organizations to evolve at a rapid rate to stay competitive in the market, and has left many engineering teams scrambling to keep up with employee expectations. Often this means rearchitecting systems and services to meet these expectations, which is often difficult, time consuming, and error prone.   

In 2010 Forester coined the term ‘Zero Trust’ where they flipped the current security models on their heads by changing how we think about cyberthreats. The new model is to assume you’ve been compromised, but may not yet be aware of it. A couple years later, Google announced they had implemented Zero Trust into their networking infrastructure with much success. Fast forward to 2019 and the plans to adopt this new paradigm have spread across industries like wildfire, mostly due to the massive data breaches and stricter regulatory requirements.

Here are the key Zero Trust Networking Principles:

  • Networks should always be considered hostile. 
    • Just because you’re inside the “castle” does not make you safe.
  • Network locality is not sufficient for deciding trust in a network.
    • Just because you know someone next to you in the “castle”, doesn’t mean you should trust them.
  • Every device, user, and request is authenticated and authorized.
    • Ensure that every person entering the “castle” has been properly identified and is allowed to enter.
  • Network policies must be dynamic and calculated from as many sources of data as possible. 
    • Ask as many people as possible when validating if someone is allowed to enter the “castle”.

Transitioning to Zero Trust Networking can dramatically increase your security posture, but until recent years, it has been a time consuming and difficult task that required extensive security knowledge within engineering teams, sophisticated internal tooling that could manage workload certificates, and service level authentication and authorization. Thankfully service mesh technologies, such as Istio, allow us to easily implement Zero Trust Networking across our microservices and clusters with little effort, minimal service disruption, and does not require your team to be security experts. 

Zero Trust Networking With Istio

Istio provides the following features that help us implement Zero Trust Networking in our infrastructure:

  • Service Identities
  • Mutual Transport Layer Security (mTLS)
  • Role Based Access Control (RBAC) 
  • Network Policy

Service Identities

One of the key Zero Trust Networking principles requires that “every device, user, and request is authenticated and authorized”. Istio implements this key foundational principle by issuing secure identities to services, much like how application users are issued an identity. This is often referred to as the SVID (Secure and Verifiable Identification) and is used to identify the services across the mesh, so they can be authenticated and authorized to perform actions. Service identities can take different forms based on the platform Istio is deployed on, for example:

  • When deployed on:
    • Kubernetes: Istio can use Kubernetes service accounts.
    • Amazon Web Services (AWS): Istio can use AWS IAM user and role accounts.
    • Google Kubernetes Engine (GKE): Istio can use Google Cloud Platform (GCP) service accounts.

Mutual Transport Layer Security (mTLS)

To support secure Service Identities and to secure data in transit, Istio provides mTLS for encrypting service-to-service communication and achieving non-repudiation for requests. This layer of security reduces the likelihood of a successful Man-in-The-Middle attack (MiTM) by requiring all parties in a request to have valid certificates that trust each other. The process for certificate generation, distribution, and rotation is automatically handled by a secure Istio service called Citadel.  

Role Based Access Control (RBAC)

Authorization is a critical part of any secure system and is required for a successful Zero Trust Networking implementation. Istio provides flexible and highly performant RBAC via centralized policy management, so you can easily define what services are allowed to communicate and what endpoints services and users are allowed to communicate with. This makes the implementation of the principle of least privilege (PoLP) simple and reduces the development teams’ burden of creating and maintaining authorization specific code.

Network Policy

With Istio’s centralized policy management, you can enforce networking rules at runtime. Common examples include, but are not limited to the following:

  • Whitelisting and blacklisting access to services, so that access is only granted to certain actors.
  • Rate limiting traffic, to ensure a bad actor does not cause a Denial of Service attack.
  • Redirecting requests, to enforce that certain actors go through proper channels when making their requests.

Cyber Attacks Mitigated by Zero Trust Networking With Istio

The following are example attacks that can be mitigated:

  1. Service Impersonation - A bad actor is able to gain access to the private network for your applications, pretends to be an authorized service, and starts making requests for sensitive data.
  2. Unauthorized Access - A legitimate service makes requests for sensitive data that it is not authorized to obtain. 
  3. Packet Sniffing - A bad actor gains access to your applications private network and captures sensitive data from legitimate requests going over the network.
  4. Data Exfiltration - A bad actor sends sensitive data out of the protected network to a destination of their choosing.

Applying Zero Trust Networking in Highly Regulated Industries

To combat increased high profile cyber attacks, regulations and standards are evolving to include stricter controls to enforce that organizations follow best practices when processing and storing sensitive data. 

The most common technical requirements across regulations and standards are:

  • Authentication - verify the identity of the actor seeking access to protected data.
  • Authorization - verify the actor is allowed to access the requested protected data.
  • Accounting - mechanisms for recording and examining activities within the system.
  • Data Integrity - protecting data from being altered or destroyed in an unauthorized manner.

As you may have noticed, applying Zero Trust Networking within your application infrastructure does not only increase your security posture and help mitigate cyber attacks, it also addresses control requirements set forth in regulations and standards, such as HIPAA, PCI-DSS, GDPR, and FISMA.

Use Istio to Achieve Zero Trust the Easy Way

High profile data breaches are at an all time high, cost an average of $3.92 million, and they take upwards of 314 days from breach to containment. Implementing Zero Trust Networking with Istio to secure your microservice architecture at scale is simple, requires little effort, and can be completed with minimal service disruption. If you would like to discuss the most effective way for your organization to achieve zero trust, grab some time to talk through your use case and how Aspen Mesh can help solve your security concerns.


From NASA to Service Mesh

The New Stack recently published a podcast featuring our CTO, Andrew Jenkins discussing How Service Meshes Found a Former Space Dust Researcher. In the podcast, Andrew talks about how he moved from working on electrical engineering and communication protocols for NASA to software and finally service mesh development here at Aspen Mesh.

“My background is in electrical engineering, and I used to work a lot more on the hardware side of it, but I did get involved in communication, almost from the physical layer, and I worked on some NASA projects and things like that,” said Jenkins. “But then my career got further and further up into the software side of things, and I ended up at a company called F5 Networks. [Eventually] this ‘cloud thing’ came along, and F5 started seeing a lot of applications moving to the cloud. F5 offers their product in a version that you use in AWS, so what I was working on was an open source project to make a Kubernetes ingress controller for the F5 device. That was successful, but what we saw was that a lot of the traffic was shifting to the inside of the Kubernetes cluster. It was service-to-service communication from all these tiny things--these microservices--that were designed to be doing business logic. So this elevated the importance of communication...and that communication became very important for all of those tiny microservices to work together to deliver the final application experience for developers. So we started looking at that microservice communication inside and figuring out ways to make that more resilient, more secure and more observable so you can understand what’s going on between your applications.”

In addition, the podcast covers the evolution of service mesh, more details about tracing and logging, canaries, Kubernetes, YAML files and other surrounding technologies that extend service mesh to help simplify microservices management.

“I hope service meshes become the [default] way to deal with distributed tracing or certificate rotation. So, if you have an application, and you want it to be secure, you have to deal with all these certs, keys, etc.,” Jenkins said. “It’s not impossible, but when you have microservices, you do not have to do it a whole lot more times. So that’s why you get this better bang for the buck by pushing that down into that service mesh layer where you don’t have to repeat it all the time.”

To listen to the entire podcast, visit The News Stack’s post.

Interested in reading more articles like this? Subscribe to the Aspen Mesh blog:


The Complete Guide to Service Mesh

What’s Going On In The Service Mesh Universe?

Service meshes are relatively new, extremely powerful and can be complex. There’s a lot of information out there on what a service mesh is and what it can do, but it’s a lot to sort through. Sometimes, it’s helpful to have a guide. If you’ve been asking questions like “What is a service mesh?” “Why would I use one?” “What benefits can it provide?” or “How did people even come up with the idea for service mesh?” then The Complete Guide to Service Mesh is for you.

Check out the free guide to find out:

  • The service mesh origin story
  • What a service mesh is
  • Why developers and operators love service mesh
  • How a service mesh enables DevOps
  • Problems a service mesh solves

The Landscape Right Now

A service mesh overlaps, complements, and in some cases, replaces many tools that are commonly used to manage microservices. Last year was all about evaluating and trying out service meshes. But while curiosity about service mesh is still at a peak, enterprises are already in the evaluation and adoption process.

The capabilities service mesh can add to ease managing microservices applications at runtime are clearly exciting to early adopters and companies evaluating service mesh. Conversations tell us that many enterprises are already using microservices and service mesh, and many others are planning to deploy in the next six months. And if you’re not yet sure about whether or not you need a service mesh, check out the recent Gartner, 451 and IDC reports on microservices — all of which say a service mesh will be mandatory by 2020 for any organization running microservices in production.

Get Started with Service Mesh

Are you already using Kubernetes and Istio? You might be ready to get started using a service mesh. Download Aspen Mesh here or contact us to talk with a service mesh expert about getting set up for success.

Get the Guide

Fill out the form below to get your copy of The Complete Guide to Service Mesh.


Security

A Service Mesh Helps Simplify PCI DSS Compliance

PCI DSS is an information security standard for organizations that handle credit card data. The requirements are largely around developing and maintaining secure systems and applications and providing appropriate levels of access — things a service mesh can make easier.

However, building a secure, reliable and PCI DSS-compliant microservice architecture at scale is a difficult undertaking, even when using a service mesh.

It requires, for example, 12 separate requirements, each of which has different sub-requirements. Additionally, some of the requirements are vague and are left up to the designated Qualified Security Assessor (QSA) to make their best judgment based on the design in question.

Meeting these requirements involves:

  • Controlling what services can talk to each other;
  • Guaranteeing non-repudiation for actors making requests;
  • Building accurate real-time and historical diagrams of cardholder data flows across systems and networks when services can be added, removed or updated at a team’s discretion.

Achieving PCI DSS compliance at scale can be simplified by implementing a uniform layer of infrastructure between services and the network that provides your operations team with centralized policy management and decouples them from feature development and release processes, regardless of scale and release velocity. This layer of infrastructure is commonly referred to as a service mesh. A service mesh provides many features that simplify compliance management, such as fine-grained control over service communication, traffic encryption, non-repudiation via service to service authentication with strong identity assertions, and rich telemetry and reporting.

Below are some of the key PCI DSS requirements listed in the 3.2.1 version of the requirements document, where a service mesh helps simplify the implementation of both controls and reporting:

  • Requirement 1: Install and maintain a firewall configuration to protect cardholder data
    • The first requirement focuses on firewall and router configurations that ensure cardholder data is only accessed when it should be and only by authorized sources.
  • Requirement 6: Develop and maintain secure systems and applications
    • The applicable portions of this requirement focus on encrypting network traffic using strong cryptography and restricting user access to URLs and functions.
  • Requirement 7: Restrict access to cardholder data by business need to know.
    • Arguably one of the most critical requirements in PCI DSS, since even the most secure system can be easily circumvented by overprivileged employees. This requirement focuses on restricting privileged users to least privileges necessary to perform job responsibilities, ensuring access to systems are set to “deny all” by default, and ensuring proper documentation detailing roles and responsibilities are in place.
  • Requirement 8: Identify and authenticate access to system components
    • Building on the foundation of requirement 7, this requirement focuses on ensuring all users have a unique ID; controlling the creation, deletion and modification of identifier objects; revoking access; utilizing strong cryptography during credential transmission; verifying user identity before modifying authentication credentials.
  • Requirement 10: Track and monitor all access to network resources and cardholder data
    • This requirement puts a heavy emphasis on designing and implementing a secure, reliable and accurate audit trail of all events within the environment. This includes capturing all individual user access to cardholder data, invalid logical access attempts and intersystem communication logs. All audit trail entries should include: user identification, type of event, date and time, success or failure indication, origin of event and the identity of the affected data or resource.

Let’s review how Aspen Mesh, the enterprise-ready service mesh built on Istio, helps simplify both the implementation and reporting of controls for the above requirements.

‘Auditable’ Real-time and Historical Dataflows

Keeping records of how data flows through your system is one of key requirements of PCI DSS compliance (1.1.3), as well as a security best practice. With Aspen Mesh, you can see a live view of your service-to-service communications and retrieve historical configurations detailing what services were communicating, their corresponding security configuration (e.g. whether or not mutual TLS was enabled, certificate thumbprint and validity period, internal IP address and protocol) and what the entire cluster configuration was at that point in time (minus secrets of course).

Encrypting Traffic and Achieving Non-Repudiation for Requests

Defense in depth is an industry best practice when securing sensitive data. Aspen Mesh can automatically encrypt and decrypt service requests and responses without development teams having to write custom logic per service. This helps reduce the amount of non-functional feature development work teams are required to do prior to delivering their services in a secure and compliant environment. The mesh also provides non-repudiation for requests by authenticating clients via mutual TLS and through a key management system that automates key and certificate generation, distribution and rotation, so you can be sure requests are actually coming from who they say they’re coming from. Encrypting traffic and providing non-repudiation is key to implementing controls that ensure sensitive data, such as a Primary Account Number (PAN), are protected from unauthorized actors sniffing traffic, and to providing definitive proof for auditing events as required by PCI DSS requirements 10.3.1, 10.3.5, and 10.3.6.

Strong Access Control and Centralized Policy Management

Aspen Mesh provides flexible and highly performant Role-Based Access Control (RBAC) via centralized policy management. RBAC allows you to easily implement and report on controls for requirements 7 and 8 – Implement Strong Access Control Measures. With policy control, you can easily define what services are allowed to communicate, what methods services can call, rate limit requests, and define and enforce quotas.

Centralized Tracing, Monitoring and Alerting

One of the most difficult non-functional features to implement at scale is consistent and reliable application tracing. With Aspen Mesh, you get reliable and consistent in-depth tracing between all services within the mesh, configurable real-time dashboards, the ability to create criteria driven alerts and the ability to retain your logs for at least one year — which exceeds requirements that dictate a minimum of three months data be immediately available for analysis for PCI DSS requirements 10.1, 10.2-10.2.4, 10.3.1-10.3.6 and 10.7.

Aspen Mesh Makes it Easier to Implement and Scale a Secure and PCI DSS Compliant Microservice Environment

Managing a microservice architecture at scale is a serious challenge without the right tools. Having to ensure each service follows the proper organizational secure communication, authentication, authorization and monitoring policies needed to comply with PCI DSS is not easy to achieve.

Achieving PCI DSS compliance involves addressing a number of different things around firewall configuration, developing applications securely and fine-grained RBAC. These are all distinct development efforts that can be hard to achieve individually, but even harder to achieve as a coordinated team. The good news is, with the help of Aspen Mesh, your engineering team can spend less time building and maintaining non-functional yet essential features, and more time building features that provide direct value to your customers.

 

Originally posted on The New Stack

service mesh

How The Service Mesh Space Is Like Preschool

I have a four year old son who recently started attending full day preschool. It has been fascinating to watch his interests shift from playing with stuffed animals and pushing a corn popper to playing with his science set (w00t for the STEM lab!) and riding his bike. The other kids in school are definitely informing his view of what cool new toys he needs. Undoubtedly, he could still make due with the popper and stuffed animals (he may sleep with Lambie until he's ten), but as he progresses his desire to explore new things increases.

Watching the community around service mesh develop is similar to watching my son's experience in preschool (if you're willing to make the stretch with me). People have come together in a new space to learn about cool new things, and as excited as they are, they don't completely understand the cool new things. Just as in preschool, there are a ton of bright minds that are eager to soak up new knowledge and figure out how to put it to good use.

Another parallel between my son and many of the people we talk to in the service mesh space is that they both have a long and broad list of questions. In the case of my son, it's awesome because they're questions like: "Is there a G in my name?" "What comes after Sunday?" "Does God live in the sky with the unicorns?" The questions we get from prospects and clients on service mesh are a bit different but equally interesting. It would take more time than anybody wants to spend to cover all these questions, but I thought it might be interesting to cover the top 3 questions we get from users evaluating service mesh.

What do I get with a service mesh?

We like getting this question because the answer to it is a good one. You get a toolbox that gives you a myriad of different capabilities. At a high level, what you get is observability, control and security of your microservice architecture. The features that a service mesh provide include:

  • Load balancing
  • Service discovery
  • Ingress and egress control
  • Distributed tracing
  • Metrics collection and visualization
  • Policy and configuration enforcement
  • Traffic routing
  • Security through mTLS

When do I need a service mesh?

You don't need 1,000 microservices for a service mesh to make sense. If you have nicknames for your monoliths, you're probably a ways away from needing a service mesh. And you probably don't need one if you only have 2 services, but if you have a few services and plan to continue down the microservices path it is easier to get started sooner. We are believers that containers and Kubernetes will be the way companies build infrastructure in the future, and waiting to hop on that train will only be a competitive disadvantage. Generally, we find that the answer to this question usually hinges on whether or not you are committed to cloud native. Service meshes like Aspen mesh work seamlessly with cloud native tools so the barrier to entry is low, and running cloud native applications will be much easier with the help of a service mesh.

What existing tools does service mesh allow me to replace?

This answer all depends on what functionality you want. Here's a look at tools that service mesh overlaps, what it provides and what you'll need to keep old tools for.

API gateway
Not yet. It replaces some of the functionality of a API gateway but does not yet cover all of the ingress and payment features an API gateway provides. Chances are API gateways and service meshes will converge in the future.

Tracing Tools
You get tracing capabilities as part of Istio. If you are using distributed tracing tools such as Jaeger or Zipkin, you no longer need to continue managing them separately as they are part of the Istio toolbox. With Aspen Mesh's hosted SaaS platform, we offer managed Jaeger so you don't even need to deploy or manage them.

Metrics Tools
Just like tracing, a metrics monitoring tool is included as part of Istio.With Aspen Mesh's hosted SaaS platform, we offer managed Prometheus and Grafana so you don't even need to deploy or manage them. Istio leverages Prometheus to query metrics. You have the option of visualizing them through the Prometheus UI, or using Grafana dashboards.

Load Balancing
Yep. Envoy is the sidecar proxy used by Istio and provides load balancing functionality such as automatic retries, circuit breaking, global rate limiting, request shadowing and zone local load balancing. You can use a service mesh in place of tools like HAProxy NGINX for ingress load balancing.

Security tools
Istio provides mTLS capabilities that address some important microservices security concerns. If you’re using SPIRE, you can definitely replace it with Istio which provides a more comprehensive utilisation of the SPIFFE framework. An important thing to note is that while a service mesh adds several important security features, it is not the end-all-be-all for microservices security. It’s important to also consider a strategy around network security.

If you have little ones and would be interested in comparing notes on the fantastic questions they ask, let’s chat. I'd also love to talk anything service mesh. We have been helping a broad range of customers get started with Aspen Mesh and make the most out of it for their use case. We’d be happy to talk about any of those experiences and best practices to help you get started on your service mesh journey. Leave a comment here or hit me up @zjory.


Aspen Mesh Enterprise Service Mesh

Enabling the Financial Services Shift to Microservices

Financial services has historically been an industry riddled with barriers to entry. Challengers found it difficult to break through low margins and tightening regulations. However, large enterprises that once dominated the market are now facing disruption from smaller, leaner fintech companies that are eating away at the value chain. These disruptors are marked by technological agility, specialization and customer-centric UX. To remain competitive, financial services firms are reconsidering their cumbersome technical architectures and transforming them into something more adaptable. A recent survey of financial institutions found that ~85% consider their core technology to be too rigid and slow. Consequently, ~80% are expected to replace their core banking systems within the next five years.

Emerging regulations meant to address the new digital payment economy, such as PSD2 regulations in Europe, will require banks to adopt a new way to operate and deliver. Changes like PSD2 are aimed at bringing banking into the open API economy, driving interoperability and integration through open standards. To become a first class player in this new world of APIs, integration, and open data, financial services firms will need the advantages provided by microservices.

Microservices provide 3 key advantages for financial services

Enhanced Security

Modern fintech requirements create challenges to the established security infrastructure. Features like digital wallet, robo advisory and blockchain mandate the need for a new security mechanisms. Microservices follow a best practice of creating a separate identity service which addresses these new requirements.

Faster Delivery

Rapidly bringing new features to market is a cornerstone of successful fintech companies. Microservices make it easier for different application teams to independently deliver new functionality to meet emerging customer demands. Microservices also scale well to accommodate greater numbers of users and transactions..

Seamless Integration

The integration layer in a modern fintech solution needs a powerful set of APIs to communicate with other services, both internally and externally. This API layer is notoriously challenging to manage in a large monolithic application. Microservices make the API layer much easier to manage and secure through isolation, scalability and resilience.

Service mesh makes it easier to manage a complex microservice architecture

In the face of rapidly changing customer, business and regulatory requirements, microservices help financial services companies quickly respond to these changes.. But this doesn’t come for free. Companies take on increased operational overhead during the shift to microservices – technologies such as a service mesh can help manage that.

Service mesh provides a bundle of features around observability, security, and control that are crucial to managing microservices at scale. Previously existing solutions like DNS and configuration management provide some capabilities such as service discovery, but didn’t provide fast retries, load balancing, tracing and health monitoring. The old approach to managing microservices requires that you cobble together several different solutions each time a problem arises, but a service mesh bundles it all together in a reusable package. While it’s possible to accomplish some of what a service mesh manages with individual tools and processes, it’s manual and time consuming.

Competition from innovative fintech startups, along with ever increasing  customer expectations means established financial services players must change the way they deliver offerings and do business with their customers. Delivering on these new requirements is difficult with legacy systems. Financial services firms need a software architecture that’s fit for purpose – agile, adaptable, highly scalable, reliable and robust. Microservices make this possible, and a service mesh makes microservices manageable at scale.


Microservices challenges

How Service Mesh Addresses 3 Major Microservices Challenges

I was recently reading the Global Microservices Trends report by Dimensional Research and found myself thinking "a service mesh could help with that." So I thought I would cover those 3 challenges and how a service mesh addresses them. Respondents cited in the report make it clear microservices are gaining widespread adoption. It's also clear that along with the myriad of benefits they bring, there are also tough challenges that come as part of the package. The report shows:

91% of enterprises are using microservices or have plans to
99% of users report challenges with using microservices

Major Microservices Challenges

The report identifies a range of challenges companies are facing.

Companies are seeing a mix of technology and organizational challenges. I'll focus on the technological challenges a service mesh solves, but it's worth noting that one thing a service mesh does is bring uniformity so it's possible to achieve the same view across teams which can reduce the need for certain skills.

Each additional microservice increases the operational challenges

Not with a service mesh! A service mesh provides monitoring, scalability, and high availability through APIs instead of using discrete appliances. This flexible framework removes the operational complexity associated with modern applications. Infrastructure services were traditionally implemented as discrete appliances, which meant going to the actual appliance to get the service. Each appliance is unique which makes monitoring, scaling, and providing high availability for each appliance hard. A service mesh delivers these services inside the compute cluster itself through APIs and doesn’t require any additional appliances. Implementing a service mesh means adding new microservices doesn't have to add complexity.

It is harder to identify the root cause of performance issues

The service mesh toolbox gives you a couple of things that help solve this problem:

Distributed Tracing
Tracing provides service dependency analysis for different microservices and tracking for requests as they are traced through multiple microservices. It’s also a great way to identify performance bottlenecks and zoom into a particular request to define things like which microservice contributed to the latency of a request or which service created an error.

Metrics Collection
Another powerful thing you gain with service mesh is the ability to collect metrics. Metrics are key to understanding historically what has happened in your applications, and when they were healthy compared to when they were not. A service mesh can gather telemetry data from across the mesh and produce consistent metrics for every hop. This makes it easier to quickly solve problems and build more resilient applications in the future.

Differing development languages and frameworks

Another major challenge that report respondents noted facing was the challenge of maintaining a distributed architecture in a polyglot world. When making the move from monolith to microservices, many companies struggle with the reality that to make things work, they have to use different languages and tools. Large enterprises can be especially affected by this as they have many large, distributed teams. Service mesh provides uniformity by providing programming-language agnosticism, which addresses inconsistencies in a polyglot world where different teams, each with its own microservice, are likely to be using different programming languages and frameworks. A mesh also provides a uniform, application-wide point for introducing visibility and control into the application runtime, moving service communication out of the realm of implied infrastructure, to where it can be easily seen, monitored, managed and controlled.

Microservices are cool, but service mesh makes them ice cold. If you're on the microservices journey and are finding it difficult to manage the infrastructure challenges, a service mesh may be the right answer. Let us know if you have any questions on how to get the most out of service mesh, our engineering team is always available to talk.


Observability, or "Knowing What Your Microservices Are Doing"

Microservicin’ ain’t easy, but it’s necessary. Breaking your monolith down into smaller pieces is a must in a cloud native world, but it doesn’t automatically make everything easier. Some things actually become more difficult. An obvious area where it adds complexity is communications between services; observability into service to service communications can be hard to achieve, but is critical to building an optimized and resilient architecture.

The idea of monitoring has been around for a while, but observability has become increasingly important in a cloud native landscape. Monitoring aims to give an idea of the overall health of a system, while observability aims to provide insights into the behavior of systems. Observability is about data exposure and easy access to information which is critical when you need a way to see when communications fail, do not occur as expected or occur when they shouldn’t. The way services interact with each other at runtime needs to be monitored, managed and controlled. This begins with observability and the ability to understand the behavior of your microservice architecture.

A primary microservices challenges is trying to understand how individual pieces of the overall system are interacting. A single transaction can flow through many independently deployed microservices or pods, and discovering where performance bottlenecks have occurred provides valuable information.

It depends who you ask, but many considering or implementing a service mesh say that the number one feature they are looking for is observability. There are many other features a mesh provides, but those are for another blog. Here, I’m going to cover the top observability features provided by a service mesh.

Tracing

An overwhelmingly important things to know about your microservices architecture is specifically which microservices are involved in an end-user transaction. If many teams are deploying their dozens of microservices, all independently of one another, it’s difficult to understand the dependencies across your services. Service mesh provides uniformity which means tracing is programming-language agnostic, addressing inconsistencies in a polyglot world where different teams, each with its own microservice, can be using different programming languages and frameworks.

Distributed tracing is great for debugging and understanding your application’s behavior. The key to making sense of all the tracing data is being able to correlate spans from different microservices which are related to a single client request. To achieve this, all microservices in your application should propagate tracing headers. If you’re using a service mesh like Aspen Mesh, which is built on Istio, the ingress and sidecar proxies automatically add the appropriate tracing headers and reports the spans to a tracing collector backend. Istio provides distributed tracing out of the box making it easy to integrate tracing into your system. Propagating tracing headers in an application can provide nice hierarchical traces that graph the relationship between your microservices. This makes it easy to understand what is happening when your services interact and if there are any problems.

Metrics

A service mesh can gather telemetry data from across the mesh and produce consistent metrics for every hop. Deploying your service traffic through the mesh means you automatically collect metrics that are fine-grained and provide high level application information since they are reported for every service proxy. Telemetry is automatically collected from any service pod providing network and L7 protocol metrics. Service mesh metrics provide a consistent view by generating uniform metrics throughout. You don’t have to worry about reconciling different types of metrics emitted by various runtime agents, or add arbitrary agents to gather metrics for legacy apps. It’s also no longer necessary to rely on the development process to properly instrument the application to generate metrics. The service mesh sees all the traffic, even into and out of legacy “black box” services, and generates metrics for all of it.

Valuable metrics that a service mesh gathers and standardizes include:

  • Success Rates
  • Request Volume
  • Request Duration
  • Request Size
  • Request and Error Counts
  • Latency
  • HTTP Error Codes

These metrics make it simpler to understand what is going on across your architecture and how to optimize performance.

Most failures in the microservices space occur during the interactions between services, so a view into those transactions helps teams better manage architectures to avoid failures. Observability provided by a service mesh makes it much easier to see what is happening when your services interact with each other, making it easier to build a more efficient, resilient and secure microservice architecture.


The Road Ahead for Service Mesh

This is the third in a blog series covering how we got to a service meshwhy we decided on the type of mesh we did and where we see the future of the space.

If you’re struggling to manage microservices as architectures continue to become more complex, there’s a good chance you’ve at least heard of service mesh. For the purposes of this blog, I’ll assume you’re familiar with the basic tenets of a service mesh.

We believe that service mesh is advancing microservice communication to a new level that is unachievable with the one-off solutions that were previously being used. Things like DNS provide some capabilities like service discovery, but don’t provide fast retries, load balancing, tracing and health monitoring. The old approach also requires that you cobble together several things each time when it’s possible to bundle it all together in a reusable tool.

While it’s possible to accomplish much of what a service mesh manages with individual tools and processes, it’s manual and time consuming. The images below provides a good idea of how a mesh simplifies the management of microservices.

 

 

Right Around the Corner

So what’s in the immediate future? I think we’ll see the technology quickly mature and add more capabilities as standard features in response to enterprises realizing the efficiency gains created by a mesh and look to implement them as the standard for managing microservice architectures. Offerings like Istio are not ready for production deployments, but the roadmap is progressing quickly and it seems we’ll be to v1 in short order. Security is a feature provided by service mesh, but for most enterprises it’s a major consideration and I see policy enforcement and monitoring options becoming more robust for enterprise production deployments. A feature I see on the near horizon and one that will provide tremendous value is an analytics platform to show insights from the huge amount of telemetry data in a service mesh. I think an emerging value proposition we’ll see is that the mesh allows you to gain and act on data that will allow you to more efficiently manage your entire architecture.

Further Down the Road

There is a lot of discussion on what’s on the immediate horizon for service mesh, but what is more interesting is considering what the long term will bring. My guess is that we ultimately come to a mesh being an embedded value add in a platform. Microservices are clearly the way of the future, so organizations are going to demand an effortless way to manage them. They’ll want something automated, running in the background that never has to be thought about. This is probably years down the road, but I do believe service mesh will eventually be a ubiquitous technology that is a fully managed plug and play config. It will be interesting to see new ways of using the technology to manage infrastructure, services and applications.

We’re excited to be part of the journey, and are inspired by the ideas in the Istio community and how users are leveraging service mesh to solve direct problems created by the explosion of microservices and also find new efficiencies with it. Our goal is to make the implementation of a mesh seamless with your existing technology and provide enhanced features, knowledge and support to take the burden out of managing microservices. We’re looking forward to the road ahead and would love to work with you to make your microservices journey easier.