Considerations & Best Practices for Multi-cluster in a Microservices Environment

Learn how to implement a hybrid or multi-cloud microservices environment, including:

  • Multi-cluster topologies
  • Disaster recovery
  • Compliance
  • Multi-tenancy
  • Integration with virtual machines

Advantages of Using Istio for Multi-cluster Deployment

Understand the advantages (and disadvantages) of a multi-cluster deployment using Istio in a Kubernetes environment and best practices to mitigate risk.


Zero-trust Security for your Microservices Architecture

The number of security breaches reported in the United States increased 10%, growing to 2,932 in 2021 compared to 2,645 in 2020. Overall, the healthcare sector experienced the most incidents, accounting for 14% of reported breaches. However, when economic sectors are broken out into their component risk groups, financial services and software providers are the top two most breached business groups, with healthcare practitioners’ offices coming in third. 

Vulnerabilities have increased by a noticeable margin, and 2021 can now be credited with the most disclosures on record. Risk Based Security’s VulnDB® team aggregated 28,695 vulnerabilities that were disclosed during 2021. That total is the highest number on record – versus 23,269 vulnerabilities that were disclosed during 2020. 

Breaches and Vulnerabilities Continue to Increase

Diagram: Breaches and Vulnerabilities still increasing1

What does this tell us? Despite significant technological advancements, security is still hard. A single phishing email, missed patch, or misconfiguration can let the bad guys in to wreak havoc or steal data. For companies that are moving to the cloud and the cloud-native architecture of microservices and containerized applications, it’s even harder. Now, in addition to the perimeter and the network itself, the myriad connections between microservice containers also need to be protected.  

With microservices, the surface area available for attack has increased exponentially, putting data at greater risk. Moreover, network-related problems like access control, load balancing, and monitoring that had to be solved once for a monolith application now must be handled separately for each service within a cluster, as well as between clusters. Emerging today to address security in this environment is the convergence of the Zero-Trust approach to network security and service mesh technology. A service mesh combines security and operations capabilities into a transparent infrastructure layer that sits between the containerized application and the network.  

This paper examines the tenets of Zero-Trust security and how a service mesh enables Zero Trust in the microservices environment. It also looks at how Zero-Trust capabilities can help organizations address and demonstrate compliance with stringent industry compliance regulations. The context of our discussion is containerized applications that are managed in Kubernetes clusters. 

Zero-Trust Security Today

According to IBM Security, the average cost of a breach at organizations without zero trust deployed in 2021 is $5.04m. Today, only about a third of organizations have a zero-trust approach. 43% of organizations have no plan to deploy a zero-trust approach to security. 

Traditionally, network security has been based on having a strong perimeter to help thwart attackers, commonly known as perimeter-based security. With a secure perimeter constructed of firewalls, you trust the internal network by default, and by extension, anyone who’s there already. Unfortunately, this has never a reliably effective strategy. But more importantly, this approach is becoming even less effective in a world where employees expect access to applications and data from anywhere in the world, on any device. In fact, other types of threats — such as insider threats — have been considered by most security professionals to be among the highest threats to data protected by companies, leading to more development around new ways to address these challenges.  

In 2010, Forrester Research coined the term “Zero Trust” and overturned the perimeter-based security model with a new principle: “never trust, always verify.” This means no individual or machine is trusted by default from inside or outside the network. Another Zero-Trust precept: “assume you’ve been compromised but may not yet be aware of it.” With the time to identify and contain a breach running at 287 days, and an average cost per breach of $9.05M in 20212 that’s not an unsafe assumption.

Zero Trust Authentication Methodology for a Service Mesh

In the figure below, we can see a modern Zero Trust authentication methodology for a service mesh. New authentication methods may include a combination of human and machine signals, multifactor authentication, as well as the need to verify every access attempt before app and data permissions are granted.

Zero Trust Networking Principles

  • Networks should always be considered hostile
  • Network locality is not sufficient for deciding trust in a network
  • Every device, user, and request should be authenticated and authorized
  • Network policies must be dynamic and calculated from as many sources of data as possible

Starting in 2013, Google began its transition to implementing Zero Trust into its networking infrastructure with much success and has made the results of their efforts open to the public in BeyondCorp3. Fast forward to 2022 and the plans to adopt this paradigm have spread across industries like wildfire, in response to massive data breaches, stricter regulatory compliance requirements and vulnerabilities such as Log4j/Log4Shell4.

According to Risk Based Security “It is important to call out that log4j is a popular logging framework in Java. This means it’s used in an extraordinary number of things. How many? Back in 2015, Oracle bragged that Java was running on 13 billion devices. Given the popularity of the library in question, Log4Shell might impact a substantial proportion of them, including IoT, ICS, medical, and more. It may not impact all of them, but even a small percentage would mean a staggering number of devices. How prevalent is log4j? It’s used in the Mars 2020 Helicopter mission among other things.”

Mitigating Cyberattacks Against Containerized Applications

Zero-Trust Networking within a service mesh can mitigate these attacks:

  • Service impersonation.
    A bad actor gains access to the private network for your applications, pretends to be an authorized service, and starts making requests for sensitive data.
  • Unauthorized access.
    A legitimate service makes requests for sensitive data that it is not authorized to obtain.
  • Packet sniffing.
    A bad actor gains access to your application’s private network and captures sensitive data from legitimate requests going over the network.
  • Data exfiltration.
    A bad actor sends sensitive data out of the protected network to a destination of their choosing.

Security Within the Kubernetes Cluster

While there are myriad Zero-Trust networking solutions available for protecting the perimeter and the operation of corporate networks, there are many new miles of connections within the microservices environment that also need protection. A service mesh provides critical security capabilities such as observability to aid in optimizing MTTD and MTTR, as well as ways to implement and manage encryption, authentication, authorization, policy control and configuration in Kubernetes clusters.

Simplifying Microservices Security with Incremental mTLS

Kubernetes removes much of the complexity and difficulty involved in managing and operating a microservices application architecture. Kubernetes also sets up basic networking capabilities. However, most of the networking capabilities provided by Kubernetes are constrained to the lower levels of the networking stack.

That means providing advanced networking functionality, including transport layer security (TLS) encryption, must be baked into the application. Burdening the application (and your developers) with enabling TLS encryption for all inbound and outbound traffic within a Kubernetes environment is complex. It involves establishing trust, managing certificates, verifying trust, and processing encryption/decryption — none of which are associated with the application function.

This is the problem that a service mesh leveraging the sidecar approach solves by offloading network functions from the microservice. The sidecar approach puts data path functionality into a separate container and then situates that container as close to the application it is protecting as possible. In Kubernetes, the sidecar container and the application container live in the same Kubernetes pod, so the communication path between sidecar and app is protected inside the pod’s network namespace; by default, it isn’t visible to the host or other network namespaces on the system.

The sidecar can initiate mutual TLS (mTLS), encrypt service-to-service traffic, and achieve non-repudiation for requests without requiring any changes or support from the applications. This layer of security reduces the likelihood of a successful man-in-the-middle (MITM) attack by requiring all parties in a request to have valid certificates that trust each other.

Istio provides a control plane with a rich set of tools for configuring mTLS globally (on or off) for the entire cluster or incrementally, enabling mTLS for a subset of services for organizations operating in a hybrid environment.

Managing Identity, Certificates, and Authorization in Service Mesh

One of the key Zero-Trust security principles requires that “every device, user, and request is authenticated and authorized.” The service mesh addresses this principle by issuing secure identities to services, much like how application users are issued an identity. This is often referred to as the SVID (Secure and Verifiable Identification) and is used to identify the services across the mesh, so they can be authenticated and authorized to perform actions. In addition to handling workload identities, the service mesh creates and renews certificates and mounts the appropriate certificates to the sidecars.

The Istio control plane centralizes policy management and enforces networking rules at runtime. However, this is where things get a little complicated. Correctly configuring mTLS for one service, for example, may require configuring an authentication policy for that service and the corresponding clients.

The authentication policy follows a complex set of precedence rules which must be accounted for when creating these configuration objects. For example, a namespace-level authentication policy overrides the mesh-level global policy, and a service-level policy overrides the namespace level. Moreover, a service port-level policy overrides the service-specific authentication policy.

Access Control and Enforcing the Level of Least Privilege

In addition to applying a Zero-Trust approach to the network connections within the Kubernetes cluster, the service mesh adds controls over traffic ingress and egress at the perimeter. Allowed user behavior is addressed with role-based access control (RBAC). With these controls, the Zero-Trust philosophy of “trust no one, authenticate everyone” stays in force by providing enforceable least privilege access to services in the mesh.

Ingress Control

It’s important to note that routing traffic within a service mesh and allowing external traffic into the mesh function differently. Within the mesh, policies specify exceptions from normal traffic since Istio by default (in compatibility with Kubernetes) allows everything to talk to everything once inside the mesh.

Getting traffic into the service mesh works in reverse (similar to traditional load balancers and application delivery controllers). That means specifying exactly what traffic is allowed in so that your services can safely connect with APIs and databases both within the organization and on the internet. With traditional load balancers, virtual IPs and virtual servers have long been used as concepts that enable operators to configure ingress traffic in a flexible and scalable manner.

Similarly, Istio gateways control exposure of services at the edge, enabling monitoring and employing routing rules to address traffic as it enters the mesh. This works much in the same way that tying virtual IPs to virtual servers works with traditional load balancers. Gateways also leverage the built-in capabilities of Kubernetes as an ingress controller to add security with ingress rules such as whitelisting and blacklisting.

Egress Control

Egress is also a key security concern. It’s important to be cautious about what data is allowed to leave a cluster because most security breaches include some type of egress exploit — typically by data exfiltration. This can be carried out by malware executing a command to extract data and transmit it to an unauthorized IP address or by an unauthorized person who intentionally or unwittingly extracts the data and shares it with an unauthorized third party or moves it to an insecure system. Both types of exploits are hard to detect because the data flow looks like business-as-usual network traffic.

The service mesh enables control over how traffic is routed from services in the mesh to external services. The native Istio default is to allow the sidecar proxy to pass through all requests to services not configured within the mesh, and it does enable egress traffic controls, including whitelisting and blacklisting. For example, individual services can be configured to control access to external services or, alternatively, to bypass the sidecar for a specific range of IP addresses — but again, this can get complicated as the Kubernetes environment and the service mesh grow. Other Istio egress capabilities include providing gateways for traffic control, managing TLS origination, and supporting Kubernetes-native egress services.

Role-Based Access Control

 

Role Based Access Control Methodology for a Service Mesh

 

Because even the most secure system can be easily circumvented by over-privileged users, it’s important to use a proven strategy for access control. In systems security, role-based access control, or role-based security, provides the ability to enforce the principle of least privilege in an organization. As an advanced access control, it restricts network access based on individuals’ roles within an organization. For enhanced security, different access levels are granted to different authorized users within a network based on what they need to do to perform job responsibilities.

As a key security element for Kubernetes clusters and service meshes, RBAC provides important features such as:

  • Delivering more consistent access management
  • Providing enforceable least privilege
  • Enabling an authentication mechanism for users with different roles
  • Restricting user or user group operations
  • Restricting operations performed by processes inside pods
  • Controlling resource visibility
  • Maximizing operational efficiency
  • Reducing HR and administrative work and IT support

Kubernetes RBAC enables control over how unique, authorized user or user group permission levels are defined in a Kubernetes cluster. The service mesh extends those capabilities, enabling fine-grained control.

While Kubernetes RBAC can help you meet compliance needs, additional benefits can be gained from a service mesh to achieve more fine-grained RBAC. However, for this to work as intended, the service mesh must be configured correctly.

Aspen Mesh provides two tools that address the complexity of access control and enforcing the level of least privilege to help you achieve a Zero-Trust security posture. Istio Vet (also known as Istio Analyze) is designed to prevent misconfigurations in the service mesh by refusing to allow them in the first place. In addition, Istio Vet warns about incorrect or incomplete service mesh configuration, and it also provides issues resolution guidance for any issues it finds.

Organizations using global Istio configuration resources can take advantage of the Aspen Mesh-developed tool, Traffic Claim Enforcer. Global configuration resources can affect how traffic flows through the service mesh to any specified target, which requires accurate configuration of resource namespaces to make sure traffic can get to intended destinations. Namespace misconfigurations are difficult to troubleshoot. Traffic Claim Enforcer works with Kubernetes RBAC to help avoid invalid configurations and to provide an early failure for configuration problems for easier, faster detection. Traffic Claim Enforcer can be invoked globally or on a namespace-by-namespace basis.

Monitoring, Alerting and Observability

Monitoring and alerting are key components to successfully meet security requirements and demonstrate industry compliance. A service mesh takes system monitoring a step further by providing observability. Monitoring reports overall system health, while observability focuses on highly granular insights into the behavior of systems, via consistent, in-depth tracing between all services within the mesh.

For example, an overwhelmingly important thing to know for security and regulatory compliance is which microservices are involved in an end-user transaction. With many teams deploying dozens of microservices independently, it can be difficult to understand the dependencies across services. Distributed tracing, made possible by the service mesh, addresses this by automatically adding tracing headers to transactions and then reporting the spans to a tracing collector.

Just as important is the ability to create criteria-driven alerts. For example, during the development process, the service mesh can create warnings of potential misconfigurations that would affect security as well as connectivity and performance. At runtime, security alerts are issued when unhealthy communication is observed, allowing rapid response and troubleshooting.

Keeping records of how data flows through the Kubernetes cluster is one of the key requirements of compliance as well as a security best practice. Some service meshes can provide a view of service-to-service communications and can retrieve historical configurations detailing what services were communicating, their corresponding security configuration (e.g., whether mutual TLS was enabled, certificate thumbprint and validity period, internal IP address, and protocol) and what the entire cluster configuration was at that point in time (minus Kubernetes secrets, of course).

Achieving Compliance in Highly Regulated Industries

To combat the increase in high-profile cyberattacks, regulations and standards are evolving to include stricter controls. The aim is to enforce security best practices when organizations process, store, and transmit sensitive data. This includes Payment Card Industry Data Security Standards (PCI DSS) and the EU’s General Data Protection Regulation (GDPR) for personally identifiable information (PII) and the Health Insurance Portability and Accountability Act (HIPAA) for electronic Protected Health Information (ePHI).

In this paper, we have covered how employing a service mesh to achieve Zero-Trust security in a Kubernetes environment addresses authentication, authorization, and accounting. Transport encryption via mTLS in the service mesh addresses the data integrity requirement. Moreover, the service mesh removes the burden of addressing these security requirements from the development team, allowing them to focus on functions that provide direct value to customers.

The most common technical requirements across regulations and standards are:

  • Authentication – Verify the identity of the actor seeking access to protected data.
  • Authorization – Verify the actor is allowed to access the requested protected data.
  • Accounting – Provide mechanisms for recording and examining activities within the system.
  • Data Integrity – Protect data from being altered or destroyed in an unauthorized manner.

Why Aspen Mesh

Aspen Mesh can help you to achieve a Zero-Trust security posture by applying the concepts and features discussed in this paper. Aspen Mesh is an agnostic enterprise- and production-ready service mesh that extends the capabilities of Istio to address enterprise security and regulatory compliance needs. It also provides an intuitive hosted user interface and dashboard that make it easier to deploy, monitor, and configure these features. Aspen Mesh includes:

Easy mTLS. The dashboard makes it easy for users to identify services and workloads which have mTLS turned on or off and then easily create a configuration to change the mTLS state as needed. This allows services on the mesh to be consumed by clients outside the Kubernetes environment.

Enhanced ingress. With Secure Ingress from Aspen Mesh, operators define a secure ingress object and developers define an application object. Aspen Mesh takes care of the rest, creating the objects that Istio expects.

Enhanced egress. Aspen Mesh enables observing what egress points are in use, how frequently they are accessed, and how healthy they are. It also surfaces idle egress policies that can be turned off.

Enhanced RBAC. Istio Vet helps prevent RBAC misconfigurations in the service mesh while Traffic Claim Enforcer helps avoid invalid traffic configurations. (See the RBAC section above for details.)

Secure by default configuration. Aspen Mesh implements a secure by default posture by setting communication and security switches to “on.” For example, while Istio enables mTLS encryption for all services in a cluster, Aspen Mesh turns it on by default. Likewise, security features like egress control and protocol sniffing are on by default.

Advanced policy and configuration options. Aspen Mesh includes a policy framework that simplifies specifying, measuring, and enforcing security policies, along with alerts that identify configuration errors.

Take the Next Step

At Aspen Mesh, we are here to help you achieve a Zero-Trust security posture for containerized applications at your organization. Contact us to start a conversation with our hands-on Istio experts are available to guide your team whether you have a service mesh in pre-prod or production. Learn ways we can help, and see our Professional Services and 24/7 Support plans. 

Read a recent white paper authored by an Aspen Mesh Solutions Engineer and Istio expert: Solve Istio Security Risks and Get a Handle on Regulatory Compliance 

If you have Istio installed, you can get a complimentary Istio Health Check, learn how here: See What’s Working (and What’s Not) in Your OS Istio Pre-Prod or Prod Environment 

References  
1Risk Based Security: 2021 Mid Year Report. Data Breach QuickView
https://pages.riskbasedsecurity.com/hubfs/Reports/2021/2021%20Mid%20Year%20Data%20Breach%20QuickView%20Report.pdf
1Risk Based Security. 2021 Year End Report. Data Breach QuickView
https://pages.riskbasedsecurity.com/hubfs/Reports/2021/2021%20Year%20End%20Data%20Breach%20QuickView%20Report.pdf 
1Risk Based Security: 2020 Year End Report. Vulnerability QuickView
https://pages.riskbasedsecurity.com/hubfs/Reports/2020/2020%20Year%20End%20Vulnerability%20QuickView%20Report.pdf 
1Risk Based Security: 2021 Year End Report. Vulnerability QuickView
https://pages.riskbasedsecurity.com/hubfs/Reports/2021/2021%20Year%20End%20Vulnerability%20QuickView%20Report.pdf  
2Cost of a Data Breach Report 2021” Ponemon Institute & IBM Security 2021
https://www.ibm.com/security/data-breach   
3Google BeyondCorp
https://cloud.google.com/beyondcorp/ 
4Risk Based Security: Log4Shell Vulnerability
Log4Shell: log4j Vulnerability, Attack Surface, Variant and Remediation
https://www.riskbasedsecurity.com/2021/12/14/log4shell-log4j-vulnerability-attack-surface-variant-and-remediation/ 

 



Adopting a Zero-Trust Approach to Security for Containerized Applications

Adopting a zero-trust secure service mesh can help remove the burden of addressing security requirements from your application development teams, freeing them to focus on functions that provide direct value to your customers. Find out how in this whitepaper along with:


photo of magnifying glass

Getting the Most Out of Your Service Mesh

The Aspen Mesh team knows that service mesh has broad implications and benefits whether you're a product owner, a software developer, or an operations leader. Someone in Dev is going to have very different questions than someone in Ops. And an App Owner is going to want to better understand things like a service mesh’s impact on the bottom line.

This guide will help you understand the benefits no matter your role in your organization.


photo of compass

The Complete Guide to Service Mesh

Service meshes are new, extremely powerful and can be complex. If you’ve been asking questions like “What is a service mesh?” “Why would I use one?” “What benefits can it provide?” or “How did people even come up with the idea for service mesh?” then The Complete Guide to Service Mesh is for you.

Check out the free guide to find out:


abstract technology encryption graphic

Service Mesh University

Catch up on all things service mesh in these seven, on-demand videos with the experts that help you learn more at your own pace. Everything is organized into bite size sections including:


closeup photo of clouds in a puzzle

Manning eBook: Solving Microservices Challenges with Service Mesh

Based on our knowledge of service meshes and the lessons we’ve learned helping users adopt service meshes and build advanced applications on top of them, Aspen Mesh and Manning have put together a comprehensive guide on how to apply service mesh to containerized applications. Chapters include: