Simplifying Microservices Security with Incremental mTLS

Kubernetes removes much of the complexity and difficulty involved in managing and operating a microservices application architecture. Out of the box, Kubernetes gives you advanced application lifecycle management techniques like rolling upgrades, resiliency via pod replication, auto-scalers and disruption budgets, efficient resource utilization with advanced scheduling strategies and health checks like readiness and liveness probes. Kubernetes also sets up basic networking capabilities which allow you to easily discover new services getting added to your cluster (via DNS) and enables pod to pod communication with basic load balancing.

However, most of the networking capabilities provided by Kubernetes and it’s CNI providers are constrained to layer 3/4 (networking/protocols like TCP/IP) of the OSI stack. This means that any advanced networking functionality (like retries or routing) which relies on higher layers i.e. parsing application protocols like HTTP/gRPC (layer 7) or encrypting traffic between pods using TLS (layer 5) has to be baked into the application. Relying on your applications to enforce network security is often fraught with landmines related to close coupling of your operations/security and development teams and at the same time adding more burden on your application developers to own complicated infrastructure code.

Let’s explore what it takes for applications to perform TLS encryption for all inbound and outbound traffic in a Kubernetes environment. In order to achieve TLS encryption, you need to establish trust between the parties involved in communication. For establishing trust, you need to create and maintain some sort of PKI infrastructure which can generate certificates, revoke them and periodically refresh them. As an operator, you now need a mechanism to provide these certificates (maybe use Kubernetes secrets?) to the running pods and update the pods when new certificates are minted. On the application side, you have to rely on OpenSSL (or its derivatives) to verify trust and encrypt traffic. The application developer team needs to handle upgrading these libraries when CVE fixes and upgrades are released. In addition to all these complexities, compliance concerns may also require you only support a TLS version (or higher) and subset of ciphers, which requires creating and supporting more configuration options in your applications. All of these challenges make it very hard for organizations to encrypt all pod network traffic on Kubernetes, whether it’s for compliance reasons or achieving a zero trust network model.

This is the problem that a service mesh leveraging the sidecar proxy approach is designed to solve. The sidecar proxy can initiate a TLS handshake and encrypt traffic without requiring any changes or support from the applications. In this architecture, the application pod makes a request in plain text to another application running in the Kubernetes cluster which the sidecar proxy takes over and transparently upgrades to use mutual TLS. Additionally, the Istio control plane component Citadel handles creating workload identities using the SPIFFE specification to create and renew certificates and mount the appropriate certificates to the sidecars. This removes the burden of encrypting traffic from developers and operators.

Istio provides a rich set of tools to configure mutual TLS globally (on or off) for the entire cluster or incrementally enabling mTLS for namespaces or a subset of services and its clients and incrementally adopting mTLS. This is where things get a little complicated. In order to correctly configure mTLS for one service, you need to configure an Authentication Policy for that service and the corresponding DestinationRules for its clients.

Both the Authentication policy and Destination rule follow a complex set of precedence rules which must be accounted for when creating these configuration objects. For example, a namespace level Authentication policy overrides the mesh level global policy, a service level policy overrides the namespace level and a service port level policy overrides the service specific Authentication policy. Destination rules allow you to specify the client side configuration based on host names where the highest precedence is the Destination rule defined in the client namespace then the server namespace and finally the global default Destination rule. On top of that, if you have conflicting Authentication policies or Destination rules, the system behavior can be indeterminate. A mismatch in Authentication policy and Destination rule can lead to subtle traffic failures which are difficult to debug and diagnose. Aspen Mesh makes it easy to understand mTLS status and avoid any configuration errors.

Editing these complex configuration files in YAML can be tricky and only compound the problem at hand. In order to simplify how you configure these resources and incrementally adopt mutual TLS in your environment, we are releasing a new feature which enables our customers to specify a service port (via APIs or UI) and their desired mTLS state (enabled or disabled). The Aspen Mesh platform automatically generates the correct set of configurations needed (Authentication policy and/or Destination rules) by inspecting the current state and configuration of your cluster. You can then view the generated YAMLs, edit as needed and store them in your CI system or apply them manually as needed. This feature removes the hassle of learning complex Istio resources and their interaction patterns, and provides you with valid, non-conflicting and functional Istio configuration.

Customers that we talk to are in various stages of migrating to a microservices architecture or Kubernetes environment which results in a hybrid environment where you have services which are consumed by clients not in the mesh or are deployed outside the Kubernetes environment, so some services require a different mTLS policy. Our hosted dashboard makes it easy for users to identify services and workloads which have mTLS turned on or off and then easily create configuration using the above workflow to change the mTLS state as needed.

If you’re an existing customer, please upgrade your cluster to our latest release (Aspen Mesh 1.1.3-am2) and login to the dashboard to start using the new capabilities.

If you’re interested in learning about Aspen Mesh and incrementally adopting mTLS in your cluster, you can sign up for a beta account here.


Leveraging Service Mesh To Address HIPAA Security Requirements

Building a product utilizing a distributed microservice architecture for the healthcare industry, while following the requirements set forth in the Health Insurance Portability and Accountability Act (HIPAA), is hard. Trust me, I have felt the pain. I’ve spent the majority of my career building, securing and ensuring compliance for products in highly regulated industries, including healthcare. The healthcare industry is a necessity for all, which is causing it to grow at a rapid pace as new advancements are made. This is great for our health and wellbeing, but it starts to pose new challenges for organizations that process and store sensitive data such as Personally Identifiable Information (PII) and Electronic Protected Health Information (ePHI). What used to be to be a system of paper charts in manila envelopes stored in filing cabinets, is now a large interconnected system where patient medications, x-rays, surgeries, diagnosis and other health related data are transferred between internal and external entities. This advancement has allowed physicians to quickly provide other entities with your entire medical history, so you receive the best care possible, as quickly as possible. But this exchange does not come without risk. Anytime you make something more accessible, you also introduce new attack surfaces and points of failure, allowing data to be leaked and increasing the possibility of malicious attacks.

The HIPAA Security Rule was created to help address this new risk. It mandates that organizations that process or store ePHI follow certain safeguards to protect sensitive data.

The technical safeguard standards introduced by the Security Rule include:

  • Authentication - verification of the identity of the actor seeking access to protected data.
  • Authorization - verification that the actor is allowed to access the requested protected data.
  • Audit Controls - mechanisms for recording and examining activities pertaining to protected data within the system.
  • Data Integrity - protecting the data from being altered or destroyed in an unauthorized manner.

Implementing these safeguards may seem like an obvious thing to do when processing or storing sensitive data, but all too often they are overlooked or may be deemed too difficult, expensive and/or time consuming to implement with available resources. No matter the reason, this is a violation in the eyes of the U.S Department of Health and Human Services (HHS) Office for Civil Rights (OCR) and can result in fines up to $1.5 million a year for each violation and can even result in criminal charges. Fortunately, a service mesh helps address many of these standards in a way that requires less effort than building custom controls, and is also less error prone.

Let’s take a look at how you can leverage Aspen Mesh, the enterprise-ready service mesh built on Istio, to easily implement controls to address many of these standards that would otherwise require significant development effort and expertise.

Authentication
As briefly discussed, authentication is the verification of the identity of the actor seeking access to protected data. With Aspen Mesh, you can easily configure mesh wide service-to-service authentication and end-user authentication with little effort. In fact, if you use the recommended default Aspen Mesh installation, it will enable mesh wide mTLS automatically without requiring any code changes.

Now that you have service-to-service authentication and transport encryption enabled, the next step is to enable end-user authentication.

Below is an example of how you would enable end-user authentication on a Patient Check-in Service using an external Identity Management Service that supports JWTs (e.g. Azure Active Directory B2C, Amazon Cognito, Auth0, Okta, GSuite), so reception personnel can securely login and check-in patients as they arrive.

1. You’re going to need to make note of the JWT Issuer and JWK URI from your User Directory Service.
2. Create and apply a Policy called patients-checkin-user-auth that configures end user authentication to the Patient Check-in Service using your JWT supported Identity Management Service of choice.

apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "patients-checkin-user-auth"
spec:
  targets:
  - name: patient-checkin
  peers:
  - mtls:
  origins:
  - jwt:
      issuer: "<REPLACE_WITH_YOUR_JWT_SUPPORTED_IDENTITY_MANAGEMENT_SERVICE_ISSUER>"
      jwksUri: "<REPLACE_WITH_YOUR_JWT_SUPPORTED_IDENTITY_MANAGEMENT_SERVICE_USER_DIRECTORY_JWKS_URI>"
  principalBinding: USE_ORIGIN

3. Ensure that the Patient Check-in frontend application places the JWT token in the Authorization header in http requests to the backend services
4. That’s it!

Authorization
Aspen Mesh provides flexible and fine-grained Role-Based Access Control (RBAC) via centralized policy management. With policy control, you can easily define what services are allowed to communicate, what methods services can call, rate limit requests and define and enforce quotas.

Below is a simple example of how a Patient Check-in Service can make GET, PUT, and POST requests to the Patients Service, but can’t make DELETE requests. While the Admin Service can make GET, POST, PUT, and DELETE requests to the Patients Service.  

1. Create a ServiceRole called patient-service-querie which allows making GET, PUT, POST requests to the Patients Service.

apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: patient-service-querier
  namespace: default
spec:
  rules:
  - services: ["patients.default.svc.cluster.local"]
    methods: ["GET", “PUT”, “POST”]

2. Create another ServiceRole called patients-admin that allows GET, POST, PUT, and DELETE requests to the Patients Service.

apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: patients-admin
  namespace: default
spec:
  rules:
  - services: ["patients.default.svc.cluster.local"]
    methods: ["GET", "POST", "PUT", DELETE]

3. Create a ServiceRoleBinding called bind-patient-service-querier which assigns patient-querier role to the cluster.local/ns/default/sa/patient-check-in service account, which represents the Patient Check-In Service.

apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: bind-patient-service-querier
  namespace: default
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/patient-check-in"
  roleRef:
    kind: ServiceRole
    name: "patient-querier"

4. Lastly we’ll create another ServiceRoleBinding called bind-patient-service-admin which assigns patient-admin role to the cluster.local/ns/default/sa/admin service account, which represents the Admin Service.

apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: bind-patient-service-admin
  namespace: default
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/admin"
  roleRef:
    kind: ServiceRole
    name: "patient-admin"

As you can see, you can quickly and effectively add Authorization between services in your mesh without any custom development work.

Audit Controls
Keeping audit records of data access is one of the key requirements for HIPAA compliance, as well as a security best practice. With Aspen Mesh, you get a single source of truth with in-depth tracing between all services within the mesh. Traces can be accessed and exported via the ‘Tracing’ tab on the Aspen Mesh Dashboard or API. You may still need to add the corresponding audit logs for specific actions that happen within a service to comply with all of the requirements, but at least you have reduced the amount of engineering effort spent on non-functional but essential tasks.

Data Integrity
Data integrity and confidentiality is arguably one of the most critical requirements of HIPAA. If sensitive data such as medications, blood type or allergies are modified by or leaked to an unauthorized user, it could be detrimental to the patient. With Aspen Mesh you can quickly and easily enable transport encryption, service-to-service authentication, authorization and monitoring so you can more easily comply with HIPAA requirements and protect patient data.

Aspen Mesh Makes it Easier to Implement and Scale a Secure HIPAA Compliant Microservice Environment
Building a HIPAA compliant microservice architecture at scale is a serious challenge without the right tools. Having to ensure each service adheres to both organizational and regulatory compliance requirements is not an easy task.  

Achieving HIPAA compliance involves addressing a number of technical security requirements such as network protection, encryption and key management, identification and authorization of users and auditing access to systems. These are all distinct development efforts that can be hard to achieve individually, but even harder to achieve as a coordinated team. The good news is, with the help of Aspen Mesh, your engineering team can spend less time building and maintaining non-functional yet essential features, and more time building features that provide direct value to your customers.