Get The Fully Supported Service Mesh

Aspen Mesh provides the observability, security and scalability you need to operate service mesh in the enterprise, all in an easy-to-use package

 

Submit Your Resume

Upload your resume. (5 MB max - .pdf, .doc, or .docx)

March 5, 2019

Running Stateful Apps with Service Mesh: Kubernetes Cassandra with Istio mTLS Enabled

Aspen Mesh and Cassandra
 

Cassandra is a popular, heavy-load, highly performant, distributed NoSQL database.  It is fully integrated into many mainstay cloud and cloud-native architectures. At companies such as Netflix and Spotify, Cassandra clusters provide continuous availability, fault tolerance, resiliency and scalability.

Critical and sensitive data is sent to and from a Cassandra database.  When deployed in a Kubernetes environment, ensuring the data is secure and encrypted is a must.  Understanding data patterns and performance latencies across nodes becomes essential, as your Cassandra environment spans multiple datacenters and cloud vendors.

A service mesh provides service visibility, distributed tracing, and mTLS encryption.  

While it’s true Cassandra provides its own TLS encryption, one of the compelling features of Istio is the ability to uniformly administer mTLS for all of your services.  With a service mesh, you can set up an easy and consistent policy where Istio automatically manages the certificate rotation. Pulling Cassandra into a service mesh pairs capabilities of the two technologies in a way that makes running stateless services much easier.

In this blog, I’ll cover the steps necessary to configure Istio with mTLS enabled in a Kubernetes Cassandra environment.  We’ve collected some information from the Istio community, did some testing ourselves and pieced together a workable solution.  One of the benefits you get with Aspen Mesh is our Istio expertise from running Istio in production for the past 18 months.  We are tightly engaged with the Istio community and continually testing and working out the kinks of upstream Istio. We’re here to help you with your service mesh path to production!

Let’s consider how Cassandra operates.  To achieve continuous availability, Cassandra uses a “ring” communication approach.  Meaning each node communicates continually with the other existing nodes. For Cassandra’s node consensus, the nodes send metadata to several nodes through a service called a Gossip.  The receiving nodes then “gossip” to all the additional nodes. This Gossip protocol is similar to a TCP three-way handshake, and all of the metadata, like heartbeat state, node status, location, etc… is messaged across nodes via IP address:port.

In a Kubernetes deployment, Cassandra nodes are deployed as StatefulSets to ensure the allocated number of Cassandra nodes are available at all times. Persistent volumes are associated with the Cassandra StatefulSets, and a headless service is created to ensure a stable network ID.  This allows Kubernetes to restart a pod on another node and transfer its state seamlessly to the new node.

Now, here’s where it gets tricky.  When implementing an Istio service mesh with mTLS enabled, the Envoy sidecar intercepts all of the traffic from the Cassandra nodes, verifies where it’s coming from, decrypts and sends the payload to the Cassandra pod through an internal loopback address.   The Cassandra nodes are all listening on their Pod IPs for gossip. However, Envoy is forwarding only to 127.0.0.1, where Cassandra isn’t listening. Let’s walk through how to solve this issue.

Setting up the Mesh:

We used the cassandra:v13 image from the Google repo for our Kubernetes Cassandra environment. There are a few things you’ll need to ensure are included in the Cassandra manifest at the time of deployment.  Within the Cassandra service, you’ll need to set it to a headless service, or set clusterIP: None, and you have to allow some additional ports/port-names that Cassandra service will need to communicate with:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  namespace: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
  - name: tcp-client
    port: 9042
  - port: 7000
    name: tcp-intra-node
  - port: 7001
    name: tcp-tls-intra-node
  - port: 7199
    name: tcp-jmx
  selector:
    app: cassandra

The next step is to tell each Cassandra node to listen to the Envoy loopback address.  

This image, by default, sets Cassandra’s listener to the Kubernetes Pod IP.  The listener address will need to be set to the localhost loopback address. This allows the Envoy sidecar to pass communication through to the Cassandra nodes.

To enable this you will need to change the config file for Cassandra or the cassandra.yaml.

We did this by adding a substitution to our Kubernetes Cassandra manifest based on the Istio bug:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: cassandra
  name: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v13
        command: [ "/usr/bin/dumb-init", "/bin/bash", "-c", "sed -i 's/^CASSANDRA_LISTEN_ADDRESS=.*/CASSANDRA_LISTEN_ADDRESS=\"127.0.0.1\"/' /run.sh && /run.sh" ]
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 7001
          name: tls-intra-node
        - containerPort: 7199
          name: jmx
        - containerPort: 9042

This simple change uses sed to patch the cassandra startup script to listen on localhost.  

If you’re not using the google-samples/cassandra container you should modify your Cassandra config or container to set the listen_address to 127.0.0.1.  For some containers, this may already be the default.

You’ll need to remove any ServiceEntry or VirtualService resources associated with the Cassandra deployment as no additional specified routing entries or rules are necessary.  Nothing external is needed to communicate, Cassandra is now inside the mesh and communication will simply pass through to each node.

Since the clusterIP is set to none for the Cassandra Service will be configured as a headless service (i.e. setting the clusterIP: None) a DestinationRule does not need to be added.  When there is no clusterIP assigned, Istio defines load balancing mode as PASSTHROUGH by default.

If you are using Aspen Mesh, the global meshpolicy has mTLS enabled by default, so no changes are necessary.

$ kubectl edit meshpolicy default -o yaml
apiVersion: authentication.istio.io/v1alpha1
kind: MeshPolicy
.
. #edited out
.
spec:
  peers:
  - mtls: {}

Finally, create a Cassandra namespace, enable automatic sidecar injection and deploy Cassandra.

$ kubectl create namespace cassandra
$ kubectl label namespace cassandra istio-injection=enabled
$ kubectl -n cassandra apply -f <Cassandra-manifest>.yaml

Here is the output that shows the Cassandra nodes running with Istio sidecars.

$ kubectl get pods -n cassandra                                                                                   
NAME                     READY     STATUS    RESTARTS   AGE
cassandra-0              2/2       Running   0          22m
cassandra-1              2/2       Running   0          21m
cassandra-2              2/2       Running   0          20m
cqlsh-5d648594cb-86rq9   2/2       Running   0          2h

Here is the output validating mTLS is enabled.

$ istioctl authn tls-check cassandra.cassandra.svc.cluster.local

 
HOST:PORT           STATUS     SERVER     CLIENT     AUTHN POLICY     DESTINATION RULE
cassandra...:7000       OK       mTLS       mTLS         default/ default/istio-system

Here is the output validating the Cassandra nodes are communicating with each other and able to establish load-balancing policies.

$ kubectl exec -it -n cassandra cassandra-0 -c cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load     Tokens  Owns (effective)  Host ID            Rack
UN  100.96.1.225  129.92 KiB  32   71.8%       f65e8c93-85d7-4b8b-ae82-66f26b36d5fd Rack1-K8Demo
UN  100.96.3.51   157.68 KiB  32   55.4%       57679164-f95f-45f2-a0d6-856c62874620  Rack1-K8Demo
UN  100.96.4.59   142.07 KiB  32   72.8%       cc4d56c7-9931-4a9b-8d6a-d7db8c4ea67b  Rack1-K8Demo

If this is a solution that can make things easier in your environment, sign up for the free Aspen Mesh Beta.  It will guide you through an automated Istio installation, then you can install Cassandra using the manifest covered in this blog, which can be found here.   

4 thoughts on “Running Stateful Apps with Service Mesh: Kubernetes Cassandra with Istio mTLS Enabled

  1. Thank you for the Awesome article!!

    Currently, I’m deploying the same https://gist.github.com/andrewjjenkins/491cf40e0e412c06c491356a22307a43 however the last two deployments (ie cqlsh and cqlsh-noistio) in GKE cluster with Istio 1.1.0 however my node tool status query is failing with the below error

    kubectl exec -it -n cassandra cassandra-0 -c cassandra — nodetool status

    nodetool: Failed to connect to ‘127.0.0.1:7199’ – ConnectIOException: ‘non-JRMP server at remote endpoint’.
    command terminated with exit code 1

    By the Readiness Probe uses the same node tool cmd and therefore the readiness check in also failing. I tried adding the Destination Rule for K8s API Server as mentioned in the Istio Documentation to bypass the TLS but it not helping.

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
    name: “api-server”
    labels:
    app: istio-security
    chart: security-1.0.6
    release: istio
    heritage: Tiller
    spec:
    host: “kubernetes.default.svc.cluster.local”
    trafficPolicy:
    tls:
    mode: DISABLE

    Any help will be much appreciated!!

    1. Neeraj,
      Great to see you are interested in the service mesh.

      For this particular architecture, we do not need a DestinationRule, as we enabled mTLS mesh wide in the meshpolicy. If mTLS is conflicting with cassnadra’s gossip heartbeat, it would cause to pod to fail. You likely would not be able issue a kubectl exec to the Cassandra node.

      It looks like you are successfully issuing a kubectl exec command to the cassandra-0 node. It appears that it is the nodetool command that is having issues, stating it is unable to connect to cassandra.

      One potential reason could be, there is not enough allocated memory and Cassandra is not running properly. Using GKE n1-standard-2 (2GB mem) nodes or larger should be sufficient.

      Feel free to contact support@aspenmesh.io. We’ll dive in and take a look.

  2. I’ve been trying to replicate this setup for a small environment, and have been struggling with a few things. I’m not sure if there’s been changes to istio or perhaps some details I missed or were left out. How did you overcome these challenges:

    * first node in statefulset can’t talk to itself, since istio doesn’t allow a host to talk to itself? (https://github.com/istio/istio/issues/12551)
    * cassandra expects actual src ip address to match, but istio proxies and causes the src ip to be “127.0.01”?

    1. Hi Andrew, we agree that both of your points are valid, but we have not seen this impact our Cassandra environments. We would love to know more about your configuration. Feel free to reach out directly. (support@aspenmesh.io or joe@aspenmsh.io)

      Regarding your first point:
      By doing a port-forward of a cassandra node and looking at the envoy config_dump, it appears that envoy is assigning the node ips as “inbound” rather than “outbound”. This is the crux of issue #12551 and will impact most clustered, stateful apps.

      In our environment though, the Cassandra nodes either do not ask for themselves or are smart enough to know who the local node is. The gossip is able to communicate and all three cassandra pods successfully start. (See below)

      cassandra-0 2/2 Running 1 91d 100.96.3.67
      cassandra-1 2/2 Running 2 91d 100.96.4.74
      cassandra-2 2/2 Running 3 91d 100.96.1.241

      $kubectl -n cassandra port-forward cassandra-0 15000
      Then looking at the config_dump:
      .
      .
      .
      “dynamic_active_listeners”: [
      {
      “version_info”: “2019-05-13T23:04:14Z/94”,
      “listener”: {
      “name”: “100.96.3.67_9042”,
      “address”: {
      “socket_address”: {
      “address”: “100.96.3.67”,
      “port_value”: 9042
      }
      .
      .
      .
      {
      “name”: “envoy.tcp_proxy”,
      “config”: {
      “stat_prefix”: “inbound|9042|tcp-client|cassandra.cassandra.svc.cluster.local”,
      “cluster”: “inbound|9042|tcp-client|cassandra.cassandra.svc.cluster.local”,

      Regarding the second point, it is correct that the nodes are identifying with the src IP and envoy passes the connection through as 127.0.0.1, but again this did not seem to be a problem in our environment.

      $ kubectl -n cassandra exec -it cassandra-0 nodetool status
      Datacenter: DC1-K8Demo
      ======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      — Address Load Tokens Owns (effective) Host ID Rack
      UN 100.96.1.241 151.04 KiB 32 55.4% 57679164-f95f-45f2-a0d6-856c62874620 Rack1-K8Demo
      UN 100.96.3.67 158.03 KiB 32 72.8% cc4d56c7-9931-4a9b-8d6a-d7db8c4ea67b Rack1-K8Demo
      UN 100.96.4.74 126.22 KiB 32 71.8% f65e8c93-85d7-4b8b-ae82-66f26b36d5fd Rack1-K8Demo

      $ kubectl -n cassandra exec -ti cqlsh-5d648594cb-86rq9 bash
      # cqlsh 100.96.3.67 9042
      Connected to K8Demo at 100.96.3.67:9042.
      [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
      Use HELP for help.
      cqlsh>
      cqlsh> use system_schema;
      cqlsh:system_schema> select keyspace_name,table_name from tables where keyspace_name = ‘system’;

      keyspace_name | table_name
      —————+————————–
      system | IndexInfo
      system | available_ranges
      system | batches
      .
      .
      .

Leave a Reply

Your email address will not be published. Required fields are marked *