Overcome the limitations of Kubernetes’ Autoscaler (HPA) to achieve reliable, dynamic autoscaling

Overcome the limitations of KubernetesAutoscaler (HPA) to achieve reliable, dynamic autoscaling

Making HPA not suck and live up to its potential

Knowledge Hub

Browse articles, white papers and case studies.


This paper will cover: 

  • How Kubernetes Horizontal Pod Autoscaler (HPA) works 
  • Biggest obstacles users face today using HPA 
  • How custom metrics adapters are used to try to overcome limitations 
  • An approach that optimizes HPA’s functionality to make it possible to dynamically autoscale reliably and without expensive overprovisioning. 

Common scaling problem with Kubernetes:
A retailer with a large e-commerce site struggles with huge peaks in demand followed by big drops. These can be caused by a number of business events, such as a large sale, new product introduction, or times like weekends and evenings. Predicting these peaks and valleys is difficult, since they are usually business driven, not driven by technology.  

Kubernetes’ Horizontal Pod Autoscaler (HPA) isn’t working how they hoped to handle the changes in traffic — it often fails to scale up the correct number of replicas fast enough, causing performance degradation. They need a solution that can learn the patterns of their e-commerce site and have the capacity ready for any traffic spikes in real-time. Their goal is to have a highly responsive dynamic scaling solution that meets demand and gives them peace of mind that their application is always performant. 

About Kubernetes Horizontal Pod Autoscaler (HPA)

Kubernetes can autoscale your applications based on metric thresholds such as CPU and memory utilization using the Kubernetes Horizontal Pod Autoscaler (HPA) resource. The goal of HPA is critical in ensuring that your application can always handle the current demand to meet your SLOs and optimize the amount of resources that your application uses.  

Note that adding or removing replicas is known as horizontal scaling; adding or removing resources (e.g., CPU or memory) to existing pods is referred to as vertical scaling. 

In practice, most people find that the default HPA falls short of their needs. To understand why this is, we must first understand how the HPA works. 

How HPA works

HPA is implemented as an intermittent control loop. During each loop, the HPA controller queries the Resource Metrics API for pod metrics or a custom metrics API for custom metrics (more on this in a bit).  The HPA controller collects the metrics for all pods that match the HPA’s selector specification and takes the mean of those metrics to determine how many replicas there should be. 

The formula to determine the number of replicas that should be running is pretty simple:

dR=ceiling(cR∗(cM / dM))

dR = desired replicas
cR = current replicas
cM = current metric value
dM = desired metric value

The number of replicas of a pod are then adjusted to the desired replicas value. The HPA controller adds or removes replicas incrementally to avoid peaks or dips in performance, and to lessen the impact of fluctuating metrics. 

The HPA can also be configured to use a custom metrics API to provide custom metrics to the HPA controller. The HPA controller uses custom metrics as raw metrics, meaning no utilization is calculated.  A custom metrics API is useful for scaling on non-pod metrics, such as HTTP metrics. 

More details on how HPA works can be found in Kubernetes documentation here. 

The Challenges with HPA

HPA is a powerful construct in Kubernetes and promises to help deliver on the value of cloud native applications by ensuring that there are always the right number of replicas of a pod to meet the SLOs and resource utilization requirements of an application. However, there are several limitations currently in HPA, and users, in practice, aren’t realizing the value that they hoped from HPA.

Pod-level metrics don’t reflect performance of different containers 

By default, metrics are retrieved for a pod, not container. Often pods have multiple containers, such as a sidecar for logging and an API (this is almost always the case in Istio). Since the HPA controller is gathering pod-level metrics, these metrics can be skewed because performance characteristics are usually much different between containers within the same pod. 

Note that there is experimental support container-level metrics; this feature is currently in alpha status.

Threshold-based metrics don’t fit many cases

Autoscaling relies on some threshold value of a metric being crossed. For example, you could define an HPA that scales out or in when a threshold of 80% CPU utilization is crossed:
kubectl autoscale deployment my-app –cpu-percent=80 –min=3 –max=6 

In this example, the deployment ‘my-app’ will be scaled to a maximum of 6 replicas when the CPU utilization exceeds 80%. On the surface, this seems like a good way to make sure that CPU utilization due to spikes in traffic is maintained at an acceptable level. 

However, due to the way that autoscaling works, this often doesn’t address performance requirements acceptably. When the 80% CPU utilization threshold is crossed, Kubernetes will add replicas based on the HPA policy defined, wait for a period of time (default is 15 seconds) and remeasure the metric. If the threshold is still exceeded, Kubernetes will add more replicas, remeasure the metric, and so on, until the metric drops below the threshold, or the maximum number of replicas is reached. 

Often, increases in traffic that cause performance drops are sustained. This may be due to business events such as sales or promotions, onboarding new users, or a few other unanticipated causes. Due to the incremental additions of replicas by Kubernetes, by the time enough replicas are added to handle the sustained increase in load, it may be too late and significant degradation or outages have occurred. 

Autoscaling Algorithm is too simplistic to be reliable

The algorithm used to determine the desired number of replicas for a deployment is pretty simple and can lead to inaccurate calculations in many situations. Since the algorithm is a simple ratio of the current metric value to the desired metric value across target pods, there are several scenarios that can affect the calculation: 

  • Pods that are not in a Ready state (Initializing, Failed, etc.) 
  • Pods that are missing metrics for some reason 
  • Multiple metrics with incongruent units (utilization vs raw metrics for example) 

This often leads to more replicas than needed, or even worse, not enough replicas, which can cause an outage. 

Custom Metrics Adapters require Ops implementation and still fall short

Kubernetes supports custom metrics adapters to supply custom metrics to the HPA controller, to extend HPA metrics beyond pod CPU and memory. These adapters must follow Kubernetes’ custom metrics API specification. By implementing a custom metrics adapter, you can define autoscaling rules for any arbitrary metric, such as HTTP throughput or custom metrics emitted by an application. A common custom metrics adapter is a Prometheus adapter, allowing you to write PromQL queries to fetch metrics from a Prometheus server. 

Using custom metrics adapters is a powerful way to define autoscaling rules for metrics that are meaningful to meeting application SLOs. But they also introduce a lot of operational overhead. In order to capture meaningful metrics, you’ll probably need to build your own adapter. This requires your DevOps or Platform Engineering team to develop, test, certify, operate, patch, and then support a critical piece of infrastructure for applications. After all this additional work, the fact remains that your custom metrics adapters are still threshold-based, creating the same challenges as the built-in Kubernetes metrics adapter. 

Hard to Define Meaningful SLIs as Metrics

It can be difficult to define what metrics, either built-in pod metrics or custom metrics, truly indicate when your application needs to scale out or in. The solution is a complex combination of SLIs, container metrics, supporting infrastructure like network and storage, as well as external dependencies like database or queue connections. And unless your application has been running in production for a while, any starting point for these indicators is an educated guess at best. 

Most developers understand behavioral indicators that an application needs to be scaled, like “database queries become slow during certain times of the day”, or “my application uses more memory than I expected when the number of requests grow.” But they probably can’t translate these behaviors into actionable metrics and indicators that can be encoded into YAML. Often, developers take a blanket approach to ensure they maintain their SLOs. They define very conservative thresholds in the hope that it covers them no matter what may happen. This inevitably leads to over-provisioned, under-utilized platform resources, essentially dismissing the power inherent in a cloud platform to dynamically optimize app performance.

Aspen Mesh Optimizes Predictive Scaling with HPA

At Aspen Mesh, we believe that Horizontal Pod Autoscaler is a powerful and necessary feature in a cloud-native platform. Unfortunately, the challenges of HPA often outweigh the value. HPA users frequently choose to overprovision applications rather than rely on HPA to help meet SLOs. At Aspen Mesh we are making it possible to take full advantage of HPAs’ capabilities. We are using the HPA API to give you what you need to turn on HPA and ensure it’s optimized without locking you into a proprietary solution.  

To summarize, the shortcomings of HPA are centered around three areas:  

  1. Threshold-based, reactive scale events 
  2. The lack of knowledge of when to scale
  3. Primitive metrics used to target autoscaling 

To address these challenges and to make HPA the valuable integrant of a cloud-native platform that it promises to be, Aspen Mesh is introducing its Predictive Scaling for Kubernetes and Istio. Predictive Scaling leverages the HPA API, and uses machine learning and rich telemetry data available from your applications to model your applications’ behavior and give insight into when applications should be scaled out or in. 

Aspen Mesh’s Predictive Scaling addresses each of the critical shortcomings of HPA 

  1. Uses ML models to learn and predict your applications’ behaviors over time, obviating thresholds and reactive scaling. 
  2. Gives you insight into your applications’ behaviors over time so that you don’t have to take a guess at a starting point for thresholds and dial them in over time. 
  3. Leverages all telemetry data available for your application, including HTTP metrics, application metrics, Pod and container metrics, and platform metrics, to create a 360º view of your application. No more relying on one or two unrelated metrics. Over time we will add trace and log data to our models. 

Request Early Access now to get full access to the Aspen App Intelligence Platform and be the first to try Aspen Mesh’s Predictive Scaling and other solutions as we release them. Getting started takes just a few minutes, and you’ll get new insight into your app’s behavior on Day One. And don’t hesitate to reach out to have a conversation with us anytime. 

Kubernetes and Istio are powerful but hard to handle: This is where Aspen App Intelligence Platform comes in

Kubernetes and Istio are powerful but hard to handle: This is where Aspen App Intelligence Platform comes in

Knowledge Hub

Browse articles, white papers and case studies.


You have Kubernetes and Istio, and you understand the power of these platforms. Kubernetes allows you to operate your applications at cloud scale, and Istio gives your applications rich traffic management and security capabilities. There are still a lot of things that Kubernetes and Istio don’t do well, leaving you to come up with workarounds on your own. 

At Aspen Mesh we have talked to many companies that are far along in their cloud-native journey. As we listen to these early cloud native adopters inevitably the discussion turns to how Kubernetes and Istio are core to their platform yet hard for their development teams to access.

These open source technologies are not turnkey solutions, and DevOps leaders talk about needing to build their own solution to access Istio data and functionality because the tools available are hard to implement.

Today it’s virtually impossible for an app owner to take advantage of Istio to meet their SLO’s without becoming Istio experts.  

As we hear about what’s keeping modern app teams from achieving their goals, there are common problems we mentioned over and over. We will share how we are tackling these big problems today.

The Biggest Kubernetes and Istio Problems for High-Performing Teams

Autoscaling in Kubernetes isn’t very useful.

You can only scale on container-based thresholds like CPU and memory usage, or create your own custom metrics adapter. The scaling algorithm in Kubernetes doesn’t really fit for a production workload. By the time Kubernetes scales to enough replicas to handle the load, it is often too late, leaving you with degraded performance or an outage. 

Using Istio for continuous delivery sounds great — but is risky in practice.

Using Istio to perform canary-style and progressive rollouts is a powerful way to make continuous delivery of your applications possible. However, with Istio, the decision to move traffic between versions is based on basic metrics – like error rates and latency.

Today there are many continuous delivery tools that support canary deployments, progressive delivery and A/B testing by leveraging the routing functionality of your service mesh. However, these tools only monitor HTTP indicators to determine if another instance of your new version should be deployed. They don’t consider real behavior and usage of your application. Istio and your CD tool of choice aren’t able to make migration decisions based on meaningful data, like your application’s behavior.

Invaluable Kubernetes and Istio data and functionality is not accessible to your App Dev Team.

Software development has changed dramatically in the last several years. New development methods and programs are necessary to realize the value of cloud. The move to a modern app environment introduces a whole host of challenges and complexity that were not present before.

Developers increasingly own the whole lifecycle of their application; their responsibilities extend beyond what was traditionally expected of them. Their ability to see what’s really happening that is impacting an application’s behavior is limited. Application owners’ error budgets and SLOs aren’t being met. All of this means growing frustration for cloud operators and developers who are unable to manage their workloads at scale and meet their SLOs.

You’re not confident you know what’s going on in all your clusters.

Most people have multiple Kubernetes clusters with Istio configured in a multi-cluster topology. However, it is difficult to manage multiple clusters, see what’s going in and out of each of those clusters, and replicate configuration across these clusters. Trying to correlate data between multiple tabs in your APM tool is tedious and prone to errors.

The Aspen App Intelligence Platform

We understand the challenges and complexities of cloud-native transformation.  

Aspen Mesh has years of experience helping customers adopt, operationalize, and support Istio. Our heritage is shaping early service mesh code and pioneering mesh implementation in large enterprises. We have helped one of the world’s largest, most complex microservices environments achieve unprecedented app performance.  

We believe operationalizing Kubernetes and Istio is crucial for cloud-native transformation – and few understand the intricacies of these technologies better than us. The Aspen App Intelligence Platform is a cloud-based solution we have created to allow app teams to harness the power and data of Kubernetes, service mesh, and our machine learning models – making it possible to unlock new capabilities for app owners. Our goal is to enable developers to move faster and safer to accelerate your cloud-native journey. 

If you’re part of a high-performing team that’s driving toward flawless app performance at scale and you want more from your cloud native environment, we want to hear from you. We are working on big problems in new ways – from predictive autoscaling and continuous rollouts to empowering dev teams with the ability to build SLO assurance into their apps.  

You can join our Early Access user program to receive an account at no charge for a year, and full access to all of our solutions. This is an opportunity to share what you’re striving to achieve and work with us to build incredible solutions.

How we are shaping our Predictive Autoscaling Solution

Kubernetes autoscaling, in the form of Horizontal Pod Autoscalers, provides the ability for Kubernetes to add or remove replicas to your deployment if certain criteria are met. These criteria are specified as thresholds for simple container metrics like memory and CPU utilization. However, there are many other indicators of how your application should be scaled beyond memory and CPU, for example HTTP or application-emitted metrics. 

Kubernetes HPA incrementally adds replicas. After each replica is added, Kubernetes waits for a period of time, re-measures the threshold, and adds another replica if the threshold is still crossed. It continues this process until the metric falls below the threshold. Often, spikes in traffic or utilization are immediate and sustained; the Kubernetes method of autoscaling often doesn’t add enough replicas in time to address the spike. 

Due to these limitations, many people choose to overprovision their clusters and workloads because they realize Kubernetes HPA doesn’t work very well. Our Predictive Autoscaling solution addresses these issues and makes autoscaling an essential part of your applications’ lifecycle.

Using machine learning models, our platform learns your application’s behavior over time. When Predictive Autoscaling predicts that a scaling event is approaching, it will either alert you to the recommended number of replicas or automatically scale your deployment to the right number of replicas.

Our Progressive Rollout Solution eliminates manual testing. 

Continuous Delivery leverages the power of Kubernetes and your service mesh to incrementally add new versions of your application without any downtime. This helps deliver on the CD promise of quickly deploying new versions or patches at any time — rather than scheduling maintenance windows and ‘batching’ updates – without any interruption in performance.  

Today the lack of insight into what optimal app behavior looks like and what other factors drive behavior prevents application owners from feeling confident the new version is fit to be fully deployed. You must resort to endless manual testing and intervention to verify the new version is ready. Fear keeps teams from fully embracing a continuous delivery strategy — and squanders a key cloud-native platform value proposition. 

Aspen Mesh’s intelligent Progressive Rollout solution learns the behavior of your application through machine learning models. With this knowledge, it provides feedback to your CD tool about your application’s behavior – is it correct and therefore the deployment should be promoted, or should it be rolled back. You now have more confidence in your applications’ worthiness to be deployed without manual intervention. Now you are taking full advantage of how a cloud-native platform enables seamless continuous delivery. 

App owners can access Kubernetes and Istio with no learning curve.

The Aspen App Intelligence Platform includes app 360º App Performance Insight for App Owners. This gives app owners an unprecedented view of their applications. Traditional APM dashboards are built by and optimized for infrastructure owners – not application owners. Today an app owner is forced to switch between multiple browser tabs to find meaningful data, and then they must somehow correlate that data. Aspen Mesh puts the information you need to manage your applications and SLOs at your fingertips – instead of a bunch of line charts and dashboards that don’t offer any meaningful and actionable information. 

Aspen Mesh also makes recommendations to optimize your workloads. You shouldn’t have to wait until performance degrades and SLOs fail to tune your applications for optimal performance. Using telemetry data from Kubernetes and your service mesh, Aspen Mesh proactively gives recommendations for tuning before there is an issue. 

Manage multiple clusters from one clear data view.

Often people have multiple Kubernetes clusters for their workloads and use Istio’s multi-cluster topologies to form a single logical service mesh across clusters. This is a very powerful technique that can address many needs, such as failover and high availability, locality-based load balancing, and cloud cost optimization. 

Regardless of the reasons for having a multi-cluster configuration, it introduces a whole set of new challenges and complexity, like managing SLOs across clusters, visualizing and verifying correct traffic flow between clusters, and managing configuration parity between clusters. And as more clusters are added these challenges and complexities grow exponentially. 

Our multi-cluster management solution gives a 360º view of your applications across all your clusters. You have the ability to visualize, manage, and audit your workloads and data wherever they are running without having to jump between browser tabs and command line tools. Within a single view, you can visualize and validate traffic between clusters, to which cluster traffic is being sent, performance characteristics across clusters, and much more. 

What’s next? Lets have a conversation, you can email me or schedule a time to talk.

I recommend you request access to our Early Access program, there’s no obligation and you get full access to the Aspen App Intelligence Platform for a year at no charge for your entire teamLearn more about Early Access and the 360° Performance Insight for App Owners that is available on Day One. Better yetlets meet and we’ll share what are working on with you – and you can tell us what you want to achieve. Contact Us.