Usage scenario
You have deployed some workloads to your Kubernetes cluster. So far, you have only configured a fixed amount of simultaneous replicas for each workload. However, you want to be able to scale your workloads automatically based on the current load. For this, you want to explore the capabilities of Horizontal Pod Autoscalers (HPA) in Kubernetes.
What are Horizontal Pod Autoscalers (HPA)?
With Deployments, you can define a fixed replica count for the managed ReplicaSet. This amount might be too small or too big for the current load. For example, if you have a Deployment with three replicas and the current load is very high, you should scale up the Deployment to five or more replicas. If the load is low, you should scale down the Deployment to two replicas. This is where Horizontal Pod Autoscalers (HPA) come into play. With Horizontal Pod Autoscalers, you can define the minimum and the maximum number of replicas for a Deployment. The HPA will then automatically scale the Deployment up or down based on the current load.
This automatic scaling has to be configured. The scaling is based on a metric, such as CPU or memory usage. The HPA will then check the metric's current value and compare it to a target value. If the current value is higher than the target value, the HPA will scale up the Deployment. For example, you can define a target CPU usage of a Pod of 50%. If the current CPU usage of a Pod is higher than 50%, the HPA will scale up the Deployment. If the current CPU usage is below 50%, the HPA will scale down the Deployment. Read more about metrics in the official documentation.
A Horizontal Pod Autoscaler is explicitly called horizontal because it scales the number of Pods horizontally. This means that the number of Pods increases or decreases by adding or removing Pods. This is in contrast to vertical scaling, where the resources of a Pod are increased or decreased. For example, you can increase the CPU and memory of a Pod by increasing the resources section of the Pod definition.
Defining a Horizontal Pod Autoscaler (HPA)
To define a Horizontal Pod Autoscaler, you need to create a YAML file that defines the HPA. The following YAML file defines a Horizontal Pod Autoscaler that scales a Deployment named hello-kyma:
123456789101112131415161718apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hello-kyma-hpa
spec:
minReplicas: 3
maxReplicas: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hello-kyma
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
The minReplicas and maxReplicas fields define the minimum and the maximum number of replicas for the Deployment. The scaleTargetRef field defines the Deployment that should be scaled. The metrics field defines the metric that should be used for scaling. In this case, the metric is CPU usage. The target field defines the target value for the metric. In this case, the target value is 50% CPU usage.