Usage scenario
You have deployed workloads to your Kubernetes cluster. So far, you have only configured a fixed number of simultaneous replicas for each workload. However, you want to be able to scale your workloads automatically based on the current load. For this, you want to explore the capabilities of Horizontal Pod Autoscalers (HPA) in Kubernetes.
What are HorizontalPodAutoscalers?
With 'Deployments', you can define a fixed replica count for the managed ReplicaSet. This amount might be too small or too big for the current load. For example, if you have a Deployment with three replicas and the current load is very high, you should scale up the Deployment to five or more replicas. If the load is low, you should scale down the Deployment to two replicas. This is where HPAs come into play. With HPAs, you can define the minimum and the maximum number of replicas for a Deployment. The HPA then automatically scales the Deployment up or down based on the current load.
You can configure this automatic scaling based on a metric, such as CPU or memory usage. The HPA checks the metric's current value and compares it to a target value. If the current value is higher than the target value, the HPA scales up the Deployment. For example, you can set a target CPU usage of a Pod to 50%. If the current CPU usage of a Pod is higher than 50%, the HPA scales up the Deployment. If the current CPU usage is below 50%, the HPA scales down the Deployment. Read more about metrics in Container resource metrics.
A HPA is explicitly called horizontal because it scales the number of Pods horizontally. This means that the HPA adds or removes Pods to increase or decrease their number. This is in contrast to vertical scaling, where the resources of a Pod are increased or decreased. For example, you can increase the CPU and memory of a Pod by increasing the resources section of the Pod definition.
Defining a HorizontalPodAutoscaler
To define an HPA, you create a YAML file. The following YAML file defines an HPA that scales the Deployment named hello-kyma:
123456789101112131415161718apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hello-kyma-hpa
spec:
minReplicas: 3
maxReplicas: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hello-kyma
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50The minReplicas and maxReplicas fields define the minimum and the maximum number of replicas for the Deployment. The scaleTargetRef field defines the Deployment that is scaled. The metrics field defines the metric used for scaling. In this case, the metric is CPU usage. The target field defines the target value for the metric. In this case, the target value is 50% CPU usage.
