Master Kubernetes resource management with Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Learn when to use each and how to configure them for optimal performance.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Kubernetes provides three powerful autoscaling mechanisms to optimize resource utilization and costs. Understanding when and how to use each is crucial for running efficient cloud-native applications.
HPA scales the number of pod replicas based on observed metrics like CPU, memory, or custom metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Use Cases:
VPA adjusts CPU and memory requests/limits for pods without changing replica count.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
Use Cases:
Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-status
namespace: kube-system
data:
nodes.min: "3"
nodes.max: "10"
Use Cases:
Here's a complete setup for a production web application:
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: myapp/api:v1.0.0
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
---
# HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Set up alerts for:
Choosing the right autoscaling strategy depends on your application architecture, traffic patterns, and cost requirements. HPA is the most common choice, but VPA and Cluster Autoscaler have their place in specific scenarios.
For Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.
Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.
Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.
For Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.
Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.
Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.
For Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.
Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.
Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.
Cloud Disaster Recovery Runbook Design. Practical guidance for reliable, scalable platform operations.
Linux Performance Baseline Methodology. Practical guidance for reliable, scalable platform operations.
Explore more articles in this category
SLO-Based Monitoring for APIs. Practical guidance for reliable, scalable platform operations.
Secure Container Supply Chain Controls. Practical guidance for reliable, scalable platform operations.
Incident Response for Platform Teams. Practical guidance for reliable, scalable platform operations.