Master Kubernetes resource management with Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Learn when to use each and how to configure them for optimal performance.

On this page

Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler

Kubernetes provides three powerful autoscaling mechanisms to optimize resource utilization and costs. Understanding when and how to use each is crucial for running efficient cloud-native applications.

Understanding the Three Autoscalers #

Horizontal Pod Autoscaler (HPA)#

HPA scales the number of pod replicas based on observed metrics like CPU, memory, or custom metrics.

yaml.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Use Cases:

Stateless applications
When you need to scale based on traffic
Applications that can handle multiple instances

Vertical Pod Autoscaler (VPA)#

VPA adjusts CPU and memory requests/limits for pods without changing replica count.

yaml.yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

Use Cases:

Stateful applications
When pod count is fixed
Optimizing resource allocation

Cluster Autoscaler #

Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes.

yaml.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-status
  namespace: kube-system
data:
  nodes.min: "3"
  nodes.max: "10"

Use Cases:

Dynamic workloads
Cost optimization
Cloud environments (AWS, GCP, Azure)

Best Practices #

Don't use HPA and VPA together - They can conflict
Start with HPA - It's simpler and works for most use cases
Monitor metrics - Use Prometheus and Grafana
Set appropriate thresholds - Avoid thrashing
Test scaling behavior - Load test your applications

Real-World Example #

Here's a complete setup for a production web application:

yaml.yaml

# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: myapp/api:v1.0.0
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 1000m
            memory: 512Mi
---
# HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Monitoring and Alerts #

Set up alerts for:

Scaling events
Resource exhaustion
Failed scaling attempts
Cost anomalies

Conclusion #

Choosing the right autoscaling strategy depends on your application architecture, traffic patterns, and cost requirements. HPA is the most common choice, but VPA and Cluster Autoscaler have their place in specific scenarios.

Production Notes 1 #

For Kubernetes Autoscaling: HPA vs VPA vs Cluster Autoscaler, define pre-deploy checks, rollout gates, and rollback triggers before release. Track p95 latency, error rate, and cost per request for at least 24 hours after deployment. If the trend regresses from baseline, revert quickly and document the decision in the runbook.

Keep the operating model simple under pressure: one owner per change, one decision channel, and clear stop conditions. Review alert quality regularly to remove noise and ensure on-call engineers can distinguish urgent failures from routine variance.

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 2 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.

Production Notes 3 #

Repeatability is the goal. Convert successful interventions into standard operating procedures and version them in the repository so future responders can execute the same flow without ambiguity.