Autoscaling Microservices in Kubernetes: Vertical Scaling
VPA Modes
In the previous example, we used the Auto mode.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: stateless-flask-vpa
namespace: stateless-flask
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: stateless-flask
updatePolicy:
# This is the mode we used
updateMode: "Auto"
There are four modes available in the Vertical Pod Autoscaler (VPA). Each mode defines how VPA manages the resource requests and limits of your Pods.
Auto
In Auto mode, the VPA automatically updates the requests and limits of Pods during both scale-up and scale-down events.
This mode simply works in an automated fashion:
- Scale-up: When the VPA detects that a Pod is consistently using more CPU or memory than requested, it increases the Pod’s resource requests and limits.
- Scale-down: When a Pod is underusing its allocated resources, the VPA can reduce its requests and limits. This frees up cluster resources for other workloads.
It’s useful for workloads that require high availability and cannot tolerate downtime, as well as for scenarios where you want to adjust resource requirements dynamically while minimizing the impact on running Pods. It’s particularly suitable for stateless applications that can gracefully handle updates.
Initial
In Initial mode, the VPA updates the requests
Cloud-Native Microservices With Kubernetes - 2nd Edition
A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in KubernetesEnroll now to unlock all content and receive all future updates for free.
