Optimizing Microservices in EKS With HPA & VPA

Microservices architecture has become a preferred approach for building scalable and maintainable applications. When running microservices on Amazon Elastic Kubernetes Sevice (EKS), optimizing performance and resources usage is crucial.

Kubernetes provides two autosaling mechanisms that help to achieve this Horizontal Pod Autosaler (HPA) and Verticla Pod Autoscaler (VPA).

In this blog we will explore how to optimize microservices in EKS using both HPA and VPA with examples.

Understanding Autoscalers

Horizontal Pod Autoscaler (HPA)

The HPA scales the number of pods in a deployment based on observed CPU, memory or custom metrics.

It is ideal for scaling horizontally, i.e increasing the number of running instances (pods) of a microservice when demand increases.

Vertical Pod Autoscaler (VPA)

The VPA adjusts the CPU and memory requests and limits for a pod. It is useful for scaling vertically by adjusting the amount of resources allocated to each pod, ensuring that each pod has enough resources to handle it’s workload efficiently.

Why Use Both HPA & VPA ?

While HPA increases the number of pods to manage load spikes, VPA ensure each pod has the optimal resource configuration. Together these autoscalers, HPA can handle increased requests by adding more pods and VPA can handle varying resource needs by adjusting the resource requests and limits of existing pods.

Using both ensure that we are neither over provisioning resources (wasting cost) nor undderprovisioning (leading to performance bottlenecks).

Setting Up EKS with HPA and VPA

Create an EKS Cluster

To create a new EKS cluster, follow the steps here.

Deploy Microservices

Deploy your microservices using Kubernetes mainfests. For example here’s a sample deployment file for a simple microservice,

apiVersion: apps/v1
kind: Deployment
metadata:
     name: my-microservice
spec:
    replicas: 3
    selector:
        matchLabels:
             app: my-microservice
    template:
       metadata:
            labels:
               app: my-microservice
       spec:
           containers:
           - name: my-microservice
             image: my-microservice:latest
             ports:
             -  containerPort: 80
             resources:
                requests:
                    memory: "512Mi"
                    cpu: "500m"
                limits:
                  memory: "1Gi"
                  cpu: "1"

Configure Horizontal Pod Autoscaler (HPA)

To enable horizontal scaling, configure the HPA to automatically adjust the number of pods based on CPU utilization. Here’s an example HPA configuration,

apiVersion: autoscaling/v2
kind: HorizontalPodAutosaler
metadata:
     name: my-microservice-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-microservice
    minReplicas: 2
    maxReplicas: 10
    metrics:
     -   type: Resource
         resource:
             name: cpu
             target:
                type: Utilization
                averageUtilization: 60

In the above configuration,

minReplicas: The minimum number of pods to run.
maxReplicas: The maximum number of pods to scale.
averageUtilization: The average CPU utiization threshold for scaling. Here, it is set to 60% meaning that if the CPU utilization goes above 60% new pods will be added.

Configure Vertical Pod Autoscaler (VPA)

Next, configure VPA to adjust CPU and memory request dynamically. Here an example VPA configuration,

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
     name: my-microservice-vpa
spec:
    targetRef:
       apiVersion: "apps/v1"
       kind:            Deployment
       name:          my-microservice
    updatePolicy:
        updateMode: "Auto"

With the updateMode set to “Auto” , VPA automatically adjusts the CPU and memory requests for your microservice pods based on their actual usage.

Monitoring and Adjustents

Once HPA and VPA are in place, monitor the application’s performance. Tools like Amazon CloudWatch and Prometheus can help to track resource usage and ensure that autoscaling is working as expected.

Example Use Case

Suppose you have an e-commerce microservice handling order processing. On days like Bank Holiday, traffic surges and the system is strained.

By enabling HPA EKS automatically adds more pods to handle the surge. At the same time VPA adjusts CPU and memory requests for each pod as they receive varying loads.

During normal traffic you might be running 3 pods with each using about 500m CPU. But during peak traffic the HPA scales to 8 pods while the VPA dynamically increase memory allocation to ensure smooth processing.

Best Practices

Set Realistic Metrics: Use accurate metrics to ensure autoscalers react appropriatly. Use kube-metrics-server or custom-metrics to get insights into your service’s actual needs.
Monitor Resource Usage: Continuosly monitir your cluster to avoid over provisioning which could lead to higher costs.
Simulate Load: Before using autoscalers in production simulate heavy load conditions to ensure HPA and VPA respond adequately.
Resource Request and Limits: Carefully set resource requests and limits in the deployment spec to guide the autoscalers effectively.

Conclusion

Optimizing microservices in Amazon EKS using horizontal pod autoscaler (HPA) and Vertical pod autocaler (VPA) ensures that your applications can scale efficiently, handling varying loads while controlling costs. By leveraging HPA and VPA together you can optimize both pod scaling and resource allocation leading to more resilient and cost-effective microservices architecture.

FAQ’s

Q: What is Difference Between Horizontal & Vertical Pod Autoscalers?

Ans:

Horizontal Pod Autoscaler (HPA) scales the number of pods based on resource usage or custom metrics. It increases or decreases the number of pod replicas.
Vertical Pod Autoscaler (VPA) adjusts the resource requests (CPU and memory) for pods ensuring that each pod has sufficient resources for the workload. It scales by adjusting resources within a pod rather than increasing the number of pods.

Q: Can We Use HPA and VPA Together in the Same Deployment?

Ans: Yes, HPA and VPA can be used together in the same deployment. HPA adjusts the number of pods based on the demand, while VPA adjusts the resource requests of each pod. However careful configuration is needed to ensure they don’t confilict.

Q: When Should We Use Only HPA?

Ans: Use only HPA when our application needs to handle sudden increases in requests or traffic and scaling the number of instances (pods) is the most efficient way to handle the load. HPA is most effective when our microservice can be distributed across multiple pods without performance bottlenecks.

Q: When Should We Use Only VPA?

Ans: Use only VPA when the workload within our pods fluctuates and can’t be handled simply by adding more pods. If our microservice frequently uses more CPU or memory than expected VPA adjuts the resource requests to prevent performance degradation or out-ofmemory errors.

Q: Does VPA Kill Running Pods to Update Resource Request?

Ans: Yes, when VPA adjusts the resource requests it may kill and recreate pods with new configurations. To avoid downtime we can configure the VPA Update Mode,

Auto: Automatically applies recommendations by restarting pods.
Initial: Adjusts resources only when a pod is first created.
Off: VPA makes recommndations but does not automatically apply them.

Q: Can We Scale Down the Pods With HPA When Idle?

Ans: Yes, HPA can scale down to the minReplicas values specified in the configuration when resource usage is low or idle. For example, if minReplicas: 1 is set, it can reduce the deployment to a single pod under mininmal load conditions.

Optimizing Microservices in EKS with Horizontal and Vertical Pod Autoscalers (2025)