Vertical Pod Autoscaling in Kubernetes

TOC:

Horizontal Pod Autoscaling vs Vertical Pod Autoscaling

In our last blog Horizontal Pod Autoscaling in Kubernetes with Prometheus on autoscaling, we started off by looking at horizontal auto-scaling of Kubernetes pods and how we can allow HPAs to ingest metrics from Prometheus.

In times where additional capacity is needed, horizontal scaling gives us additional copies of the same computational unit. Instead of allowing a single unit to handle more requests, the load is reduced per unit as requests are distributed across a larger set.

When one first thinks about what vertical autoscaling might mean, one would assume that a vertical pod auto-scaler would be an allegory to vertical scaling of a host machine or VM – in other words, increasing the amount of resource on that machine. If a VM is using 4GB of memory and is using 3.8GB, then make an additional 2GB available to that machine.

This might make sense in, for example, the vSphere world where we can set resource pools for VMs - but in the Kubernetes world, this doesn’t quite make sense.

After all, Kubernetes is only a scheduler that sits on top of a host. The Kubelet determines and reports the amount of resource a host has installed, and using these values and the reported resource required by running workloads, the scheduler determines if a workload can fit onto a node.

What form could vertical pod autoscaling take in theory?

Well, since the scheduler can only work effectively if there are reported requests and limits for workloads, setting requests that are true to real-life application usage will allow the scheduler to ensure that that amount of resource within the pool of resources available in the will be guaranteed for use for that application on a specific node. This may prevent workloads from scheduling if there isn’t a hole on any of the cluster nodes that is big enough to fit that application, but again, this guarantees that we can run existing workloads on the cluster without overwhelming available resource and bringing nodes down.

While it’s possible to have VPA and HPA target the same workload, because VPA works exclusively with CPU and memory resources, you shouldn’t use the same metrics for horizontal scaling. Since the VPA and HPA controllers are not aware of each other (at the moment), both controllers may try to apply incompatible changes to the workload. VPA can be complementary to horizontal autoscaling, but you must use assign non-computational metrics to HPA.

With that out of the way, let’s take a dive into how the VPA works.

VPA Modes

VPA has various operational modes to suit depending on how aggressively you would like pods to be updated with new request values:

VPA Components

Unlike the HPA controller, the components which realise vertical pod autoscaling aren’t installed in Kube by default, so we will need to install the components which make up the VPA architecture. There are three controllers that implement vertical pod autoscaling:

Recommender

The initial task required for autoscaling is to ingest metrics and determine the current usage of the workload. Based on current and past metrics for resources, the recommender will determine a “recommended” set of CPU and memory values for each container.

By default, this will be metrics-server. As metrics-server is designed to store metrics in-memory, it only provides metrics for the last 10 minutes. To provide the recommender with a broader history of the running history of the service it is monitoring, you can also plug in historical metrics from a time-series database like Prometheus.

Updater

As detection of what the “correct” requests should be is delegated to the recommender, the updater controller compares the current and recommended requests of each deployment with a delegated vertical pod autoscaler object. If the update mode of the deployment’s corresponding VPA object is set to Auto or Recreate, the updater controller facilitates the creation of new pods that contain new recommended requests. It doesn’t directly update pods with new recommended values, but instructing the Kube API that a particular pod should be evicted from the cluster. It will rely on other controllers in the Kube master plane to take care of the creation of the new pod and making sure the pod has the new desired request values.

Admission Controller

If you’ve already brushed up on what admission controllers are and their purpose, the VPA also includes a mutating admission webhook. If a VPA object’s mode is set to Auto, Recreate or Initial, this webhook will inject the current request values generated by the Recommender at the time a pod is admitted to the cluster.

VPA Object

Let’s take a look at a VPA object:

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
  annotations:
  name: test-vpa
  namespace: dev
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      maxAllowed:
        memory: 1Gi
      minAllowed:
        memory: 500Mi
  targetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: test
  updatePolicy:
    updateMode: Recreate

There are a couple of interesting things here.

Using resourcePolicy, we can provide some boundaries on how wild the recommender can vary the CPU and memory resources for the pod by assign minimum and maximum values allowable by the recommender. The use case for how these resources are boundaried depends on how many containers are running in a single pod.

If a pod only has a single container, then setting a wildcard value should be fine:

containerName: '*'

If you want to apply this to pods with sidecars, then, of course, you will need to boundary values on a per container basis. It doesn’t make sense to apply a wildcard value, as this will apply the same memory values for every single pod - and I’m sure you don’t need 500-1000MB for every single container in your pod!

If the recommender controller is able to ingest metrics, in about five-minute internals it will generate and then write recommendations into the status block of the VPA object. Four different bounds are generated:

For example:

status:
  conditions:
  - lastTransitionTime: 2019-06-12T14:29:00Z
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: test
      lowerBound:
        cpu: 25m
        memory: 262144k
      target:
        cpu: 25m
        memory: 262144k
      uncappedTarget:
        cpu: 25m
        memory: 262144k
      upperBound:
        cpu: 644m
        memory: 1Gi

Installing

In order to use VPA, you should be using a version of Kube which supports mutating admission webhooks on the API server. This means a cluster that is at least version 1.9. In addition, mutating webhooks need to be enabled on the API server by including MutatingAdmissionWebhook as a value when defining the --admission-control flag. The ordering of admission controllers using this flag isn’t idempotent, so check the Using Admission Controllers documentation.

There are also a couple of VPA-specific requirements:

While VPA version 0.3 requires Kube version 1.9 and over, VPA version 0.4 and 0.5 require a cluster that is version 1.11 and over.

For the recommender to be able to ingest pod metrics, metrics-server also needs to be running on the cluster. As metrics-server is designed to store metrics in-memory, it only provides metrics for the last 10 minutes. To provide the VPA recommender with a broader history of the running history of the service it is monitoring, you can also plug in historical metrics from a time-series database like Prometheus.

Metrics-server isn’t typically installed on the cluster by default, but the easiest way to install it is using Helm:

helm install  --namespace kube-system \
  --name metrics-server stable/metrics-server

Or conversely, Minikube includes it as a bundled add-on:

minikube addons enable metrics-server
``

You can confirm metrics-server is operating as expecting when you are able to view current resource consumption using <code>kubectl top</code>:

```bash
kubectl top nodes

Which outputs:

NAME      CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
master0   411m         13%       980Mi           17%

You can then install the VPA controllers. The official VPA repo has a bash script you can run to install the Kube resources for each controller. Alternatively, you can clone the Helm chart that we’ve made available as made of this blog post:

git clone [email protected]:livewyer-ops/verticalpodautoscaler.git

The only thing you may need to configure in the Helm chart is the location of your Prometheus instance. This is applied using the prometheus.url value in values.yaml:

prometheus:
 url: http://prometheus.monitoring.svc

Now you can install the Helm chart:

helm install --namespace kube-system 
  --name vpa .

When you install the helm chart, you will see pods for the three VPA controllers:

kubectl get pods -n kube-system

Which outputs:

NAME                                                 READY     STATUS    RESTARTS   AGE
autoscale-vpa-admissioncontroller-74d489d767-hnp9c   1/1       Running   0          26m
autoscale-vpa-recommender-5944df6c7f-4zht4           1/1       Running   0          26m
autoscale-vpa-updater-cd668b489-jqc6b                1/1       Running   0          26m
metrics-server-77fddcc57b-c2mzc                      1/1       Running   3          7d21h

There is a secret called vpa-tls-certswhich is mounted into the admission controller that contains a cert bundle.

In the install script, this bundle is generated using a shell script, but in the Helm chart we have the luxury of using the Sprig library and so this processed is scripted using functions:

{{- $altNames := list ( printf "%s.%s" (include "vpa.name" .) .Release.Namespace ) ( printf "%s.%s.svc" (include "vpa.name" .) .Release.Namespace ) -}}
{{- $ca := genCA "vpa-ca" 3650 -}}
{{- $server := genSignedCert ( include "vpa.name" . ) nil $altNames 3650 $ca -}}

The ca object generated by genCA will contain a CA certificate and key, so we can embed these in place in the Helm template:

caCert.pem: {{ b64enc $ca.Cert }}
caKey.pem: {{ b64enc $ca.Key }}

The server object will contain a certificate and key signed by the CA we just created and referenced, and so these we can also embed in the Helm template:

serverCert.pem: {{ b64enc $server.Cert }}
serverKey.pem: {{ b64enc $server.Key }}

We can also confirm that the admission controller is registered as a mutating webhook by checking its log:

I0617 13:36:44.847929       7 v1beta1_fetcher.go:84] Initial VPA v1beta1 synced successfully
I0617 13:36:44.851884       7 config.go:62] client-ca-file=-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
I0617 13:36:54.877379       7 config.go:131] Self registration as MutatingWebhook succeeded.

How do I use Vertical Pod Autoscaling?

A control panel with screens showing various objects

Once the three controllers are deployed, we are ready to start using VPA.

For this demonstration, we will use the modified php-apache container from the Horizontal Pod Autoscaler Walkthrough. If you’re not familiar with that example, this apache image is modified to create some additional computational load when its index.html is accessed.

First of all, deploy the php-apache image and a service for the php-apache pods. A VPA object is also deployed:

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
 name: php-apache-vpa
 namespace: dev
spec:
 targetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: php-apache
---

apiVersion: v1
kind: Service
metadata:
 labels:
   run: php-apache
 name: php-apache
 namespace: dev
spec:
 ports:
* port: 80
   protocol: TCP
   targetPort: 80
 selector:
   run: php-apache
 type: ClusterIP
---

apiVersion: apps/v1
kind: Deployment
metadata:
 generation: 1
 labels:
   run: php-apache
 name: php-apache
 namespace: dev
spec:
 progressDeadlineSeconds: 600
 replicas: 2
 revisionHistoryLimit: 2
 selector:
   matchLabels:
     run: php-apache
 strategy:
   rollingUpdate:
     maxSurge: 25%
     maxUnavailable: 25%
   type: RollingUpdate
 template:
   metadata:
     labels:
       run: php-apache
   spec:
     containers:
     - image: gcr.io/google_containers/hpa-example
       imagePullPolicy: Always
       name: php-apache
       resources:
         requests:
           cpu: 1m

This should generate a deployment, service, and VPA object for you:

verticalpodautoscaler.autoscaling.k8s.io/php-apache-vpa created
service/php-apache created
deployment.apps/php-apache created
kubectl get pods -n dev
NAME                              READY     STATUS    RESTARTS   AGE
php-apache-59759c4b98-rczhn       1/1       Running   0          37s
php-apache-59759c4b98-swvcv       1/1       Running   0          37s

In order to use VPA, it seems to be a requirement that a targeted workload run at least two replicas. In my testing, the updater was unable to evict in cases where there was only a single replica deployed, as indicated by this log:

I0617 13:39:48.819919       6 pods_eviction_restriction.go:209] too few replicas for ReplicaSet dev/php-apache-7bfdf49c69. Found 1 live pods

In order to demonstrate the usage of the VPA and make it more likely that pod eviction would take place, the pod for this test deployment is assigned a default CPU request value of 1 millicore:

kubectl get pod -n dev \
  -o=custom-columns=NAME:.metadata.name,PHASE:.status.phase,CPU-REQUEST:.spec.containers\[0\].resources.requests.cpu
NAME                              PHASE     CPU-REQUEST
php-apache-7bfdf49c69-gf2jd       Running   1m
php-apache-7bfdf49c69-pjb98       Running   1m

Once the workload is deployed on the cluster, the recommender will detect that there is a new VPA object detected and fetch the metrics available for the pods / containers targeted by this VPA via the metrics API. When this is completed, the recommender will update the status block of the VPA specification with its recommendations:

Kubectl describe vpa php-apache-vpa -n dev
Name:         php-apache-vpa
Namespace:    dev
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling.k8s.io/v1beta2","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"php-apache-vpa","namespace":"dev"},"spec...
API Version:  autoscaling.k8s.io/v1beta2
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2019-06-17T11:29:25Z
  Generation:          4
  Resource Version:    288396
  Self Link:           /apis/autoscaling.k8s.io/v1beta2/namespaces/dev/verticalpodautoscalers/php-apache-vpa
  UID:                 2954b910-90f3-11e9-aae4-080027655ff0
Spec:
  Resource Policy:
    Container Policies:
      Container Name:  *
      Max Allowed:
        Memory:  1Gi
  Target Ref:
    API Version:  extensions/v1beta1
    Kind:         Deployment
    Name:         php-apache
  Update Policy:
    Update Mode:  Recreate
Status:
  Conditions:
    Last Transition Time:  2019-06-17T11:30:04Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  php-apache
      Lower Bound:
        Cpu:     25m
        Memory:  262144k
      Target:
        Cpu:     25m
        Memory:  262144k
      Uncapped Target:
        Cpu:     25m
        Memory:  262144k
      Upper Bound:
        Cpu:     5291m
        Memory:  1Gi

Because we have sent no load to the php-apache replicas yet, the metrics reported for these pods is minimal, and so the recommended resources will be the minimum that can be set. For CPU this seems to be 25m, and 256MB for memory.

Now let’s send generate some load on php-apache - Open a new terminal window and run a Busybox container in the same namespace:

kubectl run -i --tty load-generator -n dev --image=busybox:1.27 /bin/sh

When you have an interactive shell with Busybox, run a looped wget that is targeted towards our php-apache pods. If successful, you should see OK! flooding the output log:

while true; do wget -q -O- http://php-apache.dev.svc.cluster.local; done

Wait a few minutes for the recommender to run again, and perform another kubectl describe to check the current recommendation. After a minute or so the load has increased to ~500m:

kubectl describe vpa php-apache -n dev
Name:         php-apache-vpa
Namespace:    dev
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling.k8s.io/v1beta2","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"php-apache-vpa","namespace":"dev"},"spec...
API Version:  autoscaling.k8s.io/v1beta2
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2019-06-17T15:55:53Z
  Generation:          4
  Resource Version:    329553
  Self Link:           /apis/autoscaling.k8s.io/v1beta2/namespaces/dev/verticalpodautoscalers/php-apache-vpa
  UID:                 62a76e03-9118-11e9-aae4-080027655ff0
Spec:
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         php-apache
Status:
  Conditions:
    Last Transition Time:  2019-06-17T15:56:08Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  php-apache
      Lower Bound:
        Cpu:     25m
        Memory:  262144k
      Target:
        Cpu:     587m
        Memory:  262144k
      Uncapped Target:
        Cpu:     587m
        Memory:  262144k
      Upper Bound:
        Cpu:     17662m
        Memory:  664103245
Events:          <none>

We can see from the logs for the recommender that when a recommendation is generated, this result is written to the VPA in the form of a patch request on the VPA object on the cluster:

I0617 15:45:08.894873       1 metrics_client.go:69] 30 podMetrics retrieved for all namespaces
I0617 15:45:08.897170       1 cluster_feeder.go:376] ClusterSpec fed with #60 ContainerUsageSamples for #30 containers
I0617 15:45:08.897298       1 recommender.go:183] ClusterState is tracking 30 PodStates and 1 VPAs
I0617 15:45:08.898760       1 request.go:897] Request Body: [{"op":"add","path":"/status","value":{"recommendation":{"containerRecommendations":[{"containerName":"php-apache","target":{"cpu":"627m","memory":"262144k"},"lowerBound":{"cpu":"187m","memory":"262144k"},"upperBound":{"cpu":"46399m","memory":"993517772"},"uncappedTarget":{"cpu":"627m","memory":"262144k"}}]},"conditions":[{"type":"RecommendationProvided","status":"True","lastTransitionTime":"2019-06-17T15:37:08Z"}]}}]
I0617 15:45:08.924479       1 round_trippers.go:405] PATCH https://10.96.0.1:443/apis/autoscaling.k8s.io/v1beta2/namespaces/dev/verticalpodautoscalers/php-apache-vpa 200 OK in 24 milliseconds

After a couple of minutes, you should see the pods get recreated. Check the CPU requests on these new pods, it should match the target recommendation:

kubectl get pod -n dev -o=custom-columns=NAME:.metadata.name,PHASE:.status.phase,CPU-REQUEST:.spec.containers\[0\].resources.requests.cpu
NAME                              PHASE     CPU-REQUEST
load-generator-66fb94857f-d4q2b   Running   <none>
php-apache-7bfdf49c69-p5klm       Running   587m
php-apache-7bfdf49c69-xp56w       Running   587m

If we check the logs for the updater, we can see that it has noticed there are new VPA-targetted pods, and that these pods have been evicted from the cluster:

I0617 13:36:48.818914       6 api.go:99] Initial VPA synced successfully
I0617 13:36:48.819475       6 reflector.go:131] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:196
I0617 13:36:48.819532       6 reflector.go:169] Listing and watching *v1.Pod from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:196
I0617 13:40:48.819590       6 update_priority_calculator.go:118] pod accepted for update php-apache-7bfdf49c69-p86nb with priority 2.62144001099e+11
I0617 13:40:48.819673       6 update_priority_calculator.go:118] pod accepted for update php-apache-7bfdf49c69-8b9m9 with priority 2.62144001099e+11
I0617 13:40:48.819692       6 updater.go:147] evicting pod php-apache-7bfdf49c69-p86nb
I0617 13:40:48.844718       6 updater.go:205] Event(v1.ObjectReference{Kind:"Pod", Namespace:"dev", Name:"php-apache-7bfdf49c69-p86nb", UID:"bbb1f53c-9103-11e9-aae4-080027655ff0", APIVersion:"v1", ResourceVersion:"306480", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I0617 13:41:48.819643       6 update_priority_calculator.go:118] pod accepted for update php-apache-7bfdf49c69-8b9m9 with priority 2.62144001099e+11
I0617 13:41:48.819682       6 update_priority_calculator.go:118] pod accepted for update php-apache-7bfdf49c69-j9zl4 with priority 2.62144001099e+11
I0617 13:41:48.819689       6 updater.go:147] evicting pod php-apache-7bfdf49c69-8b9m9
I0617 13:41:48.833147       6 updater.go:205] Event(v1.ObjectReference{Kind:"Pod", Namespace:"dev", Name:"php-apache-7bfdf49c69-8b9m9", UID:"618670f6-9105-11e9-aae4-080027655ff0", APIVersion:"v1", ResourceVersion:"308483", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
I0617 13:42:07.321571       6 reflector.go:357] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/v1beta1_fetcher.go:80: Watch close - *v1beta1.VerticalPodAutoscaler total 4 items received

Once the pods have been evicted and the replica set that manages these php-apache notices that these pods are missing, it will send a request to the kube-api to create these two pods. These requests will also be noticed by the mutating webhook admission controller. Because these two pods are managed by a VPA with recommendations set, and the VPA is set to allow the admission controller to mutate these pods, the admission controller will inject the recommended resources into the pod spec:

I0617 15:36:20.118442       6 server.go:62] Admitting pod {php-apache-59759c4b98-% php-apache-59759c4b98- dev    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[pod-template-hash:59759c4b98 run:php-apache] map[] [{apps/v1 ReplicaSet php-apache-59759c4b98 a76f33fd-9115-11e9-aae4-080027655ff0 0xc0004d8127 0xc0004d8128}] nil [] }
I0617 15:36:20.118961       6 recommendation_provider.go:108] updating requirements for pod php-apache-59759c4b98-%.
I0617 15:36:20.119104       6 recommendation_provider.go:97] Let's choose from 1 configs for pod dev/php-apache-59759c4b98-%
I0617 15:36:20.119156       6 recommendation_provider.go:68] no matching recommendation found for container php-apache
I0617 15:36:20.119224       6 server.go:259] Sending patches: [{add /spec/containers/0/resources {map[] map[]}} {add /spec/containers/0/resources/requests map[]} {add /metadata/annotations map[vpaUpdates:Pod resources updated by php-apache-vpa: container 0: ]}]

Admittedly I did have some trouble getting the admission controller to work. I initially suspected that the configmap containing the cert bundle had an invalid configuration, but when I checked the API server log, I noticed that the webhook wasn’t being called - I had a vpa-webhook service in the same namespace as the admission controller pod, but the selectors were misconfigured, so there was no endpoint:

W0617 14:06:49.864288       1 dispatcher.go:70] Failed calling webhook, failing open vpa.k8s.io: failed calling webhook "vpa.k8s.io": Post https://vpa-webhook.kube-system.svc:443/?timeout=30s: dial tcp 10.102.151.220:443: connect: connection refused
E0617 14:06:49.864322       1 dispatcher.go:71] failed calling webhook "vpa.k8s.io": Post https://vpa-webhook.kube-system.svc:443/?timeout=30s: dial tcp 10.102.151.220:443: connect: connection refused

Now check the pods altered by the admission controller to check what modifications have taken place. In addition to CPU and memory requests, there is also an annotation that has been added to indicate that this pod has been altered due to VPA. There as well that there are no resource limits set.

kubectl get pod php-apache-59759c4b98-7z86g -n dev -o yaml --export
apiVersion: v1
kind: Pod
metadata:
  annotations:
    vpaUpdates: 'Pod resources updated by php-apache-vpa: container 0: cpu request,
      memory request'
  creationTimestamp: null
  generateName: php-apache-59759c4b98-
  labels:
    pod-template-hash: 59759c4b98
    run: php-apache
  ownerReferences:
  - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: php-apache-59759c4b98
      uid: a76f33fd-9115-11e9-aae4-080027655ff0
    selfLink: /api/v1/namespaces/dev/pods/php-apache-59759c4b98-7z86g
spec:
  containers:
  - image: gcr.io/google_containers/hpa-example
      imagePullPolicy: Always
      name: php-apache
      resources:
        requests:
          cpu: 627m
          memory: 262144k

Interestingly, these changes are not propagated to the deployment that manages these pods:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: null
  generation: 1
  labels:
    run: php-apache
  name: php-apache
  selfLink: /apis/extensions/v1beta1/namespaces/dev/deployments/php-apache
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      run: php-apache
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        run: php-apache
    spec:
      containers:
      - image: gcr.io/google_containers/hpa-example
        imagePullPolicy: Always
        name: php-apache
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

So these request injections from the admission controller aren’t meant to replace the default requests that you set in your deployment. If you redeploy an application to the cluster multiple times a day, because of the way changes are applied to resources in Kube, it makes sense to not set any requests in the deployment and defer these to the VPA. This way, your current VPA-controller requests will be preserved when the live deployment on the cluster is patched with changes using kubectl apply.

VPA Completion Reward

And that brings us to the end of our experimentation with VPAs. If you’ve made it this far we think you deserve a pint!

But if you’re thirsty for more Kubernetes scaling content, check out our Using KEDA Autoscaling with Prometheus and Redis blog.