Empowering Vertical Pod Autoscaler with Python Scripting and Prometheus Metrics

Welcome to the world of dynamic scaling and resource optimization! In this article, we dive into the fascinating realm of the Vertical Pod Autoscaler (VPA), the magic wand that ensures your Kubernetes workloads are neither underfed nor overallocated. But hold on tight, because we're not stopping there! We'll take things up a notch by introducing the dynamic duo of Prometheus metrics and Python scripting, unleashing a symphony of intelligence that tunes your application's resource allocation to perfection.

First things first, let's meet our star player: the Vertical Pod Autoscaler. VPA is like your personal fitness trainer for Kubernetes pods, tirelessly monitoring resource usage and dynamically adjusting their limits and requests. It ensures your pods always have just the right allocation of resources to flex their muscles without wasting precious memory or CPU cycles. Say goodbye to manual tweaking and hello to automated efficiency!

But what if we told you that you can supercharge this already impressive autoscaler with the power of Python scripting? Python, the beloved language of automation enthusiasts, lends its prowess to the mix, empowering you to write custom scripts that scrape invaluable metrics from Prometheus. Prometheus, the titan of monitoring, captures a wealth of performance data, giving you insights into the inner workings of your applications.

With Python scripting and Prometheus metrics in hand, you become the maestro of resource orchestration. You can extract vital information about your application's resource consumption, utilization patterns, and even predict future needs. Armed with this knowledge, you can dynamically fine-tune the Vertical Pod Autoscaler's Custom Resources (CRs), ensuring that your pods never miss a beat and always scale harmoniously with demand.

This article focuses on a specific usage scenario involving an application that cannot scale horizontally and each pod has exactly one replica. The pods of this application undergo a daily process where they are stopped and started, typically during the backup phase of the application. As a result, the resource allocation for these pods is only performed at start-up time using the Vertical Pod Autoscaler's "initial" mode.

In this scenario, the limitation of horizontal scaling necessitates alternative approaches to ensure optimal resource allocation. By utilizing the "initial" mode of the Vertical Pod Autoscaler, resources can be allocated solely during the start-up phase, based on the maximum requirements of the application. This allows for more efficient resource provisioning, even in situations where horizontal scaling is not feasible.

We will explore the intricacies of this usage scenario, shedding light on how the Vertical Pod Autoscaler's "initial" mode can be employed to address resource allocation challenges in such environments. We also delve into the role of Python scripting and Prometheus metrics, which play a crucial role in fine-tuning the initial resource allocation for the pods. By leveraging these tools, we can optimize the performance of the application, taking into account its unique scaling constraints.

So, get ready to explore the marriage of VPA, Prometheus metrics and Python scripting—a triumphant trifecta that brings unparalleled intelligence and efficiency to your Kubernetes deployments. It's time to scale smart, scale dynamically, and unlock the full potential of your applications. Let's dive into the world of adaptive resource management and embark on a journey towards peak performance!


Part I – Deploy VPA to the Cluster

The Vertical Pod Autoscaler (VPA) is a component of Kubernetes that enables automatic vertical scaling of pods based on their resource usage patterns. While horizontal scaling involves adding or removing instances of a pod, vertical scaling adjusts the resource allocations (CPU and memory) for individual pods.

VPA constantly monitors the resource usage of pods and collects data to determine their optimal resource requirements. By analyzing metrics such as CPU utilization, memory usage, and other relevant indicators, the VPA can dynamically adjust the resource limits and requests of pods. While this is a powerful feature brought by the VPA, note that we won’t directly leverage it in our scenario.

This functionality can operate in different modes, such as "initial," "update," and "recommender." These modes define when and how the VPA adjusts resource allocations based on the workload characteristics. The "initial" mode, specifically mentioned in the scenario, allows for resource allocation only during the start-up phase of the pods, which is beneficial in our usage scenarios.

In order to deploy the VPA, the following prerequisites have to be fulfilled (more details here ):

- kubectl should be connected to the cluster you want to install VPA;

- the metrics server must be deployed in your cluster. Read more about Metrics Server

- If you already have another version of VPA installed in your cluster, you have to tear down the existing installation first with:


Then, you can install it by doing the following:

1) Clone the Repo:

git clone https://github.com/kubernetes/autoscaler.git

2) Navigate to the “vertical-pod-autoscaler” directory whithin the repo, and issue the following:


Once the above is done, the VPA components should be running as 3 deployments in the “kube-system” namespace.

To associate a specific Kubernetes object, such as a Deployment or StatefulSet, with the Vertical Pod Autoscaler and have its memory allocation calculated by the VPA, it is necessary to create a VPA custom resource at the namespace level. This custom resource serves as a link between the targeted object and the VPA, enabling the VPA to monitor and adjust the memory allocation for that particular object. By creating the VPA custom resource within the desired namespace, you establish the connection that allows the VPA to dynamically optimize the memory usage of the associated object based on its resource utilization patterns. This approach facilitates fine-grained control over resource allocation, ensuring efficient memory utilization and enhancing the overall performance of the targeted Kubernetes object within the specified namespace.


Below you can find an example of such a resource:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
  name: example-vpa
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: "example-deployment"
    updateMode: "Initial"
    - containerName: "example-container"
        cpu: "250m"
        memory: "1000Mi"
        cpu: "500m"
        memory: "2000Mi"


The YAML example provided represents a VerticalPodAutoscaler (VPA) Custom Resource (CR) configuration. Let's break down the different elements:

- targetRef: Identifies the target deployment that the VPA will apply to. In this example, the target deployment is named "example-deployment" and belongs to the "apps/v1" API version.

- updatePolicy: Specifies the update mode for the VPA. In this case, the update mode is set to "Initial," indicating that resource allocations will be adjusted only during the start-up phase.

- resourcePolicy: Defines the resource policies for the VPA.

- containerPolicies: Specifies the container-specific policies within the VPA. In this example, there is one container policy defined.

- containerName: Identifies the name of the container to which the policy applies. In this case, it is "example-container."

- minAllowed: Sets the minimum allowed resource limits for the container. It defines the minimum CPU and memory values that the VPA should not go below during resource allocation adjustments.

- maxAllowed: Specifies the maximum allowed resource limits for the container. It defines the maximum CPU and memory values that the VPA should not exceed during resource allocation adjustments.

By providing these configurations in the VPA CR, you can define the target deployment, specify the update mode, and set the minimum and maximum resource limits for a specific container within the deployment. This allows the VPA to make informed decisions and adjust the resource allocations accordingly for optimized performance.

While the above set-up can probably cover most of the usage scenarios, we observe one notable limitation. Once the “minAllowed” and “maxAllowed” values are set in the VPA CR, they remain static. In other words, any changes in the memory or CPU demands of the application, past the defined tresholds, will not be taken into account automatically. As a result, the allocated resources will not accurately reflect the current needs of the application, potentially leading to inefficiencies or insufficient resources for optimal performance.



Part II – Leverage Prometheus for treshold calculations

We mentioned earlier that we won’t fully leverage the calculation power embedded in VPA. This is mainly because our application’s pods have one replica each and cannot be directly restarted by VPA during their lifetime cycle. On top of that, the memory consumption can greatly vary from day to day, and “minAllowed” and “maxAllowed” tresholds can become outdated.

Therefore, in order to cover our specific scenario, we have decided to directly query Prometheus for the maximum memory usage for the last 15 days.

Prometheus is an open-source monitoring and alerting system designed for collecting and analyzing time-series data. It is widely used in modern cloud-native environments and provides a flexible and scalable solution for monitoring applications and infrastructure.

You can take as an example, the query below written in PromQL:

max(max_over_time(container_memory_working_set_bytes{namespace="your-namespace", container="your-container", image!="", container_name!="POD"}[15d])) /1024/1024


container_memory_working_set_bytes” is a metric in Kubernetes that represents the amount of memory (in bytes) currently in use by a container. It specifically measures the working set memory usage, which refers to the portion of memory actively used by a container's processes.

The working set memory includes both the resident memory (physical memory actively used by the container) and any swap memory that may be in use. It represents the actual memory footprint of the container at a given point in time.

In its entirety, query retrieves the memory working set usage for a specific container within a namespace, calculates the maximum value over a 15-day period, and then converts it to megabytes. This will be in turn used by the Python script to patch the “minAllowed” value in the CR.

But why the maximum value over a 15-day period?

Well, 15 days happens to be the period for which we store the Prometheus data. Also, for our current usage scenario, the maximum usage over the last 15 days has proven to be enough memory allocation to accommodate the demands of our application. Last but not least, the VPA recommendations are calculated using decaying histogram of weighted samples from the metrics server, where the newer samples are assigned higher weights; older samples are decaying and hence affect less and less with reference to the recommendations. CPU is calculated using the 90th percentile of all CPU samples, and memory is calculated using the 90th percentile peak over the 8-day window. Source here .

Our approach with the custom estimation led to less overall OOMkills due to resource exhaustion.

For “maxAllowed” feel free to use a static treshold that is reasonably high to accommodate the most demanding usage patterns, or adjust it as well via Python.



Part III – Patching with Python

In this part we will leverage the power of the Kubernetes Python Client and Prometheus client_python libraries in order to reach our allocation goals. Roughly, we have created a python script that runs daily along with the other functionalities that perform backup and restarts the pods. Below you’ll find some example Python code that performs the tasks described above.


1) Query Prometheus for the memory usage of your container. This function will return the maximum memory usage of the container.

      def calculate_prom_mem(self, promql, container):

        retry_count = 0

        container = “your-container”

        #---Define Query for req mem of container

        query = "max(max_over_time(container_memory_working_set_bytes {namespace="your_namespace", container=" +container +", image!=\"\", container_name!=\"POD\" }[15d])) /1024/1024"

        while retry_count < 3:


            #---run the query and append it to a variable

            query_result = requests.get("http://prometheus-server.monitoring.svc.cluster.local" + '/api/v1/query', params={'query': query})

            result_json = query_result.json()['data']['result']

            #---define array that stores the max request from all containers

            result_arr = []

            for result in result_json:

              results2 = ' {metric}'.format(**result)

              s = results2.replace("\'", "\"")

              a = json.loads(s)

              result_raw = '{value[1]}'.format(**result)

              result_as_no = int(float(result_raw))

              result_trunc_raw =  math.trunc(result_as_no)

              #append results to array


            max_usage = max(result_arr)

            return max_usage

          except Exception as ex:

            print("UNABLE TO GET "+container+ " MAX MEM REQUEST. STOPPING THE SCRIPT. \n" % ( retry_count, str(ex) ) )

            retry_count += 1

            if retry_count == 3:




2) The function create_vpa is responsible for creating a VerticalPodAutoscaler (VPA) custom resource in Kubernetes:

def create_vpa(self, subject, container, kind, min_allowed, max_allowed, update_mode ):

          retry_count = 0

          name = container+"-vpa"


          body = '{ "apiVersion": "autoscaling.k8s.io/v1", "kind": "VerticalPodAutoscaler", "metadata": { "name": ' +name + ' }, "spec": { "resourcePolicy": { "containerPolicies": [ { "containerName": ' +container+ ', "controlledResources": [ "memory" ], "maxAllowed": { "memory": ' +max_allowed+ ' }, "minAllowed": { "memory": ' +min_allowed+ ' } } ] }, "targetRef": { "apiVersion": "apps/v1", "kind": ' +kind+', "name": '+subject+ ' }, "updatePolicy": { "updateMode": ' +update_mode+' } } }'


          while retry_count < 3:



                  kubernetes.client.CustomObjectsApi().create_namespaced_custom_object(group, version, namespace, plural, json.loads(body))


              except Exception as ex:

                  print( "Failure number %s when creating VPA %s.\n" % ( retry_count, str(ex) ) )

                  retry_count += 1

                  if retry_count == 3:



The function takes several parameters: subject (the name of the target object), container (the name of the container within the target object), kind (the kind of the target object, e.g., Deployment), min_allowed (the minimum allowed memory for the container), max_allowed (the maximum allowed memory for the container), and update_mode (the update mode for the VPA).

Inside the function, there is a retry mechanism that allows for multiple attempts in case of failures during the VPA creation process.

The function generates the JSON body for the VPA custom resource using the provided parameters. It sets the target object, container policies, resource policies (specifying memory as the controlled resource), and the update policy. It then attempts to create the VPA using the Kubernetes API, specifically the create_namespaced_custom_object method from the kubernetes.client.CustomObjectsApi().

If the creation is successful, the function returns. Otherwise, it retries the creation up to three times, printing an error message each time.

You will define min_allowed as the output of the calculate_prom_mem function.


3) Lastly, if the min_allowed calculated memory changes, you can recreate the CR object with the updated value, using the function below.

      def delete_vpa(self, container):

          retry_count = 0

          name = container+"-vpa"


          while retry_count < 3:



                  kubernetes.client.CustomObjectsApi().delete_namespaced_custom_object(group, version, namespace, plural, name)


              except Exception as ex:

                  print( "Failure number %s when deleting VPA %s.\n" % ( retry_count, str(ex) ) )

                  retry_count += 1

                  if retry_count == 3:



The function takes a single parameter container, which represents the name of the container associated with the VPA. It initializes a retry counter variable and generates the name of the VPA based on the provided container name.

Inside the function, there is a retry mechanism that allows for multiple attempts in case of failures during the deletion process. The function calls the delete_namespaced_custom_object method from the kubernetes.client.CustomObjectsApi() to delete the VPA based on the provided group, version, namespace, plural, and VPA name.

If the deletion is successful, the function returns. Otherwise, it retries the deletion up to three times, printing an error message each time.

Of course, you’d have to set-up some functionality that checks weather the Prometheus service is up before querying it, or check if the VPA object actually exists before attempting to remove it. But these topics are beyond our scope.



In this article, we explored the powerful combination of the Vertical Pod Autoscaler (VPA) and Python scripting to optimize resource allocation in Kubernetes. We leverage Python to query Prometheus, collect relevant metrics, and use them to adjust the VPA Custom Resources. This allows us to dynamically allocate resources based on our custom needs.

While highlighting the benefits of this approach, we acknowledge the limitations of its scope, and the fact that it’s application will be quite narrow. Nevertheless, this serves as a prime example of the tremendous outcomes that can be achieved when the right combination of technologies is implemented effectively.

Guide on how to implement Cross-Tenant Disk Encryption with Azure Kubernetes Service

Oct, 2023 Yalos Team

This documentation is an expansion of the article "Configure cross-tenant customer-managed keys for an existing storage account" of Microsoft. It's purpose is to explain how to allow a "Disk Encryption Set" to consume a key hosted in a KeyVault from a diferent Tenant.


a consulting boutique that delivers software at scale, all around the world, with continuous operation.