Welcome to the world of dynamic scaling and resource optimization! In this article, we dive into the fascinating realm of the Vertical Pod Autoscaler (VPA), the magic wand that ensures your Kubernetes workloads are neither underfed nor overallocated. But hold on tight, because we're not stopping there! We'll take things up a notch by introducing the dynamic duo of Prometheus metrics and Python scripting, unleashing a symphony of intelligence that tunes your application's resource allocation to perfection.
First
things first, let's meet our star player: the Vertical Pod Autoscaler. VPA is
like your personal fitness trainer for Kubernetes pods, tirelessly monitoring
resource usage and dynamically adjusting their limits and requests. It ensures
your pods always have just the right allocation of resources to flex their
muscles without wasting precious memory or CPU cycles. Say goodbye to manual
tweaking and hello to automated efficiency!
But
what if we told you that you can supercharge this already impressive autoscaler
with the power of Python scripting? Python, the beloved language of automation
enthusiasts, lends its prowess to the mix, empowering you to write custom
scripts that scrape invaluable metrics from Prometheus. Prometheus, the titan
of monitoring, captures a wealth of performance data, giving you insights into
the inner workings of your applications.
With
Python scripting and Prometheus metrics in hand, you become the maestro of
resource orchestration. You can extract vital information about your application's
resource consumption, utilization patterns, and even predict future needs.
Armed with this knowledge, you can dynamically fine-tune the Vertical Pod
Autoscaler's Custom Resources (CRs), ensuring that your pods never miss a beat
and always scale harmoniously with demand.
This
article focuses on a specific usage scenario involving an application that
cannot scale horizontally and each pod has exactly one replica. The pods of
this application undergo a daily process where they are stopped and started,
typically during the backup phase of the application. As a result, the resource
allocation for these pods is only performed at start-up time using the Vertical
Pod Autoscaler's "initial" mode.
In
this scenario, the limitation of horizontal scaling necessitates alternative
approaches to ensure optimal resource allocation. By utilizing the
"initial" mode of the Vertical Pod Autoscaler, resources can be
allocated solely during the start-up phase, based on the maximum requirements
of the application. This allows for more efficient resource provisioning, even
in situations where horizontal scaling is not feasible.
We
will explore the intricacies of this usage scenario, shedding light on how the
Vertical Pod Autoscaler's "initial" mode can be employed to address
resource allocation challenges in such environments. We also delve into the
role of Python scripting and Prometheus metrics, which play a crucial role in
fine-tuning the initial resource allocation for the pods. By leveraging these
tools, we can optimize the performance of the application, taking into account
its unique scaling constraints.
So,
get ready to explore the marriage of VPA, Prometheus metrics and Python
scripting—a triumphant trifecta that brings unparalleled intelligence and efficiency
to your Kubernetes deployments. It's time to scale smart, scale dynamically,
and unlock the full potential of your applications. Let's dive into the world
of adaptive resource management and embark on a journey towards peak
performance!
Part
I – Deploy VPA to the Cluster
The
Vertical Pod Autoscaler (VPA) is a component of Kubernetes that enables
automatic vertical scaling of pods based on their resource usage patterns.
While horizontal scaling involves adding or removing instances of a pod,
vertical scaling adjusts the resource allocations (CPU and memory) for
individual pods.
VPA
constantly monitors the resource usage of pods and collects data to determine
their optimal resource requirements. By analyzing metrics such as CPU
utilization, memory usage, and other relevant indicators, the VPA can
dynamically adjust the resource limits and requests of pods. While this is a
powerful feature brought by the VPA, note that we won’t directly leverage it in
our scenario.
This
functionality can operate in different modes, such as "initial,"
"update," and "recommender." These modes define when and
how the VPA adjusts resource allocations based on the workload characteristics.
The "initial" mode, specifically mentioned in the scenario, allows
for resource allocation only during the start-up phase of the pods, which is
beneficial in our usage scenarios.
In order to deploy the
VPA, the following prerequisites have to be fulfilled (more details
here
):
- kubectl should be
connected to the cluster you want to install VPA;
- the metrics server must
be deployed in your cluster. Read more about
Metrics
Server
- If you already have
another version of VPA installed in your cluster, you have to tear down the
existing installation first with:
./hack/vpa-down.sh
Then,
you can install it by doing the following:
1)
Clone the Repo:
git clone https://github.com/kubernetes/autoscaler.git
2) Navigate to the “vertical-pod-autoscaler” directory
whithin the repo, and issue the following:
./hack/vpa-up.sh
Once the above is done,
the VPA components should be running as 3 deployments in the “kube-system”
namespace.
To associate a specific
Kubernetes object, such as a Deployment or StatefulSet, with the Vertical Pod
Autoscaler and have its memory allocation calculated by the VPA, it is
necessary to create a VPA custom resource at the namespace level. This custom
resource serves as a link between the targeted object and the VPA, enabling the
VPA to monitor and adjust the memory allocation for that particular object. By
creating the VPA custom resource within the desired namespace, you establish
the connection that allows the VPA to dynamically optimize the memory usage of
the associated object based on its resource utilization patterns. This approach
facilitates fine-grained control over resource allocation, ensuring efficient
memory utilization and enhancing the overall performance of the targeted
Kubernetes object within the specified namespace.
Below you can find an
example of such a resource:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: example-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: "example-deployment"
updatePolicy:
updateMode: "Initial"
resourcePolicy:
containerPolicies:
- containerName: "example-container"
minAllowed:
cpu: "250m"
memory: "1000Mi"
maxAllowed:
cpu: "500m"
memory: "2000Mi"
The YAML example provided
represents a VerticalPodAutoscaler (VPA) Custom Resource (CR) configuration.
Let's break down the different elements:
- targetRef:
Identifies the target deployment that the VPA will apply to. In this example,
the target deployment is named "example-deployment" and belongs to
the "apps/v1" API version.
- updatePolicy:
Specifies the update mode for the VPA. In this case, the update mode is set to
"Initial," indicating that resource allocations will be adjusted only
during the start-up phase.
- resourcePolicy:
Defines the resource policies for the VPA.
- containerPolicies:
Specifies the container-specific policies within the VPA. In this example,
there is one container policy defined.
- containerName:
Identifies the name of the container to which the policy applies. In this case,
it is "example-container."
- minAllowed: Sets
the minimum allowed resource limits for the container. It defines the minimum
CPU and memory values that the VPA should not go below during resource
allocation adjustments.
- maxAllowed:
Specifies the maximum allowed resource limits for the container. It defines the
maximum CPU and memory values that the VPA should not exceed during resource
allocation adjustments.
By providing these
configurations in the VPA CR, you can define the target deployment, specify the
update mode, and set the minimum and maximum resource limits for a specific
container within the deployment. This allows the VPA to make informed decisions
and adjust the resource allocations accordingly for optimized performance.
While the above set-up
can probably cover most of the usage scenarios, we observe one notable
limitation. Once the “minAllowed” and “maxAllowed” values are set
in the VPA CR, they remain static. In other words, any changes in the memory or
CPU demands of the application, past the defined tresholds, will not be taken
into account automatically. As a result, the allocated resources will not
accurately reflect the current needs of the application, potentially leading to
inefficiencies or insufficient resources for optimal performance.
Part
II – Leverage Prometheus for treshold calculations
We
mentioned earlier that we won’t fully leverage the calculation power embedded
in VPA. This is mainly because our application’s pods have one replica each and
cannot be directly restarted by VPA during their lifetime cycle. On top of
that, the memory consumption can greatly vary from day to day, and “minAllowed”
and “maxAllowed” tresholds can become outdated.
Therefore,
in order to cover our specific scenario, we have decided to directly query
Prometheus for the maximum memory usage for the last 15 days.
Prometheus
is an open-source monitoring and alerting system designed for collecting and
analyzing time-series data. It is widely used in modern cloud-native
environments and provides a flexible and scalable solution for monitoring
applications and infrastructure.
You
can take as an example, the query below written in PromQL:
max(max_over_time(container_memory_working_set_bytes{namespace="your-namespace", container="your-container", image!="", container_name!="POD"}[15d])) /1024/1024
“container_memory_working_set_bytes”
is a metric in Kubernetes that represents the amount of memory (in bytes)
currently in use by a container. It specifically measures the working set
memory usage, which refers to the portion of memory actively used by a
container's processes.
The
working set memory includes both the resident memory (physical memory actively
used by the container) and any swap memory that may be in use. It represents
the actual memory footprint of the container at a given point in time.
In
its entirety, query retrieves the memory working set usage for a specific
container within a namespace, calculates the maximum value over a 15-day
period, and then converts it to megabytes. This will be in turn used by the
Python script to patch the “minAllowed” value in the CR.
But
why the maximum value over a 15-day period?
Well,
15 days happens to be the period for which we store the Prometheus data. Also,
for our current usage scenario, the maximum usage over the last 15 days has
proven to be enough memory allocation to accommodate the demands of our
application. Last but not least, the VPA recommendations are calculated using
decaying histogram of weighted samples from the metrics server, where the newer
samples are assigned higher weights; older samples are decaying and hence
affect less and less with reference to the recommendations. CPU is calculated
using the 90th percentile of all CPU samples, and memory is calculated using
the 90th percentile peak over the 8-day window. Source
here
.
Our
approach with the custom estimation led to less overall OOMkills due to
resource exhaustion.
For
“maxAllowed” feel free to use a static treshold that is reasonably high
to accommodate the most demanding usage patterns, or adjust it as well via
Python.
Part
III – Patching with Python
In
this part we will leverage the power of the
Kubernetes Python Client
and
Prometheus
client_python
libraries in order to reach our
allocation goals. Roughly, we have created a python script that runs daily
along with the other functionalities that perform backup and restarts the pods.
Below you’ll find some example Python code that performs the tasks described
above.
1)
Query Prometheus for the memory usage of your container. This function will return
the maximum memory usage of the container.
def calculate_prom_mem(self, promql, container):
retry_count = 0
container = “your-container”
#---Define Query for req mem of container
query = "max(max_over_time(container_memory_working_set_bytes {namespace="your_namespace", container=" +container +", image!=\"\", container_name!=\"POD\" }[15d])) /1024/1024"
while retry_count < 3:
try:
#---run the query and append it to a variable
query_result = requests.get("http://prometheus-server.monitoring.svc.cluster.local" + '/api/v1/query', params={'query': query})
result_json = query_result.json()['data']['result']
#---define array that stores the max request from all containers
result_arr = []
for result in result_json:
results2 = ' {metric}'.format(**result)
s = results2.replace("\'", "\"")
a = json.loads(s)
result_raw = '{value[1]}'.format(**result)
result_as_no = int(float(result_raw))
result_trunc_raw = math.trunc(result_as_no)
#append results to array
result_arr.append(result_trunc_raw)
max_usage = max(result_arr)
return max_usage
except Exception as ex:
print("UNABLE TO GET "+container+ " MAX MEM REQUEST. STOPPING THE SCRIPT. \n" % ( retry_count, str(ex) ) )
retry_count += 1
if retry_count == 3:
raise
2)
The function create_vpa is responsible for creating a VerticalPodAutoscaler
(VPA) custom resource in Kubernetes:
def create_vpa(self, subject, container, kind, min_allowed, max_allowed, update_mode ):
retry_count = 0
name = container+"-vpa"
body = '{ "apiVersion": "autoscaling.k8s.io/v1", "kind": "VerticalPodAutoscaler", "metadata": { "name": ' +name + ' }, "spec": { "resourcePolicy": { "containerPolicies": [ { "containerName": ' +container+ ', "controlledResources": [ "memory" ], "maxAllowed": { "memory": ' +max_allowed+ ' }, "minAllowed": { "memory": ' +min_allowed+ ' } } ] }, "targetRef": { "apiVersion": "apps/v1", "kind": ' +kind+', "name": '+subject+ ' }, "updatePolicy": { "updateMode": ' +update_mode+' } } }'
while retry_count < 3:
try:
#CREATE
kubernetes.client.CustomObjectsApi().create_namespaced_custom_object(group, version, namespace, plural, json.loads(body))
return
except Exception as ex:
print( "Failure number %s when creating VPA %s.\n" % ( retry_count, str(ex) ) )
retry_count += 1
if retry_count == 3:
raise
The
function takes several parameters: subject (the name of the target
object), container (the name of the container within the target object),
kind (the kind of the target object, e.g., Deployment), min_allowed
(the minimum allowed memory for the container), max_allowed (the maximum
allowed memory for the container), and update_mode (the update mode for
the VPA).
Inside
the function, there is a retry mechanism that allows for multiple attempts in
case of failures during the VPA creation process.
The
function generates the JSON body for the VPA custom resource using the provided
parameters. It sets the target object, container policies, resource policies
(specifying memory as the controlled resource), and the update policy. It then
attempts to create the VPA using the Kubernetes API, specifically the
create_namespaced_custom_object method from the
kubernetes.client.CustomObjectsApi().
If
the creation is successful, the function returns. Otherwise, it retries the
creation up to three times, printing an error message each time.
You
will define min_allowed as the output of the calculate_prom_mem function.
3)
Lastly, if the min_allowed calculated memory changes, you can recreate
the CR object with the updated value, using the function below.
def delete_vpa(self, container):
retry_count = 0
name = container+"-vpa"
while retry_count < 3:
try:
#DELETE
kubernetes.client.CustomObjectsApi().delete_namespaced_custom_object(group, version, namespace, plural, name)
return
except Exception as ex:
print( "Failure number %s when deleting VPA %s.\n" % ( retry_count, str(ex) ) )
retry_count += 1
if retry_count == 3:
raise
The
function takes a single parameter container, which represents the name of the
container associated with the VPA. It initializes a retry counter variable and
generates the name of the VPA based on the provided container name.
Inside
the function, there is a retry mechanism that allows for multiple attempts in
case of failures during the deletion process. The function calls the
delete_namespaced_custom_object method from the
kubernetes.client.CustomObjectsApi() to delete the VPA based on the provided
group, version, namespace, plural, and VPA name.
If
the deletion is successful, the function returns. Otherwise, it retries the
deletion up to three times, printing an error message each time.
Of
course, you’d have to set-up some functionality that checks weather the
Prometheus service is up before querying it, or check if the VPA object
actually exists before attempting to remove it. But these topics are beyond our
scope.
Summary:
In
this article, we explored the powerful combination of the Vertical Pod
Autoscaler (VPA) and Python scripting to optimize resource allocation in
Kubernetes. We leverage Python to query Prometheus, collect relevant metrics,
and use them to adjust the VPA Custom Resources. This allows us to dynamically
allocate resources based on our custom needs.
While
highlighting the benefits of this approach, we acknowledge the limitations of
its scope, and the fact that it’s application will be quite narrow.
Nevertheless, this serves as a prime example of the tremendous outcomes that
can be achieved when the right combination of technologies is implemented
effectively.
Oct, 2023 Yalos Team
This documentation is an expansion of the article "Configure cross-tenant customer-managed keys for an existing storage account" of Microsoft. It's purpose is to explain how to allow a "Disk Encryption Set" to consume a key hosted in a KeyVault from a diferent Tenant.
read