Skip to content

Improve docs for HorizontalPodAutoscaler #30711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,68 @@ reviewers:
- jszczepkowski
- justinsb
- directxman12
title: Horizontal Pod Autoscaler Walkthrough
title: HorizontalPodAutoscaler Walkthrough
content_type: task
weight: 100
min-kubernetes-server-version: 1.23
---

<!-- overview -->

Horizontal Pod Autoscaler automatically scales the number of Pods
in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
(or, with beta support, on some other, application-provided metrics).
A [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/)
(HPA for short)
automatically updates a workload resource (such as
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
aim of automatically scaling the workload to match demand.

This document walks you through an example of enabling Horizontal Pod Autoscaler for the php-apache server.
For more information on how Horizontal Pod Autoscaler behaves, see the
[Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/).
Horizontal scaling means that the response to increased load is to deploy more
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
This is different from _vertical_ scaling, which for Kubernetes would mean
assigning more resources (for example: memory or CPU) to the Pods that are already
running for the workload.

If the load decreases, and the number of Pods is above the configured minimum,
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
or other similar resource) to scale back down.
Comment on lines +29 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably end up referring to any object implementing the scale subresource, but I don't think we currently have any great place to link users to for more information on this, so best to leave it as is, and make improving the docs on the scale subresource a follow-up action.


This document walks you through an example of enabling HorizontalPodAutoscaler to
automatically manage scale for an example web app. This example workload is Apache
httpd running some PHP code.

## {{% heading "prerequisites" %}}

This example requires a running Kubernetes cluster and kubectl, version 1.2 or later.
[Metrics server](https://github.com/kubernetes-sigs/metrics-server) monitoring needs to be deployed
in the cluster to provide metrics through the [Metrics API](https://github.com/kubernetes/metrics).
Horizontal Pod Autoscaler uses this API to collect metrics. To learn how to deploy the metrics-server,
see the [metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} If you're running an older
release of Kubernetes, refer to the version of the documentation for that release (see
[available documentation versions](/docs/home/supported-doc-versions/).

To follow this walkthrough, you also need to use a cluster that has a
[Metrics Server](https://github.com/kubernetes-sigs/metrics-server#readme) deployed and configured.
The Kubernetes Metrics Server collects resource metrics from
the {{<glossary_tooltip term_id="kubelet" text="kubelets">}} in your cluster, and exposes those metrics
through the [Kubernetes API](/docs/concepts/overview/kubernetes-api/),
using an [APIService](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) to add
new kinds of resource that represent metric readings.

To specify multiple resource metrics for a Horizontal Pod Autoscaler, you must have a
Kubernetes cluster and kubectl at version 1.6 or later. To make use of custom metrics, your cluster
must be able to communicate with the API server providing the custom Metrics API.
Finally, to use metrics not related to any Kubernetes object you must have a
Kubernetes cluster at version 1.10 or later, and you must be able to communicate
with the API server that provides the external Metrics API.
See the [Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics) for more details.
Comment on lines -33 to -36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we move this anywhere else as this still applies given custom and external metrics continue to be entirely optional?

To learn how to deploy the Metrics Server, see the
[metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).

<!-- steps -->

## Run and expose php-apache server

To demonstrate Horizontal Pod Autoscaler we will use a custom docker image based on the php-apache image. The Dockerfile has the following content:
To demonstrate a HorizontalPodAutoscaler, you will first make a custom container image that uses
the `php-apache` image from Docker Hub as its starting point. The `Dockerfile` is ready-made for you,
and has the following content:

```dockerfile
FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php
```

It defines an index.php page which performs some CPU intensive computations:
This code defines a simple `index.php` page that performs some CPU intensive computations,
in order to simulate load in your cluster.

```php
<?php
Expand All @@ -59,12 +77,13 @@ It defines an index.php page which performs some CPU intensive computations:
?>
```

First, we will start a deployment running the image and expose it as a service
using the following configuration:
Once you have made that container image, start a Deployment that runs a container using the
image you made, and expose it as a {{< glossary_tooltip term_id="service">}}
using the following manifest:

{{< codenew file="application/php-apache.yaml" >}}

Run the following command:
To do so, run the following command:

```shell
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
Expand All @@ -75,16 +94,27 @@ deployment.apps/php-apache created
service/php-apache created
```

## Create Horizontal Pod Autoscaler
## Create the HorizontalPodAutoscaler {#create-horizontal-pod-autoscaler}

Now that the server is running, create the autoscaler using `kubectl`. There is
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
part of `kubectl`, that helps you do this.
Comment on lines +100 to +101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
part of `kubectl`, that helps you do this.
the [`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
part of `kubectl`, which helps you do this.


You will shortly run a command that creates a HorizontalPodAutoscaler that maintains
between 1 and 10 replicas of the Pods controlled by the php-apache Deployment that
you created in the first step of these instructions.

Roughly speaking, the HPA {{<glossary_tooltip text="controller" term_id="controller">}} will increase and decrease
the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
The Deployment then updates the ReplicaSet---this is part of how all Deployments work in Kubernetes---and

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean an em dash here, which I believe is typically used without spaces before and after it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant an en dash with spaces round it (but, I'm not very familiar with American-style punctuation). Could we tweak the punctuation in a follow up PR though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you like. We Americans do things a little different sometimes.

and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.

I added the "and" to the previous line so the em dash could butt up against it.


Now that the server is running, we will create the autoscaler using
[kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands#autoscale).
The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods
controlled by the php-apache deployment we created in the first step of these instructions.
Roughly speaking, HPA will increase and decrease the number of replicas
(via the deployment) to maintain an average CPU utilization across all Pods of 50%.
Since each pod requests 200 milli-cores by `kubectl run`, this means an average CPU usage of 100 milli-cores.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not via kubectl apply rather than kubectl run at this point?

See [here](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details on the algorithm.
See [Algorithm details](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
on the algorithm.


Create the HorizontalPodAutoscaler:

```shell
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Expand All @@ -94,47 +124,64 @@ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled
```

We may check the current status of autoscaler by running:
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:

```shell
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
# You can use "hpa" or "horizontalpodautoscaler"; either name will work.

kubectl get hpa
```

The output is similar to:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
```

Please note that the current CPU consumption is 0% as we are not sending any requests to the server
(the ``TARGET`` column shows the average across all the pods controlled by the corresponding deployment).
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,
and isn't usually a problem).

Please note that the current CPU consumption is 0% as there are no clients sending requests to the server
(the ``TARGET`` column shows the average across all the Pods controlled by the corresponding deployment).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an area we might need to expand on given the support for container-based scaling, though I don't think it should block this PR, especially given they're still in alpha.


## Increase load
## Increase the load {#increase-load}

Now, we will see how the autoscaler reacts to increased load.
We will start a container, and send an infinite loop of queries to the php-apache service (please run it in a different terminal):
Next, see how the autoscaler reacts to increased load.
To do this, you'll start a different Pod to act as a client. The container within the client Pod
runs in an infinite loop, sending queries to the php-apache service.

```shell
# Run this in a separate terminal
# so that the load generation continues and you can carry on with the rest of the steps
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
```

Within a minute or so, we should see the higher CPU load by executing:

Now run:
```shell
kubectl get hpa
# type Ctrl+C to end the watch when you're ready
kubectl get hpa php-apache --watch
```

Within a minute or so, you should see the higher CPU load; for example:

```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
```

and then, more replicas. For example:
```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
```

Here, CPU consumption has increased to 305% of the request.
As a result, the deployment was resized to 7 replicas:
As a result, the Deployment was resized to 7 replicas:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As a result, the Deployment was resized to 7 replicas:
As a result, the Deployment was resized to 7 replicas to attempt to match the target

?


```shell
kubectl get deployment php-apache
```

You should see the replica count matching the figure from the HorizontalPodAutoscaler
```
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 7/7 7 7 19m
Expand All @@ -146,24 +193,29 @@ of load is not controlled in any way it may happen that the final number of repl
will differ from this example.
{{< /note >}}

## Stop load
## Stop generating load {#stop-load}

We will finish our example by stopping the user load.
To finish the example, stop sending the load.

In the terminal where we created the container with `busybox` image, terminate
In the terminal where you created the Pod that runs a `busybox` image, terminate
the load generation by typing `<Ctrl> + C`.

Then we will verify the result state (after a minute or so):
Then verify the result state (after a minute or so):

```shell
kubectl get hpa
# type Ctrl+C to end the watch when you're ready
kubectl get hpa php-apache --watch
```

The output is similar to:

```
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
```

and the Deployment also shows that it has scaled down:

```shell
kubectl get deployment php-apache
```
Expand All @@ -173,11 +225,9 @@ NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 1/1 1 1 27m
```

Here CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
Once CPU utilization dropped to 0 (and any [downscale stabilization](/docs/reference/command-line-tools-reference/kube-controller-manager/) expired), the HPA automatically scaled the number of replicas back down to 1.

though this may be something to add to the line below instead.


{{< note >}}
Autoscaling the replicas may take a few minutes.
{{< /note >}}

<!-- discussion -->

Expand Down Expand Up @@ -444,7 +494,7 @@ Conditions:
Events:
```

For this HorizontalPodAutoscaler, we can see several conditions in a healthy state. The first,
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,
`AbleToScale`, indicates whether or not the HPA is able to fetch and update scales, as well as
whether or not any backoff-related conditions would prevent scaling. The second, `ScalingActive`,
indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and
Expand All @@ -454,7 +504,7 @@ was capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an
you may wish to raise or lower the minimum or maximum replica count constraints on your
HorizontalPodAutoscaler.

## Appendix: Quantities
## Quantities

All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using
a special whole-number notation known in Kubernetes as a
Expand All @@ -464,16 +514,16 @@ will return whole numbers without a suffix when possible, and will generally ret
quantities in milli-units otherwise. This means you might see your metric value fluctuate
between `1` and `1500m`, or `1` and `1.5` when written in decimal notation.

## Appendix: Other possible scenarios
## Other possible scenarios

### Creating the autoscaler declaratively

Instead of using `kubectl autoscale` command to create a HorizontalPodAutoscaler imperatively we
can use the following file to create it declaratively:
can use the following manifest to create it declaratively:

{{< codenew file="application/hpa/php-apache.yaml" >}}

We will create the autoscaler by executing the following command:
Then, create the autoscaler by executing the following command:

```shell
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
Expand Down
Loading