Skip to content

Commit f2883eb

Browse files
authored
Jetstream Maxtext Module (#719)
* first commit * terraform fmt * Update README.md * prometheus adapter module in main * remove apply.sh * typo * terraform fmt * large cleanup and validation * moved fields and made module variables consistent with example variables * parameterized accelerator selectors * parameterize metrics scrape interval * fmt * fmt * load parameters parameterization and multiple hpa resources * fmt * parameterized model name * update readme and validators * changes to jetstream module deployment readme * terraform fmt * accelerator_memory_used_percentage -> memory_used_percentage * changes to READMEs * tweaks * metrics port optional * sample tfvars no longer includes autoscaling config * example autoscaling config * Update README.md * Update README.md * Update README.md * strengthen hpa config validation * More updates to readmes * tweak to readme * typo * missing kubectl apply * typos
1 parent 0d7231d commit f2883eb

File tree

26 files changed

+705
-810
lines changed

26 files changed

+705
-810
lines changed

modules/custom-metrics-stackdriver-adapter/README.md

+25-25
Original file line numberDiff line numberDiff line change
@@ -2,32 +2,9 @@
22

33
Adapted from https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
44

5-
## Usage
5+
## Installation via bash, gcloud, and kubectl
66

7-
To use this module, include it from your terraform main:
8-
9-
```
10-
module "custom_metrics_stackdriver_adapter" {
11-
source = "./path/to/custom-metrics-stackdriver-adapter"
12-
}
13-
```
14-
15-
For a workload identity enabled cluster, some additional configuration is
16-
needed:
17-
18-
```
19-
module "custom_metrics_stackdriver_adapter" {
20-
source = "./path/to/custom-metrics-stackdriver-adapter"
21-
workload_identity = {
22-
enabled = true
23-
project_id = "<PROJECT_ID>"
24-
}
25-
}
26-
```
27-
28-
## Bash equivalent of this module
29-
30-
Assure the following are set before running:
7+
Assure the following environment variables are set:
318
- PROJECT_ID: Your GKE project ID
329
- WORKLOAD_IDENTITY: Is workload identity federation enabled in the target cluster?
3310

@@ -63,3 +40,26 @@ kubectl apply -f apiservice_v1beta2.custom.metrics.k8s.io.yaml.tftpl
6340
kubectl apply -f apiservice_v1beta1.external.metrics.k8s.io.yaml.tftpl
6441
kubectl apply -f clusterrolebinding_external-metrics-reader.yaml.tftpl
6542
```
43+
44+
## Installation via Terraform
45+
46+
To use this as a module, include it from your terraform main:
47+
48+
```
49+
module "custom_metrics_stackdriver_adapter" {
50+
source = "./path/to/custom-metrics-stackdriver-adapter"
51+
}
52+
```
53+
54+
For a workload identity enabled cluster, some additional configuration is
55+
needed:
56+
57+
```
58+
module "custom_metrics_stackdriver_adapter" {
59+
source = "./path/to/custom-metrics-stackdriver-adapter"
60+
workload_identity = {
61+
enabled = true
62+
project_id = "<PROJECT_ID>"
63+
}
64+
}
65+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
This module deploys Jetstream Maxtext to a cluster. If `prometheus_port` is set then a [PodMontoring CR](https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-pod-monitoring) will be deployed for scraping metrics and exporting them to Google Cloud Monitoring. See the [deployment template](./templates/deployment.yaml.tftpl) to see which command line args are passed by default. For additional configuration please reference the [MaxText base config file](https://github.com/google/maxtext/blob/main/MaxText/configs/base.yml) for a list of configurable command line args and their explainations.
2+
3+
## Installation via bash and kubectl
4+
5+
Assure the following environment variables are set:
6+
- MODEL_NAME: The name of your LLM (as of the writing of this README valid options are "gemma-7b", "llama2-7b", "llama2-13b")
7+
- PARAMETERS_PATH: Where to find the parameters for your LLM (if using the checkpoint-converter it will be "gs:\/\/$BUCKET_NAME\/final\/unscanned\/gemma_7b-it\/0\/checkpoints\/0\/items" where $BUCKET_NAME is the same one used in the checkpoint-converter)
8+
- (optional) METRICS_PORT: Port to emit custom metrics on
9+
- (optional) TPU_TOPOLOGY: Topology of TPU chips used by jetstream (default: "2x4")
10+
- (optional) TPU_TYPE: Type of TPUs used (default: "tpu-v5-lite-podslice")
11+
- (optional) TPU_CHIP_COUNT: Number of TPU chips requested, can be obtained by algebraically evaluating TPU_TOPOLOGY
12+
- (optional) MAXENGINE_SERVER_IMAGE: Maxengine server container image
13+
- (optional) JETSTREAM_HTTP_SERVER_IMAGE: Jetstream HTTP server container image
14+
15+
```
16+
if [ -z "$MAXENGINE_SERVER_IMAGE" ]; then
17+
MAXENGINE_SERVER_IMAGE="us-docker.pkg.dev\/cloud-tpu-images\/inference\/maxengine-server:v0.2.2"
18+
fi
19+
20+
if [ -z "$JETSTREAM_HTTP_SERVER_IMAGE" ]; then
21+
JETSTREAM_HTTP_SERVER_IMAGE="us-docker.pkg.dev\/cloud-tpu-images\/inferenc\/jetstream-http:v0.2.2"
22+
fi
23+
24+
if [ -z "$TPU_TOPOLOGY" ]; then
25+
TPU_TOPOLOGY="2x4"
26+
fi
27+
28+
if [ -z "$TPU_TYPE" ]; then
29+
TPU_TYPE="tpu-v5-lite-podslice"
30+
fi
31+
32+
if [ -z "$TPU_CHIP_COUNT" ]; then
33+
TPU_CHIP_COUNT="8"
34+
fi
35+
36+
if [ -z "$MODEL_NAME" ]; then
37+
echo "Must provide MODEL_NAME in environment" 1>&2
38+
exit 2;
39+
fi
40+
41+
if [ -z "$PARAMETERS_PATH" ]; then
42+
echo "Must provide PARAMETERS_PATH in environment" 1>&2
43+
exit 2;
44+
fi
45+
46+
JETSTREAM_MANIFEST=$(mktemp)
47+
cat ./templates/deployment.yaml.tftpl >> "$JETSTREAM_MANIFEST"
48+
49+
PODMONITORING_MANIFEST=$(mktemp)
50+
cat ./templates/podmonitoring.yaml.tftpl >> "$PODMONITORING_MANIFEST"
51+
52+
if [ "$METRICS_PORT" != "" ]; then
53+
cat $PODMONITORING_MANIFEST | sed "s/\${metrics_port}/$METRICS_PORT/g" >> "$PODMONITORING_MANIFEST"
54+
cat $JETSTREAM_MANIFEST | sed "s/\${metrics_port_arg}/prometheus_port=$METRICS_PORT/g" >> "$JETSTREAM_MANIFEST"
55+
56+
cat $PODMONITORING_MANIFEST | kubectl apply -f -
57+
else
58+
cat $JETSTREAM_MANIFEST | sed "s/\${metrics_port_arg}//g" >> "$JETSTREAM_MANIFEST"
59+
fi
60+
61+
cat $JETSTREAM_MANIFEST \
62+
| sed "s/\${tpu-type}/$TPU_TYPE/g" \
63+
| sed "s/\${tpu-topology}/$TPU_TOPOLOGY/g" \
64+
| sed "s/\${tpu-chip-count}/$TPU_CHIP_COUNT/g" \
65+
| sed "s/\${maxengine_server_image}/$MAXENGINE_SERVER_IMAGE/g" \
66+
| sed "s/\${jetstream_http_server_image}/$JETSTREAM_HTTP_SERVER_IMAGE/g" \
67+
| sed "s/\${model_name}/$MODEL_NAME/g" \
68+
| sed "s/\${load_parameters_path_arg}/$PARAMETERS_PATH/g" >> "$JETSTREAM_MANIFEST"
69+
70+
cat $JETSTREAM_MANIFEST | kubectl apply -f -
71+
```
72+
## (Optional) Autoscaling Components
73+
74+
Applying the following resources to your cluster will enable you to scale the number of Jetstream server pods with custom or system metrics:
75+
- Metrics Adapter (either [Prometheus-adapter](https://github.com/kubernetes-sigs/prometheus-adapter)(recommended) or [CMSA](https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter)): For making metrics from the Google Cloud Monitoring API visible to resources within the cluster.
76+
- [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/): For reading metrics and setting the maxengine-servers deployments replica count accordingly.
77+
78+
### Metrics Adapter
79+
80+
#### Custom Metrics Stackdriver Adapter
81+
82+
Follow the [Custom-metrics-stackdriver-adapter README](https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main/modules/custom-metrics-stackdriver-adapter/README.md) to install without terraform.
83+
84+
Once installed the values of the following metrics can be used as averageValues in a HorizontalPodAutoscaler (HPA):
85+
- Jetstream metrics (i.e. any metric prefixed with "jetstream_")
86+
- "memory_used" (the current sum of memory usage across all accelerators used by a node in bytes, note this value can be extremely large since the unit of measurement is bytes)
87+
88+
#### Prometheus Adapter
89+
90+
Follow the [Prometheus-adapter README](https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main/modules/prometheus-adapter/README.md) to install without terraform. A few notes:
91+
92+
This module uses the the prometheus-community/prometheus-adapter Helm chart as part of the install process, it has a values file that requires "CLUSTER_NAME" to be replaced with your cluster name in order to properly filter metrics. This is a consequence of differing cluster name schemes between GKE and standard k8s clusters. Instructions for each are as follows for if the cluster name isnt already known. For GKE clusters, Remove any characters prior to and including the last underscore with `kubectl config current-context | awk -F'_' ' { print $NF }'` to get the cluster name. For other clusters, The cluster name is simply: `kubectl config current-context`.
93+
94+
Instructions to set the PROMETHEUS_HELM_VALUES_FILE env var as follows:
95+
96+
```
97+
PROMETHEUS_HELM_VALUES_FILE=$(mktemp)
98+
sed "s/\${cluster_name}/$CLUSTER_NAME/g" ../templates/values.yaml.tftpl >> "$PROMETHEUS_HELM_VALUES_FILE"
99+
```
100+
101+
Once installed the values of the following metrics can be used as averageValues in a HorizontalPodAutoscaler (HPA):
102+
- Jetstream metrics (i.e. any metric prefixed with "jetstream_")
103+
- "memory_used_percentage" (the percentage of total accelerator memory used across all accelerators used by a node)
104+
105+
### Horizontal Pod Autoscalers
106+
107+
The following should be run for each HPA, assure the following are set before running:
108+
- ADAPTER: The adapter currently in cluster, can be either 'custom-metrics-stackdriver-adapter' or 'prometheus-adapter'
109+
- MIN_REPLICAS: Lower bound for number of jetstream replicas
110+
- MAX_REPLICAS: Upper bound for number of jetstream replicas
111+
- METRIC: The metrics whose value will be compared against the average value, can be any metric listed above
112+
- AVERAGE_VALUE: Average value to be used for calculating replica cound, see [docs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
113+
114+
```
115+
if [ -z "$ADAPTER" ]; then
116+
echo "Must provide ADAPTER in environment" 1>&2
117+
exit 2;
118+
fi
119+
120+
if [ -z "$MIN_REPLICAS" ]; then
121+
echo "Must provide MIN_REPLICAS in environment" 1>&2
122+
exit 2;
123+
fi
124+
125+
if [ -z "$MAX_REPLICAS" ]; then
126+
echo "Must provide MAX_REPLICAS in environment" 1>&2
127+
exit 2;
128+
fi
129+
130+
if [[ $METRIC =~ ^jetstream_.* ]]; then
131+
METRICS_SOURCE_TYPE="Pods"
132+
METRICS_SOURCE="pods"
133+
elif [ $METRIC == memory_used ] && [ "$ADAPTER" == custom-metrics-stackdriver-adapter ]; then
134+
METRICS_SOURCE_TYPE="External"
135+
METRICS_SOURCE="external"
136+
METRIC="kubernetes.io|node|accelerator|${METRIC}"
137+
elif [ $METRIC == memory_used_percentage ] && [ "$ADAPTER" == prometheus-adapter ]; then
138+
METRICS_SOURCE_TYPE="External"
139+
METRICS_SOURCE="external"
140+
else
141+
echo "Must provide valid METRIC for ${ADAPTER} in environment" 1>&2
142+
exit 2;
143+
fi
144+
145+
if [ -z "$AVERAGE_VALUE" ]; then
146+
echo "Must provide AVERAGE_VALUE in environment" 1>&2
147+
exit 2;
148+
fi
149+
150+
echo "apiVersion: autoscaling/v2
151+
kind: HorizontalPodAutoscaler
152+
metadata:
153+
name: jetstream-hpa-$(uuidgen)
154+
namespace: default
155+
spec:
156+
scaleTargetRef:
157+
apiVersion: apps/v1
158+
kind: Deployment
159+
name: maxengine-server
160+
minReplicas: ${MIN_REPLICAS}
161+
maxReplicas: ${MAX_REPLICAS}
162+
metrics:
163+
- type: ${METRICS_SOURCE_TYPE}
164+
${METRICS_SOURCE}:
165+
metric:
166+
name: ${METRIC}
167+
target:
168+
type: AverageValue
169+
averageValue: ${AVERAGE_VALUE}
170+
" | kubectl apply -f -
171+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
/**
2+
* Copyright 2024 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
locals {
18+
deployment_template = "${path.module}/templates/deployment.yaml.tftpl"
19+
service_template = "${path.module}/templates/service.yaml.tftpl"
20+
podmonitoring_template = "${path.module}/templates/podmonitoring.yaml.tftpl"
21+
cmsa_jetstream_hpa_template = "${path.module}/templates/custom-metrics-stackdriver-adapter/hpa.jetstream.yaml.tftpl"
22+
prometheus_jetstream_hpa_template = "${path.module}/templates/prometheus-adapter/hpa.jetstream.yaml.tftpl"
23+
}
24+
25+
resource "kubernetes_manifest" "jetstream-deployment" {
26+
count = 1
27+
manifest = yamldecode(templatefile(local.deployment_template, {
28+
maxengine_server_image = var.maxengine_deployment_settings.maxengine_server_image
29+
jetstream_http_server_image = var.maxengine_deployment_settings.jetstream_http_server_image
30+
model_name = var.maxengine_deployment_settings.model_name
31+
load_parameters_path_arg = var.maxengine_deployment_settings.parameters_path
32+
metrics_port_arg = var.maxengine_deployment_settings.metrics_port != null ? format("prometheus_port=%d", var.maxengine_deployment_settings.metrics_port) : "",
33+
tpu-topology = var.maxengine_deployment_settings.accelerator_selectors.topology
34+
tpu-type = var.maxengine_deployment_settings.accelerator_selectors.accelerator
35+
tpu-chip-count = var.maxengine_deployment_settings.accelerator_selectors.chip_count
36+
}))
37+
}
38+
39+
resource "kubernetes_manifest" "jetstream-service" {
40+
count = 1
41+
manifest = yamldecode(file(local.service_template))
42+
}
43+
44+
resource "kubernetes_manifest" "jetstream-podmonitoring" {
45+
count = var.maxengine_deployment_settings.metrics_port != null ? 1 : 0
46+
manifest = yamldecode(templatefile(local.podmonitoring_template, {
47+
metrics_port = var.maxengine_deployment_settings.metrics_port != null ? var.maxengine_deployment_settings.metrics_port : "",
48+
metrics_scrape_interval = var.maxengine_deployment_settings.metrics_scrape_interval
49+
}))
50+
}
51+
52+
module "custom_metrics_stackdriver_adapter" {
53+
count = var.hpa_config.metrics_adapter == "custom-metrics-stackdriver-adapter" ? 1 : 0
54+
source = "../custom-metrics-stackdriver-adapter"
55+
workload_identity = {
56+
enabled = true
57+
project_id = var.project_id
58+
}
59+
}
60+
61+
module "prometheus_adapter" {
62+
count = var.hpa_config.metrics_adapter == "prometheus-adapter" ? 1 : 0
63+
source = "../prometheus-adapter"
64+
credentials_config = {
65+
kubeconfig = {
66+
path : "~/.kube/config"
67+
}
68+
}
69+
project_id = var.project_id
70+
config_file = templatefile("${path.module}/templates/prometheus-adapter/values.yaml.tftpl", {
71+
cluster_name = var.cluster_name
72+
})
73+
}
74+
75+
resource "kubernetes_manifest" "prometheus_adapter_hpa_custom_metric" {
76+
for_each = {
77+
for index, rule in var.hpa_config.rules :
78+
index => {
79+
index = index
80+
target_query = rule.target_query
81+
average_value_target = rule.average_value_target
82+
}
83+
if var.maxengine_deployment_settings.custom_metrics_enabled && var.hpa_config.metrics_adapter == "prometheus-adapter"
84+
}
85+
86+
manifest = yamldecode(templatefile(local.prometheus_jetstream_hpa_template, {
87+
index = each.value.index
88+
hpa_type = try(each.value.target_query, "")
89+
hpa_averagevalue_target = try(each.value.average_value_target, 1)
90+
hpa_min_replicas = var.hpa_config.min_replicas
91+
hpa_max_replicas = var.hpa_config.max_replicas
92+
}))
93+
}
94+
95+
resource "kubernetes_manifest" "cmsa_hpa_custom_metric" {
96+
for_each = {
97+
for index, rule in var.hpa_config.rules :
98+
index => {
99+
index = index
100+
target_query = rule.target_query
101+
average_value_target = rule.average_value_target
102+
}
103+
if var.maxengine_deployment_settings.custom_metrics_enabled && var.hpa_config.metrics_adapter == "custom-metrics-stackdriver-adapter"
104+
}
105+
106+
manifest = yamldecode(templatefile(local.cmsa_jetstream_hpa_template, {
107+
index = each.value.index
108+
hpa_type = try(each.value.target_query, "")
109+
hpa_averagevalue_target = try(each.value.average_value_target, 1)
110+
hpa_min_replicas = var.hpa_config.min_replicas
111+
hpa_max_replicas = var.hpa_config.max_replicas
112+
}))
113+
}
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
apiVersion: autoscaling/v2
22
kind: HorizontalPodAutoscaler
33
metadata:
4-
name: jetstream-hpa
5-
namespace: ${namespace}
4+
name: jetstream-hpa-${index}
5+
namespace: default
66
spec:
77
scaleTargetRef:
88
apiVersion: apps/v1
@@ -20,12 +20,11 @@ spec:
2020
type: AverageValue
2121
averageValue: ${hpa_averagevalue_target}
2222
%{ else }
23-
- type: Pods
24-
pods:
23+
- type: External
24+
external:
2525
metric:
26-
name: kubernetes.io|node|accelerator|memory_used
26+
name: kubernetes.io|node|accelerator|${hpa_type}
2727
target:
2828
type: AverageValue
2929
averageValue: ${hpa_averagevalue_target}
30-
%{ endif }
31-
30+
%{ endif }

0 commit comments

Comments
 (0)