|
| 1 | +This module deploys Jetstream Maxtext to a cluster. If `prometheus_port` is set then a [PodMontoring CR](https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-pod-monitoring) will be deployed for scraping metrics and exporting them to Google Cloud Monitoring. See the [deployment template](./templates/deployment.yaml.tftpl) to see which command line args are passed by default. For additional configuration please reference the [MaxText base config file](https://github.com/google/maxtext/blob/main/MaxText/configs/base.yml) for a list of configurable command line args and their explainations. |
| 2 | + |
| 3 | +## Installation via bash and kubectl |
| 4 | + |
| 5 | +Assure the following environment variables are set: |
| 6 | + - MODEL_NAME: The name of your LLM (as of the writing of this README valid options are "gemma-7b", "llama2-7b", "llama2-13b") |
| 7 | + - PARAMETERS_PATH: Where to find the parameters for your LLM (if using the checkpoint-converter it will be "gs:\/\/$BUCKET_NAME\/final\/unscanned\/gemma_7b-it\/0\/checkpoints\/0\/items" where $BUCKET_NAME is the same one used in the checkpoint-converter) |
| 8 | + - (optional) METRICS_PORT: Port to emit custom metrics on |
| 9 | + - (optional) TPU_TOPOLOGY: Topology of TPU chips used by jetstream (default: "2x4") |
| 10 | + - (optional) TPU_TYPE: Type of TPUs used (default: "tpu-v5-lite-podslice") |
| 11 | + - (optional) TPU_CHIP_COUNT: Number of TPU chips requested, can be obtained by algebraically evaluating TPU_TOPOLOGY |
| 12 | + - (optional) MAXENGINE_SERVER_IMAGE: Maxengine server container image |
| 13 | + - (optional) JETSTREAM_HTTP_SERVER_IMAGE: Jetstream HTTP server container image |
| 14 | + |
| 15 | +``` |
| 16 | +if [ -z "$MAXENGINE_SERVER_IMAGE" ]; then |
| 17 | + MAXENGINE_SERVER_IMAGE="us-docker.pkg.dev\/cloud-tpu-images\/inference\/maxengine-server:v0.2.2" |
| 18 | +fi |
| 19 | +
|
| 20 | +if [ -z "$JETSTREAM_HTTP_SERVER_IMAGE" ]; then |
| 21 | + JETSTREAM_HTTP_SERVER_IMAGE="us-docker.pkg.dev\/cloud-tpu-images\/inferenc\/jetstream-http:v0.2.2" |
| 22 | +fi |
| 23 | +
|
| 24 | +if [ -z "$TPU_TOPOLOGY" ]; then |
| 25 | + TPU_TOPOLOGY="2x4" |
| 26 | +fi |
| 27 | +
|
| 28 | +if [ -z "$TPU_TYPE" ]; then |
| 29 | + TPU_TYPE="tpu-v5-lite-podslice" |
| 30 | +fi |
| 31 | +
|
| 32 | +if [ -z "$TPU_CHIP_COUNT" ]; then |
| 33 | + TPU_CHIP_COUNT="8" |
| 34 | +fi |
| 35 | +
|
| 36 | +if [ -z "$MODEL_NAME" ]; then |
| 37 | + echo "Must provide MODEL_NAME in environment" 1>&2 |
| 38 | + exit 2; |
| 39 | +fi |
| 40 | +
|
| 41 | +if [ -z "$PARAMETERS_PATH" ]; then |
| 42 | + echo "Must provide PARAMETERS_PATH in environment" 1>&2 |
| 43 | + exit 2; |
| 44 | +fi |
| 45 | +
|
| 46 | +JETSTREAM_MANIFEST=$(mktemp) |
| 47 | +cat ./templates/deployment.yaml.tftpl >> "$JETSTREAM_MANIFEST" |
| 48 | +
|
| 49 | +PODMONITORING_MANIFEST=$(mktemp) |
| 50 | +cat ./templates/podmonitoring.yaml.tftpl >> "$PODMONITORING_MANIFEST" |
| 51 | +
|
| 52 | +if [ "$METRICS_PORT" != "" ]; then |
| 53 | + cat $PODMONITORING_MANIFEST | sed "s/\${metrics_port}/$METRICS_PORT/g" >> "$PODMONITORING_MANIFEST" |
| 54 | + cat $JETSTREAM_MANIFEST | sed "s/\${metrics_port_arg}/prometheus_port=$METRICS_PORT/g" >> "$JETSTREAM_MANIFEST" |
| 55 | + |
| 56 | + cat $PODMONITORING_MANIFEST | kubectl apply -f - |
| 57 | +else |
| 58 | + cat $JETSTREAM_MANIFEST | sed "s/\${metrics_port_arg}//g" >> "$JETSTREAM_MANIFEST" |
| 59 | +fi |
| 60 | +
|
| 61 | +cat $JETSTREAM_MANIFEST \ |
| 62 | +| sed "s/\${tpu-type}/$TPU_TYPE/g" \ |
| 63 | +| sed "s/\${tpu-topology}/$TPU_TOPOLOGY/g" \ |
| 64 | +| sed "s/\${tpu-chip-count}/$TPU_CHIP_COUNT/g" \ |
| 65 | +| sed "s/\${maxengine_server_image}/$MAXENGINE_SERVER_IMAGE/g" \ |
| 66 | +| sed "s/\${jetstream_http_server_image}/$JETSTREAM_HTTP_SERVER_IMAGE/g" \ |
| 67 | +| sed "s/\${model_name}/$MODEL_NAME/g" \ |
| 68 | +| sed "s/\${load_parameters_path_arg}/$PARAMETERS_PATH/g" >> "$JETSTREAM_MANIFEST" |
| 69 | +
|
| 70 | +cat $JETSTREAM_MANIFEST | kubectl apply -f - |
| 71 | +``` |
| 72 | +## (Optional) Autoscaling Components |
| 73 | + |
| 74 | +Applying the following resources to your cluster will enable you to scale the number of Jetstream server pods with custom or system metrics: |
| 75 | + - Metrics Adapter (either [Prometheus-adapter](https://github.com/kubernetes-sigs/prometheus-adapter)(recommended) or [CMSA](https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter)): For making metrics from the Google Cloud Monitoring API visible to resources within the cluster. |
| 76 | + - [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/): For reading metrics and setting the maxengine-servers deployments replica count accordingly. |
| 77 | + |
| 78 | +### Metrics Adapter |
| 79 | + |
| 80 | +#### Custom Metrics Stackdriver Adapter |
| 81 | + |
| 82 | +Follow the [Custom-metrics-stackdriver-adapter README](https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main/modules/custom-metrics-stackdriver-adapter/README.md) to install without terraform. |
| 83 | + |
| 84 | +Once installed the values of the following metrics can be used as averageValues in a HorizontalPodAutoscaler (HPA): |
| 85 | + - Jetstream metrics (i.e. any metric prefixed with "jetstream_") |
| 86 | + - "memory_used" (the current sum of memory usage across all accelerators used by a node in bytes, note this value can be extremely large since the unit of measurement is bytes) |
| 87 | + |
| 88 | +#### Prometheus Adapter |
| 89 | + |
| 90 | +Follow the [Prometheus-adapter README](https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main/modules/prometheus-adapter/README.md) to install without terraform. A few notes: |
| 91 | + |
| 92 | +This module uses the the prometheus-community/prometheus-adapter Helm chart as part of the install process, it has a values file that requires "CLUSTER_NAME" to be replaced with your cluster name in order to properly filter metrics. This is a consequence of differing cluster name schemes between GKE and standard k8s clusters. Instructions for each are as follows for if the cluster name isnt already known. For GKE clusters, Remove any characters prior to and including the last underscore with `kubectl config current-context | awk -F'_' ' { print $NF }'` to get the cluster name. For other clusters, The cluster name is simply: `kubectl config current-context`. |
| 93 | + |
| 94 | +Instructions to set the PROMETHEUS_HELM_VALUES_FILE env var as follows: |
| 95 | + |
| 96 | +``` |
| 97 | +PROMETHEUS_HELM_VALUES_FILE=$(mktemp) |
| 98 | +sed "s/\${cluster_name}/$CLUSTER_NAME/g" ../templates/values.yaml.tftpl >> "$PROMETHEUS_HELM_VALUES_FILE" |
| 99 | +``` |
| 100 | + |
| 101 | +Once installed the values of the following metrics can be used as averageValues in a HorizontalPodAutoscaler (HPA): |
| 102 | + - Jetstream metrics (i.e. any metric prefixed with "jetstream_") |
| 103 | + - "memory_used_percentage" (the percentage of total accelerator memory used across all accelerators used by a node) |
| 104 | + |
| 105 | +### Horizontal Pod Autoscalers |
| 106 | + |
| 107 | +The following should be run for each HPA, assure the following are set before running: |
| 108 | + - ADAPTER: The adapter currently in cluster, can be either 'custom-metrics-stackdriver-adapter' or 'prometheus-adapter' |
| 109 | + - MIN_REPLICAS: Lower bound for number of jetstream replicas |
| 110 | + - MAX_REPLICAS: Upper bound for number of jetstream replicas |
| 111 | + - METRIC: The metrics whose value will be compared against the average value, can be any metric listed above |
| 112 | + - AVERAGE_VALUE: Average value to be used for calculating replica cound, see [docs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details |
| 113 | + |
| 114 | + ``` |
| 115 | +if [ -z "$ADAPTER" ]; then |
| 116 | + echo "Must provide ADAPTER in environment" 1>&2 |
| 117 | + exit 2; |
| 118 | +fi |
| 119 | +
|
| 120 | +if [ -z "$MIN_REPLICAS" ]; then |
| 121 | + echo "Must provide MIN_REPLICAS in environment" 1>&2 |
| 122 | + exit 2; |
| 123 | +fi |
| 124 | +
|
| 125 | +if [ -z "$MAX_REPLICAS" ]; then |
| 126 | + echo "Must provide MAX_REPLICAS in environment" 1>&2 |
| 127 | + exit 2; |
| 128 | +fi |
| 129 | +
|
| 130 | +if [[ $METRIC =~ ^jetstream_.* ]]; then |
| 131 | + METRICS_SOURCE_TYPE="Pods" |
| 132 | + METRICS_SOURCE="pods" |
| 133 | +elif [ $METRIC == memory_used ] && [ "$ADAPTER" == custom-metrics-stackdriver-adapter ]; then |
| 134 | + METRICS_SOURCE_TYPE="External" |
| 135 | + METRICS_SOURCE="external" |
| 136 | + METRIC="kubernetes.io|node|accelerator|${METRIC}" |
| 137 | +elif [ $METRIC == memory_used_percentage ] && [ "$ADAPTER" == prometheus-adapter ]; then |
| 138 | + METRICS_SOURCE_TYPE="External" |
| 139 | + METRICS_SOURCE="external" |
| 140 | +else |
| 141 | + echo "Must provide valid METRIC for ${ADAPTER} in environment" 1>&2 |
| 142 | + exit 2; |
| 143 | +fi |
| 144 | +
|
| 145 | +if [ -z "$AVERAGE_VALUE" ]; then |
| 146 | + echo "Must provide AVERAGE_VALUE in environment" 1>&2 |
| 147 | + exit 2; |
| 148 | +fi |
| 149 | +
|
| 150 | +echo "apiVersion: autoscaling/v2 |
| 151 | +kind: HorizontalPodAutoscaler |
| 152 | +metadata: |
| 153 | + name: jetstream-hpa-$(uuidgen) |
| 154 | + namespace: default |
| 155 | +spec: |
| 156 | + scaleTargetRef: |
| 157 | + apiVersion: apps/v1 |
| 158 | + kind: Deployment |
| 159 | + name: maxengine-server |
| 160 | + minReplicas: ${MIN_REPLICAS} |
| 161 | + maxReplicas: ${MAX_REPLICAS} |
| 162 | + metrics: |
| 163 | + - type: ${METRICS_SOURCE_TYPE} |
| 164 | + ${METRICS_SOURCE}: |
| 165 | + metric: |
| 166 | + name: ${METRIC} |
| 167 | + target: |
| 168 | + type: AverageValue |
| 169 | + averageValue: ${AVERAGE_VALUE} |
| 170 | +" | kubectl apply -f - |
| 171 | + ``` |
0 commit comments