You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials-and-examples/inference-servers/checkpoints/README.md
+20-20Lines changed: 20 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -13,28 +13,28 @@ Now you can use it in a [Kubernetes job](../jetstream/maxtext/single-host-infere
13
13
14
14
## Jetstream + MaxText
15
15
```
16
-
-b, --bucket_name: [string] The GSBucket name to store checkpoints, without gs://.
17
-
-s, --inference_server: [string] The name of the inference server that serves your model. (Optional) (default=jetstream-maxtext)
18
-
-m, --model_path: [string] The model path.
19
-
-n, --model_name: [string] The model name.
20
-
-h, --huggingface: [bool] The model is from Hugging Face. (Optional) (default=False)
21
-
-t, --quantize_type: [string] The type of quantization. (Optional)
22
-
-q, --quantize_weights: [bool] The checkpoint is to be quantized. (Optional) (default=False)
23
-
-i, --input_directory: [string] The input directory, likely a GSBucket path.
24
-
-o, --output_directory: [string] The output directory, likely a GSBucket path.
25
-
-u, --meta_url: [string] The url from Meta. (Optional)
26
-
-v, --version: [string] The version of repository. (Optional) (default=main)
16
+
--bucket_name: [string] The GSBucket name to store checkpoints, without gs://.
17
+
--inference_server: [string] The name of the inference server that serves your model. (Optional) (default=jetstream-maxtext)
18
+
--model_path: [string] The model path.
19
+
--model_name: [string] The model name. ex. llama-2, llama-3, gemma.
20
+
--huggingface: [bool] The model is from Hugging Face. (Optional) (default=False)
21
+
--quantize_type: [string] The type of quantization. (Optional)
22
+
--quantize_weights: [bool] The checkpoint is to be quantized. (Optional) (default=False)
23
+
--input_directory: [string] The input directory, likely a GSBucket path.
24
+
--output_directory: [string] The output directory, likely a GSBucket path.
25
+
--meta_url: [string] The url from Meta. (Optional)
26
+
--version: [string] The version of repository. (Optional) (default=main)
27
27
```
28
28
29
29
## Jetstream + Pytorch/XLA
30
30
```
31
-
- -s, --inference_server: [string] The name of the inference server that serves your model.
32
-
- -m, --model_path: [string] The model path.
33
-
- -n, --model_name: [string] The model name, Model name, ex. llama-2, llama-3, gemma.
34
-
- -q, --quantize_weights: [bool] The checkpoint is to be quantized. (Optional) (default=False)
35
-
- -t, --quantize_type: [string] The type of quantization. Availabe quantize type: {"int8", "int4"} x {"per_channel", "blockwise"}. (Optional) (default=int8_per_channel)
36
-
- -v, --version: [string] The version of repository to override, ex. jetstream-v0.2.2, jetstream-v0.2.3. (Optional) (default=main)
37
-
- -i, --input_directory: [string] The input directory, likely a GSBucket path. (Optional)
38
-
- -o, --output_directory: [string] The output directory, likely a GSBucket path.
39
-
- -h, --huggingface: [bool] The model is from Hugging Face. (Optional) (default=False)
31
+
--inference_server: [string] The name of the inference server that serves your model.
32
+
--model_path: [string] The model path.
33
+
--model_name: [string] The model name. ex. llama-2, llama-3, gemma.
34
+
--quantize_weights: [bool] The checkpoint is to be quantized. (Optional) (default=False)
35
+
--quantize_type: [string] The type of quantization. Availabe quantize type: {"int8", "int4"} x {"per_channel", "blockwise"}. (Optional) (default=int8_per_channel)
36
+
--version: [string] The version of repository to override, ex. jetstream-v0.2.2, jetstream-v0.2.3. (Optional) (default=main)
37
+
--input_directory: [string] The input directory, likely a GSBucket path. (Optional)
38
+
--output_directory: [string] The output directory, likely a GSBucket path.
39
+
--huggingface: [bool] The model is from Hugging Face. (Optional) (default=False)
The Llama3-405B model needs ~2000 GB of client memory to download and run checkpoint conversion. This machine type and boot disk size will ensure enough capacity to facilitate the model download and conversion. The checkpoint conversion for the 405B model supports weights downloaded only from Meta.
4
+
5
+
## Agree to the Meta Terms and Conditions
6
+
Go to https://www.llama.com/llama-downloads/ to acknowledge the terms and conditions. Select `Llama3.1: 405B & 8B` as the model(s) you will request.
7
+
8
+
Copy the provided Meta URL to use in your manifest file.
9
+
10
+
## Create GCS Bucket to store checkpoint
11
+
```
12
+
BUCKET_NAME=<your bucket>
13
+
14
+
gcloud storage buckets create gs://$BUCKET_NAME
15
+
```
16
+
17
+
## Configure a service account for Storage Object access
18
+
Configure a Kubernetes service account to act as an IAM service account.
19
+
20
+
Create an IAM service account for your application:
21
+
22
+
```
23
+
gcloud iam service-accounts create jetstream-pathways
24
+
```
25
+
26
+
Add an IAM policy binding for your IAM service account to manage Cloud Storage:
Create a node pool with machine type `m1-ultramem-160`. You may need to request m1 quota in your project.
47
+
48
+
```
49
+
CLUSTER=<your cluster>
50
+
ZONE=<your zone>
51
+
PROJECT=<project>
52
+
53
+
gcloud container node-pools create m1-pool \
54
+
--cluster "${CLUSTER}" \
55
+
--zone "${ZONE}" \
56
+
--machine-type m1-ultramem-160 \
57
+
--num-nodes 1 \
58
+
--disk-size 3000 \
59
+
--project "${PROJECT}" \
60
+
--scopes cloud-platform
61
+
```
62
+
63
+
In `checkpoint-converter.yaml`, replace `BUCKET_NAME` with the GCS Bucket that you created earlier, without gs://.
64
+
65
+
Parameter descriptions:
66
+
67
+
```
68
+
--bucket_name: [string] The GSBucket name to store checkpoints, without gs://.
69
+
--inference_server: [string] The name of the inference server that serves your model. (ex. jetstream-maxtext)
70
+
--meta_url: [string] The url from Meta.
71
+
--model_name: [string] The model name. (ex. llama-2, llama-3, llama-3.1)
72
+
--model_path: [string] The model path. For Llama models, download llama via `pip install llama-stack` and run `llama model list --show-all` for Model Descriptor to use. (ex. Llama3.1-405B-Instruct:bf16-mp16)
73
+
--output_directory: [string] The output directory. (ex. gs://bucket_name/maxtext/llama3.1-405b)
74
+
--quantize_type: [string] The type of quantization. (ex. int8)
75
+
--quantize_weights: [bool] The checkpoint is to be quantized. (ex. True)
76
+
```
77
+
78
+
Apply the manifest:
79
+
80
+
```
81
+
kubectl apply -f checkpoint-converter.yaml
82
+
```
83
+
84
+
The checkpoint conversion job takes around 9-10 hours to complete, To monitor the progress of the checkpoint download and conversion, check ]GCP Log Explorer](https://console.cloud.google.com/logs/query) and enter the following query:
85
+
86
+
```
87
+
resource.type="k8s_container"
88
+
resource.labels.project_id="PROJECT_ID"
89
+
resource.labels.location="LOCATION"
90
+
resource.labels.cluster_name="CLUSTER_NAME"
91
+
resource.labels.namespace_name="default"
92
+
resource.labels.pod_name:"checkpoint-converter-"
93
+
```
94
+
95
+
Once completed, you will see a log similar to:
96
+
97
+
```
98
+
# bf16 checkpoint
99
+
Completed unscanning checkpoint to gs://output_directory/bf16/unscanned/checkpoints/0/items
100
+
101
+
# int8 checkpoint
102
+
Completed quantizing model llama3.1-405b with int8 to gs://output_directory/int8
0 commit comments