Skip to content

Commit 48850b2

Browse files
Merge pull request #2 from volatilemolotov/README
Minor README fixes
2 parents 3328e8e + ed1e48d commit 48850b2

File tree

1 file changed

+12
-12
lines changed
  • tutorials-and-examples/skypilot/dws-and-kueue

1 file changed

+12
-12
lines changed

tutorials-and-examples/skypilot/dws-and-kueue/README.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Server Version: v1.30.6-gke.1596000
8282
```
8383
If not you can change the version in terraform with the `kubectl_version` variable
8484
## Install and configure Kueue
85-
1. Install Kueue from the official manifest. Note that --server-side switch . Without it the client cannot render the CRDs because of annotation size limitations.
85+
1. Install Kueue from the official manifest. Note that `--server-side` switch . Without it the client cannot render the CRDs because of annotation size limitations.
8686
```bash
8787
VERSION=v0.7.0
8888
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
@@ -182,7 +182,7 @@ Note: The following clouds were disabled because they were not included in allow
182182
✔ Kubernetes
183183
```
184184
## Configure and Run SkyPilot Job
185-
For SkyPilot to create pods with the necessary pod config we need to add the following config to train_dws.yaml.
185+
For SkyPilot to create pods with the necessary pod config we need to add the following config to `train_dws.yaml`.
186186
```yaml
187187
experimental:
188188
config_overrides:
@@ -268,10 +268,10 @@ This section details how to fine-tune Gemma 2B for SQL generation on GKE Autopil
268268
- A GKE cluster configured with SkyPilot
269269
- HuggingFace account with access to Gemma model
270270

271-
###Fine-tuning Implementation
271+
### Fine-tuning Implementation
272272
The [finetune.py](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/skypilot_dws_kueue/tutorials-and-examples/skypilot/dws-and-kueue/finetune.yaml) script uses QLoRA with 4-bit quantization to fine-tune Gemma 2B on SQL generation tasks.
273273

274-
###Configure GCS Storage Access
274+
### Configure GCS Storage Access
275275
The infrastructure Terraform configuration in [main.tf](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/skypilot_dws_kueue/tutorials-and-examples/skypilot/dws-and-kueue/main.tf) includes Workload Identity and GCS bucket setup:
276276
```
277277
module "skypilot-workload-identity" {
@@ -289,7 +289,7 @@ module "skypilot-workload-identity" {
289289
}
290290
291291
```
292-
1. 1. Get project and service account details
292+
1. Get project and service account details
293293
```
294294
terraform output project_id
295295
terraform output service_account
@@ -311,10 +311,10 @@ kubectl annotate serviceaccount skypilot-service-account --namespace default iam
311311
```
312312
terraform output model_bucket_name
313313
```
314-
5. Update gcsfuse configuration in finetune.yaml and sever.yaml
314+
5. Update gcsfuse configuration in `finetune.yaml` and `sever.yaml`
315315
Replace the [BUCKET_NAME](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/skypilot_dws_kueue/tutorials-and-examples/skypilot/dws-and-kueue/finetune.yaml#L27)
316316

317-
###Fine-tune the Model
317+
### Fine-tune the Model
318318
1. Set up HuggingFace access:
319319
Finetune script needs a HuggingFace token and to sign the licence consent agreement. Follow instructions on the following link: Get access to the [model](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-vllm#model-access)
320320
```
@@ -336,8 +336,8 @@ Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00
336336
✓ Job finished (status: SUCCEEDED).
337337
```
338338

339-
###Serve the Model
340-
Next, run the finetuned model with the serve.yaml and serve cli
339+
### Serve the Model
340+
Next, run the finetuned model with the `serve.yaml` and serve cli
341341
```
342342
sky serve up serve.yaml
343343
```
@@ -408,7 +408,7 @@ terraform destroy -var-file=your_environment.tfvar
408408
```
409409
## Troubleshooting
410410

411-
1.If Kueue install gives the error:
411+
1. If Kueue install gives the error:
412412
```
413413
the CustomResourceDefinition "workloads.kueue.x-k8s.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
414414
```
@@ -435,7 +435,7 @@ Hint: sky show-gpus to list available accelerators.
435435
```
436436
Make sure you added `autoscaling: gke` to the sky config in step [Install SkyPilot](#install-skypilot)
437437

438-
5. Permission denied when trying to write to the mounted gcsfuse volume.
438+
4. Permission denied when trying to write to the mounted gcsfuse volume.
439439

440440
Make sure you added `uid=1000,gid=1000` to the `mountOptions:` YAML inside of the task yaml file. SkyPilot by default uses 1000 gid and uid
441441
```
@@ -446,4 +446,4 @@ volumes:
446446
volumeAttributes:
447447
bucketName: MODEL_BUCKET_NAME
448448
mountOptions: "implicit-dirs,uid=1000,gid=1000"
449-
```
449+
```

0 commit comments

Comments
 (0)