Fine-tuning

Fine-tune a Gemma Instruction Tuned model using a flipkart processed catalog. The dataset used for fine-tuning is generated by Llama 3.1 on Vertex AI. The fine-tuned model can be deployed with an inference serving engine.

The resulting fine-tuned model is, Built with Meta Llama 3.1, using the the data prepared by the Llama 3.1 on Vertex AI API.

Prerequisites

This guide was developed to be run on the playground AI/ML platform. If you are using a different environment the scripts and manifest will need to be modified for that environment.
A bucket containing the prepared data from the Data Preparation example

NOTE: If you did not execute the data preparation example, follow these instructions to load the dataset into the bucket.

Preparation

Clone the repository and change directory to the guide directory

git clone https://github.com/GoogleCloudPlatform/accelerated-platforms && \
cd accelerated-platforms/use-cases/model-fine-tuning-pipeline/fine-tuning/pytorch

Ensure that your MLP_ENVIRONMENT_FILE is configured
```
cat ${MLP_ENVIRONMENT_FILE} && \
source ${MLP_ENVIRONMENT_FILE}
```
You should see the various variables populated with the information specific to your environment.

Access token variables

Set HF_TOKEN to your HuggingFace access token. Go to https://huggingface.co/settings/tokens , click Create new token , provide a token name, select Read in token type and click Create token.
```
HF_TOKEN=
```

Build the container image

Build the container image using Cloud Build and push the image to Artifact Registry

cd src
sed -i -e "s|^serviceAccount:.*|serviceAccount: projects/${MLP_PROJECT_ID}/serviceAccounts/${MLP_BUILD_GSA}|" cloudbuild.yaml
gcloud beta builds submit \
--config cloudbuild.yaml \
--gcs-source-staging-dir gs://${MLP_CLOUDBUILD_BUCKET}/source \
--project ${MLP_PROJECT_ID} \
--substitutions _DESTINATION=${MLP_FINE_TUNING_IMAGE}
cd ..

Run the job

Accept Gemma model terms

To get access to the Gemma models for this example, you must first sign the license consent agreement. Follow these instructions:
- Access the model consent page on Kaggle.com
- Select Request Access
- Select Verify via Hugging Face and continue
- Accept the model terms

Verify your HF_TOKEN is valid and that you have agreed to the Gemma model terms.

git clone https://token:${HF_TOKEN}@huggingface.co/google/gemma-2-9b-it /tmp/test

NOTE: If you get the following message, please check your HF token and agreement.

remote: Access to model google/gemma-2-9b-it is restricted. You must have access to it and be authenticated to access it. Please log in.
fatal: Authentication failed for '<https://huggingface.co/google/gemma-2-9b-it/>'

Get credentials for the GKE cluster

gcloud container fleet memberships get-credentials ${MLP_CLUSTER_NAME} --project ${MLP_PROJECT_ID}

Create a Kubernetes secret with your HuggingFace token

kubectl create secret generic hf-secret \
--from-literal=hf_api_token=${HF_TOKEN} \
--dry-run=client -o yaml | kubectl apply -n ${MLP_KUBERNETES_NAMESPACE} -f -

Configure the job

Variable	Description	Example
ACCELERATOR	Type of GPU accelerator to use (l4, a100, h100)	l4
DATA_BUCKET_DATASET_PATH	The path where the generated prompt data is for fine-tuning.	dataset/output/training
EXPERIMENT	If MLflow is enabled. experiment ID used in MLflow	experiment-
HF_BASE_MODEL_NAME	The Hugging Face path to the base model for fine-tuning.	google/gemma-2-9b-it
MLFLOW_ENABLE	Enable MLflow, empty will also disable	true/false
MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING	If MLflow is enabled, track system level metrics, CPU/Memory/GPU	true/false
MLFLOW_TRACKING_URI	If MLflow is enabled, the tracking server URI	http://mlflow-tracking-service.ml-team:5000
MODEL_PATH	The output folder path for the fine-tuned model. This location will be used by the inference serving engine and model evaluation.	/model-data/model-gemma2/experiment
TRAIN_BATCH_SIZE	The number of training examples processed in a single iteration of an ML model's training process	1

ACCELERATOR="l4"
DATA_BUCKET_DATASET_PATH="dataset/output/training"
EXPERIMENT="finetune-experiment"
HF_BASE_MODEL_NAME="google/gemma-2-9b-it"
MLFLOW_ENABLE="true"
MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING="true"
MLFLOW_TRACKING_URI="http://mlflow-tracking-svc:5000"
MODEL_PATH="/model-data/model-gemma2/experiment"
TRAIN_BATCH_SIZE="1"

sed \
-i -e "s|V_DATA_BUCKET|${MLP_DATA_BUCKET}|" \
-i -e "s|V_EXPERIMENT|${EXPERIMENT}|" \
-i -e "s|V_MODEL_NAME|${HF_BASE_MODEL_NAME}|" \
-i -e "s|V_IMAGE_URL|${MLP_FINE_TUNING_IMAGE}|" \
-i -e "s|V_KSA|${MLP_FINE_TUNING_KSA}|" \
-i -e "s|V_MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING|${MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING}|" \
-i -e "s|V_MLFLOW_ENABLE|${MLFLOW_ENABLE}|" \
-i -e "s|V_MLFLOW_TRACKING_URI|${MLFLOW_TRACKING_URI}|" \
-i -e "s|V_MODEL_BUCKET|${MLP_MODEL_BUCKET}|" \
-i -e "s|V_MODEL_PATH|${MODEL_PATH}|" \
-i -e "s|V_TRAINING_DATASET_PATH|${DATA_BUCKET_DATASET_PATH}|" \
-i -e "s|V_TRAIN_BATCH_SIZE|${TRAIN_BATCH_SIZE}|" \
manifests/fine-tune-${ACCELERATOR}-dws.yaml

Create the provisioning request and job

kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/provisioning-request-${ACCELERATOR}.yaml
kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/fine-tune-${ACCELERATOR}-dws.yaml

Verify the completion of the job

In the Google Cloud console, go to the Logs Explorer page to run the following query to see the completion of the job.
```
labels."k8s-pod/app"="finetune-job"
textPayload: "finetune - INFO - ### Completed ###"
```
After the fine-tuning job is successful, the model bucket should have a checkpoint folder created.
```
gcloud storage ls gs://${MLP_MODEL_BUCKET}/${MODEL_PATH}
```

Observability

Besides the logs and metrics provided by Google Cloud Observability, it's also important to track the fine-tuning job and its results.

There are many existing options for this. As an example, we choose to use MLflow Tracking to keep track of running the ML workloads. The MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.

When you use the playground configuration, MLflow Tracking has been installed for you.

You can run the following command to get the URL:

echo -e "\n${MLP_KUBERNETES_NAMESPACE} MLFlow Tracking URL: ${MLP_MLFLOW_TRACKING_NAMESPACE_ENDPOINT}\n"

Read this playground README section for more info.

Note: You can set the variable MLFLOW_ENABLE to false or leave it empty to disable MLflow Tracking.

MLflow Tracking is protected by IAP. After you log in, you should see a page similar to the following.

All successful experiments should appear. If you click into a completed run, you can see an overview page with metric tabs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fine-tuning

Prerequisites

Preparation

Access token variables

Build the container image

Run the job

Observability

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fine-tuning

Prerequisites

Preparation

Access token variables

Build the container image

Run the job

Observability