Fine-tune a Gemma Instruction Tuned model using a flipkart processed catalog. The dataset used for fine-tuning is generated by Llama 3.1 on Vertex AI. The fine-tuned model can be deployed with an inference serving engine.
The resulting fine-tuned model is, Built with Meta Llama 3.1, using the the data prepared by the Llama 3.1 on Vertex AI API.
- This guide was developed to be run on the playground AI/ML platform. If you are using a different environment the scripts and manifest will need to be modified for that environment.
- A bucket containing the prepared data from the Data Preparation example
NOTE: If you did not execute the data preparation example, follow these instructions to load the dataset into the bucket.
-
Clone the repository and change directory to the guide directory
git clone https://github.com/GoogleCloudPlatform/accelerated-platforms && \ cd accelerated-platforms/use-cases/model-fine-tuning-pipeline/fine-tuning/pytorch
-
Ensure that your
MLP_ENVIRONMENT_FILE
is configuredcat ${MLP_ENVIRONMENT_FILE} && \ source ${MLP_ENVIRONMENT_FILE}
You should see the various variables populated with the information specific to your environment.
-
Set
HF_TOKEN
to your HuggingFace access token. Go to https://huggingface.co/settings/tokens , clickCreate new token
, provide a token name, selectRead
in token type and clickCreate token
.HF_TOKEN=
-
Build the container image using Cloud Build and push the image to Artifact Registry
cd src sed -i -e "s|^serviceAccount:.*|serviceAccount: projects/${MLP_PROJECT_ID}/serviceAccounts/${MLP_BUILD_GSA}|" cloudbuild.yaml gcloud beta builds submit \ --config cloudbuild.yaml \ --gcs-source-staging-dir gs://${MLP_CLOUDBUILD_BUCKET}/source \ --project ${MLP_PROJECT_ID} \ --substitutions _DESTINATION=${MLP_FINE_TUNING_IMAGE} cd ..
-
Accept Gemma model terms
To get access to the Gemma models for this example, you must first sign the license consent agreement. Follow these instructions:
- Access the model consent page on Kaggle.com
- Select
Request Access
- Select
Verify via Hugging Face
and continue - Accept the model terms
-
Verify your
HF_TOKEN
is valid and that you have agreed to the Gemma model terms.git clone https://token:${HF_TOKEN}@huggingface.co/google/gemma-2-9b-it /tmp/test
NOTE: If you get the following message, please check your HF token and agreement.
remote: Access to model google/gemma-2-9b-it is restricted. You must have access to it and be authenticated to access it. Please log in. fatal: Authentication failed for '<https://huggingface.co/google/gemma-2-9b-it/>'
-
Get credentials for the GKE cluster
gcloud container fleet memberships get-credentials ${MLP_CLUSTER_NAME} --project ${MLP_PROJECT_ID}
-
Create a Kubernetes secret with your HuggingFace token
kubectl create secret generic hf-secret \ --from-literal=hf_api_token=${HF_TOKEN} \ --dry-run=client -o yaml | kubectl apply -n ${MLP_KUBERNETES_NAMESPACE} -f -
-
Configure the job
Variable Description Example ACCELERATOR Type of GPU accelerator to use (l4, a100, h100) l4 DATA_BUCKET_DATASET_PATH The path where the generated prompt data is for fine-tuning. dataset/output/training EXPERIMENT If MLflow is enabled. experiment ID used in MLflow experiment- HF_BASE_MODEL_NAME The Hugging Face path to the base model for fine-tuning. google/gemma-2-9b-it MLFLOW_ENABLE Enable MLflow, empty will also disable true/false MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING If MLflow is enabled, track system level metrics, CPU/Memory/GPU true/false MLFLOW_TRACKING_URI If MLflow is enabled, the tracking server URI http://mlflow-tracking-service.ml-team:5000 MODEL_PATH The output folder path for the fine-tuned model. This location will be used by the inference serving engine and model evaluation. /model-data/model-gemma2/experiment TRAIN_BATCH_SIZE The number of training examples processed in a single iteration of an ML model's training process 1 ACCELERATOR="l4" DATA_BUCKET_DATASET_PATH="dataset/output/training" EXPERIMENT="finetune-experiment" HF_BASE_MODEL_NAME="google/gemma-2-9b-it" MLFLOW_ENABLE="true" MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING="true" MLFLOW_TRACKING_URI="http://mlflow-tracking-svc:5000" MODEL_PATH="/model-data/model-gemma2/experiment" TRAIN_BATCH_SIZE="1"
sed \ -i -e "s|V_DATA_BUCKET|${MLP_DATA_BUCKET}|" \ -i -e "s|V_EXPERIMENT|${EXPERIMENT}|" \ -i -e "s|V_MODEL_NAME|${HF_BASE_MODEL_NAME}|" \ -i -e "s|V_IMAGE_URL|${MLP_FINE_TUNING_IMAGE}|" \ -i -e "s|V_KSA|${MLP_FINE_TUNING_KSA}|" \ -i -e "s|V_MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING|${MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING}|" \ -i -e "s|V_MLFLOW_ENABLE|${MLFLOW_ENABLE}|" \ -i -e "s|V_MLFLOW_TRACKING_URI|${MLFLOW_TRACKING_URI}|" \ -i -e "s|V_MODEL_BUCKET|${MLP_MODEL_BUCKET}|" \ -i -e "s|V_MODEL_PATH|${MODEL_PATH}|" \ -i -e "s|V_TRAINING_DATASET_PATH|${DATA_BUCKET_DATASET_PATH}|" \ -i -e "s|V_TRAIN_BATCH_SIZE|${TRAIN_BATCH_SIZE}|" \ manifests/fine-tune-${ACCELERATOR}-dws.yaml
-
Create the provisioning request and job
kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/provisioning-request-${ACCELERATOR}.yaml kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/fine-tune-${ACCELERATOR}-dws.yaml
-
Verify the completion of the job
In the Google Cloud console, go to the Logs Explorer page to run the following query to see the completion of the job.
labels."k8s-pod/app"="finetune-job" textPayload: "finetune - INFO - ### Completed ###"
-
After the fine-tuning job is successful, the model bucket should have a checkpoint folder created.
gcloud storage ls gs://${MLP_MODEL_BUCKET}/${MODEL_PATH}
Besides the logs and metrics provided by Google Cloud Observability, it's also important to track the fine-tuning job and its results.
There are many existing options for this. As an example, we choose to use MLflow Tracking to keep track of running the ML workloads. The MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.
When you use the playground configuration, MLflow Tracking has been installed for you.
You can run the following command to get the URL:
echo -e "\n${MLP_KUBERNETES_NAMESPACE} MLFlow Tracking URL: ${MLP_MLFLOW_TRACKING_NAMESPACE_ENDPOINT}\n"
Read this playground README section for more info.
Note: You can set the variable MLFLOW_ENABLE
to false
or leave it empty
to disable MLflow Tracking.
MLflow Tracking is protected by IAP. After you log in, you should see a page similar to the following.
All successful experiments should appear. If you click into a completed run, you can see an overview page with metric tabs.