Skip to content

Latest commit

 

History

History
215 lines (163 loc) · 9.22 KB

File metadata and controls

215 lines (163 loc) · 9.22 KB

Fine-tuning

Fine-tune a Gemma Instruction Tuned model using a flipkart processed catalog. The dataset used for fine-tuning is generated by Llama 3.1 on Vertex AI. The fine-tuned model can be deployed with an inference serving engine.

The resulting fine-tuned model is, Built with Meta Llama 3.1, using the the data prepared by the Llama 3.1 on Vertex AI API.

Prerequisites

  • This guide was developed to be run on the playground AI/ML platform. If you are using a different environment the scripts and manifest will need to be modified for that environment.
  • A bucket containing the prepared data from the Data Preparation example

NOTE: If you did not execute the data preparation example, follow these instructions to load the dataset into the bucket.

Preparation

  • Clone the repository and change directory to the guide directory

    git clone https://github.com/GoogleCloudPlatform/accelerated-platforms && \
    cd accelerated-platforms/use-cases/model-fine-tuning-pipeline/fine-tuning/pytorch
  • Ensure that your MLP_ENVIRONMENT_FILE is configured

    cat ${MLP_ENVIRONMENT_FILE} && \
    source ${MLP_ENVIRONMENT_FILE}

    You should see the various variables populated with the information specific to your environment.

Access token variables

  • Set HF_TOKEN to your HuggingFace access token. Go to https://huggingface.co/settings/tokens , click Create new token , provide a token name, select Read in token type and click Create token.

    HF_TOKEN=

Build the container image

  • Build the container image using Cloud Build and push the image to Artifact Registry

    cd src
    sed -i -e "s|^serviceAccount:.*|serviceAccount: projects/${MLP_PROJECT_ID}/serviceAccounts/${MLP_BUILD_GSA}|" cloudbuild.yaml
    gcloud beta builds submit \
    --config cloudbuild.yaml \
    --gcs-source-staging-dir gs://${MLP_CLOUDBUILD_BUCKET}/source \
    --project ${MLP_PROJECT_ID} \
    --substitutions _DESTINATION=${MLP_FINE_TUNING_IMAGE}
    cd ..

Run the job

  • Accept Gemma model terms

    To get access to the Gemma models for this example, you must first sign the license consent agreement. Follow these instructions:

    • Access the model consent page on Kaggle.com
    • Select Request Access
    • Select Verify via Hugging Face and continue
    • Accept the model terms
  • Verify your HF_TOKEN is valid and that you have agreed to the Gemma model terms.

    git clone https://token:${HF_TOKEN}@huggingface.co/google/gemma-2-9b-it /tmp/test

    NOTE: If you get the following message, please check your HF token and agreement.

    remote: Access to model google/gemma-2-9b-it is restricted. You must have access to it and be authenticated to access it. Please log in.
    fatal: Authentication failed for '<https://huggingface.co/google/gemma-2-9b-it/>'
    
  • Get credentials for the GKE cluster

    gcloud container fleet memberships get-credentials ${MLP_CLUSTER_NAME} --project ${MLP_PROJECT_ID}
  • Create a Kubernetes secret with your HuggingFace token

    kubectl create secret generic hf-secret \
    --from-literal=hf_api_token=${HF_TOKEN} \
    --dry-run=client -o yaml | kubectl apply -n ${MLP_KUBERNETES_NAMESPACE} -f -
  • Configure the job

    Variable Description Example
    ACCELERATOR Type of GPU accelerator to use (l4, a100, h100) l4
    DATA_BUCKET_DATASET_PATH The path where the generated prompt data is for fine-tuning. dataset/output/training
    EXPERIMENT If MLflow is enabled. experiment ID used in MLflow experiment-
    HF_BASE_MODEL_NAME The Hugging Face path to the base model for fine-tuning. google/gemma-2-9b-it
    MLFLOW_ENABLE Enable MLflow, empty will also disable true/false
    MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING If MLflow is enabled, track system level metrics, CPU/Memory/GPU true/false
    MLFLOW_TRACKING_URI If MLflow is enabled, the tracking server URI http://mlflow-tracking-service.ml-team:5000
    MODEL_PATH The output folder path for the fine-tuned model. This location will be used by the inference serving engine and model evaluation. /model-data/model-gemma2/experiment
    TRAIN_BATCH_SIZE The number of training examples processed in a single iteration of an ML model's training process 1
    ACCELERATOR="l4"
    DATA_BUCKET_DATASET_PATH="dataset/output/training"
    EXPERIMENT="finetune-experiment"
    HF_BASE_MODEL_NAME="google/gemma-2-9b-it"
    MLFLOW_ENABLE="true"
    MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING="true"
    MLFLOW_TRACKING_URI="http://mlflow-tracking-svc:5000"
    MODEL_PATH="/model-data/model-gemma2/experiment"
    TRAIN_BATCH_SIZE="1"
    sed \
    -i -e "s|V_DATA_BUCKET|${MLP_DATA_BUCKET}|" \
    -i -e "s|V_EXPERIMENT|${EXPERIMENT}|" \
    -i -e "s|V_MODEL_NAME|${HF_BASE_MODEL_NAME}|" \
    -i -e "s|V_IMAGE_URL|${MLP_FINE_TUNING_IMAGE}|" \
    -i -e "s|V_KSA|${MLP_FINE_TUNING_KSA}|" \
    -i -e "s|V_MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING|${MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING}|" \
    -i -e "s|V_MLFLOW_ENABLE|${MLFLOW_ENABLE}|" \
    -i -e "s|V_MLFLOW_TRACKING_URI|${MLFLOW_TRACKING_URI}|" \
    -i -e "s|V_MODEL_BUCKET|${MLP_MODEL_BUCKET}|" \
    -i -e "s|V_MODEL_PATH|${MODEL_PATH}|" \
    -i -e "s|V_TRAINING_DATASET_PATH|${DATA_BUCKET_DATASET_PATH}|" \
    -i -e "s|V_TRAIN_BATCH_SIZE|${TRAIN_BATCH_SIZE}|" \
    manifests/fine-tune-${ACCELERATOR}-dws.yaml
  • Create the provisioning request and job

    kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/provisioning-request-${ACCELERATOR}.yaml
    kubectl --namespace ${MLP_KUBERNETES_NAMESPACE} apply -f manifests/fine-tune-${ACCELERATOR}-dws.yaml
  • Verify the completion of the job

    In the Google Cloud console, go to the Logs Explorer page to run the following query to see the completion of the job.

    labels."k8s-pod/app"="finetune-job"
    textPayload: "finetune - INFO - ### Completed ###"
  • After the fine-tuning job is successful, the model bucket should have a checkpoint folder created.

    gcloud storage ls gs://${MLP_MODEL_BUCKET}/${MODEL_PATH}

Observability

Besides the logs and metrics provided by Google Cloud Observability, it's also important to track the fine-tuning job and its results.

There are many existing options for this. As an example, we choose to use MLflow Tracking to keep track of running the ML workloads. The MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.

When you use the playground configuration, MLflow Tracking has been installed for you.

You can run the following command to get the URL:

echo -e "\n${MLP_KUBERNETES_NAMESPACE} MLFlow Tracking URL: ${MLP_MLFLOW_TRACKING_NAMESPACE_ENDPOINT}\n"

Read this playground README section for more info.

Note: You can set the variable MLFLOW_ENABLE to false or leave it empty to disable MLflow Tracking.

MLflow Tracking is protected by IAP. After you log in, you should see a page similar to the following.

mlflow-home

All successful experiments should appear. If you click into a completed run, you can see an overview page with metric tabs.

mlflow-model-experiment