This guide is deprecated an no longer maintained.

Quick start guide

Here we will go over some common tasks, related to utilizing RAPIDS on the GCP AI Platform. Note that strings containing '[YOUR_XXX]' indicate items that you will need to supply, based on your specific resource names and environment.

Create a Notebook using the RAPIDS environemnt

Motivation: We would like to create a GCP notebook with RAPIDS 0.18 release Workflow: We will create a notebook instance using the RAPIDS 0.18 [Experimental] env

Log into your GCP console.
1. Select AI-Platform -> Notebooks
2. Select a "New Instance" -> "RAPIDS 0.18 [Experimental]"
  1. Select 'Install NVIDIA GPU driver automatically for me'
  2. Create
  3. Once JupterLab is running, you will have jupyter notebooks with rapids installed and rapids notebook examples under tutorials/RapidsAi.

To create an instance with A100s:

Select "New Instance" -> "Customize instance"
Select us-central1 region
Select "RAPIDS 0.18 [Experimental]" Environment
Choose A2 highgpu (for 1, 2 4 and 8 A100s) or A1 megagpu (16x A100s) as machine type

Install RAPIDS on a pre-made Notebook

Motivation: We have an existing GCP notebook that we wish to update to support RAPIDS functionality. Workflow: We will create a notebook instance, and run a shell script that will install a Jupyter kernel and allow us to run RAPIDS based tasks.

Log into your GCP console.
1. Select AI-Platform -> Notebooks
2. Select a "New Instance" -> "Python 3 (CUDA Toolkit 11.0)" -> With 1 NVIDIA Tesla T4
  1. Select 'Install NVIDIA GPU driver automatically for me'
  2. Create.
3. Once JupyterLab is running
  1. Open a new terminal
  2. Run
```
RAPIDS_VER=21.06
CUDA_VER=11.0
wget -q https://data.rapids.ai/conda-pack/rapidsai/rapids${RAPIDS_VER}_cuda${CUDA_VER}_py3.8.tar.gz
tar -xzf rapids${RAPIDS_VER}_cuda${CUDA_VER}_py3.8.tar.gz -C /opt/conda/envs/rapids_py38
conda activate rapids_py38
conda unpack
ipython kernel install --user --name=rapids_py38
```
  3. Once completed, you will now have a new kernel in your jupyter notebooks called 'rapids_py38' which will have rapids installed.

Deploy a custom RAPIDS training container utilizing the 'airline dataset', and initiate a training job with support for HyperParameter Optimization (HPO)

Motivation: We would like to be able to utilize GCP's AI Platform for training a custom model, utilizing RAPIDS. Workflow: Install the required libraries, and authentication components for GCP, configure a storage bucket for persistent data, build our custom training container, upload the container, and launch a training job with HPO.

Install GCP 'gcloud' SDK
1. See: https://cloud.google.com/sdk/install
Configure gcloud authorization for docker on your build machine
1. See: https://cloud.google.com/container-registry/docs/advanced-authentication
Configure a google cloud object storage bucket that will provide and output location
Pull or build training containers and upload to GCR
1. Pull
  1. Find the appropriate container: Here
  2. docker tag <image> gcr.io/[YOUR_PROJECT_NAME]/rapids_training_container:latest
2. Build
  1. $ cd .
  2. $ docker build --tag gcr.io/[YOUR_PROJECT_NAME]/rapids_training_container:latest --file common/docker/Dockerfile.training.unified .
  3. $ docker push gcr.io/[YOUR_PROJECT_NAME]/rapids_training_container:latest
Training via GCP UI
1. A quick note regarding GCP's cloudml-Hypertune
  1. This library interacts with the GCP AI Platform's HPO process by reporting required optimization metrics to the system after each training iteration.
```
    hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag='hpo_accuracy',
    metric_value=accuracy)
```
  1. For our purposes, the 'hyperparameter_metric_tag' should always correspond to the 'Metric to optimize' element passed to a job deployment.
2. Training Algorithm
  1. From the GCP console select 'jobs' -> 'new training job' -> custom code training
  2. Choose 'Select a container image from the container Registry'
  3. Set 'Master image' to 'gcr.io/[YOUR_PROJECT_NAME]/rapids_training_container:latest'
  4. Set 'Job directory' to 'gs://[YOUR_GOOGLE_STORAGE_BUCKET]'
3. Algorithm Arguments
  1. Ex:
  2. ```
   --train
   --do-hpo
   --cloud-type=GCP
   --data-input-path=gs://[YOUR STORAGE BUCKET]
   --data-output-path=gs://[YOUR STORAGE BUCKET]/training_output
   --data-name=airline_20000000.orc
```
3. With Hypertune
  1. Enter the hypertune parameters. Ex:
    1. Argument name: hpo-max-depth Type: Integer Min: 2 Max: 8
    2. Argumnet name: hpo-num-est Type: Integer Min: 100 Max: 200
    3. Argument name: hpo-max-features Type: Double Min: 0.2 Max: 0.6
  2. Enter an optimizing metric. Ex:
    1. Metric to optimize: hpo_accuracy Goal: Maximize Max trials: 20 Max parallel trials: 5 Algorithm: Bayesian optimization Early stopping: True
4. Job Settings
  1. ```
   Job ID: my-test-job
   Region: us-central1
```
2. Scale Tier
  1. Select 'CUSTOM' -> 'Use Compute Engine Machine Types'
  2. Master Node
    1. Ex. n1-standard-8
  3. Accelerator
    1. Ex. V100 or T4. K80s are not supported.
3. Select 'Done', and launch your training job.

Training via gcloud job submission

Update your training configuration based on 'example_config.json'

 {
     "trainingInput": {
         "args": [
             "--train",
             "--do-hpo",
             "--cloud-type=GCP",
             "--data-input-path=gs://[YOUR STORAGE BUCKET]",
             "--data-output-path=gs://[YOUR STORAGE BUCKET]/training_output",
             "--data-name=airline_20000000.orc"
         ],
         "hyperparameters": {
             "enableTrialEarlyStopping": true,
             "goal": "MAXIMIZE",
             "hyperparameterMetricTag": "hpo_accuracy",
             "maxParallelTrials": 1,
             "maxTrials": 2,
             "params": [
                 {
                     "maxValue": 200,
                     "minValue": 100,
                     "parameterName": "hpo-num-est",
                     "type": "INTEGER"
                 },
                 {
                     "maxValue": 17,
                     "minValue": 9,
                     "parameterName": "hpo-max-depth",
                     "type": "INTEGER"
                 },
                 {
                     "maxValue": 0.6,
                     "minValue": 0.2,
                     "parameterName": "hpo-max-features",
                     "type": "DOUBLE"
                 }
             ]
         },
         "jobDir": "gs://[YOUR PROJECT NAME]/training_output",
         "masterConfig": {
             "imageUri": "gcr.io/[YOUR PROJECT NAME]/rapids_training_container:latest",
             "acceleratorConfig": {
                 "count": "1",
                 "type": "NVIDIA_TESLA_T4"
             }
         },
         "masterType": "n1-standard-8",
         "region": "us-west1",
         "scaleTier": "CUSTOM"
     }
 }

For more information, see:
1. https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training

Run your training job
1. $ gcloud ai-platform jobs submit training [YOUR_JOB_NAME] --config ./example_config.json
Monitor your training job
1. $ gcloud ai-platform jobs stream-logs [YOUR_JOB_NAME]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

This guide is deprecated an no longer maintained.

Quick start guide

Create a Notebook using the RAPIDS environemnt

Install RAPIDS on a pre-made Notebook

Deploy a custom RAPIDS training container utilizing the 'airline dataset', and initiate a training job with support for HyperParameter Optimization (HPO)

Files

README.md

Latest commit

History

README.md

File metadata and controls

This guide is deprecated an no longer maintained.

Quick start guide

Create a Notebook using the RAPIDS environemnt

Install RAPIDS on a pre-made Notebook

Deploy a custom RAPIDS training container utilizing the 'airline dataset', and initiate a training job with support for HyperParameter Optimization (HPO)