Skip to content

Commit 68d8589

Browse files
author
Robert Bailey
committed
Set the default GKE cluster type for ray to GKE Autopilot. (#618)
Also add instructions to use a standard cluster if preferred.
1 parent b4203e5 commit 68d8589

File tree

3 files changed

+56
-15
lines changed

3 files changed

+56
-15
lines changed

applications/ray/README.md

+47-11
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,58 @@
33
This repository contains a Terraform template for running [Ray](https://www.ray.io/) on Google Kubernetes Engine.
44
See the [Ray on GKE](/ray-on-gke/) directory to see additional guides and references.
55

6+
## Prerequisites
7+
8+
1. GCP Project with following APIs enabled
9+
- container.googleapis.com
10+
- iap.googleapis.com (required when using authentication with Identity Aware Proxy)
11+
12+
2. A functional GKE cluster.
13+
- To create a new standard or autopilot cluster, follow the instructions in [`infrastructure/README.md`](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/infrastructure/README.md)
14+
- Alternatively, you can set the `create_cluster` variable to true in `workloads.tfvars` to provision a new GKE cluster. This will default to creating a GKE Autopilot cluster; if you want to provision a standard cluster you must also set `autopilot_cluster` to false.
15+
16+
3. This module is configured to optionally use Identity Aware Proxy (IAP) to protect access to the Ray dashboard. It expects the brand & the OAuth consent configured in your org. You can check the details here: [OAuth consent screen](https://console.cloud.google.com/apis/credentials/consent)
17+
18+
4. Preinstall the following on your computer:
19+
* Terraform
20+
* Gcloud CLI
21+
622
## Installation
723

8-
Preinstall the following on your computer:
9-
* Terraform
10-
* Gcloud
24+
### Configure Inputs
1125

12-
> **_NOTE:_** Terraform keeps state metadata in a local file called `terraform.tfstate`. Deleting the file may cause some resources to not be cleaned up correctly even if you delete the cluster. We suggest using `terraform destory` before reapplying/reinstalling.
26+
1. If needed, clone the repo
27+
```
28+
git clone https://github.com/GoogleCloudPlatform/ai-on-gke
29+
cd ai-on-gke/applications/ray
30+
```
1331

14-
1. If needed, git clone https://github.com/GoogleCloudPlatform/ai-on-gke
32+
2. Edit `workloads.tfvars` with your GCP settings.
33+
34+
**Important Note:**
35+
If using this with the Jupyter module (`applications/jupyter/`), it is recommended to use the same k8s namespace
36+
for both i.e. set this to the same namespace as `applications/jupyter/workloads.tfvars`.
37+
38+
| Variable | Description | Required |
39+
|-----------------------------|----------------------------------------------------------------------------------------------------------------|:--------:|
40+
| project_id | GCP Project Id | Yes |
41+
| cluster_name | GKE Cluster Name | Yes |
42+
| cluster_location | GCP Region | Yes |
43+
| kubernetes_namespace | The namespace that Ray and rest of the other resources will be installed in. | Yes |
44+
| gcs_bucket | GCS bucket to be used for Ray storage | Yes |
45+
| create_service_account | Create service accounts used for Workload Identity mapping | Yes |
46+
47+
48+
### Install
49+
50+
> **_NOTE:_** Terraform keeps state metadata in a local file called `terraform.tfstate`. Deleting the file may cause some resources to not be cleaned up correctly even if you delete the cluster. We suggest using `terraform destory` before reapplying/reinstalling.
1551
16-
2. `cd applications/ray`
52+
3. Ensure your gcloud application default credentials are in place.
53+
```
54+
gcloud auth application-default login
55+
```
1756

18-
3. Find the name and location of the GKE cluster you want to use.
19-
Run `gcloud container clusters list --project=<your GCP project>` to see all the available clusters.
20-
_Note: If you created the GKE cluster via the infrastructure repo, you can get the cluster info from `platform.tfvars`_
57+
4. Run `terraform init`
2158

22-
4. Edit `workloads.tfvars` with your environment specific variables and configurations.
59+
5. Run `terraform apply --var-file=./workloads.tfvars`.
2360

24-
5. Run `terraform init && terraform apply --var-file workloads.tfvars`

applications/ray/variables.tf

+2-2
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ variable "ray_version" {
3939
variable "kubernetes_namespace" {
4040
type = string
4141
description = "Kubernetes namespace where resources are deployed"
42-
default = "myray"
42+
default = "ml"
4343
}
4444

4545
variable "enable_grafana_on_ray_dashboard" {
@@ -105,7 +105,7 @@ variable "private_cluster" {
105105

106106
variable "autopilot_cluster" {
107107
type = bool
108-
default = false
108+
default = true
109109
}
110110

111111
variable "cpu_pools" {

applications/ray/workloads.tfvars

+7-2
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,16 @@
1717
## Need to pull this variables from tf output from previous platform stage
1818
project_id = "<your project ID>"
1919

20-
## this is required for terraform to connect to GKE master and deploy workloads
21-
create_cluster = false # this flag will create a new standard public gke cluster in default network
20+
## This is required for terraform to connect to GKE cluster and deploy workloads.
2221
cluster_name = "<cluster name>"
2322
cluster_location = "us-central1"
2423

24+
## If terraform should create a new GKE cluster, fill in this section as well.
25+
## By default, a public autopilot GKE cluster will be created in the default network.
26+
## Set the autopilot_cluster variable to false to create a standard cluster instead.
27+
create_cluster = false
28+
autopilot_cluster = true
29+
2530
#######################################################
2631
#### APPLICATIONS
2732
#######################################################

0 commit comments

Comments
 (0)