Skip to content

Add requirements for GKE customer on ray-on-gke README #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brandonroyal opened this issue Sep 28, 2023 · 3 comments
Closed

Add requirements for GKE customer on ray-on-gke README #53

brandonroyal opened this issue Sep 28, 2023 · 3 comments

Comments

@brandonroyal
Copy link
Contributor

TL;DR - the ray-on-gke readme should be updated with the requirements needed for setting up the prerequisite GKE clusters.

  • GKE Cluster must have Workload Identity Enabled
  • GKE Cluster must have KubeRay Operator deployed

Here are the details on the errors when these prerequisites are NOT met. These are based on a GKE Standard cluster (1.27.3-gke.100)

module.service_accounts.google_project_iam_binding.monitoring-viewer: Creation complete after 7s [id=ie-raycluster-0f2aa542/roles/monitoring.viewer]
╷
│ Error: unable to build kubernetes objects from release manifest: resource mapping not found for name: "example-cluster-kuberay" namespace: "" from "": no matches for kind "RayCluster" in version "ray.io/v1alpha1"
│ ensure CRDs are installed first
│ 
│   with module.kuberay.helm_release.ray-cluster,
│   on modules/kuberay/kuberay.tf line 15, in resource "helm_release" "ray-cluster":
│   15: resource "helm_release" "ray-cluster" {
│ 
╵

This is a result of the KubeRay operator not being installed on the cluster

The iam-role-binding will also fail. This is the result of Workload Identity not being enabled.

Copy link

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

@github-actions github-actions bot added the stale Pending closure after 60 days unless there is a strong objection. label Mar 11, 2025
@rahuld-turing
Copy link

The documentation mentions kuberay-operator , however I can't find it in the modules folder? Was this removed?

.
├── LICENSE
├── README.md
├── infrastructure
│ ├── README.md
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── platform.tfvars
│ ├── variables.tf
│ └── versions.tf
├── modules
│ ├── gke-autopilot-private-cluster
│ ├── gke-autopilot-public-cluster
│ ├── gke-standard-private-cluster
│ ├── gke-standard-public-cluster
│ ├── jupyter
│ ├── jupyter_iap
│ ├── jupyter_service_accounts
│ ├── kuberay-cluster
│ ├── kuberay-logging
│ ├── kuberay-monitoring
│ ├── kuberay-operator
│ └── kuberay-serviceaccounts
└── tutorial.md

@github-actions github-actions bot removed the stale Pending closure after 60 days unless there is a strong objection. label Apr 1, 2025
@fcabrera23
Copy link
Collaborator

This code was migrated to a new repo: https://github.com/ai-on-gke/quick-start-guides - If the problem persists, please open a issue in the new repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants