Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skypilot dws kueue #942

Merged

Conversation

volatilemolotov
Copy link
Contributor

This PR adds a guide for Skypilot running on GKE with Dynamic Workload Scheduling and Kueue

Copy link

google-cla bot commented Jan 23, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@volatilemolotov
Copy link
Contributor Author

@xiaotongyang-gke

@xiaotongyang-gke xiaotongyang-gke self-assigned this Jan 23, 2025
Copy link
Collaborator

@xiaotongyang-gke xiaotongyang-gke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the comments

@xiaotongyang-gke
Copy link
Collaborator

/gcbrun

4 similar comments
@xiaotongyang-gke
Copy link
Collaborator

/gcbrun

@xiaotongyang-gke
Copy link
Collaborator

/gcbrun

@vicentefb
Copy link
Collaborator

/gcbrun

@vicentefb
Copy link
Collaborator

/gcbrun

@xiaotongyang-gke
Copy link
Collaborator

/gcbrun

3 similar comments
@xiaotongyang-gke
Copy link
Collaborator

/gcbrun

@vicentefb
Copy link
Collaborator

/gcbrun

@vicentefb
Copy link
Collaborator

/gcbrun

@xiaotongyang-gke xiaotongyang-gke merged commit 540caad into GoogleCloudPlatform:main Feb 7, 2025
7 checks passed
ArthurKamalov added a commit to volatilemolotov/ai-on-gke that referenced this pull request Feb 18, 2025
* SkyPilot with DWS and Kueue tutorial

* typo

* fix topics

* fix backend placeholder

* add missing terrafrom outputs

* update with files needed for finetune and serve

* add mount for text classification files for train task

* change example environment structure

* move serve to l4

* added clarification on experimental section for skypilot task/service definition files

* readme update

* Minor README fixes

* Update README.md

* Update README.md

* minor updates

* minor updates

* newlines added

* added kueue quota warning

* minor whitespace

* minor whitespace

* main.tf fmt issues

* terraform fmt newlines

* terraform fmt newlines

* Update tutorials-and-examples/skypilot/dws-and-kueue/example_environment.tfvars

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* remove repeated steps

---------

Co-authored-by: ArthurKamalov <[email protected]>
Co-authored-by: Vasilii Polikarpov <[email protected]>
ArthurKamalov added a commit to volatilemolotov/ai-on-gke that referenced this pull request Feb 18, 2025
* SkyPilot with DWS and Kueue tutorial

* typo

* fix topics

* fix backend placeholder

* add missing terrafrom outputs

* update with files needed for finetune and serve

* add mount for text classification files for train task

* change example environment structure

* move serve to l4

* added clarification on experimental section for skypilot task/service definition files

* readme update

* Minor README fixes

* Update README.md

* Update README.md

* minor updates

* minor updates

* newlines added

* added kueue quota warning

* minor whitespace

* minor whitespace

* main.tf fmt issues

* terraform fmt newlines

* terraform fmt newlines

* Update tutorials-and-examples/skypilot/dws-and-kueue/example_environment.tfvars

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* remove repeated steps

---------

Co-authored-by: ArthurKamalov <[email protected]>
Co-authored-by: Vasilii Polikarpov <[email protected]>
ArthurKamalov added a commit to volatilemolotov/ai-on-gke that referenced this pull request Feb 18, 2025
* SkyPilot with DWS and Kueue tutorial

* typo

* fix topics

* fix backend placeholder

* add missing terrafrom outputs

* update with files needed for finetune and serve

* add mount for text classification files for train task

* change example environment structure

* move serve to l4

* added clarification on experimental section for skypilot task/service definition files

* readme update

* Minor README fixes

* Update README.md

* Update README.md

* minor updates

* minor updates

* newlines added

* added kueue quota warning

* minor whitespace

* minor whitespace

* main.tf fmt issues

* terraform fmt newlines

* terraform fmt newlines

* Update tutorials-and-examples/skypilot/dws-and-kueue/example_environment.tfvars

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* remove repeated steps

---------

Co-authored-by: ArthurKamalov <[email protected]>
Co-authored-by: Vasilii Polikarpov <[email protected]>
ArthurKamalov added a commit to volatilemolotov/ai-on-gke that referenced this pull request Feb 18, 2025
* SkyPilot with DWS and Kueue tutorial

* typo

* fix topics

* fix backend placeholder

* add missing terrafrom outputs

* update with files needed for finetune and serve

* add mount for text classification files for train task

* change example environment structure

* move serve to l4

* added clarification on experimental section for skypilot task/service definition files

* readme update

* Minor README fixes

* Update README.md

* Update README.md

* minor updates

* minor updates

* newlines added

* added kueue quota warning

* minor whitespace

* minor whitespace

* main.tf fmt issues

* terraform fmt newlines

* terraform fmt newlines

* Update tutorials-and-examples/skypilot/dws-and-kueue/example_environment.tfvars

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* Update tutorials-and-examples/skypilot/dws-and-kueue/README.md

Co-authored-by: Vasilii Polikarpov <[email protected]>

* remove repeated steps

---------

Co-authored-by: ArthurKamalov <[email protected]>
Co-authored-by: Vasilii Polikarpov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants