-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow any GPU resource types support #2973
Conversation
Thanks for making a pull request to Elyra! To try out this branch on binder, follow this link: |
elyra/kfp/operator.py
Outdated
gpu_vendor = self.pipeline_envs.get("GPU_VENDOR", "nvidia") | ||
self.container.set_gpu_limit(gpu=str(gpu_limit), vendor=gpu_vendor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although gpu_vendor
was not a member variable previously, we should set the member. Also, for consistency, we should reference gpu_limit
via self
.
gpu_vendor = self.pipeline_envs.get("GPU_VENDOR", "nvidia") | |
self.container.set_gpu_limit(gpu=str(gpu_limit), vendor=gpu_vendor) | |
self.gpu_vendor = self.pipeline_envs.get("GPU_VENDOR", "nvidia") | |
self.container.set_gpu_limit(gpu=str(self.gpu_limit), vendor=self.gpu_vendor) |
elyra/kfp/operator.py
Outdated
if not self.gpu_memory_vendor: | ||
raise ValueError("gpu_memory_vendor not seted while gpu_memory is specified > 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
if not self.gpu_memory_vendor: | |
raise ValueError("gpu_memory_vendor not seted while gpu_memory is specified > 0") | |
if not self.gpu_memory_vendor: | |
raise ValueError("gpu_memory_vendor not set while gpu_memory is specified") |
Question:
This PR is making gpu_memory_vendor
a required value. Is there a reasonably common default (like "nvidia" is for gpu_vendor
) that we can fall back to? Are the same vendor names used for gpu_limit
and gpu_memory
? I.e., can we use env GPU_VENDOR
here as well if not provided by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"nvidia" I would consider to be the default due to the market share they hold. Not sure about the vendor spec when it comes to gpu limit and memory, I always assumed these would be agnostic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
This PR is making gpu_memory_vendor a required value. Is there a reasonably common default (like "nvidia" is for gpu_vendor) that we can fall back to? Are the same vendor names used for gpu_limit and gpu_memory? I.e., can we use env GPU_VENDOR here as well if not provided by the user?
Thanks for reviewing!
Actually, gpu_vendor
, gpu_memory
and gpu_memory_vendor
are ment to be optional. I'm looking for a way to make "nvidia" as default, as it is before, and make gpu_vendor
, gpu_memory
and gpu_memory_vendor
to be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I've tested we can leave gpu_vendor
, gpu_memory
and gpu_memory_vendor
blank, set gpu
to 1, we'll have a step with default gpu resource setup like nvidia.com/gpu: 1
.
Please make sure that Kubeflow Pipelines actually accepts custom values. Elyra uses |
Thanks for reminding, I'm using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @typhoonzero, thanks for the contribution.
Things looking good, looking for more opinions from other members. For me, im wondering if there is a way to put these new parameters under a "click to show" type of button since these fields are starting to take up a lot of real estate.
"minimum": 0 | ||
}, | ||
"gpu_memory_vendor": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the gpu_memory_vendor
param exclusive to virtualized GPU resources? If so, can we change the name to something like vgpu.....
…source_support_2
Moved to #3029 |
What changes were proposed in this pull request?
The K8s cluster used for GPU training may have other GPU resource types other than

nvidia.com/gpu
, there are other vendors like AMD or opensource vGPU implementations, like https://github.com/4paradigm/k8s-vgpu-scheduler, https://github.com/tkestack/gpu-manager etc. I propose to add support for setup user inputed resource types.How was this pull request tested?
Tests added at
test_pipeline_constructor.py
TODO:
Docs should updated in another PR.