Description
Description
Allow Torchx KubernetesScheduler users to specify a node selector that their volcano jobs would schedule pods to.
Motivation/Background
Currently, users can only specify which machines they'd like to run on based on resources or the node.kubernetes.io/instance-type
label. Having a node selector would allow them to submit jobs to specific machines in any way they want, which enables use cases like testing isolated machines, running consecutive jobs on the same machine for comparison, and segmenting the k8s cluster by label.
Detailed Proposal
Add node_selector as a run-opt to the KubernetesScheduler run_opts, KubernetesOpts and other entry points. Add user-specified node_selector to role_to_pod method.
Alternatives
Extend the resource.capabilities feature to include other labels. This solution is less desirable because hard-coded label names will always be limiting.
Additional context/links
Code linked above.
Documentation: https://docs.pytorch.org/torchx/main/schedulers/kubernetes.html