Skip to content

Add node selector to KubernetesScheduler run opts #1067

Open
@JackWittmayer

Description

@JackWittmayer

Description

Allow Torchx KubernetesScheduler users to specify a node selector that their volcano jobs would schedule pods to.

Motivation/Background

Currently, users can only specify which machines they'd like to run on based on resources or the node.kubernetes.io/instance-type label. Having a node selector would allow them to submit jobs to specific machines in any way they want, which enables use cases like testing isolated machines, running consecutive jobs on the same machine for comparison, and segmenting the k8s cluster by label.

Detailed Proposal

Add node_selector as a run-opt to the KubernetesScheduler run_opts, KubernetesOpts and other entry points. Add user-specified node_selector to role_to_pod method.

Alternatives

Extend the resource.capabilities feature to include other labels. This solution is less desirable because hard-coded label names will always be limiting.

Additional context/links

Code linked above.
Documentation: https://docs.pytorch.org/torchx/main/schedulers/kubernetes.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions