Skip to content

Support Stateful JobSet #572

@tenzen-y

Description

@tenzen-y

What would you like to be added:
I would like to support features to create a single PVC and mount the PV to some replicatedJobs like this:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: volume-sample
spec:
  volumePolicy:
    replicatedJobs:
     - name: A
     - name: B
    volumeClaimTemplates:
    - metadata:
        name: pretrained-model
      spec:
        accessModes: [ "ReadWriteMany" ]
        storageClassName: "my-storage-class"
        resources:
          requests:
            storage: 1000Gi
  replicatedJobs:
  - name: workers
[...]

In this example, JobSet creates a PVC, "pretrained-model" and then the created PV is mounted to replicatedJobs specified in the .spec.volumePolicy.replicatedJobs

This feature is similar to kubernetes/kubernetes#115066

Why is this needed:
In large distributed training, we often store the base model, and then we want to share the pre-trained model with all workers so that we can avoid downloading the pre-trained model many times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions