Skip to content

KEP-2401: Kubeflow LLM Trainer V2 #2401

@Electronic-Waste

Description

@Electronic-Waste

This is the tracking issue for the Kubeflow LLM Trainer V2, a submodule of Kubeflow Training V2: #2170

We aim to solve:

However, the LLM Trainer V2 design is very complex and needs further discussion. So we decided to open a separate issue tracking it.

Examples & User Documentation

KEP Updates:

Jobset Improvements:

Initial Design (Google Doc): Kubeflow Training V2 LLM Trainer Design

/area runtime
/cc @kubeflow/wg-training-leads @deepanker13 @saileshd1402 @seanlaii @helenxie-bit @astefanutti @varshaprasad96 @franciscojavierarceo @thesuperzapper @rimolive @juliusvonkohout @jbottum @varodrig @Doris-xm @truc0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions