Skip to content

Commit 7b569da

Browse files
sayakpaula-r-r-o-w
andauthored
simplify docs part ii (#190)
* simplify docs. * clarify the cog checkpoints supported. * make the note smaller. * replace with sub * Apply suggestions from code review Co-authored-by: Aryan <[email protected]> --------- Co-authored-by: Aryan <[email protected]>
1 parent 38413aa commit 7b569da

File tree

4 files changed

+21
-8
lines changed

4 files changed

+21
-8
lines changed

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,14 +135,16 @@ For inference, refer [here](./docs/training/ltx_video.md#inference). For docs re
135135

136136
<div align="center">
137137

138-
| Model Name | Tasks | Ckpts Tested | Min. GPU<br>VRAM | Comments |
139-
|:------------:|:---------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------:|:--------------------------------------------------------------:|
140-
| [LTX-Video](https://huggingface.co/docs/diffusers/main/api/pipelines/ltx_video) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | [Lightricks/LTX-Video](https://huggingface.co/Lightricks/LTX-Video) | 11 GB | Fast to train |
141-
| [HunyuanVideo](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | [tencent/HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) | 42 GB | - |
142-
| [CogVideoX](https://huggingface.co/docs/diffusers/main/api/pipelines/cogvideox) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | <ul><li>[THUDM/CogVideoX1.5-5B](https://huggingface.co/THUDM/CogVideoX1.5-5B)</li><li>[THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b)</li><li>[THUDM/CogVideoX-2b](https://huggingface.co/THUDM/CogVideoX-2b)</li> | - GB | Training with multi-bucket, multi-resolution frames is supported. |
138+
| **Model Name** | **Tasks** | **Min. GPU VRAM** |
139+
|:---:|:---:|:---:|
140+
| [LTX-Video](./docs/training/ltx_video.md) | Text-to-Video | 11 GB |
141+
| [HunyuanVideo](./docs/training/hunyuan_video.md) | Text-to-Video | 42 GB |
142+
| [CogVideoX](./docs/training/cogvideox.md) | Text-to-Video | 12GB<sup>*</sup> |
143143

144144
</div>
145145

146+
<sub><sup>*</sup>Noted for the 5B variant.</sub>
147+
146148
Note that the memory consumption in the table is reported with most of the options, discussed in [docs/training/optimizations](./docs/training/optimization.md), enabled.
147149

148150
If you would like to use a custom dataset, refer to the dataset preparation guide [here](./docs/dataset/README.md).

docs/training/cogvideox.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,14 @@ Training configuration: {
109109
| after validation end | 11.145 | 28.324 |
110110
| after training end | 11.144 | 11.592 |
111111

112+
## Supported checkpoints
113+
114+
CogVideoX has multiple checkpoints as one can note [here](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce). The following checkpoints were tested with `finetrainers` and are known to be working:
115+
116+
* [THUDM/CogVideoX-2b](https://huggingface.co/THUDM/CogVideoX-2b)
117+
* [THUDM/CogVideoX-5B](https://huggingface.co/THUDM/CogVideoX-5B)
118+
* [THUDM/CogVideoX1.5-5B](https://huggingface.co/THUDM/CogVideoX1.5-5B)
119+
112120
## Inference
113121

114122
Assuming your LoRA is saved and pushed to the HF Hub, and named `my-awesome-name/my-awesome-lora`, we can now use the finetuned model for inference:
@@ -128,7 +136,8 @@ video = pipe("<my-awesome-prompt>").frames[0]
128136
export_to_video(video, "output.mp4")
129137
```
130138

131-
You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
139+
You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
132140

141+
* [CogVideoX in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox)
133142
* [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
134143
* [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)

docs/training/hunyuan_video.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,8 @@ output = pipe(
171171
export_to_video(output, "output.mp4", fps=15)
172172
```
173173

174-
You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
174+
You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
175175

176+
* [Hunyuan-Video in Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video)
176177
* [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
177178
* [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)

docs/training/ltx_video.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,8 @@ video = pipe("<my-awesome-prompt>").frames[0]
159159
export_to_video(video, "output.mp4", fps=8)
160160
```
161161

162-
You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
162+
You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
163163

164+
* [LTX-Video in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)
164165
* [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
165166
* [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)

0 commit comments

Comments
 (0)