simplify docs part ii (#190)

sayakpaul · a-r-r-o-w · web-flow · commit 7b569daf55e7 · 2025-01-07T12:47:01.000+05:30
* simplify docs.

* clarify the cog checkpoints supported.

* make the note smaller.

* replace with sub

* Apply suggestions from code review

Co-authored-by: Aryan &lt;aryan@huggingface.co&gt;

---------

Co-authored-by: Aryan &lt;aryan@huggingface.co&gt;
diff --git a/README.md b/README.md
@@ -135,14 +135,16 @@ For inference, refer [here](./docs/training/ltx_video.md#inference). For docs re
 
 <div align="center">
 
-| Model Name | Tasks                | Ckpts Tested                                                                                                                                         | Min. GPU<br>VRAM | Comments                                                     |
-|:------------:|:---------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------:|:--------------------------------------------------------------:|
-| [LTX-Video](https://huggingface.co/docs/diffusers/main/api/pipelines/ltx_video) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | [Lightricks/LTX-Video](https://huggingface.co/Lightricks/LTX-Video)                                                                                 | 11 GB                | Fast to train                                                 |
-| [HunyuanVideo](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | [tencent/HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)                                                                                 | 42 GB                 | -                                                            |
-| [CogVideoX](https://huggingface.co/docs/diffusers/main/api/pipelines/cogvideox) | <ul><li>T2V ✅</li><li> I2V ❌</li></ul> | <ul><li>[THUDM/CogVideoX1.5-5B](https://huggingface.co/THUDM/CogVideoX1.5-5B)</li><li>[THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b)</li><li>[THUDM/CogVideoX-2b](https://huggingface.co/THUDM/CogVideoX-2b)</li> | - GB                 | Training with multi-bucket, multi-resolution frames is supported. |
+| **Model Name** | **Tasks** | **Min. GPU VRAM** |
+|:---:|:---:|:---:|
+| [LTX-Video](./docs/training/ltx_video.md) | Text-to-Video | 11 GB |
+| [HunyuanVideo](./docs/training/hunyuan_video.md) | Text-to-Video | 42 GB |
+| [CogVideoX](./docs/training/cogvideox.md) | Text-to-Video | 12GB<sup>*</sup> |
 
 </div>
 
+<sub><sup>*</sup>Noted for the 5B variant.</sub>
+
 Note that the memory consumption in the table is reported with most of the options, discussed in [docs/training/optimizations](./docs/training/optimization.md), enabled.
 
 If you would like to use a custom dataset, refer to the dataset preparation guide [here](./docs/dataset/README.md).
diff --git a/docs/training/cogvideox.md b/docs/training/cogvideox.md
@@ -109,6 +109,14 @@ Training configuration: {
 | after validation end          | 11.145            | 28.324              |
 | after training end            | 11.144            | 11.592              |
 
+## Supported checkpoints
+
+CogVideoX has multiple checkpoints as one can note [here](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce). The following checkpoints were tested with `finetrainers` and are known to be working:
+
+* [THUDM/CogVideoX-2b](https://huggingface.co/THUDM/CogVideoX-2b)
+* [THUDM/CogVideoX-5B](https://huggingface.co/THUDM/CogVideoX-5B)
+* [THUDM/CogVideoX1.5-5B](https://huggingface.co/THUDM/CogVideoX1.5-5B)
+
 ## Inference
 
 Assuming your LoRA is saved and pushed to the HF Hub, and named `my-awesome-name/my-awesome-lora`, we can now use the finetuned model for inference:
@@ -128,7 +136,8 @@ video = pipe("<my-awesome-prompt>").frames[0]
 export_to_video(video, "output.mp4")
 ```
 
-You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
+You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
 
+* [CogVideoX in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox)
 * [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
 * [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)
diff --git a/docs/training/hunyuan_video.md b/docs/training/hunyuan_video.md
@@ -171,7 +171,8 @@ output = pipe(
 export_to_video(output, "output.mp4", fps=15)
 ```
 
-You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
+You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
 
+* [Hunyuan-Video in Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video)
 * [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
 * [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)
diff --git a/docs/training/ltx_video.md b/docs/training/ltx_video.md
@@ -159,7 +159,8 @@ video = pipe("<my-awesome-prompt>").frames[0]
 export_to_video(video, "output.mp4", fps=8)
 ```
 
-You can refer to the following guides to know more about performing LoRA inference in `diffusers`:
+You can refer to the following guides to know more about the model pipeline and performing LoRA inference in `diffusers`:
 
+* [LTX-Video in Diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video)
 * [Load LoRAs for inference](https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference)
 * [Merge LoRAs](https://huggingface.co/docs/diffusers/main/en/using-diffusers/merge_loras)