Skip to content

Commit 03d6430

Browse files
Cosmos predict2 model example.
1 parent daf65d4 commit 03d6430

File tree

6 files changed

+599
-1
lines changed

6 files changed

+599
-1
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ Here are some more advanced examples:
7070

7171
[Nvidia Cosmos](cosmos)
7272

73+
[Nvidia Cosmos Predict2](cosmos_predict2)
74+
7375
[Wan](wan)
7476

7577
[Audio Models](audio)

cosmos/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# Nvidia Cosmos Models
1+
# Original Nvidia Cosmos Models
2+
3+
For the newer Cosmos models see [Cosmos Predict2](../cosmos_predict2)
24

35
[Nvidia Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) is a family of "World Models". ComfyUI currently supports specifically the 7B and 14B text to video diffusion models and the 7B and 14B image to video diffusion models.
46

cosmos_predict2/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Nvidia Cosmos Predict2
2+
3+
These are a family of text to image and image to video models from Nvidia.
4+
5+
## Files to Download
6+
7+
You will first need:
8+
9+
#### Text encoder and VAE:
10+
11+
[oldt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/tree/main/text_encoders) goes in: ComfyUI/models/text_encoders/
12+
13+
[wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors) goes in: ComfyUI/models/vae/
14+
15+
16+
Note: oldt5_xxl is not the same as the t5xxl used in flux and other models.
17+
oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1
18+
19+
20+
You can find all the diffusion models (go in ComfyUI/models/diffusion_models/) here: [Repackaged safetensors files](https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/tree/main) or [Official Nvidia Model Files](https://huggingface.co/collections/nvidia/cosmos-predict2-68028efc052239369a0f2959)
21+
22+
23+
## Workflows
24+
25+
### Text to Image
26+
27+
This workflow uses the 2B text to image cosmos predict2 model. The file used in the workflow is [cosmos_predict2_2B_t2i.safetensors](https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/blob/main/cosmos_predict2_2B_t2i.safetensors) this file goes in: ComfyUI/models/diffusion_models/
28+
29+
![Example](cosmos_predict2_2b_t2i_example.png)
30+
31+
You can load this image in [ComfyUI](https://github.com/comfyanonymous/ComfyUI) to get the full workflow.
32+
33+
I think the 2B model is the most interesting one but you can find the bigger 14B model here: [cosmos_predict2_14B_t2i.safetensors](https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/blob/main/cosmos_predict2_14B_t2i.safetensors) and use it in the workflow above.
34+
35+
36+
### Image to Video
37+
38+
These models are pretty picky about the resolution/length of the videos. This workflow is for the 480p models, for the 720p models you will have to set the resolution to 720p or your results might be bad.
39+
40+
This workflow uses the 2B image to video cosmos predict2 model. The file used in the workflow is [cosmos_predict2_2B_video2world_480p_16fps.safetensors](https://huggingface.co/Comfy-Org/Cosmos_Predict2_repackaged/blob/main/cosmos_predict2_2B_video2world_480p_16fps.safetensors) this file goes in: ComfyUI/models/diffusion_models/
41+
42+
![Example](cosmos_predict2_2b_i2v_example.webp)
43+
44+
[Workflow in Json format](cosmos_predict2_2b_i2v_example.json)
45+
46+

0 commit comments

Comments
 (0)