Hunyuan Video LoRA #126

a-r-r-o-w · 2024-12-19T00:05:59Z

Script:

#!/bin/bash

# export TORCH_LOGS="+dynamo,recompiles,graph_breaks"
# export TORCHDYNAMO_VERBOSE=1
export WANDB_MODE="offline"
export NCCL_P2P_DISABLE=1
export TORCH_NCCL_ENABLE_MONITORING=0
export FINETRAINERS_LOG_LEVEL=DEBUG

GPU_IDS="0,1,2,3,4,5,6,7"

DATA_ROOT="/path/to/dataset"
CAPTION_COLUMN="prompts.txt"
VIDEO_COLUMN="videos.txt"
OUTPUT_DIR="/path/to/models/training/hunyuan-video-loras/hunyuan-video_cakify_500_3e-5_constant_with_warmup"

# Model arguments
model_cmd="--model_name hunyuan_video \
  --pretrained_model_name_or_path tencent/HunyuanVideo
  --revision refs/pr/18"

# Dataset arguments
dataset_cmd="--data_root $DATA_ROOT \
  --video_column $VIDEO_COLUMN \
  --caption_column $CAPTION_COLUMN \
  --id_token afkx \
  --video_resolution_buckets 17x512x768 49x512x768 61x512x768 129x512x768 \
  --caption_dropout_p 0.05"

# Dataloader arguments
dataloader_cmd="--dataloader_num_workers 0"

# Diffusion arguments
diffusion_cmd=""

# Training arguments
training_cmd="--training_type lora \
  --seed 42 \
  --mixed_precision bf16 \
  --batch_size 1 \
  --train_steps 500 \
  --rank 128 \
  --lora_alpha 128 \
  --target_modules to_q to_k to_v to_out.0 \
  --gradient_accumulation_steps 1 \
  --gradient_checkpointing \
  --checkpointing_steps 500 \
  --checkpointing_limit 2 \
  --enable_slicing \
  --enable_tiling"

# Optimizer arguments
optimizer_cmd="--optimizer adamw \
  --lr 3e-5 \
  --lr_scheduler constant_with_warmup \
  --lr_warmup_steps 100 \
  --lr_num_cycles 1 \
  --beta1 0.9 \
  --beta2 0.95 \
  --weight_decay 1e-4 \
  --epsilon 1e-8 \
  --max_grad_norm 1.0"

# Validation arguments
validation_cmd="--validation_prompts \"afkx A baker carefully cuts a green bell pepper cake on a white plate against a bright yellow background, followed by a strawberry cake with a similar slice of cake being cut before the interior of the bell pepper cake is revealed with the surrounding cake-to-object sequence.@@@49x512x768:::afkx A cake shaped like a Nutella container is carefully sliced, revealing a light interior, amidst a Nutella-themed setup, showcasing deliberate cutting and preserved details for an appetizing dessert presentation on a white base with accompanying jello and cutlery, highlighting culinary skills and creative cake designs.@@@49x512x768:::afkx A cake shaped like a Nutella container is carefully sliced, revealing a light interior, amidst a Nutella-themed setup, showcasing deliberate cutting and preserved details for an appetizing dessert presentation on a white base with accompanying jello and cutlery, highlighting culinary skills and creative cake designs.@@@61x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@61x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@97x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@129x512x768:::A person with gloved hands carefully cuts a cake shaped like a Skittles bottle, beginning with a precise incision at the lid, followed by careful sequential cuts around the neck, eventually detaching the lid from the body, revealing the chocolate interior of the cake while showcasing the layered design's detail.@@@61x512x768:::afkx A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage@@@61x512x768\" \
  --num_validation_videos 1 \
  --validation_steps 100"

# Miscellaneous arguments
miscellaneous_cmd="--tracker_name finetrainers-hunyuan-video \
  --output_dir $OUTPUT_DIR \
  --nccl_timeout 1800 \
  --report_to wandb"

cmd="accelerate launch --config_file accelerate_configs/uncompiled_8.yaml --gpu_ids $GPU_IDS train.py \
  $model_cmd \
  $dataset_cmd \
  $dataloader_cmd \
  $diffusion_cmd \
  $training_cmd \
  $optimizer_cmd \
  $validation_cmd \
  $miscellaneous_cmd"

echo "Running command: $cmd"
eval $cmd
echo -ne "-------------------- Finished executing script --------------------\n\n"

Slurm:

#!/bin/bash
#SBATCH --job-name=finetrainers
#SBATCH --nodes=1
#SBATCH --qos=normal
#SBATCH --time=24:00:00
#SBATCH --requeue
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:8
#SBATCH --partition=hopper-prod
#SBATCH -o /path/to/logs/logs-%x-%j.out
#SBATCH --exclusive

set -xe

echo "START TIME: $(date)"

# Show some environment variables
echo python3 version = `python3 --version`
echo "NCCL version: $(python -c "import torch;print(torch.cuda.nccl.version())")"
echo "CUDA version: $(python -c "import torch;print(torch.version.cuda)")"

module load cuda/12.1

echo "Starting job"

srun dump_train_hunyuan_video_lora.sh

echo "END TIME: $(date)"

a-r-r-o-w · 2024-12-19T07:13:11Z

@sayakpaul Could you give this a review? Note that I've left some todos for myself in future refactors and we should prioritize getting the trainers out there.

I will need some more time to complete the longer finetuning run I was trying. I accidentally set --validation_steps to 1000 and just realized what an idiot I am. Starting it again to see if it works as expected. Hopefully 2 hours should be enough for me to confirm and merge. Will take that time to also implement condition caching.

a-r-r-o-w · 2024-12-19T07:26:18Z

README.md

+dataloader_cmd="--dataloader_num_workers 0"
+
+# Diffusion arguments
+diffusion_cmd="--flow_resolution_shifting"


Suggested change

diffusion_cmd="--flow_resolution_shifting"

diffusion_cmd=""

Removing because I'm yet to test which is better since we don't know how exactly Hunyuan was trained

sayakpaul

Thanks for getting this in quickly!

sayakpaul · 2024-12-19T08:35:47Z

README.md

-  --video_resolution_buckets 17x512x768 49x512x768 61x512x768 129x512x768 \
+  --video_resolution_buckets 49x512x768 \


Why is this getting changed?

This was incorrect when I merged LTX. I copied values from my multiresolution run but validation prompts and other settings from single resolution run

sayakpaul · 2024-12-19T08:36:15Z

README.md

@@ -71,7 +71,7 @@ training_cmd="--training_type lora \
  --seed 42 \
  --mixed_precision bf16 \
  --batch_size 1 \
-  --train_steps 2000 \


So, train shorter with a smaller LR? 👁️

Higher learning rate seems to make the model worse somehow when doing stylistic training :/ Yet to find the optimal training configuration for LTXV but ~1000-1500 steps seems to be okay

sayakpaul · 2024-12-19T08:36:25Z

README.md

-+ pipe.set_adapters(["ltxv-lora"], [1.0])
+ pipe.set_adapters(["ltxv-lora"], [0.75])


Golden number?

Since I haven't found the optimal training settings yet for the LoRA, using the full power of it at 1.0 leads to slightly worse quality outputs. This seems to strike a nice balance but ideally should be explored by the person who trained

sayakpaul · 2024-12-19T08:37:19Z

README.md

+  --max_grad_norm 1.0"
+
+# Validation arguments
+validation_cmd="--validation_prompts \"afkx A baker carefully cuts a green bell pepper cake on a white plate against a bright yellow background, followed by a strawberry cake with a similar slice of cake being cut before the interior of the bell pepper cake is revealed with the surrounding cake-to-object sequence.@@@49x512x768:::afkx A cake shaped like a Nutella container is carefully sliced, revealing a light interior, amidst a Nutella-themed setup, showcasing deliberate cutting and preserved details for an appetizing dessert presentation on a white base with accompanying jello and cutlery, highlighting culinary skills and creative cake designs.@@@49x512x768:::afkx A cake shaped like a Nutella container is carefully sliced, revealing a light interior, amidst a Nutella-themed setup, showcasing deliberate cutting and preserved details for an appetizing dessert presentation on a white base with accompanying jello and cutlery, highlighting culinary skills and creative cake designs.@@@61x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@61x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@97x512x768:::afkx A vibrant orange cake disguised as a Nike packaging box sits on a dark surface, meticulous in its detail and design, complete with a white swoosh and 'NIKE' logo. A person's hands, holding a knife, hover over the cake, ready to make a precise cut, amidst a simple and clean background.@@@129x512x768:::A person with gloved hands carefully cuts a cake shaped like a Skittles bottle, beginning with a precise incision at the lid, followed by careful sequential cuts around the neck, eventually detaching the lid from the body, revealing the chocolate interior of the cake while showcasing the layered design's detail.@@@61x512x768:::afkx A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage@@@61x512x768\" \


🧠 @@@129x512x768. I kid you not I thought it was something else completely.

sayakpaul · 2024-12-19T08:37:49Z

README.md

+cmd="accelerate launch --config_file accelerate_configs/uncompiled_8.yaml --gpu_ids $GPU_IDS train.py \
+  $model_cmd \
+  $dataset_cmd \
+  $dataloader_cmd \
+  $diffusion_cmd \
+  $training_cmd \
+  $optimizer_cmd \
+  $validation_cmd \
+  $miscellaneous_cmd"


Wow, very neat way to segregate the commands!

README.md

sayakpaul · 2024-12-19T08:38:28Z

finetrainers/args.py

@@ -206,7 +206,9 @@ def validate_args(args: Args):


 def _add_model_arguments(parser: argparse.ArgumentParser) -> None:
-    parser.add_argument("--model_name", type=str, required=True, choices=["ltx_video"], help="Name of model to train.")
+    parser.add_argument(
+        "--model_name", type=str, required=True, choices=["hunyuan_video", "ltx_video"], help="Name of model to train."


We could determine the choices automatically from the config map we have right now. TODO

sayakpaul · 2024-12-19T08:39:51Z

finetrainers/hunyuan_video/hunyuan_video_lora.py

+    return pipe
+
+
+def prepare_conditions(


Should this be decorated with torch.no_grad()?

sayakpaul · 2024-12-19T08:40:22Z

finetrainers/hunyuan_video/hunyuan_video_lora.py

+    if isinstance(prompt, str):
+        prompt = [prompt]
+
+    conditions = {}
+    conditions.update(
+        _get_llama_prompt_embeds(tokenizer, text_encoder, prompt, prompt_template, device, dtype, max_sequence_length)
+    )
+    conditions.update(_get_clip_prompt_embeds(tokenizer_2, text_encoder_2, prompt, device, dtype))
+
+    guidance = torch.tensor([guidance], device=device, dtype=dtype) * 1000.0
+    conditions["guidance"] = guidance
+
+    return conditions


Wonder if it's possible to leverage the encode_prompt() from the pipeline itself. TODO

I think it's better to keep custom implementations here per model because it is clean to understand and debug without jumping to diffusers codebase. Also, our pipelines contain some additional things and checks at times - let's revisit this idea maybe later

README.md

Co-authored-by: Sayak Paul <[email protected]>

a-r-r-o-w · 2024-12-20T02:33:39Z

All yours @sayakpaul for the initial designing and refactors 🪄

I'm still trying to figure out how best to implement precomputation because the current approach just loads all the models and is not really ideal. I will have a refactor out in a few hours

a-r-r-o-w added 3 commits December 19, 2024 00:05

add hunyuan-video lora support

0b81cc9

minor fixes; make style

a478ba6

update readme

4012024

a-r-r-o-w requested a review from sayakpaul December 19, 2024 07:08

a-r-r-o-w marked this pull request as ready for review December 19, 2024 07:08

update

4662402

a-r-r-o-w commented Dec 19, 2024

View reviewed changes

update

1597c74

sayakpaul approved these changes Dec 19, 2024

View reviewed changes

a-r-r-o-w commented Dec 19, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

a-r-r-o-w and others added 3 commits December 19, 2024 15:40

Update README.md

c0a4d1a

Update README.md

4333d1e

Co-authored-by: Sayak Paul <[email protected]>

update

eb9bc83

a-r-r-o-w mentioned this pull request Dec 19, 2024

[LoRA] Support HunyuanVideo huggingface/diffusers#10254

Merged

3 tasks

a-r-r-o-w added 3 commits December 20, 2024 03:25

update

89955d3

change move train scripts to internal directory

bede069

update

81023a1

a-r-r-o-w merged commit 223add1 into main Dec 20, 2024

a-r-r-o-w deleted the hunyuan-video-lora branch December 20, 2024 02:32

a-r-r-o-w mentioned this pull request Dec 21, 2024

Add HunyuanVideo huggingface/diffusers#10106

Closed

		--video_resolution_buckets 17x512x768 49x512x768 61x512x768 129x512x768 \
		--video_resolution_buckets 49x512x768 \

		+ pipe.set_adapters(["ltxv-lora"], [1.0])
		+ pipe.set_adapters(["ltxv-lora"], [0.75])

Hunyuan Video LoRA #126

Hunyuan Video LoRA #126

Uh oh!

Conversation

a-r-r-o-w commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Dec 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a-r-r-o-w commented Dec 20, 2024

Uh oh!

Uh oh!

a-r-r-o-w commented Dec 19, 2024 •

edited

Loading