Skip to content

Add SD3 fine-tuning scripts #1966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions examples/stable-diffusion/text_to_image_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,8 +318,8 @@ def main():

# Select stable diffuson pipeline based on input
sdxl_models = ["stable-diffusion-xl", "sdxl"]
sd3_models = ["stable-diffusion-3"]
flux_models = ["FLUX.1"]
sd3_models = ["stable-diffusion-3", "sd3"]
flux_models = ["FLUX.1", "flux"]
sdxl = True if any(model in args.model_name_or_path for model in sdxl_models) else False
sd3 = True if any(model in args.model_name_or_path for model in sd3_models) else False
flux = True if any(model in args.model_name_or_path for model in flux_models) else False
Expand All @@ -329,7 +329,7 @@ def main():
# Set the scheduler
kwargs = {"timestep_spacing": args.timestep_spacing, "rescale_betas_zero_snr": args.use_zero_snr}

if flux or args.scheduler == "flow_match_euler_discrete":
if flux or sd3 or args.scheduler == "flow_match_euler_discrete":
scheduler = GaudiFlowMatchEulerDiscreteScheduler.from_pretrained(
args.model_name_or_path, subfolder="scheduler", **kwargs
)
Expand Down Expand Up @@ -644,7 +644,10 @@ def main():
lines = [line.strip() for line in lines]
args.prompts = lines

# Generate Images using a Stable Diffusion pipeline
# Generate images using auto-selected diffuser pipeline
logger.info(
f"Generating images using pipeline {pipeline.__class__.__name__} with scheduler {pipeline.scheduler.__class__.__name__}"
)
if args.distributed:
with distributed_state.split_between_processes(args.prompts) as prompt:
if args.use_compel:
Expand Down
122 changes: 118 additions & 4 deletions examples/stable-diffusion/training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@ python ../text_to_image_generation.py \

We can use the same `dog` dataset for the following examples.

To launch FLUX.1-dev LoRA training on a single Gaudi card, use:"
To launch FLUX.1-dev LoRA training on a single Gaudi card, use:
```bash
python train_dreambooth_lora_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
Expand All @@ -392,12 +392,13 @@ python train_dreambooth_lora_flux.py \
--gaudi_config_name="Habana/stable-diffusion"
```

> [!NOTE]
> To use DeepSpeed instead of MPI, replace `--use_mpi` with `--use_deepspeed` in the previous example

You can run inference on multiple HPUs by replacing `python train_dreambooth_lora_flux.py`
with `python ../../gaudi_spawn.py --world_size <num-HPUs> train_dreambooth_lora_flux.py`.

> [!NOTE]
> To use MPI for multi-card training, add `--use_mpi` after `--world_size <num-HPUs>`.
> To use DeepSpeed instead of MPI, replace `--use_mpi` with `--use_deepspeed`.

After training completes, you could directly use `text_to_image_generation.py` sample for inference as follows:
```bash
python ../text_to_image_generation.py \
Expand All @@ -406,10 +407,123 @@ python ../text_to_image_generation.py \
--prompts "A picture of a sks dog in a bucket" \
--num_images_per_prompt 5 \
--batch_size 1 \
--num_inference_steps 30 \
--image_save_dir /tmp/flux_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--sdp_on_bf16 \
--bf16
```

### DreamBooth LoRA Fine-Tuning with Stable Diffusion 3 and 3.5 (SD3)

We can use the same `dog` dataset for the following example.

To launch SD3 LoRA training on a single Gaudi card, use:
```bash
python train_dreambooth_lora_sd3.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers" \
--dataset_name="dog" \
--instance_prompt="a photo of sks dog" \
--validation_prompt="a photo of sks dog in a bucket" \
--output_dir="dog_lora_sd3" \
--mixed_precision="bf16" \
--rank=8 \
--resolution=1024 \
--train_batch_size=1 \
--guidance_scale 7 \
--learning_rate=5e-4 \
--max_grad_norm=0.5 \
--report_to="tensorboard" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=1500 \
--validation_epochs=50 \
--save_validation_images \
--use_hpu_graphs_for_inference \
--use_hpu_graphs_for_training \
--gaudi_config_name="Habana/stable-diffusion" \
--sdp_on_bf16 \
--bf16
```

You can run training on multiple HPUs by replacing `python train_text_to_image_sd3.py`
with `python ../../gaudi_spawn.py --world_size <num-HPUs> train_text_to_image_sd3.py`.

> [!NOTE]
> To use MPI for multi-card training, add `--use_mpi` after `--world_size <num-HPUs>`.
> To use DeepSpeed instead of MPI, replace `--use_mpi` with `--use_deepspeed`.

After training completes, you could directly use `text_to_image_generation.py` sample for inference as follows:
```bash
python ../text_to_image_generation.py \
--model_name_or_path "stabilityai/stable-diffusion-3-medium-diffusers" \
--lora_id dog_lora_sd3 \
--prompts "A picture of a sks dog in a bucket" \
--scheduler flow_match_euler_discrete \
--num_images_per_prompt 5 \
--batch_size 1 \
--num_inference_steps 28 \
--image_save_dir /tmp/sd3_lora_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--sdp_on_bf16 \
--bf16
```

## Full Model Fine-Tuning for Stable Diffusion 3 and 3.5 (SD3)

We can use the `dog` dataset for the following example.

To launch SD3 full model training on single gaudi card, use:
```bash
python train_text_to_image_sd3.py \
--pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers" \
--dataset_name="dog" \
--instance_prompt="a photo of sks dog" \
--validation_prompt="a photo of sks dog in a bucket" \
--output_dir="dog_ft_sd3" \
--mixed_precision="bf16" \
--resolution=1024 \
--train_batch_size=1 \
--guidance_scale 7 \
--learning_rate=5e-4 \
--max_grad_norm=1 \
--report_to="tensorboard" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=1500 \
--validation_epochs=50 \
--save_validation_images \
--use_hpu_graphs_for_inference \
--use_hpu_graphs_for_training \
--gaudi_config_name="Habana/stable-diffusion" \
--sdp_on_bf16 \
--bf16
```
You can run training on multiple HPUs by replacing `python train_text_to_image_sd3.py`
with `python ../../gaudi_spawn.py --world_size <num-HPUs> train_text_to_image_sd3.py`.

> [!NOTE]
> To use MPI for multi-card training, add `--use_mpi` after `--world_size <num-HPUs>`.
> To use DeepSpeed instead of MPI, replace `--use_mpi` with `--use_deepspeed`.
> Fine-tuning the full SD3.5-Large model requires multiple HPU cards

After training completes, you could directly use `text_to_image_generation.py` sample for inference as follows:
```bash
python ../text_to_image_generation.py \
--model_name_or_path "dog_ft_sd3" \
--prompts "A picture of a sks dog in a bucket" \
--scheduler flow_match_euler_discrete \
--num_images_per_prompt 5 \
--batch_size 1 \
--num_inference_steps 28 \
--image_save_dir /tmp/sd3_images \
--use_habana \
--use_hpu_graphs \
--gaudi_config Habana/stable-diffusion \
--sdp_on_bf16 \
--bf16
```
Loading