Open
Description
Hi, I wonder how to write the code for using the deepspeed zero-3-offload strategy correctly. Currently, my code looks like:
from lightning.fabric.strategies import DeepSpeedStrategy
deep_speed = DeepSpeedStrategy(
stage=3,
offload_optimizer=True,
offload_parameters=True,
)
fabric = L.Fabric(accelerator="gpu", devices=num_devices,strategy=deep_speed)
However, it seems the parameters are duplicated for all gpu. I attached the screenshot to show the GPU utilization after model, optimizer = fabric.setup(model, optimizer)
:
According to my understanding, the parameters should be distributed on different devices, right?
Metadata
Metadata
Assignees
Labels
No labels