Lower memory requirements on single GPU #321
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #315 (comment)
Testing only with CogView4. It should have a similar effect on all other models due to same code paths.
On multi-GPU, there should be no change based on the changes made.
On single-GPU using no memory optimization flags (except the must-use defaults like
--gradient_checkpointing
:The peak memory usage is not reduced in this case. It makes sense because during validation, all components will be loaded onto the GPU. If offloading, such as enable_model_cpu_offload is enabled, we can reduce the peak!
On single-GPU, using FP8 layerwise casting + model cpu offloading:
Fixes made:
cc @neph1 Please give it a spin when you get time!