-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add defence for DeepCompile w/o optimizer #7225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When the optimizer is not specified, the optimizer will be type `DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3` (e.g. for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload` doesn't have `parameter_offload`. https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707 ```log File "deepspeed/runtime/engine.py", line 3919, in compile backend = init_z3(self, backend, compile_config, compile_kwargs, schedule) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/compile/init_z3.py", line 36, in init_z3 optimizer.parameter_offload._remove_module_hooks() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload' ``` Signed-off-by: Hollow Man <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HollowMan6 Great catch, we appreciate your contribution!
Signed-off-by: Logan Adams <[email protected]>
Thanks @HollowMan6 - we will prioritize merging this so we can push out a patch release for better DeepCompile support. |
Thanks for the quick review! After this PR, together with #7226, #7224 are merged, it should build (compile) fine for AMD machines as well. But I encountered 2 separate issues when I used DeepCompile together with OpenRLHF. |
Similar to deepspeedai#7211 When the optimizer is not specified, the optimizer will be type `DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3` (e.g. for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload` doesn't have `parameter_offload`. https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707 ```log File "deepspeed/runtime/engine.py", line 3919, in compile backend = init_z3(self, backend, compile_config, compile_kwargs, schedule) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/compile/init_z3.py", line 36, in init_z3 optimizer.parameter_offload._remove_module_hooks() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload' ``` --------- Signed-off-by: Hollow Man <[email protected]> Signed-off-by: Logan Adams <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Logan Adams <[email protected]> Signed-off-by: yisheng <[email protected]>
Similar to deepspeedai#7211 When the optimizer is not specified, the optimizer will be type `DeepSpeedZeRoOffload` instead of `DeepSpeedZeroOptimizer_Stage3` (e.g. for ZeRO-3 pure inference), while `DeepSpeedZeRoOffload` doesn't have `parameter_offload`. https://github.com/deepspeedai/DeepSpeed/blob/56005d2b256eb81a88cba0a1984375f9663a3110/deepspeed/runtime/engine.py#L1684-L1707 ```log File "deepspeed/runtime/engine.py", line 3919, in compile backend = init_z3(self, backend, compile_config, compile_kwargs, schedule) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/compile/init_z3.py", line 36, in init_z3 optimizer.parameter_offload._remove_module_hooks() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'parameter_offload' ``` --------- Signed-off-by: Hollow Man <[email protected]> Signed-off-by: Logan Adams <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Logan Adams <[email protected]> Signed-off-by: Max Kovalenko <[email protected]>
Similar to #7211
When the optimizer is not specified, the optimizer will be type
DeepSpeedZeRoOffload
instead ofDeepSpeedZeroOptimizer_Stage3
(e.g. for ZeRO-3 pure inference), whileDeepSpeedZeRoOffload
doesn't haveparameter_offload
.DeepSpeed/deepspeed/runtime/engine.py
Lines 1684 to 1707 in 56005d2