Explore ShardedDDP reduce_buffer_size setting to config (#177)

prigoyal · facebook-github-bot · commit 94598ba2388b · 2021-02-09T12:38:41.000-08:00
Summary: Pull Request resolved: #177 as title, in VISSL, we need to set the `reduce_buffer_size=0` as there are parameters that are not actually being used and `find_used_parameters` is something not handled by shardedDPP. setting buffer size to 0 will all reduce the gradients immediately instead of bucketing them Reviewed By: min-xu-ai Differential Revision: D26276800 fbshipit-source-id: 4bbe5a6e3a2b36b8a55abb6e120368025356db17
diff --git a/vissl/config/defaults.yaml b/vissl/config/defaults.yaml
@@ -258,6 +258,9 @@ config:
       # how many times the model should be checkpointed. User should tune this parameter
       # and find the number that offers best memory saving and compute tradeoff.
       NUM_ACTIVATION_CHECKPOINTING_SPLITS: 2
+    # setup for Fairscale sharded DDP
+    SHARDED_DDP_SETUP:
+      reduce_buffer_size: -1
     # ----------------------------------------------------------------------------------- #
     # Feature evaluation settings
     # ----------------------------------------------------------------------------------- #