You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using tools like Weights & Bias or other loggers, it is often very useful to track the parallelism size(TP,DP,PP) used in distributed pre training.
Currently, users must manually write something like:
wandb.config.update({
"tp": accelerator.parallelism_config.tp_size,
...
all other parallelism info
...
})
This can become repetitive and error-prone, especially as more parallelism modes are added or used in combination.
I’d like to propose parallelism_info method, which returns a dictionary containing all configured parallelism
sizes in a unified way. If this method is accepted in Accelerate, I’d be happy to do another PR to the Transformers library to enable automatic logging of parallelism information when using Accelerate and W&B together.