-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix W&B callback for distributed training #5223
Conversation
if self._watch_model: | ||
self.wandb.watch(self.trainer.model) # type: ignore[union-attr] | ||
self.wandb.watch(self.trainer.model) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this about Item "None" of "Optional[Something]" has no attribute "watch"
? I have been fixing that with assert self.wandb is not None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because MyPy sees it as undefined, not Optional
.
|
||
import wandb | ||
|
||
self.wandb = wandb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no wandb
object? wandb
is always global? What if two systems want to use it at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing wandb
may have side effects since at some point it spawns its own background worker(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is some unfortunate API design.
|
||
self.wandb = wandb | ||
self.wandb.init( | ||
self._wandb_kwargs: Dict[str, Any] = dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with this, just dict(...)
is kind of an unusual way of writing {...}
.
* fix wandb callback for distributed training * fix * close out Co-authored-by: Dirk Groeneveld <[email protected]>
This was totally broken for distributed training 🤦♂️