Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Log batch metrics #5362

Merged
merged 9 commits into from
Aug 19, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
model's `__init__()` method by wrapping them with `self.ddp_accelerator.wrap_module()`. See the `allennlp.modules.transformer.t5`
for an example.
- Added Tango components, to be explored in detail in a later post.
- We now log batch metrics to tensorboard and wandb.

### Fixed

Expand All @@ -34,15 +35,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `TransformerTextField` can now take tensors of shape `(1, n)` like the tensors produced from a HuggingFace tokenizer.
- `tqdm` lock is now set inside `MultiProcessDataLoading` when new workers are spawned to avoid contention when writing output.
- `ConfigurationError` is now pickleable.
- Multitask models now support `TextFieldTensor` in heads, not just in the backbone
- Multitask models now support `TextFieldTensor` in heads, not just in the backbone.

### Changed

- The type of the `grad_norm` parameter of `GradientDescentTrainer` is now `Union[float, bool]`,
with a default value of `False`. `False` means gradients are not rescaled and the gradient
norm is never even calculated. `True` means the gradients are still not rescaled but the gradient
norm is calculated and passed on to callbacks. A `float` value means gradients are rescaled.
- `TensorCache` now supports more concurrent readers and writers.
- `TensorCache` now supports more concurrent readers and writers.
- We no longer log parameter statistics to tensorboard or wandb by default.


## [v2.6.0](https://github.com/allenai/allennlp/releases/tag/v2.6.0) - 2021-07-19
Expand Down
11 changes: 9 additions & 2 deletions allennlp/training/callbacks/log_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def log_batch(

# Now collect per-batch metrics to log.
metrics_to_log: Dict[str, float] = {}
for key in ("batch_loss", "batch_reg_loss"):
for key in ["batch_loss", "batch_reg_loss"]:
if key not in metrics:
continue
value = metrics[key]
Expand All @@ -241,6 +241,13 @@ def log_batch(
self._batch_loss_moving_items[key]
)

for key, value in metrics.items():
if key.startswith("batch_"):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we don't log any metrics in metrics that start with batch_? Is that really what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid creating "batch_batch_loss" and "batch_batch_reg_loss", and I generalized a little. It doesn't make a ton of sense to define a metric called batch_*, because all metrics that come from the model are also calculated per epoch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size is one exception I can think of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already handle batch_loss and batch_reg_loss separately, how we just explicitly ignore those here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll do that as soon ad I can get GitHub to talk to me again :-/

key = "batch_" + key
if key not in metrics_to_log:
metrics_to_log[key] = value

self.log_scalars(
metrics_to_log,
log_prefix="train",
Expand All @@ -253,7 +260,7 @@ def log_batch(

if self._batch_size_interval:
# We're assuming here that `log_batch` will get called every batch, and only every
# batch. This is true with our current usage of this code (version 1.0); if that
# batch. This is true with our current usage of this code (version 1.0); if that
# assumption becomes wrong, this code will break.
batch_group_size = sum(get_batch_size(batch) for batch in batch_group) # type: ignore
self._cumulative_batch_group_size += batch_group_size
Expand Down
2 changes: 1 addition & 1 deletion allennlp/training/callbacks/tensorboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def __init__(
summary_interval: int = 100,
distribution_interval: Optional[int] = None,
batch_size_interval: Optional[int] = None,
should_log_parameter_statistics: bool = True,
should_log_parameter_statistics: bool = False,
should_log_learning_rate: bool = False,
) -> None:
super().__init__(
Expand Down