Should I reset train state metrics after each epoch of training? #3284

davidshen84 · 2023-08-18T05:55:10Z

davidshen84
Aug 18, 2023

Hi,

The MNIST example at https://flax.readthedocs.io/en/latest/getting_started.html replaces the state metrics to empty after each training epoch.

for step,batch in enumerate(train_ds.as_numpy_iterator()):

  # Run optimization steps over training batches and compute batch metrics
  state = train_step(state, batch) # get updated train state (which contains the updated parameters)
  state = compute_metrics(state=state, batch=batch) # aggregate batch metrics

  if (step+1) % num_steps_per_epoch == 0: # one training epoch has passed
    for metric,value in state.metrics.compute().items(): # compute metrics
      metrics_history[f'train_{metric}'].append(value) # record metrics
    state = state.replace(metrics=state.metrics.empty()) # reset train_metrics for next training epoch
...

But another MNIST example at https://github.com/google/flax/blob/main/examples/mnist/train.py#L118 does not use this technique.

On the Internet, it seems most people are not using this reset technique. I wonder about the benefit of using it and when I should use it.

Answered by cgarciae

Aug 21, 2023

In practice you probably just want use the per step metrics during training, that is, don't use cumulative metrics for training. Using cumulative metrics for training only makes sense to get a smooth plot on a notebook environment, tools such as tensorboard or wandb let you smooth the metrics later which is much more convenient as you can still see the real data and not miss loss spikes.

View full answer

davidshen84 · 2023-08-21T06:54:00Z

davidshen84
Aug 21, 2023
Author

Here's a comparison of training w/o resetting the metrics.

The models in the two training sessions are the same
Optimizer and initialization procedures are the same
The blue line is the training session without resetting the metrics after each epoch
The orange line is the training session that resets the metrics after each epoch.

So, apparently, resetting the metrics causes some statistical differences. So, should we reset the metrics after each training epoch?

4 replies

cgarciae Aug 21, 2023
Maintainer

In practice you probably just want use the per step metrics during training, that is, don't use cumulative metrics for training. Using cumulative metrics for training only makes sense to get a smooth plot on a notebook environment, tools such as tensorboard or wandb let you smooth the metrics later which is much more convenient as you can still see the real data and not miss loss spikes.

Answer selected by davidshen84

cgarciae Aug 21, 2023
Maintainer

@chiamp I wonder if we should change the Quick Start guide to reflect real world usage?

chiamp Aug 22, 2023
Collaborator

Personally I like resetting the metrics, but I don't feel strongly either way.

davidshen84 Aug 22, 2023
Author

In the guide, the train loop looks like this.

for step,batch in enumerate(train_ds.as_numpy_iterator()):

  # Run optimization steps over training batches and compute batch metrics
  state = train_step(state, batch) # get updated train state (which contains the updated parameters)
  state = compute_metrics(state=state, batch=batch) # aggregate batch metrics

The sample intentionally wants to use the accumulated metrics values. When I read it, I immediately take it as the typical/recommended practice. I think the guide should mention this is not a common practice in a real-world scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should I reset train state metrics after each epoch of training? #3284

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Should I reset train state metrics after each epoch of training? #3284

Uh oh!

davidshen84 Aug 18, 2023

Replies: 1 comment · 4 replies

Uh oh!

davidshen84 Aug 21, 2023 Author

Uh oh!

Uh oh!

cgarciae Aug 21, 2023 Maintainer

Uh oh!

Uh oh!

cgarciae Aug 21, 2023 Maintainer

Uh oh!

chiamp Aug 22, 2023 Collaborator

Uh oh!

davidshen84 Aug 22, 2023 Author

davidshen84
Aug 18, 2023

Replies: 1 comment 4 replies

davidshen84
Aug 21, 2023
Author

cgarciae Aug 21, 2023
Maintainer

cgarciae Aug 21, 2023
Maintainer

chiamp Aug 22, 2023
Collaborator

davidshen84 Aug 22, 2023
Author