set transformer to evaluation mode #5073

ArjunSubramonian · 2021-03-29T23:31:49Z

Fixes #4895 .

Changes proposed in this pull request:

If train_parameters in PretrainedTransformerEmbedder is False, the transformer's dropout and batch normalization layers are now set to evaluation mode.

epwalsh

Nice job! I just have a couple comments.

epwalsh · 2021-03-29T23:43:01Z

CHANGELOG.md

@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## Unreleased

+### Changed
+- If `train_parameters` in PretrainedTransformerEmbedder is `False`, the transformer's dropout and batch normalization layers are now set to evaluation mode.


We already have a ### Changed section on line 13 below, you can just put this bullet point there.

epwalsh · 2021-03-29T23:44:04Z

allennlp/modules/token_embedders/pretrained_transformer_embedder.py

            for param in self.transformer_model.parameters():
                param.requires_grad = False


This might not be necessary anymore

Got it. This Stack Overflow post says that it's still best practice to turn off gradients computation. Thoughts?

Reading that, sounds like we may want to use with torch.no_grad() in the forward pass?

I'm also curious what happens after this class is initialized as part of a submodule for a Model, when Model.train() is called. Will that revert the .eval() call within this submodule?

Might be worth having a test for that

Nice catch! I fixed this issue, albeit in a hacky way (https://stackoverflow.com/questions/61980943/how-can-i-keep-a-pytorch-submodule-in-eval-mode, https://stackoverflow.com/questions/394770/override-a-method-at-instance-level). Let me know what you think. I also added a test case to ensure that PretrainedTrransformerEmbedder's transformer model remains in eval mode even when module that instantiates it is switched to training mode.

Using with torch.no_grad() temporarily sets require_grad=False. Using it conditionally is pretty cumbersome because it would require setting up a ContextManager wrapper (https://stackoverflow.com/questions/22226708/can-a-with-statement-be-used-conditionally). I think this is why the original code explicitly sets require_grad=False for all parameters. I also don't see any other conditional uses of with torch.no_grad(). That's why I'm against it. @AkshitaB do you have an opinion on this?

One thing you could do is just rename the existing .forward() method to ._forward(), and then implement a new .foward() like:

def forward(self, ...): if self.train_parameters: return self._forward(...) else: with torch.no_grad(): return self._foward(...)

... that said, I just realized there are other places in our code-base that would expect requires_grad to be False when this module is initialized, if we're not training it. So yea, I think we do need to keep these lines.

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

…former model remains in eval mode even when module that instantiates it is switched to training mode

tests/modules/token_embedders/pretrained_transformer_embedder_test.py

epwalsh · 2021-03-30T21:32:19Z

allennlp/modules/token_embedders/pretrained_transformer_embedder.py

+            # Override train in transformer_model to prevent it from changing modes
+            def _train(self, mode):
+                return self
+
+            setattr(
+                self.transformer_model, "train", types.MethodType(_train, self.transformer_model)
+            )


This makes me a bit nervous. Another option would be to add something like this to the .forward() method below (after line 182):

if self.train_parameters and self.training and self.transformer_model.training: self.transformer_model.eval()

I like this approach much better, thanks for the suggestion :)

One issue with this though is that inspecting .training will not alway produce the expected result.

epwalsh · 2021-03-30T21:32:50Z

allennlp/modules/token_embedders/pretrained_transformer_embedder.py

+            # Calling transformer_model.eval() won't change anything now,
+            # so we have to explicitly set training = False


Hmm why doesn't .eval() work here?

Because .eval() just calls .train(False)

Calling .eval() or .train(False) will recursively call the same on all submodules, and also set training = False for all submodules (https://github.com/pytorch/pytorch/blob/d490e0120f32dcbb8b23e11eebd638b96b4b0898/torch/nn/modules/module.py#L1594). Isn't that what we want?

Yes, you're right, that is what we want. I just updated the code to be more robust.

…test.py Co-authored-by: Evan Pete Walsh <[email protected]>

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

…mode

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

…mode

epwalsh · 2021-03-31T18:36:14Z

allennlp/modules/token_embedders/pretrained_transformer_embedder.py

+            def _train(self, mode):
+                self.training = False
+                for module in self.children():
+                    module.train(False)
+                return self
+
+            setattr(
+                self.transformer_model, "train", types.MethodType(_train, self.transformer_model)
+            )


Instead of patching the .train() method on self.transformer model, why not just override the .train() method PretrainedTransformerEmbedder?

We could implement it like

@overrides def train(self, mode: bool = True): self.training = mode for name, module in self.named_children(): if not self.train_parameters and name == "transformer_model": module.eval() else: module.train(mode) return self

Ooh, okay, I agree that this is indeed a great solution. Thanks! Just incorporated it.

…s/pretrained_transformer_embedder_eval_mode

allennlp/modules/token_embedders/pretrained_transformer_embedder.py

…er.py

epwalsh

LGTM! Nice job!

nelson-liu · 2021-04-08T23:16:58Z

I realize I should have spoken up sooner about this, but would it be possible to change this to add another parameter that controls eval vs. non-eval mode, versus overriding the default behavior of train_parameters (asking before the next release goes out and it'd become a breaking change)? In particular, this behavior is pretty different from the setting where you use Elmo with requires_grad=False. In that setting, the Elmo weights actually aren't in eval mode.

There are also settings where you want the default behavior (i.e., non-eval mode, but frozen parameters). In particular, there's been recent work on trying to do parameter-efficient fine-tuning by, for instance, only fine-tuning the bias terms of the transformer (https://nlp.biu.ac.il/~yogo/bitfit.pdf). In this case, it's much more ergonomic to set train_parameters to False and then have a regex for [["^_text_field_embedder.token_embedder_tokens.transformer_model.*bias"], {"requires_grad": true}], versus the having to write a regex like [["^_text_field_embedder.token_embedder_tokens.transformer_model.*(?<!bias)$"], {"requires_grad": false}].

Lastly, I think that semantically train_parameters doesn't necessarily imply eval mode---they seem like distinct ways of modifying the model (e.g., how the Elmo embedder works right now).

set transformer to evaluation mode

6881619

ArjunSubramonian requested review from AkshitaB and epwalsh March 29, 2021 23:31

ArjunSubramonian self-assigned this Mar 29, 2021

epwalsh reviewed Mar 29, 2021

View reviewed changes

AkshitaB and others added 4 commits March 30, 2021 16:51

minor docstring update

f8858af

Updated CHANGELOG

4dbd440

Merge branch 'arjuns/pretrained_transformer_embedder_eval_mode' of ht…

e04e8f1

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

added test case to ensure that PretrainedTrransformerEmbedder's trans…

5f25a52

…former model remains in eval mode even when module that instantiates it is switched to training mode

epwalsh reviewed Mar 30, 2021

View reviewed changes

tests/modules/token_embedders/pretrained_transformer_embedder_test.py Outdated Show resolved Hide resolved

epwalsh reviewed Mar 30, 2021

View reviewed changes

ArjunSubramonian and others added 8 commits March 30, 2021 15:48

Update tests/modules/token_embedders/pretrained_transformer_embedder_…

31877be

…test.py Co-authored-by: Evan Pete Walsh <[email protected]>

simplified eval mode logic

e7d24c7

Merge branch 'arjuns/pretrained_transformer_embedder_eval_mode' of ht…

bba4fd0

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

made eval mode logic more robust

55e83ce

Merge branch 'main' into arjuns/pretrained_transformer_embedder_eval_…

fc48be5

…mode

fixed minor bug

7a5c7bb

Merge branch 'arjuns/pretrained_transformer_embedder_eval_mode' of ht…

a319d20

…tps://github.com/allenai/allennlp into arjuns/pretrained_transformer_embedder_eval_mode

Merge branch 'main' into arjuns/pretrained_transformer_embedder_eval_…

ec4e6ff

…mode

epwalsh reviewed Mar 31, 2021

View reviewed changes

Arjun Subramonian added 2 commits March 31, 2021 12:17

Merge branch 'main' of https://github.com/allenai/allennlp into arjun…

430b3ad

…s/pretrained_transformer_embedder_eval_mode

Incorporated Pete's solution

634ef9f

epwalsh reviewed Mar 31, 2021

View reviewed changes

allennlp/modules/token_embedders/pretrained_transformer_embedder.py Outdated Show resolved Hide resolved

Update allennlp/modules/token_embedders/pretrained_transformer_embedd…

06d4d0b

…er.py

epwalsh approved these changes Mar 31, 2021

View reviewed changes

ArjunSubramonian merged commit 63a3b48 into main Mar 31, 2021

ArjunSubramonian deleted the arjuns/pretrained_transformer_embedder_eval_mode branch March 31, 2021 20:33

epwalsh mentioned this pull request Apr 9, 2021

Setting requires_grad = True for optimizer parameters #5106

Open

nelson-liu mentioned this pull request Apr 9, 2021

Add eval_mode argument to pretrained transformer embedder #5111

Merged

		for param in self.transformer_model.parameters():
		param.requires_grad = False

		# Calling transformer_model.eval() won't change anything now,
		# so we have to explicitly set training = False

set transformer to evaluation mode #5073

set transformer to evaluation mode #5073

Uh oh!

Conversation

ArjunSubramonian commented Mar 29, 2021

Uh oh!

epwalsh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

epwalsh left a comment

Choose a reason for hiding this comment

Uh oh!

nelson-liu commented Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nelson-liu commented Apr 8, 2021 •

edited

Loading