Add support for selective finetune (freeze parameters by regex from config file) #1427

HarshTrivedi · 2018-06-26T17:07:07Z

Add support in fine_tune to selectively tune
Add tests for selective fine tuning.
Allow for turning off gradients in train command.

This PR adds support in fine-tune command to freeze parameters layers by passing regex in config file. In config file passed in fine-tune command:

"trainer": {
    ...
    "no_grad": ["*conv*", ".*text_embedder*"]
}

The above will freeze conv layers and text_embedders but not others parameters.
I believed best place to allow no_grad key setting accessible to is within trainer key and trainer is also used in train command. So I allowed turning of gradients via no_grad in train command as well.

Although I primarily intended this selective turning off of gradients for fine-tune command, it can be useful in train command as well : - eg. If one's modules parameters are loaded / transferred from some other pretrained model and want to freeze it there.

More context: Issue (#1298 )

…set through config file)

…as well this is happening with "trainer" configs).

joelgrus

looks good to me, modulo the small changes I requested, I'll let @DeNeutoy weigh in too

joelgrus · 2018-06-26T20:08:56Z

allennlp/commands/fine_tune.py

@@ -166,6 +166,12 @@ def fine_tune_model(model: Model,
    test_data = all_datasets.get('test')

    trainer_params = params.pop("trainer")
+    nograd_regex_list = trainer_params.pop("no_grad", ())
+    if nograd_regex_list:
+        nograd_regex = "(" + ")|(".join(nograd_regex_list) + ")"


this feels error-prone to me, I'd rather just

no_grad_regexes = trainer_params.pop("no_grad", ()) for name, parameter in model.named_parameters(): if any(re.search(regex, name) for regex in no_grad_regexes): parameter.requires_grad_(False)

it's cleaner and I can't imagine the time difference being noticeable

Sure, will change that.

joelgrus · 2018-06-26T20:12:16Z

allennlp/commands/train.py

+        nograd_regex = "(" + ")|(".join(nograd_regex_list) + ")"
+        for name, parameter in model.named_parameters():
+            if re.search(nograd_regex, name):
+                parameter.requires_grad_(False)


same comment here

joelgrus · 2018-06-26T20:13:21Z

allennlp/commands/fine_tune.py

@@ -10,7 +10,7 @@
 import logging
 import os
 from copy import deepcopy
-
+import re


nit: I like to leave blank lines between the grouped imports: (standard library) -> (external libraries) -> (allennlp modules)

Ohh!! I didn't realise the purpose of blanks...

joelgrus · 2018-06-26T20:13:32Z

allennlp/commands/train.py

@@ -37,7 +37,7 @@
 import logging
 import os
 from copy import deepcopy
-
+import re


joelgrus · 2018-06-26T20:14:00Z

allennlp/tests/commands/fine_tune_test.py

-
+import re
+import shutil
+import pytest


same nit: blank line before and after pytest

joelgrus · 2018-06-26T20:14:31Z

allennlp/tests/commands/train_test.py

@@ -2,7 +2,8 @@
 import argparse
 from typing import Iterable
 import os
-
+import shutil
+import re


same nit: blank line before import pytest

joelgrus · 2018-06-26T20:15:24Z

allennlp/tests/commands/fine_tune_test.py

+            # If regex is matched, parameter name should have same requires_grad
+            # as the originally loaded model
+            if regex_list:
+                nograd_regex = "(" + ")|(".join(regex_list) + ")"


I'm more ok with this being in the test, but I'd still prefer the other way

no problem, i can change that.

joelgrus · 2018-06-26T20:15:52Z

allennlp/tests/commands/train_test.py

+            # If regex is matched, parameter name should have requires_grad False
+            # Or else True
+            if regex_list:
+                nograd_regex = "(" + ")|(".join(regex_list) + ")"


Btw, I had done something similar elsewhere in previous commit : here and used that here.
Should I change there as well?

DeNeutoy

Looks good, one broad point - could you make sure to add logging which prints out which exact parameters have been frozen and which have not been? This is the sort of thing which you could easily mis-specify, such that weird stuff happens, like accidentally training the biases of the frozen model or something.

HarshTrivedi · 2018-06-26T21:28:48Z

True! Sure, I can add that in logs. thanks

DeNeutoy · 2018-06-26T21:30:07Z

Also FYI - thanks for your contributions to allennlp - your PRs and conduct are exemplary and i'm glad that you find the library useful.

HarshTrivedi · 2018-06-26T21:46:44Z

@DeNeutoy Thank you! allennlp is being very helpful to me in what I am currently doing and am sure others who try will find the same. I am very glad to contribute in allennlp as well : ) 👍

HarshTrivedi · 2018-06-26T22:39:51Z

Btw, do we want to log both frozen and non-frozen parameters? Most of the times most of the parameters will be non-frozen. And with all those parameters logged it looks quite messy on the terminal. How about logging only Frozen ones?

This is from one test: .

18:17:25 - INFO - allennlp.commands.train - Parameters without gradient (Frozen) : ['text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._char_embedding_weights', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_0.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_0.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_1.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_1.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_2.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_2.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_3.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_3.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_4.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder.char_conv_4.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._highways._layers.0.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._highways._layers.0.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._highways._layers.1.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._highways._layers.1.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._projection.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._token_embedder._projection.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias', 'text_field_embedder.token_embedder_elmo._elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight', 'text_field_embedder.token_embedder_elmo._elmo.scalar_mix_0.gamma', 'text_field_embedder.token_embedder_elmo._elmo.scalar_mix_0.scalar_parameters.0', 'text_field_embedder.token_embedder_elmo._elmo.scalar_mix_0.scalar_parameters.1', 'text_field_embedder.token_embedder_elmo._elmo.scalar_mix_0.scalar_parameters.2', 'text_field_embedder.token_embedder_tokens.weight', 'encoder._module.weight_ih_l0', 'encoder._module.weight_hh_l0', 'encoder._module.bias_ih_l0', 'encoder._module.bias_hh_l0', 'encoder._module.weight_ih_l0_reverse', 'encoder._module.weight_hh_l0_reverse', 'encoder._module.bias_ih_l0_reverse', 'encoder._module.bias_hh_l0_reverse', 'encoder._module.weight_ih_l1', 'encoder._module.weight_hh_l1', 'encoder._module.bias_ih_l1', 'encoder._module.bias_hh_l1', 'encoder._module.weight_ih_l1_reverse', 'encoder._module.weight_hh_l1_reverse', 'encoder._module.bias_ih_l1_reverse', 'encoder._module.bias_hh_l1_reverse', 'tag_projection_layer._module.weight', 'tag_projection_layer._module.bias', 'crf.transitions', 'crf._constraint_mask', 'crf.start_transitions', 'crf.end_transitions']
18:17:25 - INFO - allennlp.commands.train - Parameters with gradient    (Tunable): []

Edit Again!:
Sorry the logging for frozen and otherwise is reverse here ... but the point stands.

DeNeutoy · 2018-06-27T00:46:15Z

Put the individual parameters in separate logging statements in a for loop.

HarshTrivedi · 2018-06-27T15:59:59Z

This should be good now.

rulai-huajunzeng · 2018-06-29T23:09:54Z

allennlp/commands/fine_tune.py

+            parameter.requires_grad_(False)
+            nograd_parameter_names.append(name)
+        else:
+            grad_parameter_names.append(name)


There is only one problem here, if the parameter's requires_grad is already False, for example a non-trainable embedding, the log will show that it is tunable. Not a very big problem but it looks a bit confusing sometimes.

Thank You for catching that! I agree it would be confusing. Will fix that.

@rulai-huajunzeng

Fix issue in no-grad parameters logging as mentioned by @rulai-huajunzeng in (#1427). If parameters were already set `requires_grad=False` not through no through nograd regex but other means then they were logged as Tunable instead of Frozen. This pr fixes that. I have made the headings capitalized. They are more distinguishable this way amidst a long list of parameters.

…onfig file) (allenai#1427) * Add support in fine_tune to selectively tune (freeze some parameters set through config file) * Add tests for selective fine tuning. * Allow for turning off gradients in train command (since in fine-tune as well this is happening with "trainer" configs). * Add missing imports in fine_tune_test.py * add tests for using 'no_grad' config with train command * Code cleanup: 1. for regex matches 2. follow import convention * Add logging statements for knowing tunable and frozen parameters.

@rulai-huajunzeng

Fix issue in no-grad parameters logging as mentioned by @rulai-huajunzeng in (allenai#1427). If parameters were already set `requires_grad=False` not through no through nograd regex but other means then they were logged as Tunable instead of Frozen. This pr fixes that. I have made the headings capitalized. They are more distinguishable this way amidst a long list of parameters.

HarshTrivedi added 5 commits June 26, 2018 12:41

Add support in fine_tune to selectively tune (freeze some parameters …

50a0abf

…set through config file)

Add tests for selective fine tuning.

f19e5a7

Allow for turning off gradients in train command (since in fine-tune …

e371603

…as well this is happening with "trainer" configs).

Add missing imports in fine_tune_test.py

3af6c81

add tests for using 'no_grad' config with train command

6f0089b

joelgrus approved these changes Jun 26, 2018

View reviewed changes

joelgrus requested a review from DeNeutoy June 26, 2018 20:16

DeNeutoy approved these changes Jun 26, 2018

View reviewed changes

Code cleanup: 1. for regex matches 2. follow import convention

e6f887d

Add logging statements for knowing tunable and frozen parameters.

59ac24c

DeNeutoy added 2 commits June 28, 2018 10:09

Merge branch 'master' into selective-finetune

3a45cf5

Merge branch 'master' into selective-finetune

351987e

DeNeutoy merged commit 7664b12 into allenai:master Jun 28, 2018

DeNeutoy mentioned this pull request Jun 29, 2018

Freezing layers for fine tuning #1298

Closed

rulai-huajunzeng reviewed Jun 29, 2018

View reviewed changes

HarshTrivedi mentioned this pull request Jun 30, 2018

Fix logging of no-grad parameters. #1448

Merged

HarshTrivedi deleted the selective-finetune branch June 30, 2018 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for selective finetune (freeze parameters by regex from config file) #1427

Add support for selective finetune (freeze parameters by regex from config file) #1427

HarshTrivedi commented Jun 26, 2018 •

edited

Loading

joelgrus left a comment

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

joelgrus Jun 26, 2018

HarshTrivedi Jun 26, 2018

DeNeutoy left a comment

HarshTrivedi commented Jun 26, 2018

DeNeutoy commented Jun 26, 2018

HarshTrivedi commented Jun 26, 2018

HarshTrivedi commented Jun 26, 2018 •

edited

Loading

DeNeutoy commented Jun 27, 2018

HarshTrivedi commented Jun 27, 2018

rulai-huajunzeng Jun 29, 2018

HarshTrivedi Jun 30, 2018

Add support for selective finetune (freeze parameters by regex from config file) #1427

Add support for selective finetune (freeze parameters by regex from config file) #1427

Conversation

HarshTrivedi commented Jun 26, 2018 • edited Loading

joelgrus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeNeutoy left a comment

Choose a reason for hiding this comment

HarshTrivedi commented Jun 26, 2018

DeNeutoy commented Jun 26, 2018

HarshTrivedi commented Jun 26, 2018

HarshTrivedi commented Jun 26, 2018 • edited Loading

DeNeutoy commented Jun 27, 2018

HarshTrivedi commented Jun 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarshTrivedi commented Jun 26, 2018 •

edited

Loading

HarshTrivedi commented Jun 26, 2018 •

edited

Loading