-
Notifications
You must be signed in to change notification settings - Fork 18
Transfer learning broken for models trained before #182 #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @icedoom888 thanks for opening the issue. |
Hey @mchantry, yes. |
I had some similar problems with multi-domain setup when layer_kernels where introduced. Layers such as layer_norm1 was renamed to layer_norm_attention_src. So using the original transfer_learning_loading function, these layer were not detected, hence random weights. So I added a "patch" to fix this issue. So I introduced mapping_weights. In the config you specify : mapping_weights:
new_name: old_name in my case I had to map this: mapping_weights:
layer_norm_attention_src: layer_norm1
layer_norm_attention_dest: layer_norm2
layer_norm_attention: layer_norm1
layer_norm_mlp: layer_norm2 which works fine, but not an ideal solution if there is alot of layers missing or mismatch in names etc.. However would this patch be of interest? Would also love to see if this can be generalized in a way, happy to add some help if needed. |
This is something to consider in #248. Possibly some renaming functionality, but this might be out of scope. Would love your input on the roadmap though as users of the transfer learning / model freezing capabilities. |
Indeed, this is a more like a quick fix however quite keen on ideas to improve "my fix". However looking at #248 I think this would be more like a ideal solution. Especially if someone consider to perform model distillation. So this is great 💃 |
What happened?
With the modification of the attention mechanism and layers introduced in #182.
Transfer leanring from previous checkpoints is broken:
node_dst_mlp weights and biases shapes are incompatible and will not be loaded from the pretrained checkpoints
What are the steps to reproduce the bug?
Run transfer leanring from any checkpoint trained on versions before the merge of #182.
Version
graphs: v0.5.0
Platform (OS and architecture)
Linux balfrin-ln002 5.14.21-150400.24.81_12.0.87-cray_shasta_c #1 SMP Sun Dec 17 12:59:08 UTC 2023 (e30c7c1) x86_64 x86_64 x86_64 GNU/Linux
Relevant log output
Accompanying data
No response
Organisation
No response
The text was updated successfully, but these errors were encountered: