You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working on transfer learning, we tried to use a registred checkpoint in an environment with Python 3.12 and the checkpoint appeared to be corrupted.
However, it is possible to load the checkpoint in an environment with Python 3.11. So, the problem seems to come from the changes between 3.11 and 3.12. Indeed, the package tarfile (on which Pytorch relies to load checkpoints) mentions changes in 3.12.
This should not be a problem while Anemoi relies on Python 3.11 but I thought it was worth mentioning for future updates.
What are the steps to reproduce the bug?
Download the checkpoint "proper-osprey" from the Anemoi catalog. It will named 4b23cfdc-f24f-428a-98ce-1c800979e30a.ckpt
Execute the following in the Python 3.12 environment:
Linux laptop 6.11.0-21-generic ~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC
Relevant log output
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
File "/.../python3.12/site-packages/torch/serialization.py", line 1326, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../python3.12/site-packages/torch/serialization.py", line 671, in __init__
super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted
Accompanying data
No response
Organisation
No response
The text was updated successfully, but these errors were encountered:
Hi! For me it worked following Gabriel's steps in slack (which uses python 3.11). For training with the streteched-grid I also had to change the lines regarding the node_loss_weights in the training config to:
Hi mtgarciag, thanks for your reply! Yes, it works with Python 3.11 but not with Python 3.12. As long as we use 3.11 it's OK but I thought it was useful to mention it in anticipation to a future switch to 3.12
What happened?
When working on transfer learning, we tried to use a registred checkpoint in an environment with Python 3.12 and the checkpoint appeared to be corrupted.
However, it is possible to load the checkpoint in an environment with Python 3.11. So, the problem seems to come from the changes between 3.11 and 3.12. Indeed, the package
tarfile
(on which Pytorch relies to load checkpoints) mentions changes in 3.12.This should not be a problem while Anemoi relies on Python 3.11 but I thought it was worth mentioning for future updates.
What are the steps to reproduce the bug?
4b23cfdc-f24f-428a-98ce-1c800979e30a.ckpt
Version
python 3.12, torch 2.6, anemoi-training 0.3.2.post246
Platform (OS and architecture)
Linux laptop 6.11.0-21-generic ~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC
Relevant log output
Accompanying data
No response
Organisation
No response
The text was updated successfully, but these errors were encountered: