-
Notifications
You must be signed in to change notification settings - Fork 18
Multiple Datasets I - Base support in anemoi-core #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have done an initial implementation for the training loop and model to the point where i can run training_step/validation_step (without rollout / diagnostics) with mocked input/output data in a dictionary. Biggest changes / points for discussion
Training
|
Nice work getting it to run @havardhhaugen ! RE ModelIndex/data_indices, I think it would be good to still have this information as an attribute of AnemoiTrainer, but to collect it from the relevant config entries instead of taking the info from the dataloader -- similar to what you did in the current version. (In view of the multiple inputs/outputs we'll probably want it to be a dictionary that contains similar information as the previous data_indices but for all input/output data sources.) Potential upsides of having it as an attribute of AnemoiTrainer:
I'll start moving it there. Do let me know if you have any feedback on the above! |
Update on the dataloading side
Please check the conftest.py to check the structure of the new config file |
Implement base support for handling multiple datasets in the anemoi-core data pipeline, specifically for the GraphForecaster model.
Goal
Train a forecasting model (era → era) relying on the DataHandler design proposed in #69 without rollout or diagnostics.
Scope
DataHandler
class. The goal of this task is that PyTorch Lightning datamodule returns batches of type:instead of
Model:
Refactor model interface to accept and return dictionaries in the same format. For this stage, just handle a single input/output dictionary of
{"era": torch.Tensor}
.Training loop:
Modify loss computation to operate over {pred_dict} -> {target_dict} by comparing predicted and target tensors per dataset key.
Notes
The text was updated successfully, but these errors were encountered: