[357] Sub-package for evaluation #359

tjhunter · 2025-06-17T11:56:15Z

Description

Adds a separate weathergen-evaluate and weathergen-common packages in the same source tree, while leaving the current model code intact. This has been tested for the usual commands uv run evaluate/train/... .

Here is the tree of dependency (showing all the dependencies are in lockstep and weathergen-common being a dependency of weathergen and weathergen-evaluate):

weathergen v0.1.0
├── anemoi-datasets v0.5.16
├── astropy-healpix v1.1.2
├── flash-attn v2.7.4.post1
├── matplotlib v3.10.1
├── numpy v2.2.4
├── omegaconf v2.3.0
├── packaging v24.2
├── pandas v2.2.3
├── polars v1.25.2
├── psutil v7.0.0
├── pynvml v12.0.0
├── torch v2.6.0+cu124
├── tqdm v4.67.1
├── weathergen-common v0.1.0
├── wheel v0.45.1
├── zarr v2.17.0
├── pytest v8.3.5 (group: dev)
├── pytest-mock v3.14.1 (group: dev)
└── ruff v0.9.7 (group: dev)
weathergen-evaluate v0.1.0
├── cartopy v0.24.1
├── plotly v6.1.2
├── weathergen-common v0.1.0 (*)
├── pytest v8.3.5 (group: dev) (*)
├── pytest-mock v3.14.1 (group: dev) (*)
└── ruff v0.9.7 (group: dev)

What works:

everything. There should be no change for people working on the model/data.
pylance / vscode has no issue navigating, hinting, type checking. It needs a refresh of the cache though.

Points of questions:

each sub-package needs to define its own dependencies, including the dev ones such as ruff etc. There is a bit of duplication.
each sub-package has its own version number etc. We do not use versions right and there seems to be a way to synchronize all the versions if we wanted to.

There is an example of a self-contained script called plot.py which depends on weathergen-evaluate (and all its dependencies). It can directly be called with ./plot.py and uv will take care of creating the appropriate venv.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Issue Number

Closes #357

tjhunter · 2025-06-17T11:59:05Z

src/weathergen/__init__.py

this file needs to be removed. it was already causing issues for nv perf tools, and in general if it exists, it prevents python from discovering sub-packages such as weathergen.common

Also, this change would need to happen in any case if we want to run any code on CPU-only code. The training code depends on flash-attn (GPU-only), so removing any reference to training or model in __init__.py is necessary to run a sub-package in CPU-only code

clessig

It seems a bit inconsistent for me to have packages/common and packages/evaluation and the "main" code is under src/. But I can live with it. config.py and validation_io.py (which is currently revised!) should be part of the common package. Anything else?

Issue #357 is also more or less a duplicate of #340.

clessig · 2025-06-17T13:43:36Z

packages/common/src/weathergen/common/__init__.py

@@ -0,0 +1,2 @@
+def common_function():
+    return "This is a common function for weather generation."


This refers to code that is shared by the model and evaluation code? The config is something that is needed on both sides.

Exactly, this is for common code.

I think it is better to put skeleton first and make the move of these pieces later because it is a large change and it will probably have many conflicts. I put a TODO here for the time being and I will create a separate ticket (which can be handled by other people). How does that sound to you?

clessig · 2025-06-17T13:44:15Z

packages/evaluate/src/weathergen/evaluate/plot.py

+    print(common_function())
+
+
+if __name__ == "__main__":


Can we use the plot.py function that we have.

yes we can. I was not sure if you wanted to make changes there first? You were saying it was not ready yet. But happy to move here first.

clessig · 2025-06-17T13:45:00Z

src/weathergen/run_train.py

@@ -1,9 +1,169 @@
+# (C) Copyright 2025 WeatherGenerator contributors.


This file should now go to the train/ directory

Some slrum scripts directly refer to this file. I would rather first change these scripts to use the train / evaluate / ... commands first before moving this file. Is this good for you?

…erator into tjh/dev/357_subpackages

tjhunter · 2025-06-18T09:49:19Z

@clessig I agree that having most of the model and data stuff in a separate place is not optimal. At the same time, as long as we are not publishing packages, it should not have a downside. Once we move all the plotting to evaluation and the common stuff to common, we will have less merge conflicts to deal with.

* working * changes * removing deps from non-core project * changes * fixes * comments

* Implement mock IO (#336) * Adapt score class score class (#339) * Implement mock IO * Adapt score class * Removing unused file (#349) * remove database folder (#355) * Small change - CI - pinning the version of formatting (#361) * changes * changes * Update INSTALL.md * Update INSTALL.md * Fixed Exxx lint issues (#284) * Rebased to the latest changes and linted new changes * addressed review comments * addressed review comments * Linted the latest changes. * corrected the formating * corrected the formating * configured ruff to use LF line endings in pyproject.toml * [357] Sub-package for evaluation (#359) * working * changes * removing deps from non-core project * changes * fixes * comments * Iluise quick fix stac (#374) * remove database folder * fix database * Simplifying workflow for plot_training (#368) * Simplifying workflow for plot_training * Ruffed * Working on implementing exclude_source * Remove unused code * Fixed ruff issue * Fixing bug in lat handling (377) (#378) * Fixing bug in lat handling * Added comment --------- Co-authored-by: Seb Hickman <[email protected]> * recover num_ranks from previous run to calculate epoch_base (#317) * recover num_ranks from previous run to calculate epoch_base * set email settings for commits * addressing Tim's comment * make ruff happy * improve style * changes (#385) Linter rule so np.ndarray is not used as type * changed the script name from evaluate to inference as it simply gener… (#376) * changed the script name from evaluate to inference as it simply generate infer samples * changed evaluate to inference in the main scripts and corresponding calls in the config * update the main function for the inference script * changed evaluate to inference also in docstring, unit test scripts, and integration test scripts --------- Co-authored-by: Patnala,Ankit <[email protected]> * Introduce tuples instead for strings to avoid TypeError (#392) * Exclude channels from src / target (#363) * Exclude channels from src / target * Simplified code and added comment that pattern matching is used * Adding new stream config * Fixing bug that led to error when accessing self.ds when dataset is empty * Wokign on exlcude_source * work in progress * Fixing incorrect formating for logger (#388) * Ruffed * Refactored and cleaned up channel selection. Also added check that channels are not empty * Cleaned channel parsing and selection * Adjustments * Removing asserts incompatible with empty dataset --------- Co-authored-by: Christian Lessig <[email protected]> * add embed_dropout_rate to config v1 (#358) * [402] adds checks to the pull request (#403) * chanegs * mistake * mistake * mistake * changes * doc * Introduce masking class and incorporate in TokenizerMasking (#383) * creating masking class and adapting tokenizer_masking to use this class * minor changes to masking.py and tokenizer_masking * removed old tokenizer_masking * include masking_strategy in default_config * change ValueError to assert * linting formatting changes files * further linting of docstrings * create mask_source and mask_target in Masker, and update tokenizer_masking to use these, then style improvements * linted masking, tokenizer_masking * modify masker, rng and perm_sel now part of class, remove extra masking_rate, update comments, remove archived class * remove check if all masked, not masked * remove self.masking_rate from MultiStreamDS class, and masking args from batchify_source * update tokenizer utils with description of idx_ord_lens in comment * remove masking args from batchify_, perm_sel removed now internal to Masker class, remove handling special cases of masking (all masked) * adding masking_strategy: to config * remove unused mentions of masking_combination * removed comment about streams * changed assert to check self perm_sel is not None * ruff masking, tokenizer_masking * Ruffed * Added warning to capture corner case, likely due to incorrect user settings. * Fixed incorrect call twice * Fixed missing conditional for logger statement * Required changes for better handling of rngs * Improved handling of rngs * Improved handling of rng --------- Co-authored-by: Christian Lessig <[email protected]> * Implement per-channel logging (#283) * Fix bug with seed being divided by 0 for worker ID=0 * Fix bug causing crash when secrets aren't in private config * Implement logging losses per channel * Fix issue with empty targets * Rework loss logging * ruff * Remove computing max_channels * Change variables names * ruffed * Remove redundant enumerations * Use stages for logging * Add type hints * Apply the review * ruff * fix * Fix type hints * ruff --------- Co-authored-by: Tim Hunter <[email protected]> * [346] Passing options through the slurm script (#400) * changes * fixes * refactor `validation_io.write_validation` to make it more readable * remove legacy code `validation_io.read_validation` * encapsulate artifact path logic in config module * remove redundant attribute `Trainer.path_run` * use config to look up base_path in `write_validation` * remove unused `write_validation` args: `base_path`, `rank` * ensure correct type for pathes * remove streams initialization from `Trainer` * remove path logic from `Trainer.save_model` * simplify conditional * rename mock io module * update uv to include dask * Implement io module to support reading/writing model output * implement new validation_io routine * use new write_validation routine * remove unused code * rename output routine to `write_output` * ruffed and added comments * fixed annotation * use simple __init__ method for `OutputItem` instead of dataclasses magic * address reviewers comments * rename method * add simple docstrings * ruffed * typehint fixes * refactor names * update comments and typehints, dont import pytorch * remove `__post_init__` methods, cache properties * fixes and integration test * final fixes :) * changes * changes * changes * changes * changes * more work * changes * changes * changes * ruffed * ruffed * improve logging and comments * Update to score-class according to internal discussions and feedback in PR. * Add license header. * Ruffed code. * Update to score-class according to internal discussions and feedback in PR. * Add license header. * Ruffed code. * Add doc-string to call-method and provide example usage for efficient graph-construction. * Some fixes to score-class. * Some fixes to handling aggregation dimension. * Add missing import of MockIO. * changes * changes * removing the scores * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes --------- Co-authored-by: Kacper Nowak <[email protected]> Co-authored-by: Christian Lessig <[email protected]> Co-authored-by: iluise <[email protected]> Co-authored-by: Sindhu-Vasireddy <[email protected]> Co-authored-by: Seb Hickman <[email protected]> Co-authored-by: Julian Kuehnert <[email protected]> Co-authored-by: ankitpatnala <[email protected]> Co-authored-by: Patnala,Ankit <[email protected]> Co-authored-by: Savvas Melidonis <[email protected]> Co-authored-by: Christian Lessig <[email protected]> Co-authored-by: Till Hauer <[email protected]> Co-authored-by: Simon Grasse <[email protected]> Co-authored-by: Michael <[email protected]>

* Implement mock IO (ecmwf#336) * Adapt score class score class (ecmwf#339) * Implement mock IO * Adapt score class * Removing unused file (ecmwf#349) * remove database folder (ecmwf#355) * Small change - CI - pinning the version of formatting (ecmwf#361) * changes * changes * Update INSTALL.md * Update INSTALL.md * Fixed Exxx lint issues (ecmwf#284) * Rebased to the latest changes and linted new changes * addressed review comments * addressed review comments * Linted the latest changes. * corrected the formating * corrected the formating * configured ruff to use LF line endings in pyproject.toml * [357] Sub-package for evaluation (ecmwf#359) * working * changes * removing deps from non-core project * changes * fixes * comments * Iluise quick fix stac (ecmwf#374) * remove database folder * fix database * Simplifying workflow for plot_training (ecmwf#368) * Simplifying workflow for plot_training * Ruffed * Working on implementing exclude_source * Remove unused code * Fixed ruff issue * Fixing bug in lat handling (377) (ecmwf#378) * Fixing bug in lat handling * Added comment --------- Co-authored-by: Seb Hickman <[email protected]> * recover num_ranks from previous run to calculate epoch_base (ecmwf#317) * recover num_ranks from previous run to calculate epoch_base * set email settings for commits * addressing Tim's comment * make ruff happy * improve style * changes (ecmwf#385) Linter rule so np.ndarray is not used as type * changed the script name from evaluate to inference as it simply gener… (ecmwf#376) * changed the script name from evaluate to inference as it simply generate infer samples * changed evaluate to inference in the main scripts and corresponding calls in the config * update the main function for the inference script * changed evaluate to inference also in docstring, unit test scripts, and integration test scripts --------- Co-authored-by: Patnala,Ankit <[email protected]> * Introduce tuples instead for strings to avoid TypeError (ecmwf#392) * Exclude channels from src / target (ecmwf#363) * Exclude channels from src / target * Simplified code and added comment that pattern matching is used * Adding new stream config * Fixing bug that led to error when accessing self.ds when dataset is empty * Wokign on exlcude_source * work in progress * Fixing incorrect formating for logger (ecmwf#388) * Ruffed * Refactored and cleaned up channel selection. Also added check that channels are not empty * Cleaned channel parsing and selection * Adjustments * Removing asserts incompatible with empty dataset --------- Co-authored-by: Christian Lessig <[email protected]> * add embed_dropout_rate to config v1 (ecmwf#358) * [402] adds checks to the pull request (ecmwf#403) * chanegs * mistake * mistake * mistake * changes * doc * Introduce masking class and incorporate in TokenizerMasking (ecmwf#383) * creating masking class and adapting tokenizer_masking to use this class * minor changes to masking.py and tokenizer_masking * removed old tokenizer_masking * include masking_strategy in default_config * change ValueError to assert * linting formatting changes files * further linting of docstrings * create mask_source and mask_target in Masker, and update tokenizer_masking to use these, then style improvements * linted masking, tokenizer_masking * modify masker, rng and perm_sel now part of class, remove extra masking_rate, update comments, remove archived class * remove check if all masked, not masked * remove self.masking_rate from MultiStreamDS class, and masking args from batchify_source * update tokenizer utils with description of idx_ord_lens in comment * remove masking args from batchify_, perm_sel removed now internal to Masker class, remove handling special cases of masking (all masked) * adding masking_strategy: to config * remove unused mentions of masking_combination * removed comment about streams * changed assert to check self perm_sel is not None * ruff masking, tokenizer_masking * Ruffed * Added warning to capture corner case, likely due to incorrect user settings. * Fixed incorrect call twice * Fixed missing conditional for logger statement * Required changes for better handling of rngs * Improved handling of rngs * Improved handling of rng --------- Co-authored-by: Christian Lessig <[email protected]> * Implement per-channel logging (ecmwf#283) * Fix bug with seed being divided by 0 for worker ID=0 * Fix bug causing crash when secrets aren't in private config * Implement logging losses per channel * Fix issue with empty targets * Rework loss logging * ruff * Remove computing max_channels * Change variables names * ruffed * Remove redundant enumerations * Use stages for logging * Add type hints * Apply the review * ruff * fix * Fix type hints * ruff --------- Co-authored-by: Tim Hunter <[email protected]> * [346] Passing options through the slurm script (ecmwf#400) * changes * fixes * refactor `validation_io.write_validation` to make it more readable * remove legacy code `validation_io.read_validation` * encapsulate artifact path logic in config module * remove redundant attribute `Trainer.path_run` * use config to look up base_path in `write_validation` * remove unused `write_validation` args: `base_path`, `rank` * ensure correct type for pathes * remove streams initialization from `Trainer` * remove path logic from `Trainer.save_model` * simplify conditional * rename mock io module * update uv to include dask * Implement io module to support reading/writing model output * implement new validation_io routine * use new write_validation routine * remove unused code * rename output routine to `write_output` * ruffed and added comments * fixed annotation * use simple __init__ method for `OutputItem` instead of dataclasses magic * address reviewers comments * rename method * add simple docstrings * ruffed * typehint fixes * refactor names * update comments and typehints, dont import pytorch * remove `__post_init__` methods, cache properties * fixes and integration test * final fixes :) * changes * changes * changes * changes * changes * more work * changes * changes * changes * ruffed * ruffed * improve logging and comments * Update to score-class according to internal discussions and feedback in PR. * Add license header. * Ruffed code. * Update to score-class according to internal discussions and feedback in PR. * Add license header. * Ruffed code. * Add doc-string to call-method and provide example usage for efficient graph-construction. * Some fixes to score-class. * Some fixes to handling aggregation dimension. * Add missing import of MockIO. * changes * changes * removing the scores * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * changes --------- Co-authored-by: Kacper Nowak <[email protected]> Co-authored-by: Christian Lessig <[email protected]> Co-authored-by: iluise <[email protected]> Co-authored-by: Sindhu-Vasireddy <[email protected]> Co-authored-by: Seb Hickman <[email protected]> Co-authored-by: Julian Kuehnert <[email protected]> Co-authored-by: ankitpatnala <[email protected]> Co-authored-by: Patnala,Ankit <[email protected]> Co-authored-by: Savvas Melidonis <[email protected]> Co-authored-by: Christian Lessig <[email protected]> Co-authored-by: Till Hauer <[email protected]> Co-authored-by: Simon Grasse <[email protected]> Co-authored-by: Michael <[email protected]>

working

aa3a007

github-project-automation bot added this to WeatherGen-dev Jun 17, 2025

tjhunter commented Jun 17, 2025

View reviewed changes

changes

828b3cc

clessig reviewed Jun 17, 2025

View reviewed changes

tjhunter added 6 commits June 17, 2025 13:55

removing deps from non-core project

c2fe2c6

changes

7c5d946

merge with dev

6247a39

fixes

88f0d05

Merge branch 'tjh/dev/357_subpackages' of github.com:ecmwf/WeatherGen…

ef701e3

…erator into tjh/dev/357_subpackages

comments

73cb55f

tjhunter mentioned this pull request Jun 18, 2025

Evaluation package #340

Open

clessig approved these changes Jun 20, 2025

View reviewed changes

tjhunter merged commit 9478dac into develop Jun 20, 2025
3 checks passed

github-project-automation bot moved this to Done in WeatherGen-dev Jun 20, 2025

tjhunter deleted the tjh/dev/357_subpackages branch June 20, 2025 11:40

grassesi pushed a commit that referenced this pull request Jun 24, 2025

[357] Sub-package for evaluation (#359)

dfd3461

* working * changes * removing deps from non-core project * changes * fixes * comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[357] Sub-package for evaluation #359

[357] Sub-package for evaluation #359

Uh oh!

tjhunter commented Jun 17, 2025 •

edited

Loading

Uh oh!

tjhunter Jun 17, 2025

Uh oh!

tjhunter Jun 17, 2025

Uh oh!

clessig left a comment

Uh oh!

clessig Jun 17, 2025

Uh oh!

tjhunter Jun 18, 2025

Uh oh!

clessig Jun 17, 2025

Uh oh!

tjhunter Jun 18, 2025

Uh oh!

clessig Jun 17, 2025

Uh oh!

tjhunter Jun 18, 2025

Uh oh!

tjhunter commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,2 @@
		def common_function():
		return "This is a common function for weather generation."

		@@ -1,9 +1,169 @@
		# (C) Copyright 2025 WeatherGenerator contributors.

[357] Sub-package for evaluation #359

[357] Sub-package for evaluation #359

Uh oh!

Conversation

tjhunter commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Issue Number

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjhunter commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

tjhunter commented Jun 17, 2025 •

edited

Loading