Skip to content

feat(graphs): build nodes from xarray compatible data #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

matschreiner
Copy link

@matschreiner matschreiner commented May 23, 2025

Description

This PR adds an XArray nodebuilder for building nodes from files compatible with xarray.

Example usage:

    data_nodes = XArrayNodes("/storage/danra/height_levels.zarr", "data", lat_key='latitude', lon_key='longitude')
    graph = HeteroData()
    graph = data_nodes.update_graph(graph)

Builds a set of nodes on the latitudes and longitudes specified in the file. Therefore this builders requires latitudes and longitudes to be defined on the file. The names of these can be specified with the lat/lon_key kwargs, defaulting to lat and lon, such that xarray can access them with its __getitem__ method on the instantiated <xarrray.Dataset>.

Additionally, since this work is related to zarr, some mock objects (that were anemoi-dataset rather than zarr mocks) called for example mock_zarr_dataset, have been renamed to mock_anemoi_dataset, such that the mock_zarr_dataset is made available for actual zarr dataset mocks in the test suite.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Issue Number

Code Compatibility

  • I have performed a self-review of my code

Code Performance and Testing

  • I have added tests that prove my fix is effective or that my feature works
  • I ran the complete Pytest test suite locally, and they pass
  • I have tested the changes on a single GPU
  • I have tested the changes on multiple GPUs / multi-node setups
  • I have run the Benchmark Profiler against the old version of the code
  • If the new feature introduces modifications at the config level, I have made sure to update Pydantic Schemas and default configs accordingly

Dependencies

  • I have ensured that the code is still pip-installable after the changes and runs
  • I have tested that new dependencies themselves are pip-installable.
  • I have not introduced new dependencies in the inference portion of the pipeline

Documentation

a

  • My code follows the style guidelines of this project
  • I have updated the documentation and docstrings to reflect the changes
  • I have added comments to my code, particularly in hard-to-understand areas

Additional Notes


📚 Documentation preview 📚: https://anemoi-training--330.org.readthedocs.build/en/330/


📚 Documentation preview 📚: https://anemoi-graphs--330.org.readthedocs.build/en/330/


📚 Documentation preview 📚: https://anemoi-models--330.org.readthedocs.build/en/330/

@matschreiner matschreiner requested a review from a team as a code owner May 23, 2025 10:34
@FussyDuck
Copy link

FussyDuck commented May 23, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ matschreiner
❌ Jacob Mathias Schreiner


Jacob Mathias Schreiner seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@matschreiner
Copy link
Author

@JPXKQX I cannot tag you as a reviewer :)

@mchantry mchantry requested a review from JPXKQX May 23, 2025 10:54
@@ -55,6 +55,7 @@ dependencies = [
"torch-geometric>=2.3.1",
"trimesh>=4.1",
"typeguard>=4",
"xarray<=2025.4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an optional dependency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 why xarray has become a doc-dependency now?

Mat, why can't it be a regular dependency of graphs? As far as I know, xarray is required by earthkit-data, which is required by anemoi-transform, which is required by anemoi-datasets, which is required by anemoi-graphs, so you would end-up with xarray anyway. Perhaps it would be just a matter of the particular version..

Copy link
Author

@matschreiner matschreiner May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I didn't see it was dependencies for docs.
However, if you decide that it should be optional, where should I put it in the pyproject file?
Something like

optional-dependencies.data = "[xarray<=2025.4]"

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MeraX you may be right, that it is a dependency anyway, it just felt like philosophically more of these graph building options should be optional to streamline the dependencies. In #338 we plan to move h3 to being an optional dependency, motivated by #331

Jacob Mathias Schreiner added 2 commits May 23, 2025 13:13
…com:matschreiner/anemoi-core into feat/build-nodes-from-xarray-compliant-data
@matschreiner matschreiner changed the title Feat/build nodes from xarray compliant data Feat/build nodes from xarray compatible data May 23, 2025
@JPXKQX JPXKQX changed the title Feat/build nodes from xarray compatible data feat(graphs): build nodes from xarray compatible data May 26, 2025
Copy link
Member

@JPXKQX JPXKQX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mathias, thanks for your contribution to anemoi-graphs. Great work! I've left some minor comments. The only remaining part is what to do with the xarray dependency. Another option I see is a lazy import inside the XArrayNodes. Something like:

    try:
        import xarray as xr
    except ImportError:
        raise ImportError("
            The xarray package is required to use {self.__class__.__name__}. You can install it with 'pip install xarray'."
        )

What do you think? I think we should do the same for other libraries in future (h3, ...).

Jacob Mathias Schreiner added 2 commits May 28, 2025 11:18
@matschreiner matschreiner force-pushed the feat/build-nodes-from-xarray-compliant-data branch from 7e22fd2 to 71ef05e Compare May 28, 2025 09:20
@matschreiner matschreiner force-pushed the feat/build-nodes-from-xarray-compliant-data branch from a5812aa to 44e0a1f Compare May 28, 2025 09:26
@matschreiner matschreiner force-pushed the feat/build-nodes-from-xarray-compliant-data branch from 8575c66 to aa54902 Compare May 28, 2025 09:28
@matschreiner matschreiner force-pushed the feat/build-nodes-from-xarray-compliant-data branch from 09d212a to aa727cc Compare May 28, 2025 09:46
@matschreiner matschreiner force-pushed the feat/build-nodes-from-xarray-compliant-data branch from 06fd7f8 to a35da52 Compare May 28, 2025 09:48
@matschreiner
Copy link
Author

@JPXKQX
Ready for second round :)

I think it's a good idea with the lazy import for optional dependencies.
However, maybe catching the import error to raise a new import error is a little bloated.

@JPXKQX
Copy link
Member

JPXKQX commented May 29, 2025

@JPXKQX Ready for second round :)

I think it's a good idea with the lazy import for optional dependencies. However, maybe catching the import error to raise a new import error is a little bloated.

This is great, thank you! The only thing I missed in the previous review was the changelog. It shouldn't be modified. When we release a new version, the CI will automatically update the changelog with the correct commits and links.

@mchantry mchantry added the ATS Approval Not Needed No approval needed by ATS label May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants