Increase in memory usage with anemoi-inference=0.5.0

### What happened?

Inference runs successfully with `anemoi-inference==0.4.9`, but upgrading to `anemoi-inference==0.5.0`results in a torch.OutOfMemory error. The environment remains unchanged in both cases, with `anemoi-models==0.4.0` installed.

### What are the steps to reproduce the bug?

Run an inference step using a graphtransformer (n320->TriNodes(refinement=7)->n320) with 1024 channels.

```yaml
checkpoint: my_ckpt.ckpt
date: 2023-06-01
runner: default
input: mars
lead_time: 360
output:
  grib: test_n320.grib
```



### Version

0.5.0

### Platform (OS and architecture)

x86_64 GNU/Linux

### Relevant log output

```shell
...
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/models/encoder_processor_decoder.py", line 188, in forward
    x_data_latent, x_latent = self._run_mapper(
                              ^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/models/encoder_processor_decoder.py", line 159, in _run_mapper
    return checkpoint(
           ^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
    ret = function(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/layers/mapper.py", line 344, in forward
    x_dst = super().forward(x, batch_size, shard_shapes, model_comm_group)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/layers/mapper.py", line 260, in forward
    (x_src, x_dst), edge_attr = self.proc(
                                ^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/layers/block.py", line 512, in forward
    edge_attr_list, edge_index_list = sort_edges_1hop_chunks(
                                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/anemoi/models/distributed/khop_edges.py", line 121, in sort_edges_1hop_chunks
    edge_index_chunk, edge_attr_chunk = bipartite_subgraph(
                                        ^^^^^^^^^^^^^^^^^^^
  File "VENVS_DIR/aifs-inference/lib/python3.11/site-packages/torch_geometric/utils/subgraph.py", line 192, in bipartite_subgraph
    edge_attr = edge_attr[edge_mask] if edge_attr is not None else None

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 120.00 MiB. GPU 0 has a total capacity of 39.56 GiB of which 17.12 MiB is free. Process 2750488 has 9.46 GiB memory in use. Including non-PyTorch memory, this process has 16.54 GiB memory in use. Process 2750490 has 9.48 GiB memory in use. Process 2750487 has 4.03 GiB memory in use. Of the allocated memory 15.94 GiB is allocated by PyTorch, and 112.30 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
```

### Accompanying data

_No response_

### Organisation

ECMWF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase in memory usage with anemoi-inference=0.5.0 #197

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Increase in memory usage with anemoi-inference=0.5.0 #197

Description

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions