Residual Scaling (normalization strategy)

### Is your feature request related to a problem? Please describe.

Following Watt-Meyer et al., 2023, 2024; [ACE](https://arxiv.org/abs/2310.02074) & [ACE2](https://arxiv.org/abs/2411.11268). _Variables are normalized using a **residual scaling** approach such that predicting outputs equal to input would result in each variable contributing equally to the loss function_.

This should be straightforward, since `anemoi-datasets` already computes the _tendencies statistics_ needed for this normalization strategy. However, the [reference formulas (Appendix H)](https://arxiv.org/abs/2310.02074) use the standard deviation of the _mean-std **normalized**_ fields (not the _**unnormalized**_ fields, which is what `anemoi-datasets` actually computes these statistics for).

### Describe the solution you'd like

See [our proposed solution](https://github.com/Predictia/anemoi-core/tree/feature/de393-47-residual-scaling). This is an implementation of the [reference formulas (Appendix H)](https://arxiv.org/abs/2310.02074), using the _tendencies statistics_ computed by `anemoi-datasets`. We reworked the reference formulas so that the statistics of the unnormalized fields are used instead:

Let $a$ be the target field, and $a_{\textup{ff}}$ the mean-std normalized (or _full-field_ normalized, using their terminology) image $a_{\textup{ff}}=\frac{a-\mu(a)}{\sigma(a)}$. Then, the residual scaling is $a_{\textup{res}}=\frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)}$, where $\sigma_{\textup{res}}(a) = \frac{\sigma(a\prime_\textup{ff})}{\eta_{a\in\mathbf{T}}\left(\sigma(a\prime_\textup{ff})\right)}$ is the standard deviation of the tendency (of the mean-std normalized field, **not** the unnormalized field), divided by the geometric mean $\eta$ (of this quantity) of all targeted fields $\mathbf{T}$ (hereafter, we will omit the $a\in\mathbf{T}$ for clarity). Notice that these are just the reference equations from the paper (using $\eta$ for the geometric mean instead). Now, we want to rewrite these equations in terms of the statistics of the unnormalized field $a$ (instead of the normalized field $a_\textup{ff}$). Thus, we have $\sigma(a\prime_\textup{ff})=\sigma(a_\textup{ff}(t+1)-a_\textup{ff}(t))=\sigma\left(\frac{a(t+1) - \mu(a)}{\sigma(a)} - \frac{a(t) - \mu(a)}{\sigma(a)}\right)=\frac{\sigma(a(t+1)-a(t))}{\sigma(a)}= \frac{\sigma(a\prime)}{\sigma(a)}$, which depends only on the unnormalized field $a$. Then, we rewrite the residual scaling as $a_{\textup{res}} = \frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)} = \frac{(a-\mu(a))/\sigma(a)}{\sigma(a\prime_{\textup{ff}})/\eta\left(\sigma(a\prime_\textup{ff})\right)} = \eta\left(\sigma(a\prime_\textup{ff})\right) \cdot \frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}$. Now, using the relation $\sigma(a\prime_\textup{ff})=\frac{\sigma(a\prime)}{\sigma(a)}$, we have $\eta\left(\sigma(a\prime_\textup{ff})\right)=\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ and $\frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}=\frac{a-\mu(a)}{\sigma(a)\cdot\frac{\sigma(a\prime)}{\sigma(a)}}=\frac{a-\mu(a)}{\sigma(a\prime)}$. Thus, $a_{\textup{res}} = \eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)\cdot\frac{a-\mu(a)}{\sigma(a\prime)}$, which depends only on the unnormalized field $a$ (which is what `anemoi-datasets` actually computes the _tendencies statistics_ for). Thus, the residual scaling consists of adding $\mathbf{add}=-\frac{\mu(a)\cdot\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$ and multiplying by $\mathbf{mul}=\frac{\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$.

In our [implementation](https://github.com/Predictia/anemoi-core/tree/feature/de393-47-residual-scaling), the geometric mean $\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ is computed _iteratively_ within the main `for loop`, and multiplied afterwards to both $\mathbf{add}$ and $\mathbf{mul}$ (notice that we use the [logarithmic definition](https://en.wikipedia.org/wiki/Geometric_mean) of _geometric mean_). If the tendencies' `stdev`  doesn't exist in the `statistics` dictionary (because the _tendencies statistics_ weren't computed during the creation of the dataset), then the code fallbacks to using the `stdev` instead, in which case the formulae above reduces to a mean-std normalization.

Additionally, we [added](https://github.com/Predictia/anemoi-datasets/tree/feature/de393-47-residual-scaling) the tendencies' `stdev` (and its ratio to the `stdev`) in the `inspect` command in `anemoi-datasets`.

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

### Organisation

Predictia Intelligent Data Solutions - DestinationEarth393

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Residual Scaling (normalization strategy) #319

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Residual Scaling (normalization strategy) #319

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions