Skip to content

Residual Scaling (normalization strategy) #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
PortillaS-Predictia opened this issue May 13, 2025 · 1 comment
Open

Residual Scaling (normalization strategy) #319

PortillaS-Predictia opened this issue May 13, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@PortillaS-Predictia
Copy link

PortillaS-Predictia commented May 13, 2025

Is your feature request related to a problem? Please describe.

Following Watt-Meyer et al., 2023, 2024; ACE & ACE2. Variables are normalized using a residual scaling approach such that predicting outputs equal to input would result in each variable contributing equally to the loss function.

This should be straightforward, since anemoi-datasets already computes the tendencies statistics needed for this normalization strategy. However, the reference formulas (Appendix H) use the standard deviation of the mean-std normalized fields (not the unnormalized fields, which is what anemoi-datasets actually computes these statistics for).

Describe the solution you'd like

See our proposed solution. This is an implementation of the reference formulas (Appendix H), using the tendencies statistics computed by anemoi-datasets. We reworked the reference formulas so that the statistics of the unnormalized fields are used instead:

Let $a$ be the target field, and $a_{\textup{ff}}$ the mean-std normalized (or full-field normalized, using their terminology) image $a_{\textup{ff}}=\frac{a-\mu(a)}{\sigma(a)}$. Then, the residual scaling is $a_{\textup{res}}=\frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)}$, where $\sigma_{\textup{res}}(a) = \frac{\sigma(a\prime_\textup{ff})}{\eta_{a\in\mathbf{T}}\left(\sigma(a\prime_\textup{ff})\right)}$ is the standard deviation of the tendency (of the mean-std normalized field, not the unnormalized field), divided by the geometric mean $\eta$ (of this quantity) of all targeted fields $\mathbf{T}$ (hereafter, we will omit the $a\in\mathbf{T}$ for clarity). Notice that these are just the reference equations from the paper (using $\eta$ for the geometric mean instead). Now, we want to rewrite these equations in terms of the statistics of the unnormalized field $a$ (instead of the normalized field $a_\textup{ff}$). Thus, we have $\sigma(a\prime_\textup{ff})=\sigma(a_\textup{ff}(t+1)-a_\textup{ff}(t))=\sigma\left(\frac{a(t+1) - \mu(a)}{\sigma(a)} - \frac{a(t) - \mu(a)}{\sigma(a)}\right)=\frac{\sigma(a(t+1)-a(t))}{\sigma(a)}= \frac{\sigma(a\prime)}{\sigma(a)}$, which depends only on the unnormalized field $a$. Then, we rewrite the residual scaling as $a_{\textup{res}} = \frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)} = \frac{(a-\mu(a))/\sigma(a)}{\sigma(a\prime_{\textup{ff}})/\eta\left(\sigma(a\prime_\textup{ff})\right)} = \eta\left(\sigma(a\prime_\textup{ff})\right) \cdot \frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}$. Now, using the relation $\sigma(a\prime_\textup{ff})=\frac{\sigma(a\prime)}{\sigma(a)}$, we have $\eta\left(\sigma(a\prime_\textup{ff})\right)=\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ and $\frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}=\frac{a-\mu(a)}{\sigma(a)\cdot\frac{\sigma(a\prime)}{\sigma(a)}}=\frac{a-\mu(a)}{\sigma(a\prime)}$. Thus, $a_{\textup{res}} = \eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)\cdot\frac{a-\mu(a)}{\sigma(a\prime)}$, which depends only on the unnormalized field $a$ (which is what anemoi-datasets actually computes the tendencies statistics for). Thus, the residual scaling consists of adding $\mathbf{add}=-\frac{\mu(a)\cdot\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$ and multiplying by $\mathbf{mul}=\frac{\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$.

In our implementation, the geometric mean $\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ is computed iteratively within the main for loop, and multiplied afterwards to both $\mathbf{add}$ and $\mathbf{mul}$ (notice that we use the logarithmic definition of geometric mean). If the tendencies' stdev doesn't exist in the statistics dictionary (because the tendencies statistics weren't computed during the creation of the dataset), then the code fallbacks to using the stdev instead, in which case the formulae above reduces to a mean-std normalization.

Additionally, we added the tendencies' stdev (and its ratio to the stdev) in the inspect command in anemoi-datasets.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

Predictia Intelligent Data Solutions - DestinationEarth393

@PortillaS-Predictia PortillaS-Predictia added the enhancement New feature or request label May 13, 2025
@jakob-schloer
Copy link
Collaborator

jakob-schloer commented May 14, 2025

Thanks for the idea and clear description. We have recently included the option to use residual scaling (called tendency scaling) of the loss function (#52).
Instead of scaling the variables $x$ as you described, we scale the loss function, which should have the same effect:

$MSE_{res} \approx |\hat{x}_{res} - x_{res}|^2 = |\hat{x}/\sigma_{res} - x/\sigma_{res}|^2 = 1/\sigma_{res} |\hat{x} - x|^2$

Do you have a use case where scaling the loss function is not enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants