You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Following Watt-Meyer et al., 2023, 2024; ACE & ACE2. Variables are normalized using a residual scaling approach such that predicting outputs equal to input would result in each variable contributing equally to the loss function.
This should be straightforward, since anemoi-datasets already computes the tendencies statistics needed for this normalization strategy. However, the reference formulas (Appendix H) use the standard deviation of the mean-std normalized fields (not the unnormalized fields, which is what anemoi-datasets actually computes these statistics for).
Describe the solution you'd like
See our proposed solution. This is an implementation of the reference formulas (Appendix H), using the tendencies statistics computed by anemoi-datasets. We reworked the reference formulas so that the statistics of the unnormalized fields are used instead:
Let $a$ be the target field, and $a_{\textup{ff}}$ the mean-std normalized (or full-field normalized, using their terminology) image $a_{\textup{ff}}=\frac{a-\mu(a)}{\sigma(a)}$. Then, the residual scaling is $a_{\textup{res}}=\frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)}$, where $\sigma_{\textup{res}}(a) = \frac{\sigma(a\prime_\textup{ff})}{\eta_{a\in\mathbf{T}}\left(\sigma(a\prime_\textup{ff})\right)}$ is the standard deviation of the tendency (of the mean-std normalized field, not the unnormalized field), divided by the geometric mean $\eta$ (of this quantity) of all targeted fields $\mathbf{T}$ (hereafter, we will omit the $a\in\mathbf{T}$ for clarity). Notice that these are just the reference equations from the paper (using $\eta$ for the geometric mean instead). Now, we want to rewrite these equations in terms of the statistics of the unnormalized field $a$ (instead of the normalized field $a_\textup{ff}$). Thus, we have $\sigma(a\prime_\textup{ff})=\sigma(a_\textup{ff}(t+1)-a_\textup{ff}(t))=\sigma\left(\frac{a(t+1) - \mu(a)}{\sigma(a)} - \frac{a(t) - \mu(a)}{\sigma(a)}\right)=\frac{\sigma(a(t+1)-a(t))}{\sigma(a)}= \frac{\sigma(a\prime)}{\sigma(a)}$, which depends only on the unnormalized field $a$. Then, we rewrite the residual scaling as $a_{\textup{res}} = \frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)} = \frac{(a-\mu(a))/\sigma(a)}{\sigma(a\prime_{\textup{ff}})/\eta\left(\sigma(a\prime_\textup{ff})\right)} = \eta\left(\sigma(a\prime_\textup{ff})\right) \cdot \frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}$. Now, using the relation $\sigma(a\prime_\textup{ff})=\frac{\sigma(a\prime)}{\sigma(a)}$, we have $\eta\left(\sigma(a\prime_\textup{ff})\right)=\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ and $\frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}=\frac{a-\mu(a)}{\sigma(a)\cdot\frac{\sigma(a\prime)}{\sigma(a)}}=\frac{a-\mu(a)}{\sigma(a\prime)}$. Thus, $a_{\textup{res}} = \eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)\cdot\frac{a-\mu(a)}{\sigma(a\prime)}$, which depends only on the unnormalized field $a$ (which is what anemoi-datasets actually computes the tendencies statistics for). Thus, the residual scaling consists of adding $\mathbf{add}=-\frac{\mu(a)\cdot\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$ and multiplying by $\mathbf{mul}=\frac{\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$.
In our implementation, the geometric mean $\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ is computed iteratively within the main for loop, and multiplied afterwards to both $\mathbf{add}$ and $\mathbf{mul}$ (notice that we use the logarithmic definition of geometric mean). If the tendencies' stdev doesn't exist in the statistics dictionary (because the tendencies statistics weren't computed during the creation of the dataset), then the code fallbacks to using the stdev instead, in which case the formulae above reduces to a mean-std normalization.
Additionally, we added the tendencies' stdev (and its ratio to the stdev) in the inspect command in anemoi-datasets.
Describe alternatives you've considered
No response
Additional context
No response
Organisation
Predictia Intelligent Data Solutions - DestinationEarth393
The text was updated successfully, but these errors were encountered:
Thanks for the idea and clear description. We have recently included the option to use residual scaling (called tendency scaling) of the loss function (#52).
Instead of scaling the variables $x$ as you described, we scale the loss function, which should have the same effect:
Is your feature request related to a problem? Please describe.
Following Watt-Meyer et al., 2023, 2024; ACE & ACE2. Variables are normalized using a residual scaling approach such that predicting outputs equal to input would result in each variable contributing equally to the loss function.
This should be straightforward, since
anemoi-datasets
already computes the tendencies statistics needed for this normalization strategy. However, the reference formulas (Appendix H) use the standard deviation of the mean-std normalized fields (not the unnormalized fields, which is whatanemoi-datasets
actually computes these statistics for).Describe the solution you'd like
See our proposed solution. This is an implementation of the reference formulas (Appendix H), using the tendencies statistics computed by
anemoi-datasets
. We reworked the reference formulas so that the statistics of the unnormalized fields are used instead:Let$a$ be the target field, and $a_{\textup{ff}}$ the mean-std normalized (or full-field normalized, using their terminology) image $a_{\textup{ff}}=\frac{a-\mu(a)}{\sigma(a)}$ . Then, the residual scaling is $a_{\textup{res}}=\frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)}$ , where $\sigma_{\textup{res}}(a) = \frac{\sigma(a\prime_\textup{ff})}{\eta_{a\in\mathbf{T}}\left(\sigma(a\prime_\textup{ff})\right)}$ is the standard deviation of the tendency (of the mean-std normalized field, not the unnormalized field), divided by the geometric mean $\eta$ (of this quantity) of all targeted fields $\mathbf{T}$ (hereafter, we will omit the $a\in\mathbf{T}$ for clarity). Notice that these are just the reference equations from the paper (using $\eta$ for the geometric mean instead). Now, we want to rewrite these equations in terms of the statistics of the unnormalized field $a$ (instead of the normalized field $a_\textup{ff}$ ). Thus, we have $\sigma(a\prime_\textup{ff})=\sigma(a_\textup{ff}(t+1)-a_\textup{ff}(t))=\sigma\left(\frac{a(t+1) - \mu(a)}{\sigma(a)} - \frac{a(t) - \mu(a)}{\sigma(a)}\right)=\frac{\sigma(a(t+1)-a(t))}{\sigma(a)}= \frac{\sigma(a\prime)}{\sigma(a)}$ , which depends only on the unnormalized field $a$ . Then, we rewrite the residual scaling as $a_{\textup{res}} = \frac{a_{\textup{ff}}}{\sigma_{\textup{res}}(a)} = \frac{(a-\mu(a))/\sigma(a)}{\sigma(a\prime_{\textup{ff}})/\eta\left(\sigma(a\prime_\textup{ff})\right)} = \eta\left(\sigma(a\prime_\textup{ff})\right) \cdot \frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}$ . Now, using the relation $\sigma(a\prime_\textup{ff})=\frac{\sigma(a\prime)}{\sigma(a)}$ , we have $\eta\left(\sigma(a\prime_\textup{ff})\right)=\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ and $\frac{a-\mu(a)}{\sigma(a)\cdot\sigma(a\prime_{\textup{ff}})}=\frac{a-\mu(a)}{\sigma(a)\cdot\frac{\sigma(a\prime)}{\sigma(a)}}=\frac{a-\mu(a)}{\sigma(a\prime)}$ . Thus, $a_{\textup{res}} = \eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)\cdot\frac{a-\mu(a)}{\sigma(a\prime)}$ , which depends only on the unnormalized field $a$ (which is what $\mathbf{add}=-\frac{\mu(a)\cdot\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$ and multiplying by $\mathbf{mul}=\frac{\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)}{\sigma(a\prime)}$ .
anemoi-datasets
actually computes the tendencies statistics for). Thus, the residual scaling consists of addingIn our implementation, the geometric mean$\eta\left(\frac{\sigma(a\prime)}{\sigma(a)}\right)$ is computed iteratively within the main $\mathbf{add}$ and $\mathbf{mul}$ (notice that we use the logarithmic definition of geometric mean). If the tendencies'
for loop
, and multiplied afterwards to bothstdev
doesn't exist in thestatistics
dictionary (because the tendencies statistics weren't computed during the creation of the dataset), then the code fallbacks to using thestdev
instead, in which case the formulae above reduces to a mean-std normalization.Additionally, we added the tendencies'
stdev
(and its ratio to thestdev
) in theinspect
command inanemoi-datasets
.Describe alternatives you've considered
No response
Additional context
No response
Organisation
Predictia Intelligent Data Solutions - DestinationEarth393
The text was updated successfully, but these errors were encountered: