various problems with STDDEV() when numbers are equals #2029

alaingdl · 2025-04-23T11:30:02Z

STDDEV(array) should return 0.0 when all input are equal.

This problem is strongly linked with previous issue #2028 since we do used internal MOMENT() in stddev.pro in src/pro
but we should be able to have more simple way to minimize the problem

GDL> stddev(replicate(2.1, 1000))
   1.4312272E-05
GDL> stddev(replicate(2.1, 10000))
   0.00021124039
GDL> stddev(replicate(2.1d, 1000))
   3.8655093638078647E-14
GDL> stddev(replicate(2.1d, 10000))
   1.1991008231352767E-13

All values here should be equal to zero here !
The trick may be to test whether Min() and Max() are equal then shortcut the computation ?

What gives IDL 8.8 ?

IDL> stddev(replicate(2.1, 1000))
       0.0000000
IDL> stddev(replicate(2.1, 10000))
            -NaN
% Program caused arithmetic error: Floating illegal operand
IDL> stddev(replicate(2.1d, 1000))
       0.0000000000000000
IDL> stddev(replicate(2.1d, 10000))
       0.0000000000000000

Look like and interesting bug in IDL ?!

The text was updated successfully, but these errors were encountered:

GillesDuvert · 2025-04-24T07:55:18Z

@alaingdl I mention this for newbies, as you know this pretty well, the stddev results in gdl are 'correct' a they show the numeric accuracy of a series of 10000 (resp. 10000) sums and divide of floating point values.
This acccuracy is better with doubles

GDL> stddev(replicate(2.1d, 1000))
   3.8655093638078647E-14

To reproduce (and this is true for all other GDL commands involving floating-point adds/mults etc) exactly the IDL results, one would have not only to use the same machine-compied, optimised, algorithm (same order of adds, divide, etc at assembly level) but also in many cases the same amount of operations in each one of the parallel threads when OMP_NUM_THREADS is > 1. In other words: impossible.

However, you should be fair with GDL, you have pinpointed a case where IDL and GDL differ, but

IDL> stddev(replicate(2.1, 99766)) 
   5.2423966e-06

and not 0 !

So you are right, to get 0 in some cases IDL must have taken precautions for this special case.
I note that

DL> stddev(replicate(2.1, 9993))
   1.2930187e-06
IDL> stddev(replicate(2.1, 9994))
            -NaN

and

IDL> stddev(replicate(21., 9976688))
     0.071825057
IDL> stddev(replicate(21., 9976689))
            -NaN

So this is not a "all values are equal" test thing. The returned "NaN" (a bug indeed) would be a signature of the (apparently not completely successful!) attempt to a posteriori do something if the returned stddev is smaller than the expected accuracy (depending on number of atomic operations, single or double).

A 'simple' solution would be to do all computations in extended precision (we have no 128 bits double doubles, so this would work only for floats:)
z=stddev(replicate(2.1, 99766)) replace by

 zd=stddev(replicate(2.1d, 99766))
FLOAT_EPS=1.1920929e-07 ; see machar()
if zd le FLOAT_EPS then z=0 else z=zd
; easy but slower of course ^))

GillesDuvert · 2025-04-24T07:55:52Z

@alaingdl BTW thanks for your self-assignments!

alaingdl · 2025-04-24T09:49:52Z

Thanks for the message !

Yes I am quite aware of numerical accuracy ! (it is a pity, Coyote web site seems to be off, we have here a "the sky is falling" on that)

Yesterday I spend some time looking at the code for MOMENT() which is really impressive !

Without consideration of time, we would have a simple way to "solve" the problem, testing if the max of the array is equal to the min ! But it is quite expensive, needing to recourse the array looking for max & min. Maybe we can do that only when
the result is really close to EPS ? Or close to a given level ? In order to redo it if and only if we are in such a special case.

alaingdl · 2025-04-24T10:18:07Z

hum hum hum, IDL just introduced RUNNING_STATS

I also look at https://www.nv5geospatialsoftware.com/docs/accuracy_and_floating_po.html
and yes, for 2.1, we are in a case where 2.1 and 2.1d are different ...
Nevertheless we have still the problems when all inputs are equal, and the values for Skewness and Kurtosis to be at NaN

GillesDuvert · 2025-04-24T13:08:56Z

FYI I've done (as many others I suppose) a local working copy of coyote's site.

GillesDuvert · 2025-04-24T13:35:35Z

Here may not be the good place to discuss that, but I do not think GDL should reproduce IDL's behaviour at all costs:

first, we use, when uncopyrighted code source is available, the more efficient code possible, that is always more recent than the one used by IDL (at least more recent than Numerical Recipes, that specialists claim to be completely obsolete). For example, the triangulation, the fft, the median filtering, the parallel mersenne twister... different code, different resulting precision.
second, we should use only one C++ code, the best one, for all "related" IDL commands. Example: Should RUNNING_STATS be implemented, it should share the same related code as MOMENT, MEAN, SIGMA, etc. This to get the best results, and to simplify the maintainance of our code (OK, we are not yet there). I note that the need for more statistically robust results is already covered by several procedures in IDLAstro (robust_sigma.pro etc). Hopefully people realyy keen about the accuracy would already have used them.
(Besides, there are known means to preserve accuracy when computing basic sums, divide etc: just that it takes forever if you use them.)
third, IDL has undocumented features that we must support to make 'old' code run, this point is opposite to the previous 2.
And, the default value of !version.release returned by GDL, that modifies the behaviour of 'legacy' codes, will always be a problem.

It would be possible to add a global switch that would put GDL in an 'IDL enhanced compatibility' at the expense of a much slower GDL and a complication of the sources and their maintainance.

alaingdl self-assigned this Apr 23, 2025

alaingdl added the numeric inaccuracy label Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various problems with STDDEV() when numbers are equals #2029

various problems with STDDEV() when numbers are equals #2029

alaingdl commented Apr 23, 2025

GillesDuvert commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025

alaingdl commented Apr 24, 2025

alaingdl commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025

various problems with STDDEV() when numbers are equals #2029

various problems with STDDEV() when numbers are equals #2029

Comments

alaingdl commented Apr 23, 2025

GillesDuvert commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025

alaingdl commented Apr 24, 2025

alaingdl commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025

GillesDuvert commented Apr 24, 2025