Skip to content

GH-100425: Improve accuracy of builtin sum() for float inputs #100426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 23, 2022
Merged
4 changes: 4 additions & 0 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1733,6 +1733,10 @@ are always available. They are listed here in alphabetical order.
.. versionchanged:: 3.8
The *start* parameter can be specified as a keyword argument.

.. versionchanged:: 3.12 Summation of floats switched to an algorithm
that gives higher accuracy on most builds.


.. class:: super()
super(type, object_or_type=None)

Expand Down
7 changes: 1 addition & 6 deletions Doc/library/math.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,12 +108,7 @@ Number-theoretic and representation functions
.. function:: fsum(iterable)

Return an accurate floating point sum of values in the iterable. Avoids
loss of precision by tracking multiple intermediate partial sums:

>>> sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1])
0.9999999999999999
>>> fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1])
1.0
loss of precision by tracking multiple intermediate partial sums.

The algorithm's accuracy depends on IEEE-754 arithmetic guarantees and the
typical case where the rounding mode is half-even. On some non-Windows
Expand Down
2 changes: 1 addition & 1 deletion Doc/tutorial/floatingpoint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ added onto a running total. That can make a difference in overall accuracy
so that the errors do not accumulate to the point where they affect the
final total:

>>> sum([0.1] * 10) == 1.0
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 == 1.0
False
>>> math.fsum([0.1] * 10) == 1.0
True
Expand Down
14 changes: 14 additions & 0 deletions Lib/test/test_builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,20 @@
from test.support.os_helper import (EnvironmentVarGuard, TESTFN, unlink)
from test.support.script_helper import assert_python_ok
from test.support.warnings_helper import check_warnings
from test.support import requires_IEEE_754
from unittest.mock import MagicMock, patch
try:
import pty, signal
except ImportError:
pty = signal = None


# Detect evidence of double-rounding: sum() does not always
# get improved accuracy on machines that suffer from double rounding.
x, y = 1e16, 2.9999 # use temporary values to defeat peephole optimizer
HAVE_DOUBLE_ROUNDING = (x + y == 1e16 + 4)


class Squares:

def __init__(self, max):
Expand Down Expand Up @@ -1641,6 +1648,13 @@ def __getitem__(self, index):
sum(([x] for x in range(10)), empty)
self.assertEqual(empty, [])

@requires_IEEE_754
@unittest.skipIf(HAVE_DOUBLE_ROUNDING,
"sum accuracy not guaranteed on machines with double rounding")
def test_sum_accuracy(self):
self.assertEqual(sum([0.1] * 10), 1.0)
self.assertEqual(sum([1.0, 10E100, 1.0, -10E100]), 2.0)

def test_type(self):
self.assertEqual(type(''), type('123'))
self.assertNotEqual(type(''), type(()))
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve the accuracy of ``sum()`` with compensated summation.
16 changes: 15 additions & 1 deletion Python/bltinmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -2532,17 +2532,31 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)

if (PyFloat_CheckExact(result)) {
double f_result = PyFloat_AS_DOUBLE(result);
double c = 0.0;
double x, t;
Py_SETREF(result, NULL);
while(result == NULL) {
item = PyIter_Next(iter);
if (item == NULL) {
Py_DECREF(iter);
if (PyErr_Occurred())
return NULL;
if (c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for c being zero? Seems that unconditionally adding 0.0 (if it is 0) to f_result would be cheap and harmless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for c being zero?

It does avoid losing the sign on a negative zero result:

>>> sum([-0.0, -0.0], start=-0.0)
-0.0

Admittedly that's a fairly niche corner case; I could live with the behaviour changing.

Thinking about corner cases, the compensation is also going to interfere with infinities and overflow. On this branch:

>>> sum([float("inf"), float("inf")])
nan
>>> sum([1e308, 1e308])
nan

But on main we get infinities as expected:

>>> sum([float("inf"), float("inf")])
inf
>>> sum([1e308, 1e308])
inf

We should try to keep the existing behaviour here. I think it should just be a matter of not adding the correction if it's an infinity or nan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit 88bdb92 in 2019, Serhiy added these tests:

    self.assertEqual(repr(sum([-0.0])), '0.0')
    self.assertEqual(repr(sum([-0.0], -0.0)), '-0.0')
    self.assertEqual(repr(sum([], -0.0)), '-0.0')

It looks like he thinks there should be an invariant that sign of the start value should be preserved when iterable sum is equal to zero or empty. This seems dubious to me, but I rolled with it and added the "if (c)" test to preserve the invariant.

Copy link
Member

@mdickinson mdickinson Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's dubious: it's consistent with IEEE 754 rules that (under the default ties-to-even rounding mode) a sum of negative zeros is negative zero. It only affects the case where both the sum and the start value are negative zero; if the sum is positive zero it has no effect:

>>> sum([0.0], start=-0.0)  # same as (-0.0) + 0.0
0.0
>>> sum([-0.0], start=-0.0)  # should be the same as (-0.0) + (-0.0)
-0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be an invariant that sign of the start value should be preserve

That's fine - but surely something so excruciatingly obscure deserves a comment 😜 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to keep the existing behaviour here. I think it should just be a matter of not adding the correction if it's an infinity or nan.

Done.

That's fine - but surely something so excruciatingly obscure deserves a comment

Done.

f_result += c;
}
return PyFloat_FromDouble(f_result);
}
if (PyFloat_CheckExact(item)) {
f_result += PyFloat_AS_DOUBLE(item);
// Improved Kahan–Babuška algorithm by Arnold Neumaier
// https://www.mat.univie.ac.at/~neum/scan/01.pdf
x = PyFloat_AS_DOUBLE(item);
t = f_result + x;
if (fabs(f_result) >= fabs(x)) {
c += (f_result - t) + x;
} else {
c += (x - t) + f_result;
}
f_result = t;
_Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
continue;
}
Expand Down