Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance #132109

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Apr 5, 2025

Speed up conversion from bytes-like objects like bytearray while keeping conversion from bytes stable.

On a --with-lto --enable-optimizaitons build on my 64 bit Linux box:

new:

from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns

old:

from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns

Benchmark code:

import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]

sample_bytearray = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0

runner = pyperf.Runner()

runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)

Speed up conversion from `bytes-like` objects like `bytearray` while
keeping conversion from `bytes` stable.

On a `--with-lto --enable-optimizaitons` build on my 64 bit Linux box:

new:
from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns

old:
from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns

Benchmark code:
```python
import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]

sample_bytearray = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0

runner = pyperf.Runner()

runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
```
picnixz
picnixz previously approved these changes Apr 5, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have benchmarks for very large bytes? maybe you can also say how much we're gaining in the NEWS entry that way.

@picnixz picnixz changed the title gh-132108: Add Buffer Protocol support to int.from_bytes gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance Apr 5, 2025
@picnixz
Copy link
Member

picnixz commented Apr 5, 2025

Small question but how do we cope with classes that explicitly define .__bytes__() and are buffer-like? like custom bytes objects? (this is an edge-case but still, it can be a breaking change).

Note that PyObject_Bytes first call __bytes__, then call PyBytes_FromObject if there is no __bytes__ and only then are buffer-like objects considered, but not before. So __bytes__ has a higher priority than buffer-like interface.

Instead, we should restrict ourselves to exact buffer objects, namely exact bytes and bytearray objects.

@picnixz picnixz dismissed their stale review April 5, 2025 10:04

I want to check that the edge cases are not an issue.

@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 5, 2025

Cases including classes which implement __bytes__() that return both valid (ex. bytes) and non-valid (ex. str) values are tested in test_long, test_from_bytes so I don't think any critical behavior changes there.

As you point out, if code returns a different set of machine bytes when exporting buffer protocol vs __bytes__(), this will change behavior. __bytes__() will not be run, instead just the buffer export will be called. That same issue will come up in PyObject_Bytes vs. PyBytes_FromObject calls as PyObject_Bytes checks __bytes__() first while PyBytes_FromObject does buffer protocol first and never checks __bytes__(). Code here uses PyObject_Bytes(). I don't think CPython strongly uses one or the other as "more correct".

Could match existing behavior by always checking for a __bytes__ member and !PyBytes_CheckExact() (avoid __bytes__() call for bytes as it changes performance and wasn't present before). To me that isn't as good of an implementation. It is slower (more branches), more complex code, and I prefer encouraging buffer protocol for best performance.

Could restrict to known CPython types (bytes, bytearray, array, memoryview), but that lowers the usefulness to me as systems which implement buffer and __bytes__ for efficiency can't use the newer and potentially more efficient buffer protocol here. It also requires more condition / type checks than PyObject_CheckBuffer.


Walking through common types passed to int.from_bytes() more explicitly:

  1. exact bytes, the new code will get the data using a Py_buffer rather than increment the ref to the bytes (PyBytes_CheckExact case). Perf test shows performance is stable for that.
  2. "bytes-like" objects (subclasses of bytes, bytearray, memoryview, array) used the buffer protocol to copy before, use now. Less calls/branches/checks getting to exporting the buffer. Removes a copy of that buffer into a PyBytes. Perf test shows faster for bytearray, likely is for other cases as well.
  3. list, tuple, iterable (other than str): PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  4. str: Doesn't export bytes. That fails / raises a TypeError In test_long, test_from_bytes validates that behavior. Behavior is unchanged.
  5. Objects that implement __bytes__() but don't support buffer protocol: Tested in test_long test_from_bytes (ValidBytes, InvalidBytes, RaisingBytes). These behave as before. PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  6. Objects that implement __bytes__() and support buffer protocol: The __bytes__() function will no longer be called; If it broke the API contract by returning str for instance code will now run using its buffer protocol to get the underlying machine bytes instead of throwing an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants