Skip to content

gh-129005: Align FileIO.readall between _pyio and _io #129705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 7, 2025

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Feb 5, 2025

Utilize bytearray.resize() and os.readinto() to reduce copies and match behavior of _io.FileIO.readall().

There is still an extra copy which means twice the memory required compared to FileIO because there isn't a zero-copy path from bytearray -> bytes currently.

On my system reading a 2GB file
./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v

Goes from ~2.7 seconds -> ~2.2 seconds. The C _io implementation is ~1.2 seconds, so still some performance gap, but less.

Utilize `bytearray.resize()` and `os.readinto()` to reduce copies
and match behavior of `_io.FileIO.readall()`.

There is still an extra copy which means twice the memory required
compared to FileIO because there isn't a zero-copy  path from
`bytearray` -> `bytes` currently.

On my system reading a 2GB file
`./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v`

Goes from ~2.7 seconds -> ~2.2 seconds
@cmaloney cmaloney changed the title gh-12005: Align FileIO.readall between _pyio and _io gh-129005: Align FileIO.readall between _pyio and _io Feb 5, 2025
result += chunk

bytes_read += n
result.resize(bytes_read)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result = memoryview(result)[bytes_read:] would avoid a truncation which can imply a memory copy in the worst case, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the resize "shrink" in bytearray doesn't actually resize unless the buffer's "capacity" is 2x the requested size (https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L201-L214). Just updates its internal "this is how long the bytes is" counter (which for things like full-file readall with known size, this should already be just one byte over the right size).

My plan currently is to make it so bytes(bytearray(10)) and bytearray(b'\0' * 10) both don't copy (Ongoing discussion in https://discuss.python.org/t/add-zero-copy-conversion-of-bytearray-to-bytes-by-providing-bytes/79164). Having a memoryview would mean there's more than one reference to the bytearray, and I couldn't do / use that optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with using result.resize() here.

Co-authored-by: Victor Stinner <[email protected]>
@cmaloney
Copy link
Contributor Author

cmaloney commented Feb 6, 2025

Hypothesis test failure in binascii / pretty sure unrelated

Lib/_pyio.py Outdated
bufsize += max(bufsize, DEFAULT_BUFFER_SIZE)
n = bufsize - len(result)
if bytes_read >= bufsize:
# Parallels _io/fileio.c new_buffersize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the C code, new_buffersize() argument is bytes_read, not bufsize. You may keep new_buffersize() as a private module-level function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the loop to no longer use bufsize at all, this is the only line that used it, and it feels more Pythonic to me to just use len(result).

That enables rewriting to:

        try:
            # Read until EOF (n == 0)
            while n := os.readinto(self._fd, memoryview(result)[bytes_read:]):
                bytes_read += n
                if bytes_read >= len(result):
                    result.resize(_new_buffersize(bytes_read))
        except BlockingIOError:
            if not bytes_read:
                return None
        assert len(result) - bytes_read >= 1, \
            "os.readinto buffer size 0 will result in erroneous EOF / returns 0"
        result.resize(bytes_read)
        return bytes(result)

which feels cleaner, but also starts changing structure relative to _io version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to refactor to this. Control flow feels a lot simpler to me and a lot more readable than the branches and breaks.

@cmaloney
Copy link
Contributor Author

cmaloney commented Feb 6, 2025

Tests / Windows / build and test (Win32) failure is a urllib.error.HTTPError : HTTP error 504: Gateway Timeout [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj], believe unrelated

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner merged commit a3d5aab into python:main Feb 7, 2025
43 checks passed
@vstinner
Copy link
Member

vstinner commented Feb 7, 2025

Merged, thank you.

@cmaloney cmaloney deleted the fileio_readall branch February 7, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants