-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-129005: Align FileIO.readall between _pyio and _io #129705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
50389df
4520ecc
6c3ac57
0e15dee
b50fb66
4da4e31
09b52ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1674,22 +1674,31 @@ def readall(self): | |
except OSError: | ||
pass | ||
|
||
result = bytearray() | ||
result = bytearray(bufsize) | ||
bytes_read = 0 | ||
while True: | ||
if len(result) >= bufsize: | ||
bufsize = len(result) | ||
bufsize += max(bufsize, DEFAULT_BUFFER_SIZE) | ||
n = bufsize - len(result) | ||
if bytes_read >= bufsize: | ||
# Parallels _io/fileio.c new_buffersize | ||
if bufsize > 65536: | ||
addend = bufsize >> 3 | ||
else: | ||
addend = 256 + bufsize | ||
if addend < DEFAULT_BUFFER_SIZE: | ||
addend = DEFAULT_BUFFER_SIZE | ||
bufsize += addend | ||
result.resize(bufsize) | ||
|
||
assert len(result) - bytes_read >= 1, "Must read at least one byte" | ||
try: | ||
chunk = os.read(self._fd, n) | ||
n = os.readinto(self._fd, memoryview(result)[bytes_read:]) | ||
except BlockingIOError: | ||
if result: | ||
if bytes_read: | ||
break | ||
return None | ||
if not chunk: # reached the end of the file | ||
if n == 0: # Reached the end of the file | ||
break | ||
result += chunk | ||
|
||
bytes_read += n | ||
result.resize(bytes_read) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the resize "shrink" in bytearray doesn't actually resize unless the buffer's "capacity" is 2x the requested size (https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L201-L214). Just updates its internal "this is how long the bytes is" counter (which for things like full-file readall with known size, this should already be just one byte over the right size). My plan currently is to make it so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I'm fine with using |
||
return bytes(result) | ||
|
||
def readinto(self, buffer): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
``_pyio.FileIO.readall()`` now allocates, resizes, and fills a data buffer | ||
using the same algorithm ``_io.FileIO.readall()`` uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the C code, new_buffersize() argument is bytes_read, not bufsize. You may keep new_buffersize() as a private module-level function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the loop to no longer use
bufsize
at all, this is the only line that used it, and it feels more Pythonic to me to just uselen(result)
.That enables rewriting to:
which feels cleaner, but also starts changing structure relative to
_io
version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decided to refactor to this. Control flow feels a lot simpler to me and a lot more readable than the branches and breaks.