-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-89727: Fix os.walk RecursionError on deep trees #99803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
99f4691
3552d49
3078ea6
f37bbe8
f26a5b8
955d21b
8fbfb2e
ef46eda
e261b9f
5ec53f7
507b650
1c35610
73138a6
2814cc5
37f3cc7
d9c766c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -343,86 +343,96 @@ def walk(top, topdown=True, onerror=None, followlinks=False): | |
return _walk(fspath(top), topdown, onerror, followlinks) | ||
|
||
def _walk(top, topdown, onerror, followlinks): | ||
dirs = [] | ||
nondirs = [] | ||
walk_dirs = [] | ||
|
||
# We may not have read permission for top, in which case we can't | ||
# get a list of the files the directory contains. os.walk | ||
# always suppressed the exception then, rather than blow up for a | ||
# minor reason when (say) a thousand readable directories are still | ||
# left to visit. That logic is copied here. | ||
try: | ||
# Note that scandir is global in this module due | ||
# to earlier import-*. | ||
scandir_it = scandir(top) | ||
except OSError as error: | ||
if onerror is not None: | ||
onerror(error) | ||
return | ||
stack = [(False, top)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will it make more sense to use a deque here? They are optimized for appending and popping while having the same interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think when append and pop occur only at the end (as in this case), performance of list and deque is basically the same. When append/pop can occur at the start, then you need deque ("double-ended queue") to avoid O(n) copy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory I think a deque is a slightly better fit, since we're only popping and appending (at a certain point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, list has to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, good points. I just did some rough benchmarking as well and found no real difference. Also, I didn't find the exact cause, but adding I'll switch back to list |
||
while stack: | ||
is_result, top = stack.pop() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like the In case we will not find a better name, I believe we should add a comment describing what this variable signifies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I wasn't sure what to call it. Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like that would make a lot more sense, yeah. I used |
||
if is_result: | ||
yield top | ||
continue | ||
|
||
with scandir_it: | ||
while True: | ||
try: | ||
dirs = [] | ||
nondirs = [] | ||
walk_dirs = [] | ||
|
||
# We may not have read permission for top, in which case we can't | ||
# get a list of the files the directory contains. os.walk | ||
# always suppressed the exception then, rather than blow up for a | ||
# minor reason when (say) a thousand readable directories are still | ||
# left to visit. That logic is copied here. | ||
jonburdo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
try: | ||
# Note that scandir is global in this module due | ||
# to earlier import-*. | ||
jonburdo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
scandir_it = scandir(top) | ||
except OSError as error: | ||
if onerror is not None: | ||
onerror(error) | ||
continue | ||
|
||
cont = False | ||
with scandir_it: | ||
while True: | ||
try: | ||
entry = next(scandir_it) | ||
except StopIteration: | ||
try: | ||
entry = next(scandir_it) | ||
except StopIteration: | ||
break | ||
except OSError as error: | ||
if onerror is not None: | ||
onerror(error) | ||
cont = True | ||
break | ||
except OSError as error: | ||
if onerror is not None: | ||
onerror(error) | ||
return | ||
|
||
try: | ||
is_dir = entry.is_dir() | ||
except OSError: | ||
# If is_dir() raises an OSError, consider that the entry is not | ||
# a directory, same behaviour than os.path.isdir(). | ||
is_dir = False | ||
|
||
if is_dir: | ||
dirs.append(entry.name) | ||
else: | ||
nondirs.append(entry.name) | ||
try: | ||
is_dir = entry.is_dir() | ||
except OSError: | ||
# If is_dir() raises an OSError, consider that the entry is not | ||
# a directory, same behaviour than os.path.isdir(). | ||
jonburdo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is_dir = False | ||
|
||
if not topdown and is_dir: | ||
# Bottom-up: recurse into sub-directory, but exclude symlinks to | ||
# directories if followlinks is False | ||
if followlinks: | ||
walk_into = True | ||
if is_dir: | ||
dirs.append(entry.name) | ||
else: | ||
try: | ||
is_symlink = entry.is_symlink() | ||
except OSError: | ||
# If is_symlink() raises an OSError, consider that the | ||
# entry is not a symbolic link, same behaviour than | ||
# os.path.islink(). | ||
is_symlink = False | ||
walk_into = not is_symlink | ||
|
||
if walk_into: | ||
walk_dirs.append(entry.path) | ||
|
||
# Yield before recursion if going top down | ||
if topdown: | ||
yield top, dirs, nondirs | ||
|
||
# Recurse into sub-directories | ||
islink, join = path.islink, path.join | ||
for dirname in dirs: | ||
new_path = join(top, dirname) | ||
# Issue #23605: os.path.islink() is used instead of caching | ||
# entry.is_symlink() result during the loop on os.scandir() because | ||
# the caller can replace the directory entry during the "yield" | ||
# above. | ||
if followlinks or not islink(new_path): | ||
yield from _walk(new_path, topdown, onerror, followlinks) | ||
else: | ||
# Recurse into sub-directories | ||
for new_path in walk_dirs: | ||
yield from _walk(new_path, topdown, onerror, followlinks) | ||
# Yield after recursion if going bottom up | ||
yield top, dirs, nondirs | ||
nondirs.append(entry.name) | ||
|
||
if not topdown and is_dir: | ||
# Bottom-up: traverse into sub-directory, but exclude symlinks to | ||
# directories if followlinks is False | ||
if followlinks: | ||
walk_into = True | ||
else: | ||
try: | ||
is_symlink = entry.is_symlink() | ||
except OSError: | ||
# If is_symlink() raises an OSError, consider that the | ||
# entry is not a symbolic link, same behaviour than | ||
# os.path.islink(). | ||
is_symlink = False | ||
walk_into = not is_symlink | ||
|
||
if walk_into: | ||
walk_dirs.append(entry.path) | ||
if cont: | ||
continue | ||
|
||
# Yield before sub-directory traversal if going top down | ||
if topdown: | ||
yield top, dirs, nondirs | ||
# Traverse into sub-directories | ||
islink, join = path.islink, path.join | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're caching these functions in locals, might as well do it outside the main while loop to micro-optimize a bit more. |
||
for dirname in reversed(dirs): | ||
new_path = join(top, dirname) | ||
# Issue #23605: os.path.islink() is used instead of caching | ||
jonburdo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# entry.is_symlink() result during the loop on os.scandir() because | ||
# the caller can replace the directory entry during the "yield" | ||
# above. | ||
if followlinks or not islink(new_path): | ||
stack.append((False, new_path)) | ||
else: | ||
# Yield after sub-directory traversal if going bottom up | ||
stack.append((True, (top, dirs, nondirs))) | ||
# Traverse into sub-directories | ||
for new_path in reversed(walk_dirs): | ||
stack.append((False, new_path)) | ||
JelleZijlstra marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
__all__.append("walk") | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
Fix issue with :func:`os.walk` where a :exc:`RecursionError` would occur on | ||
deep directory structures by adjusting the implementation of | ||
:func:`os._walk` to be iterative instead of recursive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove the
_walk
function and simply put everything underwalk
because the second one was only necessary because of sys.audit not making sense for recursive calls.