Skip to content

subprocess.run gets in an infinite loop closing every possible file descriptor #127177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edre opened this issue Nov 22, 2024 · 4 comments
Closed
Labels
type-bug An unexpected behavior, bug, or error

Comments

@edre
Copy link

edre commented Nov 22, 2024

Bug report

Bug description:

Under some weird circumstances, calling subprocess.run causes the child process to be locked in a loop closing all possible file descriptors. Perusing the code this is very likely some misfiring of _close_open_fds_safe. Paging @gpshead

The weird circumstances:

  • In a chroot, so the /proc/ filesystem is inaccessible.
  • Running in docker. Maybe affects the RLIMIT_NOFILE check?
  • Using musl? I reproduced this on an alpine container but not a debian one.

Reproduction setup:

== Dockerfile ==
FROM alpine:3.20
RUN apk add python3 rust strace
# Just make some static binary.
RUN echo 'int main(){}' > main.c
RUN mkdir /sb && gcc -static -o /sb/main main.c 
COPY sandbox.py .
CMD ["strace", "-f", "python3", "sandbox.py"]

== sandbox.py ==
import os, subprocess
os.chdir("/sb")
os.chroot("/sb")
subprocess.run("/main")

$ docker build . --tag sandbox:test && docker run sandbox:test 2>&1 | less
...
[pid    10] open("/proc/self/fd", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid    10] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
[pid    10] prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
[pid    10] close(3)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(5)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(6)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(7)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(8)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(9)                    = -1 EBADF (Bad file descriptor)
[pid    10] close(10)                   = -1 EBADF (Bad file descriptor)
[pid    10] close(11)                   = -1 EBADF (Bad file descriptor)
[pid    10] close(12)                   = -1 EBADF (Bad file descriptor)
...

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@edre edre added the type-bug An unexpected behavior, bug, or error label Nov 22, 2024
@gpshead
Copy link
Member

gpshead commented Nov 23, 2024

It's basically working as "intended" but this is not a code path you want to be running in. If you're seeing this as a performance issue, that suggests sysconf(_SC_OPEN_MAX) was available and returned a huge number impacting

end_fd = Py_MIN(safe_get_max_fd(), INT_MAX);
.

Ideally configure your environment so the faster paths can be taken (such as having procfs). If there are alternate ways to do similar things to safely get a list of open fds or better understand the actual max fd that would work in this combination of "unique" environment bits... can you propose some logic and ways we could regression test it?

@edre
Copy link
Author

edre commented Nov 23, 2024

I had this code working ~2 years ago and some change in python, docker, libc, or something else has resulted in this.

What defines the result of sysconf(_SC_OPEN_MAX)? Is this set during python's compilation? Is this a flag in the libc?

@edre
Copy link
Author

edre commented Nov 23, 2024

I found that I can just query this value with os.sysconf. My alpine system reports 1024, which would probably be fine, but the alpine environment in docker reports 1048576, which takes a long time to close through.

@edre
Copy link
Author

edre commented Nov 23, 2024

Well I couldn't figure out how to use sysctl in my environment, but it was easy enough to setrlimit(NOFILE) to a less ridiculous number. Thanks for the pointers! Closing this as I don't think there's anything reasonable that python could do differently here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants