Skip to content

Kernel-dependent features like os.pidfd_open() #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
achimnol opened this issue Oct 8, 2023 · 11 comments
Open

Kernel-dependent features like os.pidfd_open() #193

achimnol opened this issue Oct 8, 2023 · 11 comments
Labels
compatibility Compatibility with CPython and the broader ecosystem performance Potential performance improvement

Comments

@achimnol
Copy link

achimnol commented Oct 8, 2023

Recent Python versions add optional features such as os.pidfd_open() when it is available during the build time.
This is going to be the default implementation of asyncio's child process watcher as of Python 3.12, with a fallback to the thread-based legacy implementation.

It seems that the current Python 3.11 distribution in this repo does not have os.pidfd_open().
While this is not a critical regression because most libraries depending on it has a good fallback, but I'd like to be able to adopt such new kernel features.

I'm not sure how the current release process could handle this. Maybe we need to have multiple different builds like the x86_64 v2/v3/... CPU generations, by several Linux kernel versions. I'm afraid that this would incur too much burden for the build infra.

I'm just reporting this issue as a future reference, though.

@indygreg
Copy link
Collaborator

indygreg commented Oct 8, 2023

The way things currently work is that we build the CPython distributions against very old kernel headers and glibc to ensure maximum binary portability by targeting a very old, ~universally supported syscall + glibc API surface.

CPython currently uses compile time checks for features like pidfd_open() support. Literally a #if defined(__linux__) && defined(__NR_pidfd_open) in C source code. So if the feature isn't present at compile time, you don't get it at run time.

On macOS, CPython is using Mach-O weak symbols and run-time checks allow us to reference macOS SDK APIs that aren't available in all machines. If they are present at run-time, CPython can use them. If they aren't, the features depending on them aren't available. This is ideal from an end-user perspective because it doesn't penalize the common user running on modern macOS by depriving them of newer features.

On Linux, ELF has support for weak symbols/linking. This is conceptually similar to Mach-O's similar feature. Usually you add a compiler #pragma or similar preprocessor directive to indicate a symbol is weakly linked. If the symbol resolves at run-time, the symbol/function address is non-0 and you can call it.

Unfortunately, I don't believe CPython has any support for weak symbols on Linux/ELF. So getting run-time conditional features isn't trivially achievable. (I'd have to page this in my brain but I want to say there are some practical limitations of weak symbols on ELF that may make their use non-viable. Even if there are, there are similar features like IFUNC that could potentially be employed for dynamic run-time dispatch support.)

That's a long way of saying that features like pidfd_open() currently require separate build variants to work. That leads to an explosion of build variants targeting various Linux + glibc feature levels. That becomes unwieldy very fast.

It might be worth engaging upstream CPython about supporting weak symbols, IFUNC, or similar run-time dynamic feature detection on Linux/ELF like they do on macOS. This would allow pre-built CPython binaries to have better performance and features versus what is achievable today.

@gpshead is this worth a discussion in a CPython forum? If so, which one?

@achimnol
Copy link
Author

achimnol commented Oct 8, 2023

Thanks for the detailed explanation!
I don't expect this issue could be fixed in the near future, but appreciate that we could start a discussion on addition of weak symbol support in Linux.

@achimnol
Copy link
Author

achimnol commented Oct 9, 2023

Just a note for others: The motivation of my question/request is that when distributing Python-based apps to non-controllable "enterprise" environments (e.g., air-gapped customer-owned clusters), we need to be able to enable/disable such kernel-specific features based on the runtime availability without rebuilding everything for each customer site.

@HackAttack
Copy link

I have the same problem but with os.memfd_create, which requires glibc 2.27. I can work around it (in my Bazel build) by using the host python, but that sort of defeats the purpose.

@finn-ball
Copy link

The way things currently work is that we build the CPython distributions against very old kernel headers and glibc to ensure maximum binary portability by targeting a very old, ~universally supported syscall + glibc API surface.

Would a short term solution for some users, not be to simply update the docker container for one with a newer glibc, such as this and build it themselves?

diff --git a/cpython-unix/base.Dockerfile b/cpython-unix/base.Dockerfile
index 76811a5..3844827 100644
--- a/cpython-unix/base.Dockerfile
+++ b/cpython-unix/base.Dockerfile
@@ -1,5 +1,5 @@
 # Debian Jessie.
-FROM debian@sha256:32ad5050caffb2c7e969dac873bce2c370015c2256ff984b70c1c08b3a2816a0
+FROM debian:bookworm
 MAINTAINER Gregory Szorc <[email protected]>
 
 RUN groupadd -g 1000 build && \
@@ -19,10 +19,10 @@ WORKDIR '/build'
 
 # Jessie's signing keys expired in late 2022. So need to add [trusted=yes] to force trust.
 # Jessie stopped publishing snapshots in March 2023.
-RUN for s in debian_jessie debian_jessie-updates debian-security_jessie/updates; do \
-      echo "deb [trusted=yes] http://snapshot.debian.org/archive/${s%_*}/20230322T152120Z/ ${s#*_} main"; \
-    done > /etc/apt/sources.list && \
-    ( echo 'quiet "true";'; \
+# RUN for s in debian_jessie debian_jessie-updates debian-security_jessie/updates; do \
+      # echo "deb [trusted=yes] http://snapshot.debian.org/archive/${s%_*}/20230322T152120Z/ ${s#*_} main"; \
+    #  done > /etc/apt/sources.list && \
+RUN ( echo 'quiet "true";'; \
       echo 'APT::Get::Assume-Yes "true";'; \
       echo 'APT::Install-Recommends "false";'; \
       echo 'Acquire::Check-Valid-Until "false";'; \
diff --git a/cpython-unix/build.cross.Dockerfile b/cpython-unix/build.cross.Dockerfile
index aa17d6c..5c28c7a 100644
--- a/cpython-unix/build.cross.Dockerfile
+++ b/cpython-unix/build.cross.Dockerfile
@@ -1,5 +1,5 @@
 # Debian Stretch.
-FROM debian@sha256:cebe6e1c30384958d471467e231f740e8f0fd92cbfd2a435a186e9bada3aee1c
+FROM debian:bookworm
 MAINTAINER Gregory Szorc <[email protected]>
 
 RUN groupadd -g 1000 build && \
@@ -20,10 +20,10 @@ WORKDIR '/build'
 # Stretch stopped publishing snapshots in April 2023. Last snapshot
 # is 20230423T032533Z. But there are package authentication issues
 # with this snapshot.
-RUN for s in debian_stretch debian_stretch-updates debian-security_stretch/updates; do \
-      echo "deb http://snapshot.debian.org/archive/${s%_*}/20221105T150728Z/ ${s#*_} main"; \
-    done > /etc/apt/sources.list && \
-    ( echo 'quiet "true";'; \
+# RUN for s in debian_stretch debian_stretch-updates debian-security_stretch/updates; do \
+#       echo "deb http://snapshot.debian.org/archive/${s%_*}/20221105T150728Z/ ${s#*_} main"; \
+#     done > /etc/apt/sources.list && \
+RUN ( echo 'quiet "true";'; \
       echo 'APT::Get::Assume-Yes "true";'; \
       echo 'APT::Install-Recommends "false";'; \
       echo 'Acquire::Check-Valid-Until "false";'; \
diff --git a/cpython-unix/rust.Dockerfile b/cpython-unix/rust.Dockerfile
index 82a0a92..15f0e97 100644
--- a/cpython-unix/rust.Dockerfile
+++ b/cpython-unix/rust.Dockerfile
@@ -4,5 +4,5 @@ RUN apt-get install \
     curl \
     libc6-dev \
     python2.7 \
-    python \
+    python3 \
     tar \
diff --git a/cpython-unix/xcb.Dockerfile b/cpython-unix/xcb.Dockerfile
index 0480eca..acdf77c 100644
--- a/cpython-unix/xcb.Dockerfile
+++ b/cpython-unix/xcb.Dockerfile
@@ -1,3 +1,3 @@
 {% include 'build.Dockerfile' %}
 RUN apt-get install \
-    python
+    python3
diff --git a/cpython-unix/xcb.cross.Dockerfile b/cpython-unix/xcb.cross.Dockerfile
index cc003ff..154a5ba 100644
--- a/cpython-unix/xcb.cross.Dockerfile
+++ b/cpython-unix/xcb.cross.Dockerfile
@@ -1,3 +1,3 @@
 {% include 'build.cross.Dockerfile' %}
 RUN apt-get install \
-    python
+    python3

@indygreg
Copy link
Collaborator

Yes, if you want the builds to pick up newer features from newer glibc and Linux the way to do that is to modernize those versions via the container base image.

It has been a few years and it may be worth making that change universally. It depends what CPython upstream is doing with ABI compatibility in wheels. (We want our distros to be compatible with oldest ABI seen in binary wheels in the wild.) I almost filed a standalone issue to track this a few hours ago...

@charliermarsh charliermarsh added performance Potential performance improvement compatibility Compatibility with CPython and the broader ecosystem labels Dec 18, 2024
@FFY00
Copy link

FFY00 commented Dec 18, 2024

I looked into this and it doesn't seem overly complicated to move the compatibility checks to runtime, though I am not really convinced it's worth the trouble over just publishing a couple of different builds targetting the already defined manylinux and manymusl platform tags.

The main issue with supporting this upstream is the complexity and maintenance burden, though it's definitely worth a discussion.

@indygreg
Copy link
Collaborator

Upstream already uses weak linking on macOS so they can target an older SDK but leverage features from a newer one.

The Apple SDKs do make this easier than on Linux. (I recall there being caveats with weak linking and runtime availability checking in Linux but I can't recall specifics.) You should engage upstream about doing this on Linux!

@zanieb
Copy link
Member

zanieb commented Dec 19, 2024

We chatted on Discord and it sounds like the upstream is willing to accept patches for this on Linux. I intend to do so when I have the chance!

We'll also probably add a build variant targeting one (or more) of the latest manylinux libc targets, since the added weak linking would only be available on 3.14+.

@achimnol
Copy link
Author

Great to hear some progress here! 👍🏼

@heiner
Copy link

heiner commented Mar 5, 2025

Commenting to help search engines: This is also the reason for

>>> os.memfd_create
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'os' has no attribute 'memfd_create'

For uv venv, this can be fixed by using system Python instead, e.g. uv venv --python 3.10 --python-preference=only-system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Compatibility with CPython and the broader ecosystem performance Potential performance improvement
Projects
None yet
Development

No branches or pull requests

8 participants