Skip to content

Warning about the use of site-specific path when the site initialization has been disabled #126793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FFY00 opened this issue Nov 13, 2024 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-sysconfig type-bug An unexpected behavior, bug, or error

Comments

@FFY00
Copy link
Member

FFY00 commented Nov 13, 2024

Bug report

Bug description:

Even when disabling the site initialization (passing -S), sysconfig.get_paths() still returns site-specific paths, as there's no way in the current API to express the unavailability of such paths. A lot of code does not take this use-case into account, and assumes these are paths that are currently in-use for the active environment.

$ python -S
Python 3.14.0a1+ experimental free-threading build (heads/main:de0d5c6e2e1, Oct 23 2024, 15:37:46) [GCC 14.2.1 20240910] on linux
>>> import sys
>>> sys.path
['', '/usr/local/lib/python314t.zip', '/usr/local/lib/python3.14t', '/usr/local/lib/python3.14t/lib-dynload']
>>> import sysconfig
>>> sysconfig.get_paths()
{'stdlib': '/usr/local/lib/python3.14t', 'platstdlib': '/usr/local/lib/python3.14t', 'purelib': '/usr/local/lib/python3.14t/site-packages', 'platlib': '/usr/local/lib/python3.14t/site-packages', 'include': '/usr/local/include/python3.14td', 'platinclude': '/usr/local/include/python3.14td', 'scripts': '/usr/local/bin', 'data': '/usr/local'}

An example of this is ensurepip, which will still proceed and install pip to the site directories, despite the site module being disabled.

$ python -S -m ensurepip
Looking in links: /tmp/tmpqcwxcy4k
Requirement already satisfied: pip in /home/anubis/.virtualenvs/test-sysconfig-paths/lib/python3.14t/site-packages (24.3.1)

IMO, it should refuse to install when the site module is disabled, as there are no valid paths to install pip to under the current environment.

To make things worse, the result of sysconfig.get_paths(), currently, is inconsistent, and does not always return the same result (see #126789).

Newer APIs (#103481), should handle this scenario directly, by expressing directly the unavailability of purelib and platlib, for example.

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

@FFY00 FFY00 added the type-bug An unexpected behavior, bug, or error label Nov 13, 2024
@zware
Copy link
Member

zware commented Nov 13, 2024

I'm not sure I agree with the assertion that purelib and platlib should be considered unavailable when running under -S. All that should mean is that they're not actually added to sys.path automatically, not that they're totally off limits. In fact, there's nothing stopping one from running python3 -i -S -c 'import site; site.main()' as a weird way to start up a regular REPL with no headers :)

@FFY00
Copy link
Member Author

FFY00 commented Nov 13, 2024

I would agree with that if the -S flag was as simple as you describe it, but it isn't 😅

What makes things messy is that on top of the customizations you mentioned, the site module is also responsible for implementing virtual environments. Disabling the site initialization also disables the virtual environment activation mechanism.
If we were to think conceptually of an environment resultant of -S simply as a "locked down" version of another existing environment, then that existing environment would be the installation-wide environment (eg. /usr/lib/python3.14t/site-packages), rather than the virtual environment (eg. ~/.virtualenvs/test-sysconfig-paths/lib/python3.14t/site-packages).

In practice, this means that python -S -m ensurepip, python -S -m pip install ..., or other commands that customize a "fully-fledged" environment would be acting on the installation-wide environment, even when running these commands inside a virtual environment. IMO this is a major footgun and, to me, is an undesirable behavior left over from of historic implementation decisions.

I think we should think conceptually of -S more as an environment-less way of running Python, than a limited environment.
I am happy to hear other points of view though.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 14, 2024
@zooba
Copy link
Member

zooba commented Nov 18, 2024

I think we should think conceptually of -S more as an environment-less way of running Python, than a limited environment.

This sounds reasonable to me, however...

Disabling the site initialization also disables the virtual environment activation mechanism.

Not entirely (a lot is handled in getpath, IIRC independent of the -S option), but if the above definition sticks, then it should disable it entirely. Effectively you should only get the stdlib (and any explicit additions), regardless of whether you launch through a venv or not. And so...

this means that python -S -m ensurepip, python -S -m pip install ..., or other commands that customize a "fully-fledged" environment would be acting on the installation-wide environment, even when running these commands inside a virtual environment

Here's where it gets complicated. There's strictly nothing wrong with ensurepip or pip install always installing into the default "site", regardless of whether it's active or not, but that is going to mean something weird if we don't clearly define what launching a venv with -S means. Hopefully we can define that in a way that makes it consistent and doesn't actually change existing behaviour.

Perhaps this works:

  • launching a venv overrides the default site directory
  • site.main() adds the default site directory (and etc...) when called
  • -S skips calling site.main()
  • [ensure]pip installs to the default site directory regardless of whether it's active or not

I'm sure there are new edge cases introduced by a definition like that (if it works at all, I haven't thought it all the way through yet), but perhaps those are easier to fix/handle?

@FFY00
Copy link
Member Author

FFY00 commented Nov 18, 2024

Not entirely (a lot is handled in getpath, IIRC independent of the -S option), but if the above definition sticks, then it should disable it entirely. Effectively you should only get the stdlib (and any explicit additions), regardless of whether you launch through a venv or not. And so...

(I realized I went a bit off track when writing the reply below, but I am keeping it since it's useful)

Right, I am aware that getpath has some code that covers virtual environments, but AFAIK nothing that changes any other aspects of the initialization, at least not in any meaningful way anymore. That may have been the case in the past, but the only code left now is for finding sys._base_executable in niche situations where it is unknown.

Here's my understanding of what happens:

  • getpath looks for a pyvenv.cfg (the indicator of a virtual environment) near the executable or, if unknown, the current directory
  • If base_executable from PyConfig is not set a home key is specified in pyvenv.cfg, it searches for the interpreter executable there and sets it base_executable to its path if found
  • If no process executable was provided, base_executable is used instead when looking for the ._pth file
  • sys set sys._base_executable to the value in PyConfig

So, there may be a possibility for sys._base_executable to be set to an interpreter from a virtual environment, though this looks like a very niche case, and am not sure if there's are any relevant real world use-case.
Even in that possibility, it only seems to affect ._pth file results in -E and -S, I don't think it's relevant here.
Considering this,

If there's anything else that I am missing, let me know.

Here's where it gets complicated. There's strictly nothing wrong with ensurepip or pip install always installing into the default "site", regardless of whether it's active or not, but that is going to mean something weird if we don't clearly define what launching a venv with -S means. Hopefully we can define that in a way that makes it consistent and doesn't actually change existing behaviour.

Perhaps this works:

* launching a venv overrides the default site directory

* `site.main()` adds the default site directory (and etc...) when called

* `-S` skips calling `site.main()`

* `[ensure]pip` installs to the default site directory regardless of whether it's active or not

I'm sure there are new edge cases introduced by a definition like that (if it works at all, I haven't thought it all the way through yet), but perhaps those are easier to fix/handle?

I think what we are running into here is the fact that virtual environments probably shouldn't be implemented by the site module. IMO, the base mechanism (prefix relocation via pyvenv.cfg) should be provided as a core feature, and site could still keep being responsible for the environment customization.

My proposal to move forward:

  • Start setting prefix and base_prefix to the pyvenv.cfg directory in getpath again
  • Document the pyvenv.cfg detection as part of the interpreter initialization, instead of part of the site import
  • Update the all other documentation to reflect this change

@zooba
Copy link
Member

zooba commented Nov 18, 2024

Even in that possibility, it only seems to affect ._pth file results in -E and -S, I don't think it's relevant here.

FWIW, the mere presence of a ._pth file implies both -E and -S (and an import site in the ._pth file removes -S again).

@FFY00
Copy link
Member Author

FFY00 commented Nov 26, 2024

This is no longer relevant, as GH-126985 made virtual environments no longer dependent on the site module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-sysconfig type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants