-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-107956: install a static build description file #108483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After reaching a consensus on the key implementation details, I will prepare this PR for a final review (add documentation, tests, validate/fix cross-builds, etc.). We can then move to adding more information to file, which involves further discussions of what should be included and how. |
Friendly ping. I know the details in question here are fairly unconstrained, so it can be very subjective, making it a maybe difficult thing to review, but the sooner we have a consensus and set the initial details, the sooner we can progress to the more technical and actually useful followup discussions. I have some employer time now to work on this, so I wanted to make sure I take advantage of it and help move this forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good (one comment about Windows aside). I think this should also be added to PCBuild?
I've no strong thoughts on the filename; I agree with your thoughts on TOML vs JSON on the issue, though JSON would be slightly easier to write (no custom writer in the script etc).
A
def toml_dump(fd: io.TextIOBase, data: ConfigData) -> None: | ||
"""**Very** basic TOML writer. It only implements what is necessary for this use-case.""" | ||
def toml_repr(obj: object) -> str: | ||
if isinstance(obj, dict): | ||
return '{ ' + ', '.join(f'{k} = {toml_repr(v)}' for k, v in obj.items()) + ' }' | ||
else: | ||
return repr(obj) | ||
|
||
for section, entries in data.items(): | ||
print(f'[{section}]', file=fd) | ||
for name, value in entries.items(): | ||
print(f'{name} = {toml_repr(value)}', file=fd) | ||
fd.write('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work on Windows, as repr(value)
escapes backslashes and uses single quotes, which TOML interprets as literal strings.
I think it would also be nice if the parser emitted dicts as tables:
[python.version_parts]
major = 3
minor = 11
micro = 5
releaselevel = 'final'
serial = 0
The following suggestion does both, but the fix for Windows is just changing repr(obj)
to '{obj'}
.
def toml_dump(fd: io.TextIOBase, data: ConfigData) -> None: | |
"""**Very** basic TOML writer. It only implements what is necessary for this use-case.""" | |
def toml_repr(obj: object) -> str: | |
if isinstance(obj, dict): | |
return '{ ' + ', '.join(f'{k} = {toml_repr(v)}' for k, v in obj.items()) + ' }' | |
else: | |
return repr(obj) | |
for section, entries in data.items(): | |
print(f'[{section}]', file=fd) | |
for name, value in entries.items(): | |
print(f'{name} = {toml_repr(value)}', file=fd) | |
fd.write('\n') | |
def toml_dump(fd: io.TextIOBase, data: list[tuple[str, dict[str, ConfigItem]]]) -> None: | |
"""**Very** basic TOML writer. It only implements what is necessary for this use-case.""" | |
for section, entries in data: | |
if fd.tell() != 0: # don't emit a newline before the first section | |
fd.write('\n') | |
fd.write(f'[{section}]\n') | |
subsections = [] | |
for name, value in entries.items(): | |
if isinstance(value, dict): | |
subsections.append((f'{section}.{name}', value)) | |
elif isinstance(value, str): | |
fd.write(f"{name} = '{value}'\n") | |
else: | |
fd.write(f"{name} = {value!r}\n") | |
toml_dump(fd, subsections) |
(This also involves changing Line 57 to toml_dump(f, config.items())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work on Windows, as
repr(value)
escapes backslashes and uses single quotes, which TOML interprets as literal strings.
Thank you for catching this! I actually wasn't aware of the difference between basic and literal strings in TOML. I don't think we should run into any such instances here, at least for now, given the small scope, but we should try to avoid that if possible.
I think it would also be nice if the parser emitted dicts as tables:
[python.version_parts] major = 3 minor = 11 micro = 5 releaselevel = 'final' serial = 0
Yeah, I don't have any very strong thoughts about it, so we can go with whatever makes the implementation simpler.
The following suggestion does both, but the fix for Windows is just changing
repr(obj)
to'{obj'}
.
That causes other issues, like escaping quotes. I think I am just gonna make an escape table and call str.translate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is more that in the previous version, e.g. the executable
string would be 'S:\\Development\\tmp\\venv\\Scripts\\python.exe'
on Windows, which then was printed to the TOML file as executable = 'S:\\Development\\tmp\\venv\\Scripts\\python.exe'
, but then we hit the literal-vs-escaped string behaviour in TOML.
The PR as it stands now (with translation table) prints the following to TOML: executable = "S:\\\\Development\\\\tmp\\\\venv\\\\Scripts\\\\python.exe"
-- the quotation marks are right but we've now got doubled-up backslashes!
I think that the fix is:
- 0x005C: r'\\\\',
+ 0x005C: '\\\\',
A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread makes me want to vote more strongly for either JSON or simple key=value
with no quoting or escaping.
TOML parsers are not universal, which means a lot of tools are going to end up hacking together just enough of a parser to make things work enough of the time. Better to go with either something that's universal-enough, or something that's trivial to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand.
This thread makes me want to vote more strongly for either JSON or simple
key=value
with no quoting or escaping.
IMO the simple key=value
option wouldn't be a great option for the following reasons:
- Being a custom format, it requires a custom parser anyway, which no matter how simple we intend it to be, it'd probably end up resulting in more user code
- If we support different data types, the user implementation wouldn't be trivial, which I think goes against the purpose of having a "simple" mapping
- If everything is a string, since we will still need to represent non-string data (eg. ints, bools, etc.), I think users would either end up implementing the type parsing code anyway, or it would be much more cumbersome for them to use the data, as they would likely still need to convert it to a useful type so that they can actually use the information
- The lack of sections makes organization worse, makes it less ergonomic to have direct maps to the
sysconfig
API 1, and makes it more cumbersome for users to use the data (eg. theversion_parts
field would need to be split in 5 different fields, on top of that, each likely with a large prefix)
That said, JSON is a reasonable choice, I just dislike how annoying it is to edit it, maybe even painful if you aren't very well familiarized with it. Not that is a big deal, it's just my personal opinion.
TOML parsers are not universal, which means a lot of tools are going to end up hacking together just enough of a parser to make things work enough of the time. Better to go with either something that's universal-enough, or something that's trivial to parse.
That is fair, I guess, though all major languages I am aware of have a TOML parser library, and looking at TOML's adoption rate I think that will only improve, so, realistically, I don't think this is really an issue. Though, to be clear, I do completely understand the concern.
Anyway, maybe I am just being a bit too opinionated here 🤣
Footnotes
-
I would like to have direct mappings, where possible, from this file to the data structures in the new
sysconfig
API. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this back to the issue #107956
If that was meant for me, I have been waiting for the issue discussion to finish and this PR to come out of draft. |
Oh okay. I don't think there's anything necessarily unresolved in the issue regarding the implementation choices made in this PR -- the feedback I got there seems compatible with this PR. There wasn't really a clear plan set in the issue, so my thought was to get and ACK/LGTM regarding the implementation choices made here while in draft state, to give more opportunity for tweaks/changes before I put in the effort to finalize this PR for review. I'll just move the discussion back to the issue then. I will ask there for feedback there on the specific choices this PR makes. |
I believe the left over buildbot failures are unrelated. I think this PR is ready for review. |
I do plan to get to this, just haven't had time today. Hopefully tomorrow! |
+++++++++++++++++++++ | ||
|
||
:Type: ``str`` | ||
:Description: An absolute path pointing to an interpreter from the installation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these paths (optionally) relative from the directory containing the file? That way it can be static for most scenarios, instead of needing to be dynamically generated (and e.g. in the Windows Store package we can't dynamically generate it and also put it into the expected location, because it's read-only).
Alternatively, include a few variables like {prefix}
that the reader can substitute themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that makes sense for use-cases like yours.
~~~~~~~~ | ||
|
||
On standard CPython installations, when available, the details file will be | ||
installed in the same directory as the standard library, with the name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases the standard library is split. Can we name a specific file it will be alongside? (e.g. the tests below expect it to be next to sysconfig.py
)
Also, on Windows, this implies that everyone should hard-code Lib
into their paths, because the only paths you can resolve without running Python are sys.prefix
and sys.executable
. (I'm not sure how best to go about locating the stdlib on other platforms either, to be honest, but I trust you for those.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases the standard library is split. Can we name a specific file it will be alongside? (e.g. the tests below expect it to be next to
sysconfig.py
)
Right, I will clarify it is the pure part of the standard library, which corresponds to the sysconfig stdlib
path.
Also, on Windows, this implies that everyone should hard-code
Lib
into their paths, because the only paths you can resolve without running Python aresys.prefix
andsys.executable
. (I'm not sure how best to go about locating the stdlib on other platforms either, to be honest, but I trust you for those.)
I don't see a great solution for this, given that sys.prefix
may contain multiple installations. My main two target use-cases are cross-compilation, where the user should be able to locate install-details.json
themselves, and Python launchers, where, knowing that stdlib
is within sys.prefix
, should be able to search for the **/install-details.txt
pattern in order to discover the available installation within a prefix.
The approach for Python launchers is not optimal, but it can be fixed later. Currently, I am just focusing on getting the file format down, and making the file available somewhere. Afterwards, we can look at the discoverability issue and consider for eg. having a global registry where these files could be additionally installed (eg. on Linux this could be the ~/.config/python/installations
and /etc/python/installations
paths), but that's a whole other issue, and I am trying to break things down into smaller chunks so that I can actually hope to make meaningful progress.
What do you think?
|
||
.. availability:: POSIX | ||
|
||
A ``python-config`` script may be available alongside the interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this script grow an option to reveal the location of the install-details.json
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable, but I'd rather put it in the sysconfig CLI, which is less problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except we need a way to locate the file without launching the runtime? Which is why python-config
is a shell script and not a Python script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right 🤦
I just remembered that now when looking at GH-77620.
.. TODO: Currently, we don't have any documentation covering python-config, but | ||
if/when we add some, we refer to it here, instead. | ||
|
||
.. availability:: POSIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Windows, probably the best equivalent is to recommend a PEP 514 addition to specify the full path to install-details.json
. Probably an InstallDetailsPath
value alongside the existing ExecutablePath
one (mentioned in this section).
for field in ('major', 'minor', 'micro', 'releaselevel', 'serial') | ||
}, | ||
'executable': executable, | ||
'stdlib': sysconfig.get_path('stdlib'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that we're running in the target interpreter, yeah? So we need another way to generate this file for when we're cross-compiling (which is a normal part of the Windows build process).
Is there an easy way to have the script pick up all the makefile variables during a build without having to actually parse the makefile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but we can add a CLI option to specify the value, like we do for executable
. This way you just give the script the value when you call it in the Makefile, or other build system.
I'd like to make this script as standalone as possible.
I forget where we landed on needing a PEP for this or not, but right now I feel like we probably do. |
Is that really required, considering this is an optional feature? |
We tend to formalise specs for things that we expect others to follow to properly record the discussion. It doesn't make it a requirement to do the thing, just lays out the expectations that if you do want to include this file, here's how it should look. The doc in the PR is pretty close to how I'd expect it to look. PEP 514 is a similar sort of thing, so I wouldn't expect it to be drastically longer or different to that one. |
That makes sense, but I didn't want the spec to be formalized by a PEP, and instead be a living specification which we can extend and iterate over time. In fact, this PR just includes a couple of initial data keys, I was planning to add more afterwards. I can write a PEP with an initial specification, though, but I'll make it clear there that the up-to-date version of the spec is on CPython docs. Hoping that the PEP process won't take that much time 😅 |
This is how it's supposed to work anyway. It's only when we never properly document things that the PEP remains the canonical source (or when it's informational, which PEP 514 is, so don't copy that bit). As long as you specify how tools should handle unexpected keys or missing keys, and include a naming convention for custom/non-standard keys, this ought to be fine. (In PEP 578 I only listed examples of audit hooks, to avoid having to argue about each one before even getting started.)
At least you're the most motivated person to work on it 😄 I took on the distutils one because nobody wanted to, and that turned into way more work than I hoped. |
Alright. I will add a |
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Signed-off-by: Filipe Laíns <[email protected]>
Co-authored-by: Steve Dower <[email protected]> Signed-off-by: Filipe Laíns <[email protected]>
Rebased on top of main without making any changes. |
Signed-off-by: Filipe Laíns <[email protected]>
Closing as superseded by GH-130069. |
This is a prototype, ready for an initial review. My main goal right now is to solidify/settle the key proposal implementation details (format, name, install location, etc).