Skip to content

Combine sys_platform and platform_system in marker algebra #9344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

charliermarsh
Copy link
Member

Summary

This PR adds a representation to the marker algebra to capture "known pairs" of sys_platform and platform_system values.

For example, we want to be able to detect that sys_platform == 'win32' and platform_system == 'Darwin' are disjoint.

There is some risk here, in that if there are platforms for which, e.g., sys_platform == 'win32 but platform_system is not Windows, we'll get things wrong. I'm trying to weigh whether that's a real risk.

In theory, we could maybe also include os.name here?

Closes #7760.
Closes #9275.

@charliermarsh charliermarsh changed the base branch from main to charlie/lower-ii November 22, 2024 02:28
@charliermarsh charliermarsh added enhancement New feature or improvement to existing functionality performance Potential performance improvement labels Nov 22, 2024
@charliermarsh charliermarsh force-pushed the charlie/lower-iii branch 4 times, most recently from cfc5550 to 1c03cb9 Compare November 22, 2024 02:34
@Vinno97
Copy link

Vinno97 commented Nov 22, 2024

In theory, we could maybe also include os.name here?

Yes. I was just about to comment on #9275 about this.

For example, jupyter-server has a dependency on pywinpty for os_name == 'nt'. If want linux-only lock-file, a full dependency string would then have to be: platform_system == 'Linux' and sys_platform == 'linux' and os_name == 'posix'

@charliermarsh
Copy link
Member Author

Yeah. I think it's a bit more challenging because there isn't a 1-to-1 mapping between os.name and sys.platform for Linux and macOS (they both use posix).

@charliermarsh
Copy link
Member Author

(Not impossible but different than what I've encoded here.)

@Vinno97
Copy link

Vinno97 commented Nov 22, 2024

Since os_name == 'posix' could also be true for BSD and other more esoteric POSIX operating systems, perhaps os_name == ' posix' should entail os_name != 'nt', with its subsequent platform_system and sys_platform restrictions?

@charliermarsh charliermarsh force-pushed the charlie/lower-iii branch 3 times, most recently from 4b8f5c7 to d51b0b5 Compare November 23, 2024 00:26
@charliermarsh
Copy link
Member Author

On reflection, what I have here is not correct. It fails this test (i.e., it thinks these are disjoint):

assert!(!is_disjoint(
    "sys_platform == 'cygwin'",
    "platform_system == 'Java'"
));

@charliermarsh
Copy link
Member Author

Somewhat losing confidence here because it turns out Android also returns "Linux" for platform_system? Despite sys_platform returning "android".

## Summary

This PR introduces a set of parallel structs to `MarkerValueString`,
`MarkerValueExtra`, and `MarkerValueVersion` that remove various pieces
of information and representations that shouldn't be available in the
marker algebra. To start, I've _just_ removed the invalid extra
component from `MarkerValueExtra` -- there are no other changes to the
representation. So, throughout the marker algebra, we can't represent
and thus don't have to worry about clauses with invalid extras.

The subsequent changes I plan to make are:

1. Removing `python_version`, since we exclusively use
`python_version_full` in the algebra.
2. Removing the deprecated aliases, such that we re-map to the correct
marker value.
3. Consolidating `sys_platform` and `platform_release` (the original
motivation).
@Vinno97
Copy link

Vinno97 commented Nov 25, 2024

It seems like Poetry has gone though quite the rabbit hole for this themselves: python-poetry/poetry#5192. But even after quite some pull requests, they closed the isue with this comment:

Closing since there haven't been much improvements/fixes anymore in the last time and I can't come up with a good solution for merging sys_platform and platform_system. - python-poetry/poetry#5192 (comment)

It appears this may be an unresolvable issue, though perhaps it's feasible to just cull known impossible environments? I.e. we know that platform_system=='Linux' will never occur together with sys_platform=='win32' or sys_platform=='darwin'.

@charliermarsh
Copy link
Member Author

I think it's absolutely possible to solve in our marker system, I just don't have the right solution here.

@konstin
Copy link
Member

konstin commented Nov 26, 2024

This is a tough one, my intuition is to encode know mismatch rather than known pairs. There's a tendency to subdivide platforms to accommodate new developments (the development of rust target triples is a good reference since it's more upstreamed than the python build vars), so i would encode tier 1 and tier 2 pairs that never match, such as darwin sys_platform and Windows in platform_system.

@Vinno97
Copy link

Vinno97 commented Nov 26, 2024

@konstin that's a good point and probably catches the majority of the standard cases with a smaller chance if breaking future cases

@charliermarsh
Copy link
Member Author

Yes that's also what I want to do. Here's a new idea: we encode all mutually exclusive pairs, and for each pair (A, B), we add and (not A or not B) to each expression. In practice, it's probably easier to have a method like is_impossible alongside is_false that just does a single operation to check this for all pairs at once.

Base automatically changed from charlie/lower-ii to main November 26, 2024 14:39
@charliermarsh
Copy link
Member Author

Closing in favor of #9444.

charliermarsh added a commit that referenced this pull request Dec 7, 2024
## Summary

This is an alternative to #9344. If accepted, I need to audit the
codebase and call sites to apply it everywhere, but the basic idea is:
rather than encoding mutually-incompatible pairs of markers in the
representation itself, we have an additional method on `MarkerTree` that
expands the false-y definition to take into account assumptions about
which markers can be true alongside others. We then check if the the
current marker implies that at least one of them is true.

So, for example, we know that `sys_platform == 'win32'` and
`platform_system == 'Darwin'` are mutually exclusive. When given a
marker expression like `python_version >= '3.7'`, we test if
`python_version >= '3.7'` and `sys_platform != 'win32' or
platform_system != 'Darwin'` are disjoint, i.e., if the following can't
be satisfied:

```
python_version >= '3.7' and (sys_platform != 'win32' or platform_system != 'Darwin')
```

Since, if this can't be satisfied, it implies that the left-hand
expression requires `sys_platform == 'win32'` and `platform_system ==
'Darwin'` to be true at the same time.

I think the main downsides here are:

1. We can't _simplify_ markers based on these implications. So we'd
still write markers like `sys_platform == 'win32' and platform_system !=
'Darwin'`, even though we know the latter expression is redundant.
2. It might be expensive? I'm not sure. I don't think we test for
falseness _that_ often though.

Closes #7760.
Closes #9275.
charliermarsh added a commit that referenced this pull request Dec 18, 2024
## Summary

A revival of an old idea (#9344) that I have slightly more confidence in
now. I abandoned this idea because (1) it couldn't capture that, e.g.,
`platform_system == 'Windows' and sys_platform == 'foo'` (or some other
unknown value) are disjoint, and (2) I thought that Android returned
`"android"` for one of `sys_platform` or `platform_system`, which
would've made this logic incorrect.

However, it looks like Android... doesn't do that? And the values here
are almost always in a small, known set. So in the end, the tradeoffs
here actually seem pretty good.

Vis-a-vis our current solution, this can (e.g.) _simplify out_
expressions like `sys_platform == 'win32' or platform_system ==
'Windows'`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement to existing functionality performance Potential performance improvement
Projects
None yet
3 participants