-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-130415: Narrow str
to ""
based on boolean tests
#130476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Added requested corrections. Thanks, @brandtbucher ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contribution.
Unfortunately, I think there is a critical flaw in this approach as it could result in mis-optimizations in the future.
This would be a useful optimization, so if you're willing to pursue this further, it would be appreciated.
Lib/test/test_capi/test_opt.py
Outdated
dummy = "aaa" | ||
# Hopefully the optimizer can't guess what the value is. | ||
# empty is always "", but we can only prove that it's a string: | ||
empty = dummy[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can easily see the optimizer turning "aaa"[:0]
into ""
.
empty
doesn't need to be a constant, we just need it to be mostly "", for profiling.
Use something like empty = "a"[:(n % 1000) == 0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we check the actual path taken as part of the test, we need the value to always be ""
, not just mostly ""
. So maybe:
false = i == TIER2_THRESHOLD
empty = "X"[:false]
The optimizer can't prove false
is False
, so it's good enough for our purposes.
Python/optimizer_bytecodes.c
Outdated
// *can't* narrow res, since that would cause the guard to be | ||
// removed and the narrowed value to be invalid: | ||
if (next_opcode == _GUARD_IS_FALSE_POP) { | ||
sym_set_const(value, Py_GetConstant(Py_CONSTANT_EMPTY_STR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strictly incorrect. We don't know that value
is "" until after the _GUARD_IS_FALSE_POP
.
The reason that matters is that when we start attaching type information to side exits, as we probably will in 3.15, then this could lead us to infer that value
is "" on both branches. Which would be wrong.
There are two possible fixes for this.
- Combine
TO_BOOL_STR
and_GUARD_IS_FALSE_POP/_GUARD_IS_TRUE_POP
into a single (super)instruction, then optimize that. - Annotate the bool value resulting from the
TO_BOOL
with its input, then in_GUARD_IS_FALSE_POP
convert the input value toTO_BOOL
.
I prefer the second option, although it may be more work, as it is more flexible and can be extended more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @Fidget-Spinner and I suggested something like the latter on the issue (new symbols like JitBoolOf(JitOptSymbol *source, bool inverted)
and JitEqualTo(JitOptSymbol *lhs, JitOptSymbol *rhs, bool inverted)
). That's probably the direction we're headed in longer term.
However, I don't think we should let perfect be the enemy of good here. We have nice, working optimizations in these PRs; just because we might sink info onto side exits in the future probably shouldn't prevent us from making changes like this now for 3.14, which are perfectly correct for the current optimizer (which doesn't sink anything).
I'm inclined to land these changes and other similar ones for ==
/!=
now, and make the symbolic representation of derived boolean values more complex later as an improvement (it will also be able to handle more uncommon cases like x = y == 42; if x: ...
). I'm really worried that if we try to "future-proof" optimizations based on what we could do six months from now, it will prevent actual improvements in the near term.
But I'll defer to you here. If having value
be narrowed one uop too early in the instruction stream is enough to block this PR, I can work with these new contributors on the more complex solution. But as-is, this has no bugs and works as intended. We don't sink value info onto side exits, so it's correct.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
str
to ""
based on boolean tests
Assign value to string when an
if
evaluates to false.@brandtbucher