-
Notifications
You must be signed in to change notification settings - Fork 616
Implement swarm testing for st.one_of
#4322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
# this really messes up our deduplication tracking, because all 255 | ||
# draws are unique. But we more or less have to choose whether something | ||
# is enabled on-demand with a prior probability, rather than choosing what | ||
# is enabled up front, because the latter results in a very large choice | ||
# sequence when there are lots of possibilities. | ||
# (a tradeoff might be selecting up front when there are <= 3 options?) | ||
self.__p_disabled = self.__data.draw_integer(0, 254) / 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment; not too sure what to do here. Here's one example of this added redundancy:
@given(st.from_regex(re.compile("[ab]*", re.IGNORECASE), fullmatch=True))
def f(v):
print(v)
f()
which prints "A" once on master but ~8 times on this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.__p_disabled
is only ever used as the p=
argument to draw_bool()
, and is strictly between 0 and 1, do we need to choose it via the choice sequence at all? vs. data.random()
, or 0.5
if that's not available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. If we don't draw this from the choice sequence, DataTree
thinks the later is_disabled
draw is flaky because the kwargs p
passed to draw_boolean
has changed. It's not actually flaky because the values in the choice sequence don't depend on p (as long as we're careful about p=0.0 and p=1), but DataTree
doesn't know that.
We could imagine some kind of "ignore this param for flakiness" api, but I'm not sure how that would look.
Alternatively, hardcode that p
in draw_boolean
shouldn't be checked for flakiness? It's controlling the distribution, not the domain, so arguably consumers should be able to change it without flakiness...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ignoring distribution-only arguments in flakiness-checks makes sense; we can still replay perfectly.
self.enabled_branches_strategy = st.shared( | ||
FeatureStrategy(self.original_strategies), | ||
key=("one_of swarm testing", self.original_strategies), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that using st.shared for the FeatureStrategy means that any separately-defined strategies which happen to have the same key will have the same swarm behavior. i.e. two separate usages of st.integers() | st.floats() will use the same swarm decision of what to disable. But I believe they will still draw separate random choices below the swarm step, so the values won't be the same, just the disabling decision.
I think diversity of whether to share swarm decisions here would be best for bug finding. What we might be able to do is draw a meta p which chooses, for each occurrence of the same-keyed strategy s
, whether to reuse swarm behavior or not. But I don't know if this would be easy to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds complicated; let's leave that for a follow-up PR.
This is a core change to our generation logic - which we also just changed in #4356. Since I'm not confident this PR will be regression free (in either performance or bugfinding ability), I'm tempted to defer this PR for a bit to let any issues with the constants pool shake out first |
Part of #2643. I think swarm testing is great and would love to get it into more places in Hypothesis. There are still some problems with this implementation (see review comments).