-
Notifications
You must be signed in to change notification settings - Fork 616
Implement swarm testing for st.one_of
#4322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
RELEASE_TYPE: patch | ||
|
||
|st.one_of| now chooses a subset of its strategies to disable each time it generates a value. For example, it was previously unlikely that ``st.lists(st.integers() | st.floats() | st.text()`` would generate a long list containing only string values. This is now more likely, along with other uncommon combinations. | ||
|
||
This technique is called `swarm testing <https://users.cs.utah.edu/~regehr/papers/swarm12.pdf>`__, and can considerably improve bug-finding power, for instance because some features actively prevent other interesting behavior from running. See :issue:`2643` for more details. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
overload, | ||
) | ||
|
||
from hypothesis import strategies as st | ||
from hypothesis._settings import HealthCheck, Phase, Verbosity, settings | ||
from hypothesis.control import _current_build_context, current_build_context | ||
from hypothesis.errors import ( | ||
|
@@ -689,10 +690,16 @@ class OneOfStrategy(SearchStrategy[Ex]): | |
""" | ||
|
||
def __init__(self, strategies: Sequence[SearchStrategy[Ex]]): | ||
from hypothesis.strategies._internal.featureflags import FeatureStrategy | ||
|
||
super().__init__() | ||
self.original_strategies = tuple(strategies) | ||
self.__element_strategies: Optional[Sequence[SearchStrategy[Ex]]] = None | ||
self.__in_branches = False | ||
self.enabled_branches_strategy = st.shared( | ||
FeatureStrategy(self.original_strategies), | ||
key=("one_of swarm testing", self.original_strategies), | ||
) | ||
|
||
Comment on lines
+699
to
703
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that using st.shared for the FeatureStrategy means that any separately-defined strategies which happen to have the same key will have the same swarm behavior. i.e. two separate usages of st.integers() | st.floats() will use the same swarm decision of what to disable. But I believe they will still draw separate random choices below the swarm step, so the values won't be the same, just the disabling decision. I think diversity of whether to share swarm decisions here would be best for bug finding. What we might be able to do is draw a meta p which chooses, for each occurrence of the same-keyed strategy There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds complicated; let's leave that for a follow-up PR. |
||
def calc_is_empty(self, recur: RecurT) -> bool: | ||
return all(recur(e) for e in self.original_strategies) | ||
|
@@ -739,9 +746,10 @@ def calc_label(self) -> int: | |
) | ||
|
||
def do_draw(self, data: ConjectureData) -> Ex: | ||
feature_flags = data.draw(self.enabled_branches_strategy) | ||
strategy = data.draw( | ||
SampledFromStrategy(self.element_strategies).filter( | ||
lambda s: s.available(data) | ||
lambda s: s.available(data) and feature_flags.is_enabled(s) | ||
) | ||
) | ||
return data.draw(strategy) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment; not too sure what to do here. Here's one example of this added redundancy:
which prints "A" once on master but ~8 times on this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
self.__p_disabled
is only ever used as thep=
argument todraw_bool()
, and is strictly between 0 and 1, do we need to choose it via the choice sequence at all? vs.data.random()
, or0.5
if that's not available?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. If we don't draw this from the choice sequence,
DataTree
thinks the lateris_disabled
draw is flaky because the kwargsp
passed todraw_boolean
has changed. It's not actually flaky because the values in the choice sequence don't depend on p (as long as we're careful about p=0.0 and p=1), butDataTree
doesn't know that.We could imagine some kind of "ignore this param for flakiness" api, but I'm not sure how that would look.
Alternatively, hardcode that
p
indraw_boolean
shouldn't be checked for flakiness? It's controlling the distribution, not the domain, so arguably consumers should be able to change it without flakiness...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ignoring distribution-only arguments in flakiness-checks makes sense; we can still replay perfectly.