Implement swarm testing for `st.one_of` #4322

tybug · 2025-03-26T19:44:50Z

Part of #2643. I think swarm testing is great and would love to get it into more places in Hypothesis. There are still some problems with this implementation (see review comments).

tybug · 2025-03-26T19:45:52Z

hypothesis-python/src/hypothesis/strategies/_internal/featureflags.py

+            # this really messes up our deduplication tracking, because all 255
+            # draws are unique. But we more or less have to choose whether something
+            # is enabled on-demand with a prior probability, rather than choosing what
+            # is enabled up front, because the latter results in a very large choice
+            # sequence when there are lots of possibilities.
+            # (a tradeoff might be selecting up front when there are <= 3 options?)
            self.__p_disabled = self.__data.draw_integer(0, 254) / 255


See comment; not too sure what to do here. Here's one example of this added redundancy:

@given(st.from_regex(re.compile("[ab]*", re.IGNORECASE), fullmatch=True)) def f(v): print(v) f()

which prints "A" once on master but ~8 times on this branch.

if self.__p_disabled is only ever used as the p= argument to draw_bool(), and is strictly between 0 and 1, do we need to choose it via the choice sequence at all? vs. data.random(), or 0.5 if that's not available?

Sort of. If we don't draw this from the choice sequence, DataTree thinks the later is_disabled draw is flaky because the kwargs p passed to draw_boolean has changed. It's not actually flaky because the values in the choice sequence don't depend on p (as long as we're careful about p=0.0 and p=1), but DataTree doesn't know that.

We could imagine some kind of "ignore this param for flakiness" api, but I'm not sure how that would look.

Alternatively, hardcode that p in draw_boolean shouldn't be checked for flakiness? It's controlling the distribution, not the domain, so arguably consumers should be able to change it without flakiness...

I think ignoring distribution-only arguments in flakiness-checks makes sense; we can still replay perfectly.

tybug · 2025-03-26T19:47:53Z

hypothesis-python/src/hypothesis/strategies/_internal/strategies.py

+        self.enabled_branches_strategy = st.shared(
+            FeatureStrategy(self.original_strategies),
+            key=("one_of swarm testing", self.original_strategies),
+        )



I think that using st.shared for the FeatureStrategy means that any separately-defined strategies which happen to have the same key will have the same swarm behavior. i.e. two separate usages of st.integers() | st.floats() will use the same swarm decision of what to disable. But I believe they will still draw separate random choices below the swarm step, so the values won't be the same, just the disabling decision.

I think diversity of whether to share swarm decisions here would be best for bug finding. What we might be able to do is draw a meta p which chooses, for each occurrence of the same-keyed strategy s, whether to reuse swarm behavior or not. But I don't know if this would be easy to implement.

sounds complicated; let's leave that for a follow-up PR.

tybug · 2025-04-17T19:44:38Z

This is a core change to our generation logic - which we also just changed in #4356. Since I'm not confident this PR will be regression free (in either performance or bugfinding ability), I'm tempted to defer this PR for a bit to let any issues with the constants pool shake out first

implement swarm testing for one_of

affe0ab

tybug commented Mar 26, 2025

View reviewed changes

fix test refactor

d3fc92f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement swarm testing for `st.one_of` #4322

Implement swarm testing for `st.one_of` #4322

Uh oh!

tybug commented Mar 26, 2025

Uh oh!

tybug Mar 26, 2025

Uh oh!

Zac-HD Mar 26, 2025

Uh oh!

tybug Mar 26, 2025 •

edited

Loading

Uh oh!

Zac-HD Mar 26, 2025

Uh oh!

tybug Mar 26, 2025

Uh oh!

Zac-HD Apr 7, 2025

Uh oh!

tybug commented Apr 17, 2025

Uh oh!

Uh oh!

Implement swarm testing for st.one_of #4322

Are you sure you want to change the base?

Implement swarm testing for st.one_of #4322

Uh oh!

Conversation

tybug commented Mar 26, 2025

Uh oh!

tybug Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Zac-HD Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

tybug Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zac-HD Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

tybug Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Zac-HD Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

tybug commented Apr 17, 2025

Uh oh!

Uh oh!

Implement swarm testing for `st.one_of` #4322

Implement swarm testing for `st.one_of` #4322

tybug Mar 26, 2025 •

edited

Loading