Expand our use of swarm testing

To paraphrase [*Swarm Testing* (Groce et al, 2012)](https://www.cs.utah.edu/~regehr/papers/swarm12.pdf), 

> Swarm testing is way to improve the diversity of generated test cases. Instead of potentially including all features in every test case, a large “swarm” of randomly generated configurations is used, each of which omits some features. ... First, some features actively prevent the system from executing interesting behaviors; e.g., `pop` calls may prevent an overflow bug from executing. Second, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically.

I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful).  In this issue I propose adding swarm testing logic in three more areas, though still without a public API.

### `st.one_of()`

This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined.  It's *also* common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.

In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.

### Unicode strings (i.e. `st.characters()`)

AKA #1401.  This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need *several* layers.  Performance more likely to be a problem.  I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.

### `from_lark()` and grammar-based strategies

This is the original use-case for swarm testing, in CSmith, and I'd really like it to work for [`hypothesmith`](https://pypi.org/project/hypothesmith/).

The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions.  I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand our use of swarm testing #2643

`st.one_of()`

Unicode strings (i.e. `st.characters()`)

`from_lark()` and grammar-based strategies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expand our use of swarm testing #2643

Description

st.one_of()

Unicode strings (i.e. st.characters())

from_lark() and grammar-based strategies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`st.one_of()`

Unicode strings (i.e. `st.characters()`)

`from_lark()` and grammar-based strategies