Skip to content

Expand our use of swarm testing #2643

Open
@Zac-HD

Description

@Zac-HD

To paraphrase Swarm Testing (Groce et al, 2012),

Swarm testing is way to improve the diversity of generated test cases. Instead of potentially including all features in every test case, a large “swarm” of randomly generated configurations is used, each of which omits some features. ... First, some features actively prevent the system from executing interesting behaviors; e.g., pop calls may prevent an overflow bug from executing. Second, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically.

I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful). In this issue I propose adding swarm testing logic in three more areas, though still without a public API.

st.one_of()

This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined. It's also common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.

In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.

Unicode strings (i.e. st.characters())

AKA #1401. This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need several layers. Performance more likely to be a problem. I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.

from_lark() and grammar-based strategies

This is the original use-case for swarm testing, in CSmith, and I'd really like it to work for hypothesmith.

The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions. I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-featureentirely novel capabilities or strategies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions