-
Notifications
You must be signed in to change notification settings - Fork 11
Unexpected behavior for set
generator
#25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for your feedback! You are absolutely right. Looking at libraries in other languages, such as Elixir's StreamData or Haskell's Hedgehog it is clear that these indeed try very hard to generate unique values instead of generating an array and dropping duplicates. A straightforward approach to fixing this, is to do what Hedgehog does, and to have |
Actually, looking at the implementation in detail again (It has been a while), we already generate unique elements rather than deduplicating non-unique elements, c.f. Generators::make_array_uniq. This helper function is what ends up being called when the Maybe we can play with the Or maybe an even better alternative would be to introduce a |
The
set
generator behaves very naive and therefore creates an unexpected distribution of tests.Example 1:
This example creates very few small examples and misses plenty of cases (with n = 100) and very often creates a Set with all 5 items.
Example 2:
Adding a size limitation makes the generator a bit smarter. The distribution is now better, but still skewed to bigger sets. I think this is skew is reasonable for larger input lists, so I'm not sure I'd call this a bug. I had cases, where some sets were not created at all. I'm not sure, whether that is expected behaviour, if the number of all possible values (2^5 = 32) is smaller than the number of runs (n = 100). But this feels acceptable and gets compensated if you run the test multiple times.
Both cases seem to stem from a missing deduplication. The
set
generator very naively uses thearray
generator. For arrays order matters, for sets it doesn't. IMO theset
generator would need to deduplicate here to avoid creating equivalent sets all the time.The text was updated successfully, but these errors were encountered: