Weighting of emoji in text strategies?

I was recently attempting to test a bug that *seemed* to only manifest when it encountered emoji (and not necessarily other high-unicode codepoints). I figured that `strategies.text()` would just naturally include some emoji, but I've found that the following tests only fail about 25% of the time (clearing `.hypothesis` between runs):

```python
from hypothesis import given, settings
from hypothesis import strategies as st

import emoji

@given(txt=st.text())
def test_contains_no_emoji(txt):
    assert not any(char not in emoji.UNICODE_EMOJI for char in txt)

@given(c=st.characters())
def test_not_emoji(c)
    assert not c in emoji.UNICODE_EMOJI
```

There are 2623 keys in the `emoji.UNICODE_EMOJI` . It seems to me that this is probably a very common source of unicode characters and may be specially handled. One problem I see with this is that emoji do not seem to be a continuous block, and some are created from combining multiple characters, so it's not terribly easy to specify by putting a filter on the codepoint range.

Is it worth more heavily weighting emoji in the draw, or possibly adding some sort of special case strategy for retrieving emoji?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weighting of emoji in text strategies? #1401

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weighting of emoji in text strategies? #1401

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions