Description
I was recently attempting to test a bug that seemed to only manifest when it encountered emoji (and not necessarily other high-unicode codepoints). I figured that strategies.text()
would just naturally include some emoji, but I've found that the following tests only fail about 25% of the time (clearing .hypothesis
between runs):
from hypothesis import given, settings
from hypothesis import strategies as st
import emoji
@given(txt=st.text())
def test_contains_no_emoji(txt):
assert not any(char not in emoji.UNICODE_EMOJI for char in txt)
@given(c=st.characters())
def test_not_emoji(c)
assert not c in emoji.UNICODE_EMOJI
There are 2623 keys in the emoji.UNICODE_EMOJI
. It seems to me that this is probably a very common source of unicode characters and may be specially handled. One problem I see with this is that emoji do not seem to be a continuous block, and some are created from combining multiple characters, so it's not terribly easy to specify by putting a filter on the codepoint range.
Is it worth more heavily weighting emoji in the draw, or possibly adding some sort of special case strategy for retrieving emoji?