Skip to content

Weighting of emoji in text strategies? #1401

Closed
@pganssle

Description

@pganssle

I was recently attempting to test a bug that seemed to only manifest when it encountered emoji (and not necessarily other high-unicode codepoints). I figured that strategies.text() would just naturally include some emoji, but I've found that the following tests only fail about 25% of the time (clearing .hypothesis between runs):

from hypothesis import given, settings
from hypothesis import strategies as st

import emoji

@given(txt=st.text())
def test_contains_no_emoji(txt):
    assert not any(char not in emoji.UNICODE_EMOJI for char in txt)

@given(c=st.characters())
def test_not_emoji(c)
    assert not c in emoji.UNICODE_EMOJI

There are 2623 keys in the emoji.UNICODE_EMOJI . It seems to me that this is probably a very common source of unicode characters and may be specially handled. One problem I see with this is that emoji do not seem to be a continuous block, and some are created from combining multiple characters, so it's not terribly easy to specify by putting a filter on the codepoint range.

Is it worth more heavily weighting emoji in the draw, or possibly adding some sort of special case strategy for retrieving emoji?

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsdocumentation could *always* be betterenhancementit's not broken, but we want it to be better

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions