Skip to content

Better diagnostics for RUF039 #16713

Open
@InSyncWithFoo

Description

@InSyncWithFoo

This snippet has 10 RUF039 diagnostics (playground), of which only two can be automatically fixed:

re.compile(
    "["
    "\U0001F600-\U0001F64F"  # emoticons
    "\U0001F300-\U0001F5FF"  # symbols & pictographs
    "\U0001F680-\U0001F6FF"  # transport & map symbols
    "\U0001F1E0-\U0001F1FF"  # flags (iOS)
    "\U00002702-\U000027B0"
    "\U000024C2-\U0001F251"
    "\u200d"  # zero width joiner
    "\u200c"  # zero width non-joiner
    "]+",
    flags=re.UNICODE,
)

There are a few improvements that can be made:

  • The current logic does not provide a fix for string literals that contain backslashes.

    However, save for \b, which can either mean "word boundary" or "backspace" depending on context, and \N{}, which is not supported by re until 3.8, all other escape sequences are supported and mean the same things as they would in normal strings. Therefore, while the fix will change the actual runtime representation (re.compile(r'\u00A0') != re.compile('\u00A0')), the regular expression semantics will be retained. In such cases, Ruff should offer an unsafe fix.

  • The number of diagnostics is not ideal.

    When a string is implicitly concatenated, only one diagnostic should be emitted for and encompassing all parts; the fix should too fix all of them at once. Ruff should only resort to multiple diagnostics in cases where only some of the parts are not raw.

(This issue is a follow-up to #16644.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedContributions especially welcomeruleImplementing or modifying a lint rule

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions