Description
This snippet has 10 RUF039
diagnostics (playground), of which only two can be automatically fixed:
re.compile(
"["
"\U0001F600-\U0001F64F" # emoticons
"\U0001F300-\U0001F5FF" # symbols & pictographs
"\U0001F680-\U0001F6FF" # transport & map symbols
"\U0001F1E0-\U0001F1FF" # flags (iOS)
"\U00002702-\U000027B0"
"\U000024C2-\U0001F251"
"\u200d" # zero width joiner
"\u200c" # zero width non-joiner
"]+",
flags=re.UNICODE,
)
There are a few improvements that can be made:
-
The current logic does not provide a fix for string literals that contain backslashes.
However, save for
\b
, which can either mean "word boundary" or "backspace" depending on context, and\N{}
, which is not supported byre
until 3.8, all other escape sequences are supported and mean the same things as they would in normal strings. Therefore, while the fix will change the actual runtime representation (re.compile(r'\u00A0') != re.compile('\u00A0')
), the regular expression semantics will be retained. In such cases, Ruff should offer an unsafe fix. -
The number of diagnostics is not ideal.
When a string is implicitly concatenated, only one diagnostic should be emitted for and encompassing all parts; the fix should too fix all of them at once. Ruff should only resort to multiple diagnostics in cases where only some of the parts are not raw.
(This issue is a follow-up to #16644.)