Skip to content

Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrolle45 opened this issue Jan 28, 2025 · 3 comments
Closed
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@mrolle45
Copy link

mrolle45 commented Jan 28, 2025

Bug report

Bug description:

>>> codecs.decode('\u{0041}',encoding='unicode-escape')
  File "<python-input-2>", line 1
    codecs.decode('\u{0041}',encoding='unicode-escape')
                  ^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
>>> codecs.decode('\u0041',encoding='unicode-escape')
'A'
>>> '\u{0041}'
  File "<python-input-4>", line 1
    '\u{0041}'
    ^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
>>> '\u0041'
'A'

I'm not sure when the \u{xxx} escape was introduced to the world, but it should be recognized by codecs.decode and the Python interpreter. The above sample was run on Python 3.13.1 on Windows.

CPython versions tested on:

3.13

Operating systems tested on:

No response

@mrolle45 mrolle45 added the type-bug An unexpected behavior, bug, or error label Jan 28, 2025
@terryjreedy terryjreedy added pending The issue will be closed if no feedback is provided and removed type-bug An unexpected behavior, bug, or error labels Jan 28, 2025
@terryjreedy
Copy link
Member

The codecs call is irrelevant. You entered an invalid string. You either need to omit the parentheses, as you later did, or use '\N{CHARNAME}', where 'CHARNAME' is a recognized name in the Unicode database. See https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences.

@ericvsmith
Copy link
Member

I agree this isn't a bug.

@mrolle45 : What's your evidence that this escape sequence was "introduced to the world"? And even if it were, we'd need more motivation than that to add it to Python. What problem would it solve?

@picnixz
Copy link
Member

picnixz commented Jan 31, 2025

I don't know why the hexadecimal ordinal value should be considered the same as a named entity, namely why \U{XXXX} would be better than \UXXXX.

I will close this one as "not planned" as the bracket-style syntax should be reserved for named entities and not hexadecimal values. If needs arise, please open a DPO thread first with sufficient evidence that real-world applications would benefit from this feature.

@picnixz picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025
@picnixz picnixz added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed pending The issue will be closed if no feedback is provided labels Jan 31, 2025
@picnixz picnixz marked this as a duplicate of #130475 Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

4 participants