Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

mrolle45 · 2025-01-28T09:17:58Z

Bug report

Bug description:

>>> codecs.decode('\u{0041}',encoding='unicode-escape')
  File "<python-input-2>", line 1
    codecs.decode('\u{0041}',encoding='unicode-escape')
                  ^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
>>> codecs.decode('\u0041',encoding='unicode-escape')
'A'
>>> '\u{0041}'
  File "<python-input-4>", line 1
    '\u{0041}'
    ^^^^^^^^^^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
>>> '\u0041'
'A'

I'm not sure when the \u{xxx} escape was introduced to the world, but it should be recognized by codecs.decode and the Python interpreter. The above sample was run on Python 3.13.1 on Windows.

CPython versions tested on:

3.13

Operating systems tested on:

No response

The text was updated successfully, but these errors were encountered:

terryjreedy · 2025-01-28T11:00:43Z

The codecs call is irrelevant. You entered an invalid string. You either need to omit the parentheses, as you later did, or use '\N{CHARNAME}', where 'CHARNAME' is a recognized name in the Unicode database. See https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences.

ericvsmith · 2025-01-29T02:28:23Z

I agree this isn't a bug.

@mrolle45 : What's your evidence that this escape sequence was "introduced to the world"? And even if it were, we'd need more motivation than that to add it to Python. What problem would it solve?

picnixz · 2025-01-31T20:22:06Z

I don't know why the hexadecimal ordinal value should be considered the same as a named entity, namely why \U{XXXX} would be better than \UXXXX.

I will close this one as "not planned" as the bracket-style syntax should be reserved for named entities and not hexadecimal values. If needs arise, please open a DPO thread first with sufficient evidence that real-world applications would benefit from this feature.

mrolle45 added the type-bug An unexpected behavior, bug, or error label Jan 28, 2025

terryjreedy added pending The issue will be closed if no feedback is provided and removed type-bug An unexpected behavior, bug, or error labels Jan 28, 2025

picnixz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025

picnixz added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed pending The issue will be closed if no feedback is provided labels Jan 31, 2025

picnixz added this to Codecs and encodings issues Jan 31, 2025

picnixz moved this to Done in Codecs and encodings issues Jan 31, 2025

picnixz marked this as a duplicate of #130475 Feb 22, 2025

picnixz mentioned this issue Feb 22, 2025

codecs module doesn't recognize new C++ 23 universal-character-name \u{xxx}. #130475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

mrolle45 commented Jan 28, 2025 •

edited by github-actions bot

Loading

terryjreedy commented Jan 28, 2025

ericvsmith commented Jan 29, 2025

picnixz commented Jan 31, 2025

Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

Python interreter and codecs module don't recognize unicode escape \u{xxx}. #129392

Comments

mrolle45 commented Jan 28, 2025 • edited by github-actions bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

terryjreedy commented Jan 28, 2025

ericvsmith commented Jan 29, 2025

picnixz commented Jan 31, 2025

mrolle45 commented Jan 28, 2025 •

edited by github-actions bot

Loading