Skip to content

Unicode characters ≥ 0x10000 cannot be inputted/behaves unusually at the REPL terminal. #136595

@haydenwong7bm

Description

@haydenwong7bm

Bug report

Bug description:

When the machine locale is set to UTF-8, when inputting a Unicode character ≥ 0x10000:
In CPython 3.13.5:
https://github.com/user-attachments/assets/7777b063-76fe-4929-b854-cae7d61807d2
In Cpython 3.14.0b4:

>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

Two surrogates were "inputted" and so two UnicodeEncodeErrors.

CPython versions tested on:

3.13, 3.14

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowsstdlibPython modules in the Lib dirtopic-replRelated to the interactive shelltopic-unicodetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions