Open
Description
Bug report
Bug description:
When the machine locale is set to UTF-8, when inputting a Unicode character ≥ 0x10000:
In CPython 3.13.5:
https://github.com/user-attachments/assets/7777b063-76fe-4929-b854-cae7d61807d2
In Cpython 3.14.0b4:
>>> Traceback (most recent call last):
File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
return reader.readline()
~~~~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
self.handle1()
~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
self.do_cmd(cmd)
~~~~~~~~~~~^^^^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
self.refresh()
~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
self.screen = self.calc_screen()
~~~~~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
screen = super().calc_screen()
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
colors = list(gen_colors(self.get_unicode()))
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
for color in gen_colors_from_token_stream(gen, line_lengths):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
for prev_token, token, next_token in token_window:
^^^^^^^^^^^^
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
window = deque((None, next(iterator)), maxlen=3)
~~~~^^^^^^^^^^
File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
for info in it:
^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> Traceback (most recent call last):
File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
return reader.readline()
~~~~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
self.handle1()
~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
self.do_cmd(cmd)
~~~~~~~~~~~^^^^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
self.refresh()
~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
self.screen = self.calc_screen()
~~~~~~~~~~~~~~~~^^
File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
screen = super().calc_screen()
File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
colors = list(gen_colors(self.get_unicode()))
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
for color in gen_colors_from_token_stream(gen, line_lengths):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
for prev_token, token, next_token in token_window:
^^^^^^^^^^^^
File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
window = deque((None, next(iterator)), maxlen=3)
~~~~^^^^^^^^^^
File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
for info in it:
^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
Two surrogates were "inputted" and so two UnicodeEncodeError
s.
CPython versions tested on:
3.13, 3.14
Operating systems tested on:
Windows