Odd behavior from pasting large text in the new REPL #119517

treyhunner · 2024-05-24T18:27:40Z

Bug report

Bug description:

This is a description of a few bugs related to paste mode in the new REPL.

Blank lines

I tried copy-pasting the first 604 lines of this copy of Frankenstein (about 32,000 characters).

After pasting, when I scroll up in my terminal I see 1 additional line above the currently visible lines (the 61 last lines) and then I see 543 blank lines. It seems that the first 543/604 lines show up as blank and the last 61 lines do not.

>>> f = """


[539 more blank lines]


... appreciate the extraordinary merits of this wonderful man.  Sometimes I
... have endeavoured to discover what quality it is which he possesses that
[57 more lines of text]
... embraced the gallant vessel on its course and wrecked it--thus!
... """
>>>

I assume this behavior may have been for the purpose of enhancing performance.
Pasting this text only took about 1 second, which is much faster than a couple weeks ago) but still slower than the old REPL.
I have not yet attempted to reproduce the blank line issue before #119341 was merged to see if it was the cause.

Paste performance and Ctrl+C

When pasting the full 7,652 line (441,033 character) text of Frankenstein takes about 21 seconds.
That's again, considerably faster than it would have been a couple weeks ago, though it does still takes a bit of time.

The most concerning part about the slow pasting is that the terminal is entirely locked during pasting.
Hitting Ctrl+C does not stop pasting and the arrow keys don't work.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

pablogsal · 2024-06-04T21:05:40Z

~~@treyhunner can you check if the first problem reproduces with the current main?~~

Ah, yeah it reproduces. Unfortunately to fix this we would need some considerable changes to how characters are processed so I am unsure if this will get fixed before 3.13 is out.

The problem is that to speed up pasting we are only refreshing at the end, but that doesn't input the characters in the terminal so terminal scrolling doesn't show them becase they were never written :(

CC @ambv @lysnikolaou

pablogsal · 2024-06-04T21:08:50Z

To fix this we need to get rid of the calc_complete_screen function and replace it for something more efficient

godlygeek · 2024-06-06T01:43:05Z

When pasting the full 7,652 line (441,033 character) text of Frankenstein takes about 21 seconds.

How long does it take if you export PYTHON_BASIC_REPL=1?

* Restore signal handlers for SIGINT and SIGSTOP (Ctrl-C and Ctrl-Z) * Ensure that signals are processed as soon as possible by making reads more efficient. * Protect against invalid state in internal REPL functions when interrumpted. * Do not show extraneous newlines above the scroll buffer when pasting text in the REPL Signed-off-by: Pablo Galindo <[email protected]>

* Remove pyrepl's optimization for self-insert This will be replaced by a less specialized optimization. * Use line-buffering when pyrepl echoes pastes Previously echoing was totally suppressed until the entire command had been pasted and the terminal ended paste mode, but this gives the user no feedback to indicate that an operation is in progress. Drawing something to the screen once per line strikes a balance between perceived responsiveness and performance. * Remove dead code from pyrepl `msg_at_bottom` is always true. * Speed up pyrepl's screen rendering computation The Reader in pyrepl doesn't hold a complete representation of the screen area being drawn as persistent state. Instead, it recomputes it, on each keypress. This is fast enough for a few hundred bytes, but incredibly slow as the input buffer grows into the kilobytes (likely because of pasting). Rather than making some expensive and expansive changes to the repl's internal representation of the screen, add some caching: remember some data from one refresh to the next about what was drawn to the screen and, if we don't find anything that has invalidated the results that were computed last time around, reuse them. To keep this caching as simple as possible, all we'll do is look for lines in the buffer that were above the cursor the last time we were asked to update the screen, and that are still above the cursor now. We assume that nothing can affect a line that comes before both the old and new cursor location without us being informed. Based on this assumption, we can reuse old lines, which drastically speeds up the overwhelmingly common case where the user is typing near the end of the buffer. * Speed up pyrepl prompt drawing Cache the `can_colorize()` call rather than repeatedly recomputing it. This call looks up an environment variable, and is called once per character typed at the REPL. The environment variable lookup shows up as a hot spot when profiling, and we don't expect this to change while the REPL is running. * Speed up pasting multiple lines into the REPL Previously, we were checking whether the command should be accepted each time a line break was encountered, but that's not the expected behavior. In bracketed paste mode, we expect everything pasted to be part of a single block of code, and encountering a newline shouldn't behave like a user pressing <Enter> to execute a command. The user should always have a chance to review the pasted command before running it. * Use a read buffer for input in pyrepl Previously we were reading one byte at a time, which causes much slower IO than necessary. Instead, read in chunks, processing previously read data before asking for more. * Optimize finding width of a single character `wlen` finds the width of a multi-character string by adding up the width of each character, and then subtracting the width of any escape sequences. It's often called for single character strings, however, which can't possibly contain escape sequences. Optimize for that case. * Optimize disp_str for ASCII characters Since every ASCII character is known to display as single width, we can avoid not only the Unicode data lookup in `disp_str` but also the one hidden in `str_width` for them. * Speed up cursor movements in long pyrepl commands When the current pyrepl command buffer contains many lines, scrolling up becomes slow. We have optimizations in place to reuse lines above the cursor position from one refresh to the next, but don't currently try to reuse lines below the cursor position in the same way, so we wind up with quadratic behavior where all lines of the buffer below the cursor are recomputed each time the cursor moves up another line. Optimize this by only computing one screen's worth of lines beyond the cursor position. Any lines beyond that can't possibly be shown by the console, and bounding this makes scrolling up have linear time complexity instead. --------- Signed-off-by: Matt Wozniski <[email protected]> Co-authored-by: Pablo Galindo <[email protected]>

* Remove pyrepl's optimization for self-insert This will be replaced by a less specialized optimization. * Use line-buffering when pyrepl echoes pastes Previously echoing was totally suppressed until the entire command had been pasted and the terminal ended paste mode, but this gives the user no feedback to indicate that an operation is in progress. Drawing something to the screen once per line strikes a balance between perceived responsiveness and performance. * Remove dead code from pyrepl `msg_at_bottom` is always true. * Speed up pyrepl's screen rendering computation The Reader in pyrepl doesn't hold a complete representation of the screen area being drawn as persistent state. Instead, it recomputes it, on each keypress. This is fast enough for a few hundred bytes, but incredibly slow as the input buffer grows into the kilobytes (likely because of pasting). Rather than making some expensive and expansive changes to the repl's internal representation of the screen, add some caching: remember some data from one refresh to the next about what was drawn to the screen and, if we don't find anything that has invalidated the results that were computed last time around, reuse them. To keep this caching as simple as possible, all we'll do is look for lines in the buffer that were above the cursor the last time we were asked to update the screen, and that are still above the cursor now. We assume that nothing can affect a line that comes before both the old and new cursor location without us being informed. Based on this assumption, we can reuse old lines, which drastically speeds up the overwhelmingly common case where the user is typing near the end of the buffer. * Speed up pyrepl prompt drawing Cache the `can_colorize()` call rather than repeatedly recomputing it. This call looks up an environment variable, and is called once per character typed at the REPL. The environment variable lookup shows up as a hot spot when profiling, and we don't expect this to change while the REPL is running. * Speed up pasting multiple lines into the REPL Previously, we were checking whether the command should be accepted each time a line break was encountered, but that's not the expected behavior. In bracketed paste mode, we expect everything pasted to be part of a single block of code, and encountering a newline shouldn't behave like a user pressing <Enter> to execute a command. The user should always have a chance to review the pasted command before running it. * Use a read buffer for input in pyrepl Previously we were reading one byte at a time, which causes much slower IO than necessary. Instead, read in chunks, processing previously read data before asking for more. * Optimize finding width of a single character `wlen` finds the width of a multi-character string by adding up the width of each character, and then subtracting the width of any escape sequences. It's often called for single character strings, however, which can't possibly contain escape sequences. Optimize for that case. * Optimize disp_str for ASCII characters Since every ASCII character is known to display as single width, we can avoid not only the Unicode data lookup in `disp_str` but also the one hidden in `str_width` for them. * Speed up cursor movements in long pyrepl commands When the current pyrepl command buffer contains many lines, scrolling up becomes slow. We have optimizations in place to reuse lines above the cursor position from one refresh to the next, but don't currently try to reuse lines below the cursor position in the same way, so we wind up with quadratic behavior where all lines of the buffer below the cursor are recomputed each time the cursor moves up another line. Optimize this by only computing one screen's worth of lines beyond the cursor position. Any lines beyond that can't possibly be shown by the console, and bounding this makes scrolling up have linear time complexity instead. --------- (cherry picked from commit 32a0fab) Co-authored-by: Matt Wozniski <[email protected]> Signed-off-by: Matt Wozniski <[email protected]> Co-authored-by: Pablo Galindo <[email protected]>

gh-119517: Fixes for pasting in pyrepl (GH-120253) * Remove pyrepl's optimization for self-insert This will be replaced by a less specialized optimization. * Use line-buffering when pyrepl echoes pastes Previously echoing was totally suppressed until the entire command had been pasted and the terminal ended paste mode, but this gives the user no feedback to indicate that an operation is in progress. Drawing something to the screen once per line strikes a balance between perceived responsiveness and performance. * Remove dead code from pyrepl `msg_at_bottom` is always true. * Speed up pyrepl's screen rendering computation The Reader in pyrepl doesn't hold a complete representation of the screen area being drawn as persistent state. Instead, it recomputes it, on each keypress. This is fast enough for a few hundred bytes, but incredibly slow as the input buffer grows into the kilobytes (likely because of pasting). Rather than making some expensive and expansive changes to the repl's internal representation of the screen, add some caching: remember some data from one refresh to the next about what was drawn to the screen and, if we don't find anything that has invalidated the results that were computed last time around, reuse them. To keep this caching as simple as possible, all we'll do is look for lines in the buffer that were above the cursor the last time we were asked to update the screen, and that are still above the cursor now. We assume that nothing can affect a line that comes before both the old and new cursor location without us being informed. Based on this assumption, we can reuse old lines, which drastically speeds up the overwhelmingly common case where the user is typing near the end of the buffer. * Speed up pyrepl prompt drawing Cache the `can_colorize()` call rather than repeatedly recomputing it. This call looks up an environment variable, and is called once per character typed at the REPL. The environment variable lookup shows up as a hot spot when profiling, and we don't expect this to change while the REPL is running. * Speed up pasting multiple lines into the REPL Previously, we were checking whether the command should be accepted each time a line break was encountered, but that's not the expected behavior. In bracketed paste mode, we expect everything pasted to be part of a single block of code, and encountering a newline shouldn't behave like a user pressing <Enter> to execute a command. The user should always have a chance to review the pasted command before running it. * Use a read buffer for input in pyrepl Previously we were reading one byte at a time, which causes much slower IO than necessary. Instead, read in chunks, processing previously read data before asking for more. * Optimize finding width of a single character `wlen` finds the width of a multi-character string by adding up the width of each character, and then subtracting the width of any escape sequences. It's often called for single character strings, however, which can't possibly contain escape sequences. Optimize for that case. * Optimize disp_str for ASCII characters Since every ASCII character is known to display as single width, we can avoid not only the Unicode data lookup in `disp_str` but also the one hidden in `str_width` for them. * Speed up cursor movements in long pyrepl commands When the current pyrepl command buffer contains many lines, scrolling up becomes slow. We have optimizations in place to reuse lines above the cursor position from one refresh to the next, but don't currently try to reuse lines below the cursor position in the same way, so we wind up with quadratic behavior where all lines of the buffer below the cursor are recomputed each time the cursor moves up another line. Optimize this by only computing one screen's worth of lines beyond the cursor position. Any lines beyond that can't possibly be shown by the console, and bounding this makes scrolling up have linear time complexity instead. --------- (cherry picked from commit 32a0fab) Signed-off-by: Matt Wozniski <[email protected]> Co-authored-by: Matt Wozniski <[email protected]> Co-authored-by: Pablo Galindo <[email protected]>

* Remove pyrepl's optimization for self-insert This will be replaced by a less specialized optimization. * Use line-buffering when pyrepl echoes pastes Previously echoing was totally suppressed until the entire command had been pasted and the terminal ended paste mode, but this gives the user no feedback to indicate that an operation is in progress. Drawing something to the screen once per line strikes a balance between perceived responsiveness and performance. * Remove dead code from pyrepl `msg_at_bottom` is always true. * Speed up pyrepl's screen rendering computation The Reader in pyrepl doesn't hold a complete representation of the screen area being drawn as persistent state. Instead, it recomputes it, on each keypress. This is fast enough for a few hundred bytes, but incredibly slow as the input buffer grows into the kilobytes (likely because of pasting). Rather than making some expensive and expansive changes to the repl's internal representation of the screen, add some caching: remember some data from one refresh to the next about what was drawn to the screen and, if we don't find anything that has invalidated the results that were computed last time around, reuse them. To keep this caching as simple as possible, all we'll do is look for lines in the buffer that were above the cursor the last time we were asked to update the screen, and that are still above the cursor now. We assume that nothing can affect a line that comes before both the old and new cursor location without us being informed. Based on this assumption, we can reuse old lines, which drastically speeds up the overwhelmingly common case where the user is typing near the end of the buffer. * Speed up pyrepl prompt drawing Cache the `can_colorize()` call rather than repeatedly recomputing it. This call looks up an environment variable, and is called once per character typed at the REPL. The environment variable lookup shows up as a hot spot when profiling, and we don't expect this to change while the REPL is running. * Speed up pasting multiple lines into the REPL Previously, we were checking whether the command should be accepted each time a line break was encountered, but that's not the expected behavior. In bracketed paste mode, we expect everything pasted to be part of a single block of code, and encountering a newline shouldn't behave like a user pressing <Enter> to execute a command. The user should always have a chance to review the pasted command before running it. * Use a read buffer for input in pyrepl Previously we were reading one byte at a time, which causes much slower IO than necessary. Instead, read in chunks, processing previously read data before asking for more. * Optimize finding width of a single character `wlen` finds the width of a multi-character string by adding up the width of each character, and then subtracting the width of any escape sequences. It's often called for single character strings, however, which can't possibly contain escape sequences. Optimize for that case. * Optimize disp_str for ASCII characters Since every ASCII character is known to display as single width, we can avoid not only the Unicode data lookup in `disp_str` but also the one hidden in `str_width` for them. * Speed up cursor movements in long pyrepl commands When the current pyrepl command buffer contains many lines, scrolling up becomes slow. We have optimizations in place to reuse lines above the cursor position from one refresh to the next, but don't currently try to reuse lines below the cursor position in the same way, so we wind up with quadratic behavior where all lines of the buffer below the cursor are recomputed each time the cursor moves up another line. Optimize this by only computing one screen's worth of lines beyond the cursor position. Any lines beyond that can't possibly be shown by the console, and bounding this makes scrolling up have linear time complexity instead. --------- Signed-off-by: Matt Wozniski <[email protected]> Co-authored-by: Pablo Galindo <[email protected]>

treyhunner added the type-bug An unexpected behavior, bug, or error label May 24, 2024

AlexWaygood added the topic-repl Related to the interactive shell label May 24, 2024

godlygeek mentioned this issue Jun 6, 2024

Deadlock when pasting >5 KiB into an application that doesn't read it fast enough microsoft/terminal#17384

Open

This was referenced Jun 7, 2024

gh-119517: Fix several issues when pasting lot of text in the REPL #120234

Closed

gh-119517: Fixes for pasting in pyrepl #120251

Closed

gh-119517: Fixes for pasting in pyrepl #120253

Merged

bedevere-app bot mentioned this issue Jun 11, 2024

[3.13] gh-119517: Fixes for pasting in pyrepl (GH-120253) #120353

Merged

pablogsal closed this as completed Jun 11, 2024

natedogith1 mentioned this issue Feb 20, 2025

Pasting in new REPL is slow #130328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd behavior from pasting large text in the new REPL #119517

Odd behavior from pasting large text in the new REPL #119517

treyhunner commented May 24, 2024 •

edited by bedevere-app bot

Loading

pablogsal commented Jun 4, 2024 •

edited

Loading

pablogsal commented Jun 4, 2024

godlygeek commented Jun 6, 2024

Odd behavior from pasting large text in the new REPL #119517

Odd behavior from pasting large text in the new REPL #119517

Comments

treyhunner commented May 24, 2024 • edited by bedevere-app bot Loading

Bug report

Bug description:

Blank lines

Paste performance and Ctrl+C

CPython versions tested on:

Operating systems tested on:

Linked PRs

pablogsal commented Jun 4, 2024 • edited Loading

pablogsal commented Jun 4, 2024

godlygeek commented Jun 6, 2024

treyhunner commented May 24, 2024 •

edited by bedevere-app bot

Loading

pablogsal commented Jun 4, 2024 •

edited

Loading