-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[api-minor] Reduce postMessage
overhead, in PartialEvaluator.getTextContent
, by sending text chunks in batches (issue 13962)
#13977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Reduce postMessage
overhead, in PartialEvaluator.getTextContent
, by sending text chunks in batches (issue 13962)
#13977
Conversation
c4602e6
to
e47b6a9
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/3499b2436b2e25f/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/3499b2436b2e25f/output.txt Total script time: 4.44 mins Published |
…xtContent`, by sending text chunks in batches (issue 13962) Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1] Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building. While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used. That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2] Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed). While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality. Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively. --- [1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large. [2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257. Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here. [3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
…xtContent"-handler The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here. Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.
e47b6a9
to
45ddb12
Compare
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/30ce54cd4242922/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/9b06e67d4ef85b6/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/30ce54cd4242922/output.txt Total script time: 22.01 mins
Image differences available at: http://54.241.84.105:8877/30ce54cd4242922/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/9b06e67d4ef85b6/output.txt Total script time: 39.49 mins
Image differences available at: http://54.193.163.58:8877/9b06e67d4ef85b6/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.241.84.105:8877/a38a03a3af1a951/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/a38a03a3af1a951/output.txt Total script time: 4.67 mins Published |
Nice work! |
@timvandermeij As always, thanks a lot for reviews :-) Off-topic: How do you feel about a new release soon, since there's been a fair number of |
Following the STR in the issue, this patch reduces the number of
PartialEvaluator.getTextContent
-relatedpostMessage
-calls by approximately 78 percent.[1]Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all
textLayer
building.While working on these changes I noticed, thanks to our unit-tests, that the implementation of the
appendEOL
function unfortunately means that the number and contents of the textItems could actually be affected by the particular chunking used.That seems extremely unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies
appendEOL
to mitigate that.[2]Given that this patch adds a minimum batch size in
enqueueChunk
, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during
textLayer
building. Note how both thePartialEvaluator
andCanvasGraphics
implementations utilizeDate.now()
-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In thetextLayer
building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a simple stand-in for that functionality.Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
Improves and/or fixes #13962
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the
PartialEvaluator.getTextContent
-code to ensure that the batches don't become too large.[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR #13257.
Looking closer at the
appendEOL
function it should potentially be changed even more, however that should probably not be done here.[3] I'd really like to avoid implementing something like that for the
textLayer
building as well, given that it'd require adding a fair bit of complexity.