-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) #14359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) #14359
Conversation
…in the viewer (PR 11263 follow-up) This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents. For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance. For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists). To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2] Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3] --- [1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep. [2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate. [3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls. Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
0c9a6c2
to
90472e5
Compare
/botio integrationtest |
From: Bot.io (Windows)ReceivedCommand cmd_integrationtest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/3e2c77f0b7067e6/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_integrationtest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/1e31515c509e783/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/1e31515c509e783/output.txt Total script time: 2.49 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/3e2c77f0b7067e6/output.txt Total script time: 5.42 mins
|
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.241.84.105:8877/d23d305efb4aef5/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/d23d305efb4aef5/output.txt Total script time: 4.24 mins Published |
Let's do this; thanks! |
This patch is essentially another continuation of PR #11263, which tried to improve loading/initialization performance of very large/long documents.
For most documents, unless they're very long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead hurt performance.
For documents with a couple of thousand pages[1], the parsing and pre-rendering of the second page of the document can be delayed (quite a bit). The reason for this is that we trigger
PDFDocumentProxy.getPage
for all pages early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands ofgetPage
-calls and leaving very little time for other parsing (such as e.g. of operatorLists).To address this situation, this patch thus proposes temporarily "pausing" the eager
PDFDocumentProxy.getPage
-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2]Obviously this may slightly delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3]
[1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only one level deep.
[2] Please note that I initially considered simply chaining the
PDFDocumentProxy.getPage
-calls, however that'd slowed things down for all documents which didn't seem appropriate.[3] This patch will hopefully also make it possible to re-visit PR #11312, since it seems that changing
Catalog.getPageDict
to anasync
method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded bygetPage
-calls.Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having
Catalog.getPageDict
beasync
would simplify things a great deal.