Skip to content

Commit 0c9a6c2

Browse files
committed
Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)
This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents. For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance. For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists). To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2] Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3] --- [1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB larger), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep. [2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate. [3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls. Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
1 parent 97dc048 commit 0c9a6c2

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

web/base_viewer.js

+7-2
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ const DEFAULT_CACHE_SIZE = 10;
5757
const PagesCountLimit = {
5858
FORCE_SCROLL_MODE_PAGE: 15000,
5959
FORCE_LAZY_PAGE_INIT: 7500,
60+
PAUSE_EAGER_PAGE_INIT: 500,
6061
};
6162

6263
/**
@@ -625,7 +626,7 @@ class BaseViewer {
625626
// Fetch all the pages since the viewport is needed before printing
626627
// starts to create the correct size canvas. Wait until one page is
627628
// rendered so we don't tie up too many resources early on.
628-
this._onePageRenderedOrForceFetch().then(() => {
629+
this._onePageRenderedOrForceFetch().then(async () => {
629630
if (this.findController) {
630631
this.findController.setDocument(pdfDocument); // Enable searching.
631632
}
@@ -650,7 +651,7 @@ class BaseViewer {
650651
return;
651652
}
652653
for (let pageNum = 2; pageNum <= pagesCount; ++pageNum) {
653-
pdfDocument.getPage(pageNum).then(
654+
const promise = pdfDocument.getPage(pageNum).then(
654655
pdfPage => {
655656
const pageView = this._pages[pageNum - 1];
656657
if (!pageView.pdfPage) {
@@ -671,6 +672,10 @@ class BaseViewer {
671672
}
672673
}
673674
);
675+
676+
if (pageNum % PagesCountLimit.PAUSE_EAGER_PAGE_INIT === 0) {
677+
await promise;
678+
}
674679
}
675680
});
676681

0 commit comments

Comments
 (0)