Skip to content

Enforce PAGE-scrolling for *very* large/long documents (bug 1588435, PR 11263 follow-up) #14324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 2, 2021

Conversation

Snuffleupagus
Copy link
Collaborator

This patch is essentially a continuation of PR #11263, which tried to improve loading/initialization performance of very large/long documents.

Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access.

Hence, in order to improve things for these very large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages exceed 15000 (which is hopefully rare in practice).
While this might feel a bit jarring to users being forced to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance.

Fixes bug 1588435, to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).

…olbar.js` (PR 9877 follow-up)

This was added on the assumption that the viewer would (eventually) start using the `PDFSinglePageViewer` for e.g. PAGE-scrolling mode and PresentationMode. However, having both a `PDFViewer` and a `PDFSinglePageViewer` side-by-side in the viewer would've been tricky to implement well, which is why PR 14112 implemented PAGE-scrolling for the general `BaseViewer` instead.

Given that the default viewer is no longer (potentially) going to use `PDFSinglePageViewer`, there's code in the `SecondaryToolbar` (and related CSS rules) which is now unnecessary.
…PR 11263 follow-up)

This patch is essentially a continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.

Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access.

Hence, in order to improve things for these *very* large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages *exceed* 15000 (which is hopefully rare in practice).
While this might feel a bit jarring to users being *forced* to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance.

Fixes [bug 1588435](https://bugzilla.mozilla.org/show_bug.cgi?id=1588435), to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).
@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/96544d351ca1920/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/96544d351ca1920/output.txt

Total script time: 4.48 mins

Published

@brendandahl
Copy link
Contributor

Thanks for looking into this. Another option I was considering is using what many of the infinite scroll libraries do, where you have a fixed number of page divs and as they go out of view when you scroll you recycle them and add them after the visible elements. The scroll bar can become jumpy with this approach, but I think it's also possible to create really big container that is the estimated size of all the pages.

@brendandahl
Copy link
Contributor

Of course the above would be a much bigger undertaking.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Nov 29, 2021

Of course the above would be a much bigger undertaking.

It would, and there'd likely be a bunch of edge-cases that would make all of that much more difficult than it seems at first.

Considering that these very large documents hopefully should not be all that common, I just figured that simply utilizing the now existing PAGE-scrolling mode seemed like a quick and easy win (especially compared to the current state of things).

@timvandermeij timvandermeij merged commit e9e4b91 into mozilla:master Dec 2, 2021
@timvandermeij
Copy link
Contributor

For such large documents, this seems like a very reasonable approach to me; in any case it's better than the document being inaccessible. If necessary there can always be a follow-up to implement a smarter method, but this should already improve the current situation. Thank you for doing this!

@MoYuXianR

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants