Skip to content

PDF rendering is quite slow with some PDFs, compared to other viewers #14652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
soadzoor opened this issue Mar 9, 2022 · 11 comments · Fixed by #14777
Closed

PDF rendering is quite slow with some PDFs, compared to other viewers #14652

soadzoor opened this issue Mar 9, 2022 · 11 comments · Fixed by #14777

Comments

@soadzoor
Copy link

soadzoor commented Mar 9, 2022

Attach (recommended) or Link to PDF file here:
Floor2.pdf

Configuration:

  • Web browser and its version: Chrome 99.0.4844.51
  • Operating system and its version: macOS monterey 12.2.1 (21D62)
  • PDF.js version: newest
  • Is a browser extension: no

Steps to reproduce the problem:

  1. Open the attached PDF with the official viewer: https://mozilla.github.io/pdf.js/web/viewer.html
  2. Try zooming in/out, move the PDF around, etc.

What is the expected behavior? (add screenshot)
It should be quicker. I tried opening the same PDF with "preview" on mac, and with chrome, and they both render the PDF much-much faster, without any noticable loading/hanging.

What went wrong? (add screenshot)
It takes quite a while to load up/render the PDF, and then when I start zooming in and out, it often just turns white, and it hangs for 5-10 seconds before I can see something again.

By the way, I noticed this issue in a complex project, where we use the pdf.js javascript API to render certain parts of the PDF (with a defined viewbox), but it's much easier to reproduce the issue with the official PDF viewer. I sometimes had performance issues before, but in those cases other viewers weren't really any better for those PDFs. With this particular PDF, other viewers/renderers can outperform PDF.js by a lot.

So maybe someone here can find the reason behind it, and even fix it.

@Snuffleupagus
Copy link
Collaborator

That PDF document contains a huge amount of path rendering operators, which likely explains why this is a bit slow since that's something that's not entirely easy to optimize (and it thus looks a tiny bit similar to e.g. issue #10565).

@soadzoor
Copy link
Author

soadzoor commented Mar 9, 2022

@Snuffleupagus how do you check the number of paths in a PDF? Also, any idea how to reduce that number, without visually affecting anything? I mean offline solutions, of course, not realtime ones. Although those would be the best, but I believe those are just too complicated, as you mentioned.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Mar 9, 2022

how do you check the number of paths in a PDF?

I didn't bother counting, but simply eyeballed what's in the /Contents-streams and noticed a very large number of path operators. (Hence I figured that it may be relevant here, since we've seen before that it can lead to poor performance.)

Also, any idea how to reduce that number, without visually affecting anything? I mean offline solutions, of course, not realtime ones.

Unfortunately we cannot help with PDF creation here, since that's not really something that the PDF.js library does; hence the only information we can provide is https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#optimize

Although those would be the best, but I believe those are just too complicated, as you mentioned.

Please note that my comment was only regarding the PDF.js library itself, since I cannot see any simple solution here.

@soadzoor
Copy link
Author

soadzoor commented Mar 9, 2022

@Snuffleupagus where is /Contents-streams ? Is that somehow accessible from the javascript API? I can't find it :(

@timvandermeij
Copy link
Contributor

@soadzoor This refers to the built-in structure of the PDF file. You can open the file with a PDF browser, for example https://brendandahl.github.io/pdf.js.utils/browser/, to see the inner structure of the PDF file. In the tree there is also a /Contents field.

@soadzoor
Copy link
Author

soadzoor commented Mar 10, 2022

@timvandermeij Hm, but on the linked page, I can only see entries like

Filter = /FlateDecode
Length = 3618135

How do I check how many path rendering operators are there? In #10565, someone even left a comment with the exact number of paths (7.8 million). If I could retrieve that information, I could warn the users at least, that they're trying to render a complex PDF, so they should expect performance problems.

The question is: how can I retrieve that information?

@Snuffleupagus
Copy link
Collaborator

How do I check how many path rendering operators are there?

You'd essentially have to manually instrument the code, if you want a fully accurate number (w.r.t. the PDF document itself), since no such functionality is provided (there's no general use-case that warrants adding such a feature).

Although, I suppose that it's also possible to use the getOperatorList-method and then loop through the operatorList to check for OPS.constructPath and using its args length to calculate the total number of path operators. Please see

pdf.js/src/display/api.js

Lines 1537 to 1546 in ee39499

/**
* @param {GetOperatorListParameters} params - Page getOperatorList
* parameters.
* @returns {Promise<PDFOperatorList>} A promise resolved with an
* {@link PDFOperatorList} object that represents the page's operator list.
*/
getOperatorList({
intent = "display",
annotationMode = AnnotationMode.ENABLE,
} = {}) {
and

pdf.js/src/display/api.js

Lines 1212 to 1219 in ee39499

/**
* PDF page operator list.
*
* @typedef {Object} PDFOperatorList
* @property {Array<number>} fnArray - Array containing the operator functions.
* @property {Array<any>} argsArray - Array containing the arguments of the
* functions.
*/

@soadzoor
Copy link
Author

Thanks! I played around with this, and it seems the PDF file above has 54 127 path rendering operators (I simply filter the operatorlist.fnArray for OPS.constructPath to get this number).

That doesn't seem like a lot. For example the following PDF has more than 3 times as many path rendering operators (167 160), yet it's still faster to render:
167160.pdf

Or this one has more than 1.2 million path rendering operators, yet it still seems to be faster to render than the original one in the first post:
murietta.pdf

So I don't think there's a linear correlation between the number of path rendering operators and performance. I think there's more to this 🤔

@Snuffleupagus
Copy link
Collaborator

I played around with this, and it seems the PDF file above has 54 127 path rendering operators (I simply filter the operatorlist.fnArray for OPS.constructPath to get this number).

Note that you've underestimated the actual number by at least one order of magnitude, since only counting OPS.constructPath isn't accurate given that each one can contain lots of actual path operators.
As mentioned in #14652 (comment) you need to count the args (i.e. the contents of the argsArray) instead; although I don't think putting time/effort into that really helps fixing this issue :-)

@luopeihai
Copy link

you can try pdf slice loading, very fast: Introduces the address

@THausherr
Copy link
Contributor

It renders in about 3 seconds on Windows, and one can see that something is happening. My PC is from 2017 but with lots of memory. I don't see this as a performance problem at all.

If I could retrieve that information, I could warn the users at least, that they're trying to render a complex PDF, so they should expect performance problems.

this would take longer than rendering this PDF itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants