Skip to content

Improve PDFDocument.checkLastPage/Catalog.getAllPageDicts for documents with corrupt XRef tables (PR 14311, 14335 follow-up) #14358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

  • Improve PDFDocument.checkLastPage for documents with corrupt XRef tables (PR 14311, 14335 follow-up)

    Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects before trying to invoke the Catalog.getAllPageDicts method.

  • Update Catalog.getAllPageDicts to always propagate the actual Errors (PR 14335 follow-up)

    Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
    Note that we purposely handle XRefEntryException specially, to make it possible to fallback to indexing all XRef objects.

…ables (PR 14311, 14335 follow-up)

Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects *before* trying to invoke the `Catalog.getAllPageDicts` method.
…s (PR 14335 follow-up)

Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/0f320a5cd8355f1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/46198cf50b74426/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/0f320a5cd8355f1/output.txt

Total script time: 21.24 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 6
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/0f320a5cd8355f1/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/46198cf50b74426/output.txt

Total script time: 41.80 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 8

Image differences available at: http://54.193.163.58:8877/46198cf50b74426/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij merged commit a6dd39b into mozilla:master Dec 11, 2021
@timvandermeij
Copy link
Contributor

Thank you for doing this!

@Snuffleupagus Snuffleupagus deleted the checkLastPage-improvements branch December 11, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants