-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Convert Catalog.getAllPageDicts
to an async
method
#14411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Catalog.getAllPageDicts
to an async
method
#14411
Conversation
The patch in PR 14335 *essentially* re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous. While this method is currently only used as a *fallback* in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the *entire* /Pages tree[1]. With this method now being asynchronous, we're able to handle fetching of References in a *much* easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything. These changes also let us simplify the call-site slightly, by calling the method *directly* instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s). Furthermore, this patch contains the following other changes: - Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead. - Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error *immediately* afterwards. --- [1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.
256207e
to
b0e774d
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/7119004af3fdacc/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/03ce5319099314d/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/7119004af3fdacc/output.txt Total script time: 22.35 mins
Image differences available at: http://54.241.84.105:8877/7119004af3fdacc/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/03ce5319099314d/output.txt Total script time: 30.14 mins
Image differences available at: http://54.193.163.58:8877/03ce5319099314d/reftest-analyzer.html#web=eq.log |
/botio-windows test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/11532ae0b0fcb47/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/11532ae0b0fcb47/output.txt Total script time: 42.47 mins
Image differences available at: http://54.193.163.58:8877/11532ae0b0fcb47/reftest-analyzer.html#web=eq.log |
Looks good! |
The patch in PR #14335 essentially re-introduced the old code from before PR #3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous.
While this method is currently only used as a fallback in corrupt documents, the way that
MissingDataException
s are handled is less than ideal. Note that if aMissingDataException
is thrown, we're forced to re-parse the entire /Pages tree[1].With this method now being asynchronous, we're able to handle fetching of References in a much easier/nicer way than before without having to throw
MissingDataException
s and re-parse anything.These changes also let us simplify the call-site slightly, by calling the method directly instead of using the
PDFManager
-instance (since again it will no longer throwMissingDataException
s).Furthermore, this patch contains the following other changes:
catch
handlers throughout the method, by simply moving theXRefEntryException
handling into theaddPageError
helper function instead.[1] Imagine e.g. a thousand page document, where there's a
MissingDataException
thrown when fetching/parsing page 900.