-
Notifications
You must be signed in to change notification settings - Fork 10.3k
When parsing corrupt documents without any trailer-dictionary, fallback to the "top"-dictionary (issue 14269) #14270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ck to the "top"-dictionary (issue 14269) There's obviously no guarantee that this will work in general, if the document is sufficiently corrupt, but it should hopefully be better than just throwing `InvalidPDFException` as currently happens. Please note that, as is often the case with corrupt documents, it's somewhat difficult to know if we're rendering the document "correctly" with this patch[1]. In this case even Adobe Reader cannot open the document, which is always a good sign that it's *really* corrupt, however we're at least able to render *something* with this patch. --- [1] Whatever "correct" even means when dealing with corrupt PDF documents, where often times different PDF viewers won't agree completely.
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/fa4fc7a58b48422/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/fa4fc7a58b48422/output.txt Total script time: 4.55 mins Published |
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/c1113461574aa69/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/7cfee903734d859/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/7cfee903734d859/output.txt Total script time: 23.97 mins
Image differences available at: http://54.241.84.105:8877/7cfee903734d859/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/c1113461574aa69/output.txt Total script time: 43.06 mins
Image differences available at: http://54.193.163.58:8877/c1113461574aa69/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could avoid an exception so likely it doesn't hurt to have this patch.
Thank you for doing this.
The worst thing that can happen as a result of the patch is that an Error is thrown later, e.g. during parsing/rendering, but we'll at least have a chance of opening a few more corrupt PDF documents this way; thanks for the review! /botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/078b83103b46365/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.193.163.58:8877/abf6c788f8349f1/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/078b83103b46365/output.txt Total script time: 21.70 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/abf6c788f8349f1/output.txt Total script time: 39.19 mins
|
There's obviously no guarantee that this will work in general, if the document is sufficiently corrupt, but it should hopefully be better than just throwing
InvalidPDFException
as currently happens.Please note that, as is often the case with corrupt documents, it's somewhat difficult to know if we're rendering the document "correctly" with this patch[1]. In this case even Adobe Reader cannot open the document, which is always a good sign that it's really corrupt, however we're at least able to render something with this patch.
[1] Whatever "correct" even means when dealing with corrupt PDF documents, where often times different PDF viewers won't agree completely.