-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Subtract start offset for xrefs in recovery mode #6194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtract start offset for xrefs in recovery mode #6194
Conversation
Thanks for the fix and detailed explanation Rob, I'm learning more about pdf.js and maybe next time I'll bee able to submit PRs like this myself! |
@timvandermeij I'm assigning this to you for review since you touched the xref offset logic in 026c45e. |
Xref offsets are relative to the start of the PDF data, not to the start of the PDF file. This is clear if you look at the other code: - In the XRef's readXRefTable and processXRefTable methods of XRef, the offset of a xref entry is set to the bytes as given by a PDF file. These values are always relative to the start of the PDF file (%PDF-). - The XRef's readXRef method adds the start offset of the stream to Xref entry's offset: "stream.pos = startXRef + stream.start". Clearly, this line assumes that the entry offset excludes the start offset. However, when the PDF is parsed in recovery mode, the xref table is filled with entries whose offset is relative to the start of the stream rather than the PDF file. This is incorrect, and the fix is to subtract the start offset of the stream from the entry's byte offset. The manually created PDF file serves as a regression test. It is a valid PDF, except: - The integer to point to the start of the xref table and the %%EOF trailer are missing. This will activate recovery mode in PDF.js - Some junk was added before the start of the PDF file. This exposes the bad offset bug.
ba5b7e7
to
fd29bb0
Compare
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/f7e7336f7f94a64/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/f7e7336f7f94a64/output.txt Total script time: 0.64 mins Published |
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c1f371e59de3dee/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/1db173a869fc588/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/c1f371e59de3dee/output.txt Total script time: 18.47 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/1db173a869fc588/output.txt Total script time: 18.88 mins
|
/botio makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c7681665c21bcc4/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/ba1e4a2b633a464/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/c7681665c21bcc4/output.txt Total script time: 18.63 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/ba1e4a2b633a464/output.txt Total script time: 18.96 mins
|
Subtract start offset for xrefs in recovery mode
Thank you, Rob! |
Xref offsets are relative to the start of the PDF data, not to the start of the PDF file. This is clear if you look at the other code:
However, when the PDF is parsed in recovery mode, the xref table is filled with entries whose offset is relative to the start of the stream rather than the PDF file. This is incorrect, and the fix is to subtract the start offset of the stream from the entry's byte offset. Otherwise you'll get a "Bad XRef entry" error.
Fixes #6069.