Subtract start offset for xrefs in recovery mode #6194

Rob--W · 2015-07-10T18:42:45Z

Xref offsets are relative to the start of the PDF data, not to the start of the PDF file. This is clear if you look at the other code:

In the XRef's readXRefTable and processXRefTable methods of XRef, the offset of a xref entry is set to the bytes as given by a PDF file. These values are always relative to the start of the PDF file (%PDF-).
The XRef's readXRef method adds the start offset of the stream to Xref entry's offset. Clearly, this line assumes that the entry offset excludes the start offset.

However, when the PDF is parsed in recovery mode, the xref table is filled with entries whose offset is relative to the start of the stream rather than the PDF file. This is incorrect, and the fix is to subtract the start offset of the stream from the entry's byte offset. Otherwise you'll get a "Bad XRef entry" error.

Fixes #6069.

beveradb · 2015-07-10T18:48:27Z

Thanks for the fix and detailed explanation Rob, I'm learning more about pdf.js and maybe next time I'll bee able to submit PRs like this myself!

Rob--W · 2015-07-10T18:55:00Z

@timvandermeij I'm assigning this to you for review since you touched the xref offset logic in 026c45e.

Xref offsets are relative to the start of the PDF data, not to the start of the PDF file. This is clear if you look at the other code: - In the XRef's readXRefTable and processXRefTable methods of XRef, the offset of a xref entry is set to the bytes as given by a PDF file. These values are always relative to the start of the PDF file (%PDF-). - The XRef's readXRef method adds the start offset of the stream to Xref entry's offset: "stream.pos = startXRef + stream.start". Clearly, this line assumes that the entry offset excludes the start offset. However, when the PDF is parsed in recovery mode, the xref table is filled with entries whose offset is relative to the start of the stream rather than the PDF file. This is incorrect, and the fix is to subtract the start offset of the stream from the entry's byte offset. The manually created PDF file serves as a regression test. It is a valid PDF, except: - The integer to point to the start of the xref table and the %%EOF trailer are missing. This will activate recovery mode in PDF.js - Some junk was added before the start of the PDF file. This exposes the bad offset bug.

timvandermeij · 2015-07-10T21:43:14Z

/botio-linux preview

pdfjsbot · 2015-07-10T21:43:14Z

From: Bot.io (Linux)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/f7e7336f7f94a64/output.txt

pdfjsbot · 2015-07-10T21:43:53Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/f7e7336f7f94a64/output.txt

Total script time: 0.64 mins

Published

timvandermeij · 2015-07-10T21:47:04Z

/botio test

pdfjsbot · 2015-07-10T21:47:05Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c1f371e59de3dee/output.txt

pdfjsbot · 2015-07-10T21:47:05Z

From: Bot.io (Linux)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1db173a869fc588/output.txt

pdfjsbot · 2015-07-10T22:05:33Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/c1f371e59de3dee/output.txt

Total script time: 18.47 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2015-07-10T22:05:57Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/1db173a869fc588/output.txt

Total script time: 18.88 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

timvandermeij · 2015-07-11T15:02:55Z

/botio makeref

pdfjsbot · 2015-07-11T15:02:55Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c7681665c21bcc4/output.txt

pdfjsbot · 2015-07-11T15:02:55Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/ba1e4a2b633a464/output.txt

pdfjsbot · 2015-07-11T15:21:33Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/c7681665c21bcc4/output.txt

Total script time: 18.63 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2015-07-11T15:21:53Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/ba1e4a2b633a464/output.txt

Total script time: 18.96 mins

Lint: Passed
Make references: Passed
Check references: Passed

Subtract start offset for xrefs in recovery mode

timvandermeij · 2015-07-11T15:22:15Z

Thank you, Rob!

Rob--W added the core label Jul 10, 2015

Rob--W assigned timvandermeij Jul 10, 2015

Rob--W force-pushed the recover-mode-start-offset branch from ba5b7e7 to fd29bb0 Compare July 10, 2015 21:33

timvandermeij added a commit that referenced this pull request Jul 11, 2015

Merge pull request #6194 from Rob--W/recover-mode-start-offset

7d4303b

Subtract start offset for xrefs in recovery mode

timvandermeij merged commit 7d4303b into mozilla:master Jul 11, 2015

divergentdave mentioned this pull request Sep 29, 2020

Make xref offsets relative to file header pdf-rs/pdf#60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtract start offset for xrefs in recovery mode #6194

Subtract start offset for xrefs in recovery mode #6194

Rob--W commented Jul 10, 2015

beveradb commented Jul 10, 2015

Rob--W commented Jul 10, 2015

timvandermeij commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

timvandermeij commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

timvandermeij commented Jul 11, 2015

pdfjsbot commented Jul 11, 2015

pdfjsbot commented Jul 11, 2015

pdfjsbot commented Jul 11, 2015

pdfjsbot commented Jul 11, 2015

timvandermeij commented Jul 11, 2015

Subtract start offset for xrefs in recovery mode #6194

Subtract start offset for xrefs in recovery mode #6194

Conversation

Rob--W commented Jul 10, 2015

beveradb commented Jul 10, 2015

Rob--W commented Jul 10, 2015

timvandermeij commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Jul 10, 2015

From: Bot.io (Linux)

Success

Published

timvandermeij commented Jul 10, 2015

pdfjsbot commented Jul 10, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 10, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Jul 10, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Jul 10, 2015

From: Bot.io (Linux)

Success

timvandermeij commented Jul 11, 2015

pdfjsbot commented Jul 11, 2015

From: Bot.io (Windows)

Received

pdfjsbot commented Jul 11, 2015

From: Bot.io (Linux)

Received

pdfjsbot commented Jul 11, 2015

From: Bot.io (Windows)

Success

pdfjsbot commented Jul 11, 2015

From: Bot.io (Linux)

Success

timvandermeij commented Jul 11, 2015