Skip to content

Subtract start offset for xrefs in recovery mode #6194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2015

Conversation

Rob--W
Copy link
Member

@Rob--W Rob--W commented Jul 10, 2015

Xref offsets are relative to the start of the PDF data, not to the start of the PDF file. This is clear if you look at the other code:

  • In the XRef's readXRefTable and processXRefTable methods of XRef, the offset of a xref entry is set to the bytes as given by a PDF file. These values are always relative to the start of the PDF file (%PDF-).
  • The XRef's readXRef method adds the start offset of the stream to Xref entry's offset. Clearly, this line assumes that the entry offset excludes the start offset.

However, when the PDF is parsed in recovery mode, the xref table is filled with entries whose offset is relative to the start of the stream rather than the PDF file. This is incorrect, and the fix is to subtract the start offset of the stream from the entry's byte offset. Otherwise you'll get a "Bad XRef entry" error.

Fixes #6069.

@Rob--W Rob--W added the core label Jul 10, 2015
@beveradb
Copy link

Thanks for the fix and detailed explanation Rob, I'm learning more about pdf.js and maybe next time I'll bee able to submit PRs like this myself!

@Rob--W
Copy link
Member Author

Rob--W commented Jul 10, 2015

@timvandermeij I'm assigning this to you for review since you touched the xref offset logic in 026c45e.

Xref offsets are relative to the start of the PDF data, not to the start
of the PDF file. This is clear if you look at the other code:

- In the XRef's readXRefTable and processXRefTable methods of XRef, the
  offset of a xref entry is set to the bytes as given by a PDF file.
  These values are always relative to the start of the PDF file (%PDF-).

- The XRef's readXRef method adds the start offset of the stream to
  Xref entry's offset: "stream.pos = startXRef + stream.start".
  Clearly, this line assumes that the entry offset excludes the start
  offset.

However, when the PDF is parsed in recovery mode, the xref table is
filled with entries whose offset is relative to the start of the stream
rather than the PDF file. This is incorrect, and the fix is to subtract
the start offset of the stream from the entry's byte offset.

The manually created PDF file serves as a regression test. It is a valid
PDF, except:
- The integer to point to the start of the xref table and the %%EOF
  trailer are missing. This will activate recovery mode in PDF.js
- Some junk was added before the start of the PDF file. This exposes the
  bad offset bug.
@Rob--W Rob--W force-pushed the recover-mode-start-offset branch from ba5b7e7 to fd29bb0 Compare July 10, 2015 21:33
@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/f7e7336f7f94a64/output.txt

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c1f371e59de3dee/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/1db173a869fc588/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/c1f371e59de3dee/output.txt

Total script time: 18.47 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/1db173a869fc588/output.txt

Total script time: 18.88 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c7681665c21bcc4/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/ba1e4a2b633a464/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/c7681665c21bcc4/output.txt

Total script time: 18.63 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/ba1e4a2b633a464/output.txt

Total script time: 18.96 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

timvandermeij added a commit that referenced this pull request Jul 11, 2015
Subtract start offset for xrefs in recovery mode
@timvandermeij timvandermeij merged commit 7d4303b into mozilla:master Jul 11, 2015
@timvandermeij
Copy link
Contributor

Thank you, Rob!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDF doesn't render at all (bad XRef entry)
4 participants