Skip to content

Ensure that abbreviated dictionary keys always take precedence in inline images (issue 14256) #14257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

@brendandahl
Copy link
Contributor

Alternatively, should we reverse the get's? e.g. dict.get("BitsPerComponent", "BPC") to dict.get("BPC", "BitsPerComponent")?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Nov 9, 2021

Alternatively, should we reverse the get's? e.g. dict.get("BitsPerComponent", "BPC") to dict.get("BPC", "BitsPerComponent")?

Initially I thought about doing that instead, but it'd have required edits all over the code-base and seems much more error-prone with any future changes since you'd need to ensure that the correct ordering is maintained everywhere.
At least to me, it seems much simpler to just avoid adding the "bad" data to the Dict in the first place, especially since all of this is limited to just inline images.

Edit: Although looking at this again, I suppose that the patch as-is could miss some cases...

@Snuffleupagus Snuffleupagus marked this pull request as draft November 9, 2021 17:53
@Snuffleupagus Snuffleupagus marked this pull request as ready for review November 9, 2021 19:11
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Nov 9, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/9c4c6a83d1a9001/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 9, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/448ae22931c3b71/output.txt

@brendandahl
Copy link
Contributor

Based of this comment it makes me think we should always do short name first even for other dictionaries. We could enforce this when using .get() with multiple arguments.

@pdfjsbot
Copy link

pdfjsbot commented Nov 9, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/448ae22931c3b71/output.txt

Total script time: 23.46 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 7

Image differences available at: http://54.241.84.105:8877/448ae22931c3b71/reftest-analyzer.html#web=eq.log

@brendandahl
Copy link
Contributor

There's also Table 94 – Additional Abbreviations in an Inline Image Object in the 2008 spec.

@pdfjsbot
Copy link

pdfjsbot commented Nov 9, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/9c4c6a83d1a9001/output.txt

Total script time: 42.28 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 11

Image differences available at: http://54.193.163.58:8877/9c4c6a83d1a9001/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator Author

Based of this comment it makes me think we should always do short name first even for other dictionaries. We could enforce this when using .get() with multiple arguments.

@brendandahl So I'm guessing that you'd prefer something along these lines instead: bfae1a3?w=1

Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Nov 10, 2021
…onary lookup (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Nov 10, 2021
…onary lookup (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Nov 10, 2021
…nary lookups (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Nov 10, 2021
…nary lookups (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Nov 10, 2021
…nary lookups (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
@Snuffleupagus Snuffleupagus deleted the issue-14256 branch November 11, 2021 22:47
bh213 pushed a commit to bh213/pdf.js that referenced this pull request Jun 3, 2022
…nary lookups (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
rousek pushed a commit to signosoft/pdf.js that referenced this pull request Aug 10, 2022
…nary lookups (issue 14256)

Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in mozilla#14257 (comment), it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline images with both abbreviated and full keys
4 participants