Skip to content

Commit de889ec

Browse files
committed
Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316)
In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```
1 parent 804abb3 commit de889ec

File tree

4 files changed

+26
-0
lines changed

4 files changed

+26
-0
lines changed

src/core/fonts.js

+19
Original file line numberDiff line numberDiff line change
@@ -2651,6 +2651,25 @@ class Font {
26512651
unicodeOrCharCode = MacRomanEncoding.indexOf(standardGlyphName);
26522652
}
26532653

2654+
if (unicodeOrCharCode === undefined) {
2655+
// Not a valid glyph name, fallback to using the /ToUnicode map
2656+
// when no post-table exists (fixes issue13316_reduced.pdf).
2657+
if (
2658+
!properties.glyphNames &&
2659+
properties.hasIncludedToUnicodeMap &&
2660+
!(this.toUnicode instanceof IdentityToUnicodeMap)
2661+
) {
2662+
const unicode = this.toUnicode.get(charCode);
2663+
if (unicode) {
2664+
unicodeOrCharCode = unicode.charCodeAt(0);
2665+
}
2666+
}
2667+
2668+
if (unicodeOrCharCode === undefined) {
2669+
continue; // No valid glyph mapping found.
2670+
}
2671+
}
2672+
26542673
for (let i = 0; i < cmapMappingsLength; ++i) {
26552674
if (cmapMappings[i].charCode !== unicodeOrCharCode) {
26562675
continue;

test/pdfs/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@
240240
!issue4304.pdf
241241
!issue4379.pdf
242242
!issue4550.pdf
243+
!issue13316_reduced.pdf
243244
!issue4575.pdf
244245
!bug1011159.pdf
245246
!issue5734.pdf

test/pdfs/issue13316_reduced.pdf

53.5 KB
Binary file not shown.

test/test_manifest.json

+6
Original file line numberDiff line numberDiff line change
@@ -4312,6 +4312,12 @@
43124312
"lastPage": 4,
43134313
"type": "load"
43144314
},
4315+
{ "id": "issue13316",
4316+
"file": "pdfs/issue13316_reduced.pdf",
4317+
"md5": "f5821891cee29d8de8b65e1efd6f4ceb",
4318+
"rounds": 1,
4319+
"type": "eq"
4320+
},
43154321
{ "id": "issue10519",
43164322
"file": "pdfs/issue10519_reduced.pdf",
43174323
"md5": "8a2dae43c0ef47b0734bedaaa24f8c09",

0 commit comments

Comments
 (0)