-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Implement TrueType character map "format 2" (fixes #14117) #14118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Snuffleupagus This lints cleanly and works on a small sample of my test files that I have exercised it with. Tomorrow I will endeavor to create an automated tests and will manually test with the remaining files from my sample of known broken files. Years of coding to the PDF spec turns out to be good preparation for implementing format 2 from the Apple spec you referenced. In addition to just loading up PDF files, I ran this in the debugger and verified that the character map as loaded matched what this little Python script produces. This script uses freetype. This is just FYI. #!/usr/bin/env python3
import freetype
import sys
import os
import PIL.Image as Image
whoami = os.path.basename(sys.argv[0])
try:
filename = sys.argv[1]
except IndexError:
exit(f'Usage: {whoami} font-filename')
face = freetype.Face(filename)
for i, c in enumerate(face.charmaps):
print(f'charmap {i}:', c.cmap_format)
print('asc/desc:', face.ascender, face.descender)
print('bbox:', face.bbox.xMin, face.bbox.yMin, face.bbox.xMax, face.bbox.yMax)
print('num_glyphs:', face.num_glyphs)
for i, c in enumerate(face.charmaps):
print(f'charmap {i}: {c.encoding_name}')
face.set_char_size(4096)
for i in face.get_chars():
char, glyph = i
print(chr(char), hex(glyph).replace('0x', ''))
face.load_glyph(glyph)
bitmap = face.glyph.bitmap
width = bitmap.width
height = bitmap.rows
try:
img = Image.frombytes(
"L", (width, height), bytes(bytearray(bitmap.buffer)))
img.save(f'/tmp/z/glyph-{glyph}.png')
except Exception as e:
print(char, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lints cleanly and works on a small sample of my test files that I have exercised it with.
Did you manage to run the PDF.js test-suite successfully with these changes?
Please see https://github.com/mozilla/pdf.js/wiki/Contributing#4-run-lint-and-testing
Overall, this looks really good and seems to follow the specification nicely!
I've added a couple of comments, and please remember to squash the commits when addressing them.
I did. There were some failures, but there were also some failures when I just straight up ran the test suite after running makeref. I will study closely to make sure there are no regressions. Of course, this is only adding code to a place that would have otherwise caused a failure, so regressions are highly unlikely. Famous last words though. I'll check. |
Comments addressed. I will retest, check the automated test results, squash commits, and remove draft status. |
Everything checks out for me locally. I'm just looking at automated tests now. |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/3ecef91c370183e/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/3ecef91c370183e/output.txt Total script time: 4.33 mins Published |
Please note that we only run a subset of all tests with GitHub Actions (mostly the unit-test), and that the full test-suite needs to be triggered manually (by approved users). /botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/6b115664a3836d2/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/7394175ad690d94/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/6b115664a3836d2/output.txt Total script time: 3.93 mins
Image differences available at: http://54.241.84.105:8877/6b115664a3836d2/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/7394175ad690d94/output.txt Total script time: 5.45 mins
Image differences available at: http://54.193.163.58:8877/7394175ad690d94/reftest-analyzer.html#web=eq.log |
If a PDF included an embedded TrueType font whose preferred character map (cmap) was in "format 2", the code would select that character map and then refuse to read it because of an unsupported format, thus causing the characters not to be rendered. This commit implements support for format 2 as described at the link below. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
@Snuffleupagus This is good from my end. The image tests that fail are eq test that contain DCTDecode (JPEG) images. There shouldn't be eq tests with JPEG images. They will not render consistently. I used to try to use heuristics like downsampling and converting to bitonal with thresholding, but these are not reliable. My recommendation is that JPEG images in the test PDFs be replaced with images compressed with a non-lossy compression scheme whose behavior isn't dependent on floating point arithmetic. You can see some image test failures if you just check out master, run In any case, my local test files are all good and unit tests are passing again, so unless you have anything else you need me to do, I'm ready to have this merged when you are. What does your release process look like? If I need this fix for my local dev, do I need to patch it, or will a fix be released soon? Thanks. |
@Snuffleupagus BTW, I greatly appreciate the quick review and feedback. Also Mr. Snuffleupagus was always one of my favorite Sesame Street characters, and it was a great day indeed when Big Bird was vindicated about his existence. Sadly I was already an adult by then and missed the actual event. |
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/2f02e54c869c4d3/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/87e134431f7ed77/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/2f02e54c869c4d3/output.txt Total script time: 22.11 mins
Image differences available at: http://54.241.84.105:8877/2f02e54c869c4d3/reftest-analyzer.html#web=eq.log |
/botio-linux browsertest |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/59677b28d22d398/output.txt |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/87e134431f7ed77/output.txt Total script time: 40.97 mins
Image differences available at: http://54.193.163.58:8877/87e134431f7ed77/reftest-analyzer.html#web=eq.log |
/botio-linux browsertest |
From: Bot.io (Linux m4)ReceivedCommand cmd_browsertest from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.241.84.105:8877/8e10a2fdc572b81/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/59677b28d22d398/output.txt Total script time: 17.88 mins
Image differences available at: http://54.241.84.105:8877/59677b28d22d398/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/8e10a2fdc572b81/output.txt Total script time: 19.52 mins
Image differences available at: http://54.241.84.105:8877/8e10a2fdc572b81/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me, looks great; thank you for the patch!
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/1906d3c397232bb/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.193.163.58:8877/1eb54024c8791ec/output.txt |
While there's unfortunately a number of intermittent failures, I don't think they have anything to do with JPEG images in particular. Note that all image formats are decoded/handled internally by the PDF.js library, see e.g. https://github.com/mozilla/pdf.js/blob/master/src/core/jpg.js, hence there's nothing inherently special about JPEG images here.
Sorry, but I don't understand how this would be appropriate/correct. First of all that'd render every test-case for the PDF.js built-in JPEG decoder useless, and secondly it wouldn't really work for any of the linked test-case.
We don't have a set release schedule, unfortunately. |
Regarding intermittent image failures, you can just disregard my comments. I spoke prematurely and didn't think it through all the way. I was remember my own experiences with jpeg tests across platforms, but even that wouldn't apply to running it twice on the same machine back to back. So never mind about the images. Thanks for the reply on the release schedule. I'll find a way to patch locally and will try to figure out a way to get automatically notified when the next release is out. |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/1906d3c397232bb/output.txt Total script time: 20.15 mins
|
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/1eb54024c8791ec/output.txt Total script time: 31.55 mins
|
/botio-windows makeref |
If a PDF included an embedded TrueType font whose preferred character
map (cmap) was in "format 2", the code would select that character map
and then refuse to read it because of an unsupported format, thus
causing the characters not to be rendered. This commit implements
support for format 2 as described at the link below.
https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html