-
Notifications
You must be signed in to change notification settings - Fork 10.3k
/Encoding prevents characters in a specific font from rendering but they do in ghostscript, chrome, and acrobat #14117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I should say that my "plan A" is to come up with (or have you come up with) a fix to pdf.js, and my "plan B" is to figure out exactly what properties of the file make it render properly so I can programmatically detect broken files and run them through ghostscript's pdfwrite device. Passing the files |
As far as I'm concerned this is a red herring, since replacing the /Encoding-entry with bogus data results in another code-path being taken in the
That warning should thus explain the font rendering errors, since we currently don't support that particular All-in-all, it seems that the correct solution here would be to implement support for |
Before I read your message, I noticed that it is the format 2 issue, and I found that the following workaround causes the file to render properly because the 1,0 cmap works: diff --git a/src/core/fonts.js b/src/core/fonts.js
index aeaf00e82..758c4cf31 100644
--- a/src/core/fonts.js
+++ b/src/core/fonts.js
@@ -1452,6 +1452,20 @@ class Font {
}
}
+ if (useTable) {
+ const oldPos = file.pos;
+ file.pos = start + offset;
+ const format = file.getUint16();
+ file.pos = oldPos;
+ if (!(format === 0) || (format === 4) || (format === 6)) {
+ // This cmap has an unsupported format, so we won't be
+ // able to use it. The list of supported formats is
+ // duplicated below.
+ useTable = false;
+ canBreak = false;
+ }
+ }
+
if (useTable) {
potentialTable = {
platformId, I may apply this locally for now. It's pretty harmless since, if the format is not supported, the cmap that it picks won't work, but it's a little ugly. I haven't opened a pull request because I doubt you would consider this a proper fix. I will look into the format 2 and see what I can do. |
This file has only English text encoded in ASCII, though the font has a few characters that fall out of ASCII. Reading up on format 2, it seems like a very odd choice. However, I will continue to study it. |
I will be able to give you a pull request implementing support for format 2. |
My changes are working on my test files. I need to create a test, which I will do tomorrow. I'll push up a draft pull request without tests for early review. |
Implement TrueType character map "format 2" (fixes #14117)
If a PDF included an embedded TrueType font whose preferred character map (cmap) was in "format 2", the code would select that character map and then refuse to read it because of an unsupported format, thus causing the characters not to be rendered. This commit implements support for format 2 as described at the link below. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
If a PDF included an embedded TrueType font whose preferred character map (cmap) was in "format 2", the code would select that character map and then refuse to read it because of an unsupported format, thus causing the characters not to be rendered. This commit implements support for format 2 as described at the link below. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
Note: I am the author of qpdf and very knowledgeable about PDF, but I have only just started digging into TrueType fonts while investigating this issue. I have passable Javascript skill but it is not my area of expertise. However I am happy to assist in producing a fix. I am going to dig into the code, but I thought I'd post the issue right away in case it's something that would be easily fixable by someone with more knowledge. Details below.
Attach (recommended) or Link to PDF file here:
ttf-font-encoding.pdf
Configuration:
Steps to reproduce the problem:
gulp server
, then load the fileWhat is the expected behavior? (add screenshot)
This is a screenshot of the file as rendered by chrome:
This is as rendered by ghostscript:
Poppler also can't render this. Here is the file as rendered by evince:
What went wrong? (add screenshot)
In the attached PDF, which is in "QDF" format and can be easily edited in a text editor that can handle binary files (like emacs), you can find two fonts defined: /F31 (object 6, base font
/FSVHNM+Arial
) and /F35 (object 7, base font/QIJLAK+Calibri
). All the characters displayed with /F31 do not render. All the characters displayed with /F35 render properly. I have removed almost all extraneous information from the PDF file but have left in all the text from the original file (after removing sensitive information) rendered in either of those two fonts. Neither font has a /ToUnicode map, so it is easy to read the text the content stream.If you edit object 6 to comment out the encoding (replace the space at offset 6752 with
%
), then the file renders properly with pdf.js as well as poppler. The fonts have /Flags 32, indicating a non-symbolic font. Removing /Flags has no bearing on the rendering.If you extract the fontfile from the broken font from object 12 into a file, you can observe that the font file has two charmaps, one of which has format 0 and encoding "Apple Roman", and the other has format 2 and encoding "Unicode". When loading in pdf.js with the Javascript console displayed, you can observe these warnings:
However, I'm not sure this is actually important since the file renders properly using presumably the other charmap with /Encoding removed.
Looking at the debugging out from
gs -dNODISPLAY -dBATCH -dTTFDEBUG /tmp/ttf-font-encoding.pdf
, it appears that ghostscript is deciding to use the builtin encoding from the cmap and is disregarding/Encoding
, but I'm not sure, and I have intentionally not dug into the ghostscript code because it is GPL-2 and I don't want it to contaminate my thinking if I help with a fix.Anything else I would say would be well into speculative territory at this point. Hopefully someone will be able to shed some light on this and help find a solution. My hunch is that we are dealing with an incorrect PDF or an incorrect TTF file that some other viewers are able to handle because of heuristics they have to work around broken files. I know from qpdf that a lot of the work of PDF readers is dealing with all the broken files in the wild, since for most of the world, "It works in Acrobat" seems to mean the PDF is good. :-)
The text was updated successfully, but these errors were encountered: