Skip to content

The text is cut off on Japanese documents. #13343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue May 6, 2021 · 7 comments · Fixed by #13347
Closed

The text is cut off on Japanese documents. #13343

ghost opened this issue May 6, 2021 · 7 comments · Fixed by #13347

Comments

@ghost
Copy link

ghost commented May 6, 2021

Attach (recommended) or Link to PDF file here:
jpa2021506744.pdf

Configuration:

  • Web browser and its version: Google Chrome v89.0.4389.114 /IE ver 20H2 (OS build 19042.928)/Microsoft Edge 89.0.774.77 (64 bit)
  • Operating system and its version: Windows 10 Pro 20H2 19042.928
  • PDF.js version: 2.7.570
  • Is a browser extension: No

Steps to reproduce the problem:
As you can see from the attached jpa2021506744 jpa2021506744.pdf page 15,
When using PDF.js, the text is cut off on Japanese documents.
What went wrong? I hope to solve this problem.

I tried to improve after seeing these pages.

#12110
#13067

Adobe Reader does not have this problem.

What is the expected behavior? (add screenshot)
jpa_2021_506744_OK

What went wrong? (add screenshot)
jpa_2021_506744_NG

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

@Snuffleupagus
Copy link
Collaborator

Your PDF document uses non-standard fonts, but doesn't embed any of them, which means that the PDF document itself is violating the PDF specification and it simply cannot be guaranteed that the document will render as intended in all viewers.

Adobe Reader does not have this problem.

That's because you (apparently) have the necessary fonts installed locally, but unless you do it's even more broken in Adobe Reader (with nothing rendering) than in the PDF.js library.


Please note that the whole point of the "Portable Document Format" is that files should be portable, and by not embedding fonts you're basically guaranteeing that problems can/will occur.
This is a bug in your PDF document, rather than in the PDF.js library, and as such this unfortunately doesn't look actionable/valid here.

@ghost
Copy link
Author

ghost commented May 7, 2021

Thank you for your reply.

Let me ask you an additional question. Is it possible to specify the font in PDF.js?

After receiving your reply, when I looked at the fonts from the PDF properties (by Adobe Reader), I found that the following fonts were used.
(Not embedded.)

  1. GothicBBB-Medium
  2. Ryumin-Light

This is Morisawa's paid font.
https://www.morisawa.co.jp/products/fonts/basic-7-pack/

But when I check the fonts installed on my PC with Windows PowerShell
These fonts weren't there.

The "actual font" of Adobe Reader is displayed as "KozGoPr6N-Medium" (Type 1 (CID)).
I understood that this was what I could see.

In other words, the font that was not originally specified in this PDF can be viewed normally in Adobe Reader.
I think that if I could specify the font in PDF.js, I can see it in the same way.

(The PDFs presented are all publicly available Japanese patent publications and cannot be expected to be recreated by the creator.)

@ghost
Copy link
Author

ghost commented May 7, 2021

After this post I confirmed a phenomenon that I could not understand, because it is related to the first query
Please let me use this place.

That is, when this PDF "jpa2021506744.pdf" is displayed in PDF.js, a few minutes after it is displayed (1 to 10 minutes later)
When page turning are repeated, It is a phenomenon that the characters on the page that was displayed without any problem at first are shifted.
page5_before_after

On the contrary, the width of the character that was initially inquired has improved at this time.
page15_before_after

If the font is the reason, why does it shift after it is displayed ?

@Snuffleupagus
Copy link
Collaborator

Is it possible to specify the font in PDF.js?

No, the PDF.js library only renders what's present in the PDF document itself.

It is a phenomenon that the characters on the page that was displayed without any problem at first are shifted.

There's now a patch for this part of the issue, which should hopefully improve things overall.
(However, as already mentioned, we cannot guarantee that non-embedded fonts render correctly.)

@ghost
Copy link
Author

ghost commented May 10, 2021

I haven't confirmed it yet, but thank you for your prompt response.
How can I apply the patch? (I'm sorry I'm not used to it.)

@Snuffleupagus
Copy link
Collaborator

I haven't confirmed it yet

It can be tested with the demo viewer, which always contains the latest changes (that haven't yet reached an official release): https://github.com/mozilla/pdf.js#online-demo

How can I apply the patch?

It's unfortunately quite unlikely that the patch will apply cleanly to PDF.js version 2.7.570 (which is the version you mentioned above), so the easiest solution is probably to just wait until PR #13347 appears in an official release.

@ghost
Copy link
Author

ghost commented May 17, 2021

I confirmed the fix! I really appreciate it. wonderful! Thank you! Thank you! Thank you!
Can you tell us when the official release will be? (PR #13347)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants