-
Notifications
You must be signed in to change notification settings - Fork 9.8k
jpg input files result in much bigger pdf #1961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This looks like a regression. @zdenop, maybe we can try to fix it for 4.0.0, perhaps by writing the image in JPEG 2000 format (like at least one commercial OCR software does) if that is supported by Leptonica. That could reduce the size of the PDF a lot. |
Can you please provide you image for testing? |
Thank for report. Should be fixed - please check. |
Yep it work . |
* 'master' of https://github.com/tesseract-ocr/tesseract: Remove code for _MSC_VER < 1900 keep API compatibility with #1265 Update googletest submodule to release v1.8.1 Update test submodule Always use isascii() with isspace() Avoid crash with --psm 0 and LSTM traineddata SVPaint: Remove empty block Classify: Don't hide debug parameter UNICHARMAP: Remove comparison which is always false svpaint: Change a variable from global to local pgedit: remove unused declaration of display_bln_lines Plumbing: Remove comparison which is always false Release candidate 2 use pdf L_FLATE_ENCODE only for png input; fixes #1961
Environment
leptonica-1.77.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.7.0beta84 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 1.0.0 : libopenjp2 2.3.0
Found AVX
Found SSE
Current Behavior:
OCR jpeg files lead to bigger output pdf file .
Inputfilesize = 642K
pdf = 2,5M
Expected Behavior:
the pdf size should not much bigger then the input
pdf = 645K
Suggested Fix:
not a fix , but this commit (5fe1390) introduce the problem . Exactly src/api/pdfrenderer.cpp line 719.
The text was updated successfully, but these errors were encountered: