-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Add a new config file 'pdftxt' to create PDF and TEXT output at the same time #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not generally useful, IMO - I don't see there being a whole lot of demand for this. |
Zdenko also has same opinion. So you can close the issue. I'll add to FAQ if that is ok.
|
Sure, the FAQ seems a good place for this information. |
zdenop
pushed a commit
that referenced
this issue
Feb 5, 2016
Revert fd429c3, 43834da, 05de195. See #49, #59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.
zvezdochiot
pushed a commit
to ImageProcessing-ElectronicPublications/tesseract
that referenced
this issue
Mar 28, 2021
Revert fd429c3, 43834da, 05de195. See tesseract-ocr#49, tesseract-ocr#59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.
zvezdochiot
pushed a commit
to ImageProcessing-ElectronicPublications/tesseract
that referenced
this issue
Mar 28, 2021
Revert fd429c3, 43834da, 05de195. See tesseract-ocr#49, tesseract-ocr#59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.
zvezdochiot
pushed a commit
to ImageProcessing-ElectronicPublications/tesseract
that referenced
this issue
Mar 28, 2021
Revert fd429c3, 43834da, 05de195. See tesseract-ocr#49, tesseract-ocr#59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.
zvezdochiot
pushed a commit
to ImageProcessing-ElectronicPublications/tesseract
that referenced
this issue
Mar 28, 2021
Revert fd429c3, 43834da, 05de195. See tesseract-ocr#49, tesseract-ocr#59. The code in this commit solves the issue in a more elegant way, IMHO. Now you can use: * `tesseract eurotext.tif eurotext txt pdf` * `tesseract eurotext.tif eurotext txt hocr` * `tesseract eurotext.tif eurotext txt hocr pdf` NOTE: With `tesseract eurotext.tif eurotext` or `tesseract eurotext.tif eurotext txt` the psm will be set to '3', but... With `tesseract eurotext.tif eurotext txt pdf` or `tesseract eurotext.tif eurotext txt hocr` the psm will be set to '1'.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-dev/XllxjvK5HtU/C4mebS6lcJoJ
Jeff suggested that users create a myconfig file. I think it will be useful to actually provide the configuration as 'pdftxt' .
tessedit_create_txt 1
tessedit_create_pdf 1
Then make sure that you invoke the command line such that
Tesseract writes to files instead of stdout, e.g.
This will read myimage.tif and pdftxt (config file), and produce myoutput.pdf and myoutput.txt
The text was updated successfully, but these errors were encountered: