Skip to content

Add a new config file 'pdftxt' to create PDF and TEXT output at the same time #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Shreeshrii opened this issue Jul 22, 2015 · 3 comments

Comments

@Shreeshrii
Copy link
Collaborator

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-dev/XllxjvK5HtU/C4mebS6lcJoJ

Jeff suggested that users create a myconfig file. I think it will be useful to actually provide the configuration as 'pdftxt' .

tessedit_create_txt 1
tessedit_create_pdf 1

Then make sure that you invoke the command line such that
Tesseract writes to files instead of stdout, e.g.

tesseract myimage.tif myoutput pdftxt

This will read myimage.tif and pdftxt (config file), and produce myoutput.pdf and myoutput.txt

@jimregan
Copy link
Contributor

Not generally useful, IMO - I don't see there being a whole lot of demand for this.

@Shreeshrii
Copy link
Collaborator Author

Zdenko also has same opinion. So you can close the issue.

I'll add to FAQ if that is ok.

  • sent from my phone. excuse the brevity.
    On 22 Jul 2015 19:34, "Jim Regan" [email protected] wrote:

Not generally useful, IMO - I don't see there being a whole lot of demand
for this.


Reply to this email directly or view it on GitHub
#59 (comment)
.

@jimregan
Copy link
Contributor

Sure, the FAQ seems a good place for this information.

zdenop pushed a commit that referenced this issue Feb 5, 2016
Revert fd429c3, 43834da, 05de195.

See #49, #59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
Revert fd429c3, 43834da, 05de195.

See tesseract-ocr#49, tesseract-ocr#59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
Revert fd429c3, 43834da, 05de195.

See tesseract-ocr#49, tesseract-ocr#59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
Revert fd429c3, 43834da, 05de195.

See tesseract-ocr#49, tesseract-ocr#59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
Revert fd429c3, 43834da, 05de195.

See tesseract-ocr#49, tesseract-ocr#59.

The code in this commit solves the issue in a more elegant way, IMHO.

Now you can use:
  * `tesseract eurotext.tif eurotext txt pdf`
  * `tesseract eurotext.tif eurotext txt hocr`
  * `tesseract eurotext.tif eurotext txt hocr pdf`

NOTE:
  With `tesseract eurotext.tif eurotext`
  or `tesseract eurotext.tif eurotext txt`
  the psm will be set to '3', but...
  With `tesseract eurotext.tif eurotext txt pdf`
  or `tesseract eurotext.tif eurotext txt hocr`
  the psm will be set to '1'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants