Skip to content

Commit 383dcf7

Browse files
committed
Document some more config options for tesseract
Clarify also the name(s) of the generated OCR result file(s): Tesseract does not create a file named outbase.txt by default. Fix also a sentence in the language section. Signed-off-by: Stefan Weil <[email protected]>
1 parent e03ee93 commit 383dcf7

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

doc/tesseract.1.asc

+17-4
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@ IN/OUT ARGUMENTS
3434

3535
'outputbase'::
3636
The basename of the output file (to which the appropriate extension
37-
will be appended). By default the output will be named 'outbase.txt'.
37+
will be appended). By default the output will be a text file
38+
with `.txt` added to the basename unless there are one or more
39+
'configfile' options which explicitly specify the desired output.
3840

3941
'stdout'::
4042
Instruction to sent output data to standard output
@@ -88,8 +90,19 @@ OPTIONS
8890
contains a list of variables and their values, one per line, with a
8991
space separating variable from value. Interesting config files
9092
include: +
91-
* hocr - Output in hOCR format instead of as a text file.
92-
* pdf - Output in pdf instead of a text file.
93+
* `hocr` - Output in hOCR format (file extension `.hocr`).
94+
* `pdf` - Output PDF (file extension `.pdf`).
95+
* `tsv` - Output TSV (file extension `.tsv`).
96+
* `txt` - Output plain text (file extension `.txt`).
97+
* `get.images` - Write images.
98+
* `logfile` - Write debug file `tesseract.log`.
99+
* `lstm.train` - Used for LSTM training.
100+
* `makebox` - Output box file.
101+
* `quiet` - Write debug file to /dev/null.
102+
103+
It is possible to select several config files, for example
104+
`tesseract image.png demo hocr pdf txt` will create three output files
105+
`demo.hocr`, `demo.pdf` and `demo.txt` with the OCR results.
93106

94107
*Nota Bene:* The options `-l lang` and `--psm N` must occur
95108
before any 'configfile'.
@@ -122,7 +135,7 @@ LANGUAGES
122135

123136
The currently available traineddata files for tesseract 4.0
124137
for the following languages are in
125-
(in https://github.com/tesseract-ocr/tessdata_fast):
138+
https://github.com/tesseract-ocr/tessdata_fast:
126139

127140
*afr* (Afrikaans),
128141
*amh* (Amharic),

0 commit comments

Comments
 (0)