Skip to content

Commit b231aee

Browse files
committed
tidy tesseract(1) adding missing options
Together with: - fix "C\++" - align executable --print-parameters message
1 parent 6c3d8fa commit b231aee

File tree

5 files changed

+157
-22
lines changed

5 files changed

+157
-22
lines changed

api/tesseractmain.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ void PrintHelpMessage(const char* program) {
169169
" --help-oem Show OCR Engine modes.\n"
170170
" -v, --version Show version information.\n"
171171
" --list-langs List available languages for tesseract engine.\n"
172-
" --print-parameters Print tesseract parameters to stdout.\n";
172+
" --print-parameters Print tesseract parameters.\n";
173173

174174
printf("\n%s", single_options);
175175
}

doc/tesseract.1

+43-8
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
.\" Title: tesseract
33
.\" Author: [see the "AUTHOR" section]
44
.\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/>
5-
.\" Date: 06/28/2015
5+
.\" Date: 03/23/2017
66
.\" Manual: \ \&
77
.\" Source: \ \&
88
.\" Language: English
99
.\"
10-
.TH "TESSERACT" "1" "06/28/2015" "\ \&" "\ \&"
10+
.TH "TESSERACT" "1" "03/23/2017" "\ \&" "\ \&"
1111
.\" -----------------------------------------------------------------
1212
.\" * Define some portability stuff
1313
.\" -----------------------------------------------------------------
@@ -84,7 +84,7 @@ Set value for control parameter\&. Multiple \-c arguments are allowed\&.
8484
The language to use\&. If none is specified, English is assumed\&. Multiple languages may be specified, separated by plus characters\&. Tesseract uses 3\-character ISO 639\-2 language codes\&. (See LANGUAGES)
8585
.RE
8686
.PP
87-
\fI\--psm N\fR
87+
\fI\-\-psm N\fR
8888
.RS 4
8989
Set Tesseract to only run a subset of layout analysis and assume a certain form of image\&. The options for
9090
\fBN\fR
@@ -111,6 +111,26 @@ are:
111111
.\}
112112
.RE
113113
.PP
114+
\fI\-\-oem N\fR
115+
.RS 4
116+
Specify OCR Engine mode\&. The options for
117+
\fBN\fR
118+
are:
119+
.sp
120+
.if n \{\
121+
.RS 4
122+
.\}
123+
.nf
124+
0 = Original Tesseract only\&.
125+
1 = Neural nets LSTM only\&.
126+
2 = Tesseract + LSTM\&.
127+
3 = Default, based on what is available\&.
128+
.fi
129+
.if n \{\
130+
.RE
131+
.\}
132+
.RE
133+
.PP
114134
\fIconfigfile\fR
115135
.RS 4
116136
The name of a config to use\&. A config is a plaintext file which contains a list of variables and their values, one per line, with a space separating variable from value\&. Interesting config files include:
@@ -139,22 +159,37 @@ pdf \- Output in pdf instead of a text file\&.
139159
.RE
140160
.RE
141161
.sp
142-
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\--psm N\fR must occur before any \fIconfigfile\fR\&.
162+
\fBNota Bene:\fR The options \fI\-l lang\fR and \fI\-\-psm N\fR must occur before any \fIconfigfile\fR\&.
143163
.SH "SINGLE OPTIONS"
144164
.PP
145-
\fI\-v\fR
165+
\fI\-h, \-\-help\fR
166+
.RS 4
167+
Show help message\&.
168+
.RE
169+
.PP
170+
\fI\-\-help\-psm\fR
171+
.RS 4
172+
Show page segmentation modes\&.
173+
.RE
174+
.PP
175+
\fI\-\-help\-oem\fR
176+
.RS 4
177+
Show OCR Engine modes\&.
178+
.RE
179+
.PP
180+
\fI\-v, \-\-version\fR
146181
.RS 4
147182
Returns the current version of the tesseract(1) executable\&.
148183
.RE
149184
.PP
150185
\fI\-\-list\-langs\fR
151186
.RS 4
152-
list available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
187+
List available languages for tesseract engine\&. Can be used with \-\-tessdata\-dir\&.
153188
.RE
154189
.PP
155190
\fI\-\-print\-parameters\fR
156191
.RS 4
157-
print tesseract parameters to the stdout\&.
192+
Print tesseract parameters\&.
158193
.RE
159194
.SH "LANGUAGES"
160195
.sp
@@ -220,7 +255,7 @@ user_patterns_suffix user\-patterns
220255
Now, if you pass the word \fIbazaar\fR as a trailing command line parameter to Tesseract, Tesseract will not bother loading the system dictionary nor the dictionary of frequent words and will load and use the eng\&.user\-words and eng\&.user\-patterns files you provided\&. The former is a simple word list, one per line\&. The format of the latter is documented in dict/trie\&.h on read_pattern_list()\&.
221256
.SH "HISTORY"
222257
.sp
223-
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C\e++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
258+
The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998\&. A lot of the code was written in C, and then some more was written in C++\&. The C++ code makes heavy use of a list system using macros\&. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug\&.
224259
.sp
225260
Version 2\&.00 brought Unicode (UTF\-8) support, six languages, and the ability to train Tesseract\&.
226261
.sp

doc/tesseract.1.asc

+21-4
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,14 @@ OPTIONS
7070
9 = Treat the image as a single word in a circle.
7171
10 = Treat the image as a single character.
7272

73+
'--oem N'::
74+
Specify OCR Engine mode. The options for *N* are:
75+
76+
0 = Original Tesseract only.
77+
1 = Neural nets LSTM only.
78+
2 = Tesseract + LSTM.
79+
3 = Default, based on what is available.
80+
7381
'configfile'::
7482
The name of a config to use. A config is a plaintext file which
7583
contains a list of variables and their values, one per line, with a
@@ -84,14 +92,23 @@ before any 'configfile'.
8492

8593
SINGLE OPTIONS
8694
--------------
87-
'-v'::
95+
'-h, --help'::
96+
Show help message.
97+
98+
'--help-psm'::
99+
Show page segmentation modes.
100+
101+
'--help-oem'::
102+
Show OCR Engine modes.
103+
104+
'-v, --version'::
88105
Returns the current version of the tesseract(1) executable.
89106
90107
'--list-langs'::
91-
list available languages for tesseract engine. Can be used with --tessdata-dir.
108+
List available languages for tesseract engine. Can be used with --tessdata-dir.
92109
93110
'--print-parameters'::
94-
print tesseract parameters to the stdout.
111+
Print tesseract parameters.
95112
96113
97114
@@ -268,7 +285,7 @@ The engine was developed at Hewlett Packard Laboratories Bristol and at
268285
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
269286
changes made in 1996 to port to Windows, and some C\+\+izing in 1998. A
270287
lot of the code was written in C, and then some more was written in C\+\+.
271-
The C\+\+ code makes heavy use of a list system using macros. This predates
288+
The C++ code makes heavy use of a list system using macros. This predates
272289
stl, was portable before stl, and is more efficient than stl lists, but has
273290
the big negative that if you do get a segmentation violation, it is hard to
274291
debug.

doc/tesseract.1.html

+44-5
Original file line numberDiff line numberDiff line change
@@ -870,6 +870,21 @@ <h2 id="_options">OPTIONS</h2>
870870
</div></div>
871871
</dd>
872872
<dt class="hdlist1">
873+
<em>--oem N</em>
874+
</dt>
875+
<dd>
876+
<p>
877+
Specify OCR Engine mode. The options for <strong>N</strong> are:
878+
</p>
879+
<div class="literalblock">
880+
<div class="content">
881+
<pre><code>0 = Original Tesseract only.
882+
1 = Neural nets LSTM only.
883+
2 = Tesseract + LSTM.
884+
3 = Default, based on what is available.</code></pre>
885+
</div></div>
886+
</dd>
887+
<dt class="hdlist1">
873888
<em>configfile</em>
874889
</dt>
875890
<dd>
@@ -902,7 +917,31 @@ <h2 id="_single_options">SINGLE OPTIONS</h2>
902917
<div class="sectionbody">
903918
<div class="dlist"><dl>
904919
<dt class="hdlist1">
905-
<em>-v</em>
920+
<em>-h, --help</em>
921+
</dt>
922+
<dd>
923+
<p>
924+
Show help message.
925+
</p>
926+
</dd>
927+
<dt class="hdlist1">
928+
<em>--help-psm</em>
929+
</dt>
930+
<dd>
931+
<p>
932+
Show page segmentation modes.
933+
</p>
934+
</dd>
935+
<dt class="hdlist1">
936+
<em>--help-oem</em>
937+
</dt>
938+
<dd>
939+
<p>
940+
Show OCR Engine modes.
941+
</p>
942+
</dd>
943+
<dt class="hdlist1">
944+
<em>-v, --version</em>
906945
</dt>
907946
<dd>
908947
<p>
@@ -914,15 +953,15 @@ <h2 id="_single_options">SINGLE OPTIONS</h2>
914953
</dt>
915954
<dd>
916955
<p>
917-
list available languages for tesseract engine. Can be used with --tessdata-dir.
956+
List available languages for tesseract engine. Can be used with --tessdata-dir.
918957
</p>
919958
</dd>
920959
<dt class="hdlist1">
921960
<em>--print-parameters</em>
922961
</dt>
923962
<dd>
924963
<p>
925-
print tesseract parameters to the stdout.
964+
Print tesseract parameters.
926965
</p>
927966
</dd>
928967
</dl></div>
@@ -1099,7 +1138,7 @@ <h2 id="_history">HISTORY</h2>
10991138
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
11001139
changes made in 1996 to port to Windows, and some C++izing in 1998. A
11011140
lot of the code was written in C, and then some more was written in C++.
1102-
The C\++ code makes heavy use of a list system using macros. This predates
1141+
The C++ code makes heavy use of a list system using macros. This predates
11031142
stl, was portable before stl, and is more efficient than stl lists, but has
11041143
the big negative that if you do get a segmentation violation, it is hard to
11051144
debug.</p></div>
@@ -1156,7 +1195,7 @@ <h2 id="_copying">COPYING</h2>
11561195
<div id="footnotes"><hr /></div>
11571196
<div id="footer">
11581197
<div id="footer-text">
1159-
Last updated 2015-06-28 22:23:47 CEST
1198+
Last updated 2017-03-23 19:56:19 GMT
11601199
</div>
11611200
</div>
11621201
</body>

doc/tesseract.1.xml

+48-4
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,20 @@ at Google since then.</simpara>
152152
</varlistentry>
153153
<varlistentry>
154154
<term>
155+
<emphasis>--oem N</emphasis>
156+
</term>
157+
<listitem>
158+
<simpara>
159+
Specify OCR Engine mode. The options for <emphasis role="strong">N</emphasis> are:
160+
</simpara>
161+
<literallayout class="monospaced">0 = Original Tesseract only.
162+
1 = Neural nets LSTM only.
163+
2 = Tesseract + LSTM.
164+
3 = Default, based on what is available.</literallayout>
165+
</listitem>
166+
</varlistentry>
167+
<varlistentry>
168+
<term>
155169
<emphasis>configfile</emphasis>
156170
</term>
157171
<listitem>
@@ -184,7 +198,37 @@ before any <emphasis>configfile</emphasis>.</simpara>
184198
<variablelist>
185199
<varlistentry>
186200
<term>
187-
<emphasis>-v</emphasis>
201+
<emphasis>-h, --help</emphasis>
202+
</term>
203+
<listitem>
204+
<simpara>
205+
Show help message.
206+
</simpara>
207+
</listitem>
208+
</varlistentry>
209+
<varlistentry>
210+
<term>
211+
<emphasis>--help-psm</emphasis>
212+
</term>
213+
<listitem>
214+
<simpara>
215+
Show page segmentation modes.
216+
</simpara>
217+
</listitem>
218+
</varlistentry>
219+
<varlistentry>
220+
<term>
221+
<emphasis>--help-oem</emphasis>
222+
</term>
223+
<listitem>
224+
<simpara>
225+
Show OCR Engine modes.
226+
</simpara>
227+
</listitem>
228+
</varlistentry>
229+
<varlistentry>
230+
<term>
231+
<emphasis>-v, --version</emphasis>
188232
</term>
189233
<listitem>
190234
<simpara>
@@ -198,7 +242,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
198242
</term>
199243
<listitem>
200244
<simpara>
201-
list available languages for tesseract engine. Can be used with --tessdata-dir.
245+
List available languages for tesseract engine. Can be used with --tessdata-dir.
202246
</simpara>
203247
</listitem>
204248
</varlistentry>
@@ -208,7 +252,7 @@ before any <emphasis>configfile</emphasis>.</simpara>
208252
</term>
209253
<listitem>
210254
<simpara>
211-
print tesseract parameters to the stdout.
255+
Print tesseract parameters.
212256
</simpara>
213257
</listitem>
214258
</varlistentry>
@@ -377,7 +421,7 @@ on read_pattern_list().</simpara>
377421
Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more
378422
changes made in 1996 to port to Windows, and some C++izing in 1998. A
379423
lot of the code was written in C, and then some more was written in C++.
380-
The C\++ code makes heavy use of a list system using macros. This predates
424+
The C++ code makes heavy use of a list system using macros. This predates
381425
stl, was portable before stl, and is more efficient than stl lists, but has
382426
the big negative that if you do get a segmentation violation, it is hard to
383427
debug.</simpara>

0 commit comments

Comments
 (0)