|
1 | 1 | # Tesseract OCR
|
2 | 2 |
|
3 |
| -For the latest online version of the README.md see: |
4 |
| - |
5 |
| -https://github.com/tesseract-ocr/tesseract/blob/master/README.md |
6 |
| - |
7 |
| -### Build |
8 |
| -[](https://travis-ci.org/tesseract-ocr/tesseract) |
9 |
| -[](https://ci.appveyor.com/project/zdenop/tesseract/) |
| 3 | +**Travis** |
| 4 | +[](https://travis-ci.org/tesseract-ocr/tesseract) |
| 5 | +**Appveyor** |
| 6 | +[](https://ci.appveyor.com/project/zdenop/tesseract/) |
10 | 7 |
|
11 |
| -### Other |
| 8 | +**Other** |
12 | 9 | [](https://scan.coverity.com/projects/tesseract-ocr)
|
13 |
| -[](https://insight.io/github.com/tesseract-ocr/tesseract) |
| 10 | +[](https://insight.io/github.com/tesseract-ocr/tesseract) |
14 | 11 |
|
15 |
| -# About |
| 12 | +## About |
16 | 13 |
|
17 |
| -This package contains an OCR engine - `libtesseract` and a command line program - `tesseract`. |
| 14 | +This package contains an **OCR engine** - `libtesseract` and a **command line program** - `tesseract`. |
18 | 15 |
|
19 | 16 | The lead developer is Ray Smith. The maintainer is Zdenko Podobny.
|
20 | 17 | For a list of contributors see [AUTHORS](https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS)
|
21 | 18 | and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/graphs/contributors).
|
22 | 19 |
|
23 |
| -Tesseract has unicode (UTF-8) support, and can recognize more than 100 |
24 |
| -languages "out of the box". It can be trained to recognize other languages. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. |
| 20 | +Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box". |
25 | 21 |
|
26 |
| -Tesseract supports various output formats: plain-text, hocr(html), pdf. |
| 22 | +Tesseract supports **various output formats**: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf. |
27 | 23 |
|
28 |
| -This project does not include a GUI application. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. |
| 24 | +You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract. |
29 | 25 |
|
30 |
| -You should note that in many cases, in order to get better OCR results, you'll need to [improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image you are giving Tesseract. |
| 26 | +This project **does not include a GUI application**. If you need one, please see the [3rdParty](https://github.com/tesseract-ocr/tesseract/wiki/User-Projects-%E2%80%93-3rdParty) wiki page. |
31 | 27 |
|
32 |
| -The latest stable version is 3.05.00, released in February 2017. |
| 28 | +Tesseract **can be trained to recognize other languages**. See [Tesseract Training](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) for more information. |
33 | 29 |
|
34 |
| -# Brief history |
| 30 | +## Brief history |
35 | 31 |
|
36 | 32 | Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and
|
37 | 33 | at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some
|
38 | 34 | more changes made in 1996 to port to Windows, and some C++izing in 1998.
|
39 |
| - |
40 | 35 | In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
|
41 | 36 |
|
42 |
| -[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes) |
| 37 | +The latest stable version is **[3.05.00](https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00)**, released in February 2017. Source code is available from [3.05 branch on github](https://github.com/tesseract-ocr/tesseract/tree/3.05). 3.05.01 bug-fix release is expected in May/June 2017. |
43 | 38 |
|
44 |
| -# For developers |
| 39 | +Source code for the new **[LSTM based 4.00.00alpha version](https://github.com/tesseract-ocr/tesseract)** is available from the master branch on github. Please note this branch is under active development. |
45 | 40 |
|
46 |
| -Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. |
| 41 | +See **[Release Notes](https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes)** and **[Change Log](https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog)** for more details of the releases. |
47 | 42 |
|
48 |
| -Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](http://tesseract-ocr.github.io/). |
49 |
| - |
50 |
| -# License |
51 |
| - |
52 |
| - The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); |
53 |
| - you may not use this file except in compliance with the License. |
54 |
| - You may obtain a copy of the License at |
55 |
| - |
56 |
| - http://www.apache.org/licenses/LICENSE-2.0 |
57 |
| - |
58 |
| - Unless required by applicable law or agreed to in writing, software |
59 |
| - distributed under the License is distributed on an "AS IS" BASIS, |
60 |
| - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
61 |
| - See the License for the specific language governing permissions and |
62 |
| - limitations under the License. |
63 |
| - |
64 |
| -**NOTE**: This software depends on other packages that may be licensed under different open source licenses. |
65 |
| - |
66 |
| -# Installing Tesseract |
| 43 | +## Installing Tesseract |
67 | 44 |
|
68 | 45 | You can either [Install Tesseract via pre-built binary package](https://github.com/tesseract-ocr/tesseract/wiki) or [build it from source](https://github.com/tesseract-ocr/tesseract/wiki/Compiling).
|
69 | 46 |
|
70 |
| -## Supported Compilers |
| 47 | +Supported Compilers are: |
71 | 48 |
|
72 | 49 | * GCC 4.8 and above
|
73 | 50 | * Clang 3.4 and above
|
74 | 51 | * MSVC 2015, 2017
|
75 | 52 |
|
76 | 53 | Other compilers might work, but are not officially supported.
|
77 | 54 |
|
78 |
| -# Running Tesseract |
| 55 | +## Running Tesseract |
79 | 56 |
|
80 |
| -Basic command line usage: |
| 57 | +Basic **[command line usage](https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage)**: |
81 | 58 |
|
82 |
| - tesseract imagename outputbase [-l lang] [--psm pagesegmode] [configfiles...] |
| 59 | + tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] |
83 | 60 |
|
84 | 61 | For more information about the various command line options use `tesseract --help` or `man tesseract`.
|
85 | 62 |
|
86 |
| -# Support |
| 63 | +## For developers |
| 64 | + |
| 65 | +Developers can use `libtesseract` [C](https://github.com/tesseract-ocr/tesseract/blob/master/api/capi.h) or [C++](https://github.com/tesseract-ocr/tesseract/blob/master/api/baseapi.h) API to build their own application. If you need bindings to `libtesseract` for other programming languages, please see the [wrapper](https://github.com/tesseract-ocr/tesseract/wiki/AddOns#tesseract-wrappers) section on AddOns wiki page. |
| 66 | + |
| 67 | +Documentation of Tesseract generated from source code by doxygen can be found on [tesseract-ocr.github.io](http://tesseract-ocr.github.io/). |
| 68 | + |
| 69 | +## Support |
| 70 | + |
| 71 | +First read the [Wiki](https://github.com/tesseract-ocr/tesseract/wiki), particularly the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) to see if your problem is addressed there. If not, search the [Tesseract user forum](https://groups.google.com/d/forum/tesseract-ocr), the [Tesseract developer forum](https://groups.google.com/d/forum/tesseract-dev) and [past issues](https://github.com/tesseract-ocr/tesseract/issues), and if you still can't find what you need, ask for support in the mailing-lists. |
87 | 72 |
|
88 | 73 | Mailing-lists:
|
89 | 74 | * [tesseract-ocr](https://groups.google.com/d/forum/tesseract-ocr) - For tesseract users.
|
90 | 75 | * [tesseract-dev](https://groups.google.com/d/forum/tesseract-dev) - For tesseract developers.
|
91 | 76 |
|
92 |
| -Please read the [FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) before asking any question in the mailing-list or reporting an issue. |
| 77 | +Please report an issue only for a **bug**, not for asking questions. |
| 78 | + |
| 79 | +## License |
| 80 | + |
| 81 | + The code in this repository is licensed under the Apache License, Version 2.0 (the "License"); |
| 82 | + you may not use this file except in compliance with the License. |
| 83 | + You may obtain a copy of the License at |
| 84 | + |
| 85 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 86 | + |
| 87 | + Unless required by applicable law or agreed to in writing, software |
| 88 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 89 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 90 | + See the License for the specific language governing permissions and |
| 91 | + limitations under the License. |
| 92 | + |
| 93 | +**NOTE**: This software depends on other packages that may be licensed under different open source licenses. |
| 94 | + |
| 95 | +## Latest Version of README |
| 96 | + |
| 97 | +For the latest online version of the README.md see: |
| 98 | + |
| 99 | +https://github.com/tesseract-ocr/tesseract/blob/master/README.md |
| 100 | + |
0 commit comments