|
| 1 | += Tesseract release notes July 11 2015 - V3.04.01 = |
| 2 | + * Added OSD renderer for psm 0. Works for single page and multi-page images. |
| 3 | + * Improve tesstrain.sh script. |
| 4 | + * Simplify build and run of ScrollView. |
| 5 | + * Improved PDF output for OS X Preview utility. |
| 6 | + * INCOMPATIBLE fix to hOCR line height information - commit 134ebc3. |
| 7 | + * Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD). |
| 8 | + * Enable OpenMP support. |
| 9 | + * Many bug fixes. |
| 10 | + |
| 11 | += Tesseract release notes July 11 2015 - V3.04.00 = |
| 12 | + * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). |
| 13 | + * Tesseract now requires leptonica 1.71 or a higher version. |
| 14 | + * Removed official support for VS 2008. |
| 15 | + * Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid |
| 16 | + * Major updates to training system as a result of extensive testing on 100 languages. |
| 17 | + * New training data for over 100 languages |
| 18 | + * Improved performance with PIC compilation option. |
| 19 | + * Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. |
| 20 | + * Improved font identification. |
| 21 | + * Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. |
| 22 | + * Fixed problems with shifted baselines so recognition can recover from layout analysis errors. |
| 23 | + * Major refactor to improve speed on difficult images, especially when running a heap checker. |
| 24 | + * Moved params from global in page layout to tesseractclass. |
| 25 | + * Improved single column layout analysis. |
| 26 | + * Allow ocr output to multiple formats using tesseract command line executable. |
| 27 | + * Fixed issues with mixed eng+ara scripts. |
| 28 | + * Improved script consistency in numbers. |
| 29 | + * Major refactor of control.cpp to enable line recognition. |
| 30 | + * Added tesstrain.sh - a master training script. |
| 31 | + * Added ability to text2image training tool to just list available fonts. |
| 32 | + * Added ability to text2image to underline words. |
| 33 | + * Improved efficiency of image processing for PDF output. |
| 34 | + * Added parameter description for each parameter listed with 'print-parameters' command line option. |
| 35 | + * Added font info to hOCR output. |
| 36 | + * Enabled streaming input and output of multi-page documents. |
| 37 | + * Many bug fixes. |
| 38 | + |
| 39 | += Tesseract release notes Feb 4 2014 - V3.03(rc1) = |
| 40 | + * Added OpenCL support (experimental). |
| 41 | + * Added new training tool text2image to generate box/tif file pairs from text and truetype fonts. |
| 42 | + * Added support for PDF output with searchable text. |
| 43 | + * Removed entire IMAGE class and all code in image directory. |
| 44 | + * Tesseract executable: support for output to stdout; limited support for one page images from stdin (especially on Windows) |
| 45 | + * Added Renderer to API to allow document-level processing and output of document formats, like hOCR, PDF. |
| 46 | + * Major refactor of word-level recognition, beam search, eliminating dead code. |
| 47 | + * Refactored classifier to make it easier to add new ones. |
| 48 | + * Generalized feature extractor to allow feature extraction from greyscale. |
| 49 | + * Improved sub/superscript treatment. |
| 50 | + * Improved baseline fit. |
| 51 | + * Added set_unicharset_properties to training tools. |
| 52 | + * Many bug fixes. |
| 53 | + * More training source data included. |
| 54 | + |
1 | 55 | = Tesseract release notes Feb 01 2012 - V3.02 =
|
2 | 56 | * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
|
3 | 57 | * Added paragraph detection in layout analysis/post OCR.
|
|
0 commit comments