Unusable OCR results with OpenCL #837

stweil · 2017-04-23T20:15:27Z

Both 3.05 and latest git master produce unusable results with OpenCL (tested on Linux and macOS). It looks like that problem only occurs with large images.

Disabling OpenCL in ccmain/thresholder.cpp fixes the problem.

The text was updated successfully, but these errors were encountered:

amitdo · 2017-04-24T09:39:04Z

#635
Maybe it's the right time to remove the OpenCL code entirely.

I have a feeling that you (Stefan) will disagree... :)

stweil · 2017-04-24T10:02:34Z

Indeed. :-)

Other projects using LSTM strive to use the computing power of graphic cards.

Should Tesseract be an exception because people are satisfied with the CPU based computation, should we try to fix the known problems of the current OpenCL based implementation and improve the code, or is there another solution how Tesseract can use advanced computing power?

amitdo · 2017-04-24T10:44:36Z

The problems I see with the OpenCL code:
It mostly duplicates parts of Leptonica, which is an external dependency.
It was contributed by someone from AMD. He does not keep maintaining it.
Google does not seem to want to invest time in maintaining it.
The only one that cares and tries to fix bugs is you.

I respect your work, but I still think the benefit here is quite small and we should drop it.

A compromise could be that you will maintain that code as a separate project and we will have an optional dependency on it.

amitdo · 2017-04-24T10:58:22Z

I tried to sell it to Dan but he didn't buy it 😆
#635 (comment)
Please read his answer.

stweil · 2017-04-26T15:01:01Z

I removed a duplicated part of Leptonica in PR #843. Are there more of them? At least I no longer find a comment naming Leptonica in the OpenCL code.

amitdo · 2017-04-26T16:45:35Z

At least some with 'pix' in their names. pixSubtract for example.

jbreiden · 2017-04-26T17:44:43Z

There's parallel set of functions for reading TIFF files. https://github.com/tesseract-ocr/tesseract/blob/master/opencl/openclwrapper.cpp https://github.com/DanBloomberg/leptonica/blob/master/src/tiffio.c OpenclDevice::pixReadTiffCl() and friends pixReadTiff() and friends That whole section is a little weird for a few reasons. 1) TIFF decode is not very expensive, why bother with hardware assist? 2) The OpenCL TIFF code has fallen behind Leptonica proper, enough to be currently disabled. (By the way, I'm probably capable of helping it catch up, but not very motivated.) and most interestingly to me 3) All the actual computation in TIFF decode isn't even done by Leptonica, it is done by libtiff. So the ideal scenario for hardware assist would be something like a "libtiff-turbo" library. (Somewhat similar to how libjepg-turbo has replaced libjpeg in many linux systems). Putting it in Tesseract is really weird, and as a side effect causes a bunch of code duplication. For all these reasons, if we did decide to prune out some OpenCL code, I think the TIFF portion is the best place to start. I'd roughly guess that includes about a dozen methods.

stweil · 2017-04-26T19:54:23Z

I'd roughly guess that includes about a dozen methods.

Good guess. It's a little bit more. I'll send a pull request which reduces the code in opencl to less than 5300 lines.

amitdo · 2017-04-26T21:10:09Z

Ray's DAS 2014 tutorial, slides set 8 has some statistics about the OpenCL code.

jbreiden · 2017-04-26T23:16:07Z

This is the slide in question.

I think this confirms TIFF decode is relatively inexpensive, and is an even smaller overall portion for Tesseract 4.x.

# This is a different TIFF, just playing around. 5.7 MB,  1678x2590, LZW TIFF, 3.2Ghz Intel processor
$ time tifftopnm lzw.tif > /dev/null
tifftopnm: writing PPM file

real	0m0.213s
user	0m0.172s
sys	0m0.008s

stweil · 2017-04-27T06:39:47Z

PR #849 now removes most of the TIFF related code. Maybe I can remove more in the future.

amitdo · 2017-04-27T21:20:51Z

... reduces the code in opencl to less than 5300 lines.

Maybe I can remove more in the future.

Only ~5300 lines left (to remove) ... :-)

jbreiden · 2017-04-27T22:15:38Z

Great job on this. I suspect next step is to remove the #include tiff.h lines and the build dependency on libtiff.

stweil · 2017-04-28T04:59:47Z

It's still used by one of the benchmarks. As soon as I'm sure that this benchmark is not needed, we'll make a large step in Amit's preferred direction (currently 5074 lines left).

amitdo · 2017-04-28T11:48:25Z

http://www.anandtech.com/show/10613/discrete-desktop-gpu-market-trends-q2-2016-amd-grabs-market-share-but-nvidia-remains-on-top

Benchmarks should done on intel's integrated GPU and NVIDIA's GPU.

I read somewhere a claim that the performance of OpenCL with NVIDIA card is significantly worst than with AMD cards.

Another claim is that OpenCL performance on Macs degraded significantly with the last macOS versions. Like NVIDIA, Apple now has an API which is competing with OpenCL.

amitdo · 2017-09-12T11:39:52Z

Both 3.05 and latest git master produce unusable results with OpenCL (tested on Linux and macOS). It looks like that problem only occurs with large images.

Is the issue still exists ?

stweil · 2017-09-12T11:52:51Z

Yes. At least I am not aware that anybody fixed it.

amitdo · 2017-09-12T12:12:29Z

Since you opened the issue, you dropped a lot of opencl code. Maybe the issue was related to that code?

stweil · 2017-09-12T12:21:18Z

No, it is caused by one of the remaining (unchanged) OpenCL code blocks.

amitdo · 2017-09-12T12:27:08Z

An idea: Maybe you want to add an environment variable to disable opencl at runtime, similar to the openmp trick?

stweil · 2017-09-12T12:36:12Z

That environment variable is already there: TESSERACT_OPENCL_DEVICE. Set it to the number of the "native device" (or to an illegal value: less than 1 or greater than the number of OpenCL devices) to disable OpenCL.

amitdo · 2017-09-12T12:59:30Z

Ok :-)

Shreeshrii · 2018-04-29T15:54:20Z

There is a thread in the tesseract-ocr forum regarding offer to rewrite OPENCL code

On Sat, Apr 28, 2018 at 1:19 PM, Janpieter Sollie wrote:
Would it be a problem for you if I rewrite the opencl engine completely, and you people provide me help to link the tesseract kernel -> opencl engine parts?
in attachment, I already have a list of features I'd like to port to openCL. As this uses the GPU in a heavy way, I will implement multi-card support on the host.
Is it a problem for you guys to think of tesseract 5.0 as a milestone?

See https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/uNRDFTavDfc/-2MrTZZKBgAJ for details and https://groups.google.com/group/tesseract-ocr/attach/64a964d2b63fb/openclkernels.cl?part=0.1&authuser=0&view=1 for the attachment.

jpsollie · 2018-04-29T17:51:50Z

This is only the opencl part (not implemented yet, but the way I see it) ...
it will still need a wrapper class (for which I'd like to know whether the license permits if the current code can be adapted) and an interface to fit in in the neural network you currently have.
I will need help to write a C++ class to fit in, as I only have knowledge about C and java, almost no C++.

amitdo · 2018-05-01T20:46:09Z

I can't help with this effort, sorry.

jpsollie · 2018-05-02T14:11:23Z

currently, I made an example of what I could do using opencl:

search for whitelines in the image.
search for whitespace characters in the image.
search for characters
perform initial training.
The current requirements are:
an OpenCL 1.1 card with cl_khr_3d_image_writes extension. I tested it on amdgpu cards, that seems fine. NVidia (pascal) supports openCL 1.2, where this extension is integrated. it does not appear in the extension list, though. Intel CPU drivers is also fine, but no idea about the gpu drivers.
the character limit (whitespace excluded) is limited to 1024 characters
the image that can be passed to the CPU only supports black and white, no grey colors.
a character image that is generated during a training model only supports up to 64 reference pixels. for large fonts, this may be not enough.
opencl 1.1 page about the extension: https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=293
opencl 1.2 page: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#page=307

amitdo · 2018-05-02T14:38:12Z

The lstm net does not get as input individual characters boxes during training.

jpsollie · 2018-05-02T15:59:50Z

but how do you generate the drawimage script then?

jpsollie · 2018-05-02T16:07:16Z

a workaround could also be:
I perform all the steps of the character recognition process on the image, and the reference pixels mapped in the process are adapted as the character reference pixels. On the host, we build a buffer. The GPU compares the different buffers and merges them into an average.

amitdo · 2018-05-02T16:54:43Z

It seems that you want to implement an OCR engine based on traditional pattern recognition methods.
The LSTM based OCR engine works in a different way.

jpsollie · 2018-05-03T05:09:49Z

that's not true: I am adapting the openCL part, if you want this to work as an LSTM object you have to help me adapt the wrapper class. But as said in the forum, I have no experience on LSTM. Therefore:

each method is fully adaptable, eg: if you can provide a custom reference to the reference point builder, it will be much faster & more accurate than using a generic recognition pattern, but if the character is not the one you are looking for, it will probably map a very low amount of pixels.
each step of the engine requires a massive amount of offsets in the image which can be referenced to a character. doing this with 5 objects is useless. you can follow the steps in the engine as described, but that would possibly let you fallback to the old method

stweil · 2018-05-03T07:36:59Z

@jpsollie, maybe the best way to start would be fixing the existing OpenCL code.

amitdo · 2018-05-03T07:37:29Z

I'm not an expert in ML, and I have no knowledge in OpenCL programing.

I think that this effort has little chance to succeed without close cooperation with @theraysmith.
CC: @jbreiden

Anyway, I appreciate your desire to contribute to tesseract to make it faster.

jpsollie · 2018-05-03T08:00:31Z

@stweil : you 're quite late with that, the engine as I designed it (only the opencl code, not the host code) is almost ready. I will write some additional kernels to perform closer interaction, but I will probably post the device code today. Posting the host code (c++ object) will probably take more time, as I have to learn tesseract's LSTM engine and a way to link those 2 together

jpsollie · 2018-05-03T12:18:08Z

as said a few hours ago: hereby the GPU code. If you have any questions about how to use it, feel free to ask.
openclkernels.zip
*edit: there seem to be some syntax errors in the kernels. I will correct them. Anyway, without a well-implemented wrapper class, these kernels are useless.
possible points of improvement:

implement a method to train l vs 1 (which currently is quite unsure)
provide neuron recognition based on height and width of the possible character, not only the position and the neuron offset. it currently is quite impossible to map ff, as it connects the 1st with the 2nd character. moreover, in a square block, you need as much place for height compared to width, which is not the case with most characters.

jpsollie · 2018-05-04T12:00:49Z

question: how does the human brain see that ff is 2 times f and w is not 2 times v? it seems quite impossible :(

jpsollie · 2018-05-10T16:09:01Z

I think I found a solution to this: include connected characters into the character list.
Anyway: as promised, this one is syntactically correct (according to libclc, with opencl features omitted) and makes sure the transfers of CPU->GPU are as little as possible.
so this one:
-does not need the 3D kernel extension
-is more optimized to use local memory
-fully supports 1024 characters
-fully supports images up to 32k*32k (yes, signed short character limit)
-has a method to re-integrate out of scope dots into the character dot system.
-has a better method to evaluate characters compared to their images
openclkernels.zip
todo:
C++ programming. Too bad I still seem to be unable to help :(

stweil · 2018-08-01T20:56:39Z

There was a mismatch between Tesseract's C++ code which allowed 1 or 4 channels for the OpenCL call in ImageThresholder::OtsuThresholdRectToPix and the OpenCL kernel which only supports 4 channels. Pull request #1819 now only uses OpenCL for 4 channels which seems to fix this issue,

amitdo · 2018-08-02T13:43:03Z

I wonder why Tesseract does not use leptonica's own otsu function, or other binarization methods in leptonica.

zdenop · 2018-08-02T15:04:32Z

IMO because leptonica was not used in past by tesseract. In past I made test and tesseract otsu provided different output as leptonica otsu, so I did not replaced it... maybe more test could be done.

Anyway it is question if allow/support more binarization method in tesseract if we suggest users to do preprocessing of images before running tesseract

satishsojitra · 2019-12-18T06:13:59Z

@jpsollie Is there any way to utilize GPU in tesseract?

stweil · 2019-12-18T06:35:09Z

Is there any way to utilize GPU in tesseract?

Currently no. Using GPU to get faster OCR needs more development work.

jpsollie · 2019-12-21T20:26:58Z

Well, you could try to develop a small neural network to decide the probability of the LSTM output, and this may make the output more accurate, but no idea if it will ever be possible to do it on its own. jpsollieOn 18 Dec 2019 07:35, Stefan Weil <[email protected]> wrote: Is there any way to utilize GPU in tesseract? Currently no. Using GPU to get faster OCR needs more development work. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

gregg-ADP · 2022-05-04T18:05:03Z

Hello @stweil. Is there an update on this? We are looking to improve our Tesseract version 4 performance. (Version 5 is not in the repositories yet.)

stweil · 2022-05-04T18:16:40Z

No, and I also don't expect any update this or next year. There are several options how to get Tesseract 5, for example https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel.

amitdo mentioned this issue Dec 20, 2017

quiet is not quiet (using opencl) #1240

Closed

stweil mentioned this issue Aug 1, 2018

Fix ImageThresholder::OtsuThresholdRectToPix for OpenCL #1819

Merged

stweil closed this as completed Aug 1, 2018

amitdo added the OpenCL label May 14, 2020

Balearica mentioned this issue Feb 9, 2024

GPU acceleration? naptha/tesseract.js#885

Closed

Unusable OCR results with OpenCL #837

Unusable OCR results with OpenCL #837

Comments

stweil commented Apr 23, 2017 • edited Loading

amitdo commented Apr 24, 2017 • edited Loading

stweil commented Apr 24, 2017

amitdo commented Apr 24, 2017 • edited Loading

amitdo commented Apr 24, 2017 • edited Loading

stweil commented Apr 26, 2017

amitdo commented Apr 26, 2017

jbreiden commented Apr 26, 2017 via email • edited Loading

stweil commented Apr 26, 2017 • edited Loading

amitdo commented Apr 26, 2017 • edited Loading

jbreiden commented Apr 26, 2017 • edited Loading

stweil commented Apr 27, 2017

amitdo commented Apr 27, 2017 • edited Loading

jbreiden commented Apr 27, 2017

stweil commented Apr 28, 2017

amitdo commented Apr 28, 2017

amitdo commented Sep 12, 2017

stweil commented Sep 12, 2017

amitdo commented Sep 12, 2017

stweil commented Sep 12, 2017

amitdo commented Sep 12, 2017

stweil commented Sep 12, 2017

amitdo commented Sep 12, 2017

Shreeshrii commented Apr 29, 2018

jpsollie commented Apr 29, 2018 • edited Loading

amitdo commented May 1, 2018

jpsollie commented May 2, 2018

amitdo commented May 2, 2018

jpsollie commented May 2, 2018

jpsollie commented May 2, 2018

amitdo commented May 2, 2018 • edited Loading

jpsollie commented May 3, 2018

stweil commented May 3, 2018

amitdo commented May 3, 2018 • edited Loading

jpsollie commented May 3, 2018

jpsollie commented May 3, 2018 • edited Loading

jpsollie commented May 4, 2018

jpsollie commented May 10, 2018

stweil commented Aug 1, 2018

amitdo commented Aug 2, 2018

zdenop commented Aug 2, 2018

satishsojitra commented Dec 18, 2019

stweil commented Dec 18, 2019

jpsollie commented Dec 21, 2019 via email

gregg-ADP commented May 4, 2022

stweil commented May 4, 2022

stweil commented Apr 23, 2017 •

edited

Loading

amitdo commented Apr 24, 2017 •

edited

Loading

amitdo commented Apr 24, 2017 •

edited

Loading

amitdo commented Apr 24, 2017 •

edited

Loading

jbreiden commented Apr 26, 2017 via email •

edited

Loading

stweil commented Apr 26, 2017 •

edited

Loading

amitdo commented Apr 26, 2017 •

edited

Loading

jbreiden commented Apr 26, 2017 •

edited

Loading

amitdo commented Apr 27, 2017 •

edited

Loading

jpsollie commented Apr 29, 2018 •

edited

Loading

amitdo commented May 2, 2018 •

edited

Loading

amitdo commented May 3, 2018 •

edited

Loading

jpsollie commented May 3, 2018 •

edited

Loading