Skip to content

Unusable OCR results with OpenCL #837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stweil opened this issue Apr 23, 2017 · 45 comments
Closed

Unusable OCR results with OpenCL #837

stweil opened this issue Apr 23, 2017 · 45 comments
Labels

Comments

@stweil
Copy link
Member

stweil commented Apr 23, 2017

Both 3.05 and latest git master produce unusable results with OpenCL (tested on Linux and macOS). It looks like that problem only occurs with large images.

Disabling OpenCL in ccmain/thresholder.cpp fixes the problem.

@amitdo
Copy link
Collaborator

amitdo commented Apr 24, 2017

#635
Maybe it's the right time to remove the OpenCL code entirely.

I have a feeling that you (Stefan) will disagree... :)

@stweil
Copy link
Member Author

stweil commented Apr 24, 2017

Indeed. :-)

Other projects using LSTM strive to use the computing power of graphic cards.

Should Tesseract be an exception because people are satisfied with the CPU based computation, should we try to fix the known problems of the current OpenCL based implementation and improve the code, or is there another solution how Tesseract can use advanced computing power?

@amitdo
Copy link
Collaborator

amitdo commented Apr 24, 2017

The problems I see with the OpenCL code:
It mostly duplicates parts of Leptonica, which is an external dependency.
It was contributed by someone from AMD. He does not keep maintaining it.
Google does not seem to want to invest time in maintaining it.
The only one that cares and tries to fix bugs is you.

I respect your work, but I still think the benefit here is quite small and we should drop it.

A compromise could be that you will maintain that code as a separate project and we will have an optional dependency on it.

@amitdo
Copy link
Collaborator

amitdo commented Apr 24, 2017

I tried to sell it to Dan but he didn't buy it 😆
#635 (comment)
Please read his answer.

@stweil
Copy link
Member Author

stweil commented Apr 26, 2017

I removed a duplicated part of Leptonica in PR #843. Are there more of them? At least I no longer find a comment naming Leptonica in the OpenCL code.

@amitdo
Copy link
Collaborator

amitdo commented Apr 26, 2017

At least some with 'pix' in their names. pixSubtract for example.

@jbreiden
Copy link
Contributor

jbreiden commented Apr 26, 2017 via email

@stweil
Copy link
Member Author

stweil commented Apr 26, 2017

I'd roughly guess that includes about a dozen methods.

Good guess. It's a little bit more. I'll send a pull request which reduces the code in opencl to less than 5300 lines.

@amitdo
Copy link
Collaborator

amitdo commented Apr 26, 2017

Ray's DAS 2014 tutorial, slides set 8 has some statistics about the OpenCL code.

@jbreiden
Copy link
Contributor

jbreiden commented Apr 26, 2017

This is the slide in question.

slide

I think this confirms TIFF decode is relatively inexpensive, and is an even smaller overall portion for Tesseract 4.x.

# This is a different TIFF, just playing around. 5.7 MB,  1678x2590, LZW TIFF, 3.2Ghz Intel processor
$ time tifftopnm lzw.tif > /dev/null
tifftopnm: writing PPM file

real	0m0.213s
user	0m0.172s
sys	0m0.008s

@stweil
Copy link
Member Author

stweil commented Apr 27, 2017

PR #849 now removes most of the TIFF related code. Maybe I can remove more in the future.

@amitdo
Copy link
Collaborator

amitdo commented Apr 27, 2017

... reduces the code in opencl to less than 5300 lines.

Maybe I can remove more in the future.

Only ~5300 lines left (to remove) ... :-)

@jbreiden
Copy link
Contributor

Great job on this. I suspect next step is to remove the #include tiff.h lines and the build dependency on libtiff.

@stweil
Copy link
Member Author

stweil commented Apr 28, 2017

It's still used by one of the benchmarks. As soon as I'm sure that this benchmark is not needed, we'll make a large step in Amit's preferred direction (currently 5074 lines left).

@amitdo
Copy link
Collaborator

amitdo commented Apr 28, 2017

http://www.anandtech.com/show/10613/discrete-desktop-gpu-market-trends-q2-2016-amd-grabs-market-share-but-nvidia-remains-on-top

Benchmarks should done on intel's integrated GPU and NVIDIA's GPU.

I read somewhere a claim that the performance of OpenCL with NVIDIA card is significantly worst than with AMD cards.

Another claim is that OpenCL performance on Macs degraded significantly with the last macOS versions. Like NVIDIA, Apple now has an API which is competing with OpenCL.

@amitdo
Copy link
Collaborator

amitdo commented Sep 12, 2017

Both 3.05 and latest git master produce unusable results with OpenCL (tested on Linux and macOS). It looks like that problem only occurs with large images.

Is the issue still exists ?

@stweil
Copy link
Member Author

stweil commented Sep 12, 2017

Yes. At least I am not aware that anybody fixed it.

@amitdo
Copy link
Collaborator

amitdo commented Sep 12, 2017

Since you opened the issue, you dropped a lot of opencl code. Maybe the issue was related to that code?

@stweil
Copy link
Member Author

stweil commented Sep 12, 2017

No, it is caused by one of the remaining (unchanged) OpenCL code blocks.

@amitdo
Copy link
Collaborator

amitdo commented Sep 12, 2017

An idea: Maybe you want to add an environment variable to disable opencl at runtime, similar to the openmp trick?

@stweil
Copy link
Member Author

stweil commented Sep 12, 2017

That environment variable is already there: TESSERACT_OPENCL_DEVICE. Set it to the number of the "native device" (or to an illegal value: less than 1 or greater than the number of OpenCL devices) to disable OpenCL.

@amitdo
Copy link
Collaborator

amitdo commented Sep 12, 2017

Ok :-)

@Shreeshrii
Copy link
Collaborator

There is a thread in the tesseract-ocr forum regarding offer to rewrite OPENCL code

On Sat, Apr 28, 2018 at 1:19 PM, Janpieter Sollie wrote:
Would it be a problem for you if I rewrite the opencl engine completely, and you people provide me help to link the tesseract kernel -> opencl engine parts?
in attachment, I already have a list of features I'd like to port to openCL. As this uses the GPU in a heavy way, I will implement multi-card support on the host.
Is it a problem for you guys to think of tesseract 5.0 as a milestone?

See https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/uNRDFTavDfc/-2MrTZZKBgAJ for details and https://groups.google.com/group/tesseract-ocr/attach/64a964d2b63fb/openclkernels.cl?part=0.1&authuser=0&view=1 for the attachment.

@jpsollie
Copy link

jpsollie commented Apr 29, 2018

This is only the opencl part (not implemented yet, but the way I see it) ...
it will still need a wrapper class (for which I'd like to know whether the license permits if the current code can be adapted) and an interface to fit in in the neural network you currently have.
I will need help to write a C++ class to fit in, as I only have knowledge about C and java, almost no C++.

@amitdo
Copy link
Collaborator

amitdo commented May 1, 2018

I can't help with this effort, sorry.

@jpsollie
Copy link

jpsollie commented May 2, 2018

currently, I made an example of what I could do using opencl:

  • search for whitelines in the image.
  • search for whitespace characters in the image.
  • search for characters
  • perform initial training.
    The current requirements are:
  • an OpenCL 1.1 card with cl_khr_3d_image_writes extension. I tested it on amdgpu cards, that seems fine. NVidia (pascal) supports openCL 1.2, where this extension is integrated. it does not appear in the extension list, though. Intel CPU drivers is also fine, but no idea about the gpu drivers.
  • the character limit (whitespace excluded) is limited to 1024 characters
  • the image that can be passed to the CPU only supports black and white, no grey colors.
  • a character image that is generated during a training model only supports up to 64 reference pixels. for large fonts, this may be not enough.
    opencl 1.1 page about the extension: https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf#page=293
    opencl 1.2 page: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#page=307

@amitdo
Copy link
Collaborator

amitdo commented May 2, 2018

The lstm net does not get as input individual characters boxes during training.

@jpsollie
Copy link

jpsollie commented May 2, 2018

but how do you generate the drawimage script then?

@jpsollie
Copy link

jpsollie commented May 2, 2018

a workaround could also be:
I perform all the steps of the character recognition process on the image, and the reference pixels mapped in the process are adapted as the character reference pixels. On the host, we build a buffer. The GPU compares the different buffers and merges them into an average.

@amitdo
Copy link
Collaborator

amitdo commented May 2, 2018

It seems that you want to implement an OCR engine based on traditional pattern recognition methods.
The LSTM based OCR engine works in a different way.

@jpsollie
Copy link

jpsollie commented May 3, 2018

that's not true: I am adapting the openCL part, if you want this to work as an LSTM object you have to help me adapt the wrapper class. But as said in the forum, I have no experience on LSTM. Therefore:

  • each method is fully adaptable, eg: if you can provide a custom reference to the reference point builder, it will be much faster & more accurate than using a generic recognition pattern, but if the character is not the one you are looking for, it will probably map a very low amount of pixels.
  • each step of the engine requires a massive amount of offsets in the image which can be referenced to a character. doing this with 5 objects is useless. you can follow the steps in the engine as described, but that would possibly let you fallback to the old method

@stweil
Copy link
Member Author

stweil commented May 3, 2018

@jpsollie, maybe the best way to start would be fixing the existing OpenCL code.

@amitdo
Copy link
Collaborator

amitdo commented May 3, 2018

I'm not an expert in ML, and I have no knowledge in OpenCL programing.

I think that this effort has little chance to succeed without close cooperation with @theraysmith.
CC: @jbreiden

Anyway, I appreciate your desire to contribute to tesseract to make it faster.

@jpsollie
Copy link

jpsollie commented May 3, 2018

@stweil : you 're quite late with that, the engine as I designed it (only the opencl code, not the host code) is almost ready. I will write some additional kernels to perform closer interaction, but I will probably post the device code today. Posting the host code (c++ object) will probably take more time, as I have to learn tesseract's LSTM engine and a way to link those 2 together

@jpsollie
Copy link

jpsollie commented May 3, 2018

as said a few hours ago: hereby the GPU code. If you have any questions about how to use it, feel free to ask.
openclkernels.zip
*edit: there seem to be some syntax errors in the kernels. I will correct them. Anyway, without a well-implemented wrapper class, these kernels are useless.
possible points of improvement:

  • implement a method to train l vs 1 (which currently is quite unsure)
  • provide neuron recognition based on height and width of the possible character, not only the position and the neuron offset. it currently is quite impossible to map ff, as it connects the 1st with the 2nd character. moreover, in a square block, you need as much place for height compared to width, which is not the case with most characters.

@jpsollie
Copy link

jpsollie commented May 4, 2018

question: how does the human brain see that ff is 2 times f and w is not 2 times v? it seems quite impossible :(

@jpsollie
Copy link

I think I found a solution to this: include connected characters into the character list.
Anyway: as promised, this one is syntactically correct (according to libclc, with opencl features omitted) and makes sure the transfers of CPU->GPU are as little as possible.
so this one:
-does not need the 3D kernel extension
-is more optimized to use local memory
-fully supports 1024 characters
-fully supports images up to 32k*32k (yes, signed short character limit)
-has a method to re-integrate out of scope dots into the character dot system.
-has a better method to evaluate characters compared to their images
openclkernels.zip
todo:
C++ programming. Too bad I still seem to be unable to help :(

@stweil
Copy link
Member Author

stweil commented Aug 1, 2018

There was a mismatch between Tesseract's C++ code which allowed 1 or 4 channels for the OpenCL call in ImageThresholder::OtsuThresholdRectToPix and the OpenCL kernel which only supports 4 channels. Pull request #1819 now only uses OpenCL for 4 channels which seems to fix this issue,

@stweil stweil closed this as completed Aug 1, 2018
@amitdo
Copy link
Collaborator

amitdo commented Aug 2, 2018

I wonder why Tesseract does not use leptonica's own otsu function, or other binarization methods in leptonica.

@zdenop
Copy link
Contributor

zdenop commented Aug 2, 2018

IMO because leptonica was not used in past by tesseract. In past I made test and tesseract otsu provided different output as leptonica otsu, so I did not replaced it... maybe more test could be done.

Anyway it is question if allow/support more binarization method in tesseract if we suggest users to do preprocessing of images before running tesseract

@satishsojitra
Copy link

@jpsollie Is there any way to utilize GPU in tesseract?

@stweil
Copy link
Member Author

stweil commented Dec 18, 2019

Is there any way to utilize GPU in tesseract?

Currently no. Using GPU to get faster OCR needs more development work.

@jpsollie
Copy link

jpsollie commented Dec 21, 2019 via email

@amitdo amitdo added the OpenCL label May 14, 2020
@gregg-ADP
Copy link

Hello @stweil. Is there an update on this? We are looking to improve our Tesseract version 4 performance. (Version 5 is not in the repositories yet.)

@stweil
Copy link
Member Author

stweil commented May 4, 2022

No, and I also don't expect any update this or next year. There are several options how to get Tesseract 5, for example https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants