-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Unusable OCR results with OpenCL #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#635 I have a feeling that you (Stefan) will disagree... :) |
Indeed. :-) Other projects using LSTM strive to use the computing power of graphic cards. Should Tesseract be an exception because people are satisfied with the CPU based computation, should we try to fix the known problems of the current OpenCL based implementation and improve the code, or is there another solution how Tesseract can use advanced computing power? |
The problems I see with the OpenCL code: I respect your work, but I still think the benefit here is quite small and we should drop it. A compromise could be that you will maintain that code as a separate project and we will have an optional dependency on it. |
I tried to sell it to Dan but he didn't buy it 😆 |
I removed a duplicated part of Leptonica in PR #843. Are there more of them? At least I no longer find a comment naming Leptonica in the OpenCL code. |
At least some with 'pix' in their names. pixSubtract for example. |
There's parallel set of functions for reading TIFF files.
https://github.com/tesseract-ocr/tesseract/blob/master/opencl/openclwrapper.cpp
https://github.com/DanBloomberg/leptonica/blob/master/src/tiffio.c
OpenclDevice::pixReadTiffCl() and friends
pixReadTiff() and friends
That whole section is a little weird for a few reasons.
1) TIFF decode is not very expensive, why bother with hardware assist?
2) The OpenCL TIFF code has fallen behind Leptonica proper, enough to be currently disabled. (By the way, I'm probably capable of helping it catch up, but not very motivated.)
and most interestingly to me
3) All the actual computation in TIFF decode isn't even done by Leptonica, it is done by libtiff. So the ideal scenario for hardware assist would be something like a "libtiff-turbo" library. (Somewhat similar to how libjepg-turbo has replaced libjpeg in many linux systems). Putting it in Tesseract is really weird, and as a side effect causes a bunch of code duplication.
For all these reasons, if we did decide to prune out some OpenCL code, I think the TIFF portion is the best place to start. I'd roughly guess that includes about a dozen methods.
|
Good guess. It's a little bit more. I'll send a pull request which reduces the code in |
Ray's DAS 2014 tutorial, slides set 8 has some statistics about the OpenCL code. |
This is the slide in question. I think this confirms TIFF decode is relatively inexpensive, and is an even smaller overall portion for Tesseract 4.x.
|
PR #849 now removes most of the TIFF related code. Maybe I can remove more in the future. |
Only ~5300 lines left (to remove) ... :-) |
Great job on this. I suspect next step is to remove the #include tiff.h lines and the build dependency on libtiff. |
It's still used by one of the benchmarks. As soon as I'm sure that this benchmark is not needed, we'll make a large step in Amit's preferred direction (currently 5074 lines left). |
Benchmarks should done on intel's integrated GPU and NVIDIA's GPU. I read somewhere a claim that the performance of OpenCL with NVIDIA card is significantly worst than with AMD cards. Another claim is that OpenCL performance on Macs degraded significantly with the last macOS versions. Like NVIDIA, Apple now has an API which is competing with OpenCL. |
Is the issue still exists ? |
Yes. At least I am not aware that anybody fixed it. |
Since you opened the issue, you dropped a lot of opencl code. Maybe the issue was related to that code? |
No, it is caused by one of the remaining (unchanged) OpenCL code blocks. |
An idea: Maybe you want to add an environment variable to disable opencl at runtime, similar to the openmp trick? |
That environment variable is already there: |
Ok :-) |
There is a thread in the tesseract-ocr forum regarding offer to rewrite OPENCL code
See https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/uNRDFTavDfc/-2MrTZZKBgAJ for details and https://groups.google.com/group/tesseract-ocr/attach/64a964d2b63fb/openclkernels.cl?part=0.1&authuser=0&view=1 for the attachment. |
This is only the opencl part (not implemented yet, but the way I see it) ... |
I can't help with this effort, sorry. |
currently, I made an example of what I could do using opencl:
|
The lstm net does not get as input individual characters boxes during training. |
but how do you generate the drawimage script then? |
a workaround could also be: |
It seems that you want to implement an OCR engine based on traditional pattern recognition methods. |
that's not true: I am adapting the openCL part, if you want this to work as an LSTM object you have to help me adapt the wrapper class. But as said in the forum, I have no experience on LSTM. Therefore:
|
@jpsollie, maybe the best way to start would be fixing the existing OpenCL code. |
I'm not an expert in ML, and I have no knowledge in OpenCL programing. I think that this effort has little chance to succeed without close cooperation with @theraysmith. Anyway, I appreciate your desire to contribute to tesseract to make it faster. |
@stweil : you 're quite late with that, the engine as I designed it (only the opencl code, not the host code) is almost ready. I will write some additional kernels to perform closer interaction, but I will probably post the device code today. Posting the host code (c++ object) will probably take more time, as I have to learn tesseract's LSTM engine and a way to link those 2 together |
as said a few hours ago: hereby the GPU code. If you have any questions about how to use it, feel free to ask.
|
question: how does the human brain see that ff is 2 times f and w is not 2 times v? it seems quite impossible :( |
I think I found a solution to this: include connected characters into the character list. |
There was a mismatch between Tesseract's C++ code which allowed 1 or 4 channels for the OpenCL call in |
I wonder why Tesseract does not use leptonica's own otsu function, or other binarization methods in leptonica. |
IMO because leptonica was not used in past by tesseract. In past I made test and tesseract otsu provided different output as leptonica otsu, so I did not replaced it... maybe more test could be done. Anyway it is question if allow/support more binarization method in tesseract if we suggest users to do preprocessing of images before running tesseract |
@jpsollie Is there any way to utilize GPU in tesseract? |
Currently no. Using GPU to get faster OCR needs more development work. |
Well, you could try to develop a small neural network to decide the probability of the LSTM output, and this may make the output more accurate, but no idea if it will ever be possible to do it on its own. jpsollieOn 18 Dec 2019 07:35, Stefan Weil <[email protected]> wrote:
Is there any way to utilize GPU in tesseract?
Currently no. Using GPU to get faster OCR needs more development work.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hello @stweil. Is there an update on this? We are looking to improve our Tesseract version 4 performance. (Version 5 is not in the repositories yet.) |
No, and I also don't expect any update this or next year. There are several options how to get Tesseract 5, for example https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-devel. |
Both 3.05 and latest git master produce unusable results with OpenCL (tested on Linux and macOS). It looks like that problem only occurs with large images.
Disabling OpenCL in
ccmain/thresholder.cpp
fixes the problem.The text was updated successfully, but these errors were encountered: