Skip to content

Commit c241d38

Browse files
authored
Merge pull request tesseract-ocr#389 from vidiecan/issue_388
fixes tesseract-ocr#388 by using raw bytes utf8 encoding
2 parents fd26a22 + 7289a3f commit c241d38

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

training/stringrenderer.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ static const int kDefaultOutputResolution = 300;
5252
// Word joiner (U+2060) inserted after letters in ngram mode, as per
5353
// recommendation in http://unicode.org/reports/tr14/ to avoid line-breaks at
5454
// hyphens and other non-alpha characters.
55-
static const char* kWordJoinerUTF8 = "\u2060";
55+
static const char* kWordJoinerUTF8 = "\xE2\x81\xA0"; //u8"\u2060";
5656
static const char32 kWordJoiner = 0x2060;
5757

5858
static bool IsCombiner(int ch) {

0 commit comments

Comments
 (0)