-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ImTextCharFromUtf8 excludes a range of unicode characters #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
0x91 isn't a quote character, according to I know for a fact that I used the copyright symbol (U+00A9, UTF-8 0xC2 0xA9). I am not mega familar with UTF-8 but I'm not sure a single byte between 0x80 and 0xBF translate to a valid code-path. Could you dump the hex data for the string and confirm that you are indeed passing UTF-8 to it and not extended Ascii ? and/or provide a "portable" repro, portable in the sense maybe using Also see |
And Possibly you aren't passing valid UTF-8 because it is a confusing thing to do with compilers pre-dating C++11. Newer compiler allows for the u8"this is a utf8 literal". |
You're entirely correct, I'm passing through some improperly converted UTF-16 from C# and that's where 0x91 corresponds to a smart quote (http://www.fileformat.info/info/charset/UTF-16/list.htm). I could have sworn I checked http://www.utf8-chartable.de/ before posting the report, but apparently not. That'll teach me to file bug reports at 2am. Thanks for your help, and thanks for working on such a useful library! |
It's sort of unfortunate and cause of recurrent first-time issues with many users. |
Just to elaborate further on what got me into this tangle in case anyone gets here by google, it turns out that 0x0091 isn't a defined character in UTF-16 either. It's reserved for private use, so Windows opts to treat it as a smart quote to match the earlier Windows-1252 code page. Converting to UTF-8 with C#'s text encoding functions correctly translates this to 0xe28098, the three-byte UTF-8 character for a smart quote. ProggyClean.tff doesn't have a 0xe28098 code point, so with the string correctly converted I get a ? in place of the quote. However, ProggyClean.tff DOES have a 0x91 code point, for Windows-1252. So if I fail to convert the string and amend ImTextCharFromUtf8 to let the malformed character through, I get the correct glyph. Someone should develop a unicode ProggyClean, I guess. |
Maybe someone can adapt this: https://gist.github.com/paulsapps/cbd037b3d1b063927b719e489197aa27 |
That does the same thing:
|
The difference of size between to those blurbs of code is also maybe a gentle reminder of how stupidly wrong and inefficient the C++ stream/string libraries are. Not only the code is 10 times bigger but it is also probably 100 times slower, involving heap allocations, etc. Stay away from this madness :) |
Uh oh!
There was an error while loading. Please reload this page.
Using a sample paragraph which includes the text " ‘possible worlds’ " in TextWrapped, ImGui would not print any characters from the first quote onwards. (Unicode codepoint 0x91)
As far as I can tell, in the case of UTF-8 characters which are greater than 0x80 but less than 0xE0, ImTextCharFromUtf8 fails to recognise the character and returns a null, truncating the string to that point. I'm not knowledgeable about UTF-8 enough to say exactly what ImTextCharFromUtf8 is doing or for what purpose in excluding these values, but replacing
*out_char = 0; return 0;
with
*out_char = *str; return 1;
On line 950 of imgui.cpp has resolved my issue, but it obviously might cause others.
The text was updated successfully, but these errors were encountered: