Skip to content

utf-8 text broken? #2233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rokups opened this issue Dec 9, 2018 · 5 comments
Closed

utf-8 text broken? #2233

rokups opened this issue Dec 9, 2018 · 5 comments

Comments

@rokups
Copy link
Contributor

rokups commented Dec 9, 2018

For some reason i am unable to print lithuanian characters (ąčęėįšųū) with ImGui. At first i thought it is something with my code, but even if i run official demo i see this:

image

No matter what i try - always get question marks in place of mentioned characters.

Tested official imgui samples: win32 dx12, linux sdl2 opengl2, linux sdl2 opengl2, linux glfw_opengl2.
Tested 1.66b release and latest snapshot of docking branch.

In my own application i tested printing text with ImGui::TextColored by passing it utf-8-encoded string. I even tried with u8"ąčęėį" literal.

I tested default font as well as google's noto fonts. I inspected my used fonts in font preview only to see that characters i am looking for indeed are supported. I do not specify any font ranges for noto fonts to include all characters.

I see no white squares as mentioned in font readme so texture must properly fit fonts. At this point i am still not sure if this is my error or the issue. Is demo supposed to show proper text instead of ????? there?

@ocornut
Copy link
Owner

ocornut commented Dec 9, 2018

Default font won't have those characters. You need to provide the glyph ranges when loading and provide a font which has them.

I inspected my used fonts in font preview

Did you inspect in the imgui font previewer, or by some of your OS tool?
You'll need to load the specific glyphs you need.
For example ą appears to be U+0105

I think we ought to provide a small tool when you can paste text and it tells you which characters are in the text so people can start understanding what is going on.

    const char* text = u8"ąčęėį";
    wchar_t converted[256];
    MultiByteToWideChar(CP_UTF8, 0, text, -1, converted, IM_ARRAYSIZE(converted));
    for (int n = 0; converted[n] != 0; n++)
        printf("Character %02d is U+%04x\n", n, converted[n]);

Outputs:

Character 00 is U+0105
Character 01 is U+010d
Character 02 is U+0119
Character 03 is U+0117
Character 04 is U+012f

And

    ImFontAtlas::GlyphRangesBuilder grb;
    grb.AddText(u8"ąčęėį");
    for (int n = 0; n < 0xFFFF; n++)
        if (grb.GetBit(n))
            printf("Contains: U+%04X\n", n);

Outputs

Contains: U+0105
Contains: U+010D
Contains: U+0117
Contains: U+0119
Contains: U+012F

@rokups
Copy link
Contributor Author

rokups commented Dec 9, 2018

Ah ok. Example now rendering text threw me off. As per this it seems like it really is my error. Good to know. Not setting up font ranges uses default latin range which is pretty minimal and does not include characters i tested. Somehow i expected that lack of explicit ranges would include all characters provided by font. Is it possible to somehow include all font characters in some easy way?

Edit: Closing as indeed explicitly specifying correct range makes it all work. Future reader: unspecified font range defaults to minimal latin range, it does not include all characters provided by font.

@rokups rokups closed this as completed Dec 9, 2018
@ocornut
Copy link
Owner

ocornut commented Dec 9, 2018

Is it possible to somehow include all font characters in some easy way?
Edit: Closing as indeed explicitly specifying correct range makes it all work. Future reader: unspecified font range defaults to minimal latin range, it does not include all characters provided by font.

Before 1.61 it was not possible to just "load everything in the font", but now with a { 0x20, 0xFFFF, 0 } range it should work.
I will add comments both about the default range and the possibility of using 0x20..0xFFFF. I am not yet sure this is viable as a default behavior. Need to measure the additional scanning cost on init and see how it would affect old users.
I also took notes of adding a section in the font demo to: 1) print the U+ codepoint of a given string to make it easier for people to verify correct encoding and 2) demonstrate theuse of the GlyphRangeBuilder.

@rokups
Copy link
Contributor Author

rokups commented Dec 9, 2018

Is it not harmful to use full range though? Most of that space will be empty. This and this suggests that imgui was written with explicit ranges in mind and it gives me impression that things might go bad.

@ocornut
Copy link
Owner

ocornut commented Dec 9, 2018 via email

ocornut added a commit that referenced this issue Jan 10, 2019
- Atlas width is now properly based on total surface rather than glyph count (unless overridden with TexDesiredWidth).
- Fixed atlas builder so missing glyphs won't influence the atlas texture width. (#2233)
- Fixed atlas builder so duplicate glyphs (when merging fonts) won't be included in the rasterized atlas.
ocornut added a commit that referenced this issue Jan 10, 2019
- Fixed abnormally high atlas height. (#618)
- Fixed support for any values of TexGlyphPadding (not just only 1). (#618)
- Atlas width is now properly based on total surface rather than glyph count (unless overridden with TexDesiredWidth). (#618)
- Fixed atlas builder so missing glyphs won't influence the atlas texture width. (#2233, #618)
- Fixed atlas builder so duplicate glyphs (when merging fonts) won't be included in the rasterized atlas. (#618)
ocornut added a commit that referenced this issue May 3, 2022
…ng issues and font loading issues. Simplified code + extracted DebugNodeFontGlyph().

Helper to diagnose issues such as #4866, #3558, #3436, #2233, #1880, #1780, #905, #832, #762, #726, #609, #565, #307)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants