-
Notifications
You must be signed in to change notification settings - Fork 464
Make internal encoding of locations aware of unicode #6073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When some unicode characters are present on a line, the existing encoding of positions, based on number of bytes since line start, is incorrect. This can be seen in e.g. error messages picked up in the editor (or on the command-line). This PR takes unicode into account. Even thought the ocaml locations are byte-based, one can trick the system by encoding as pos_cnum: (number of bytes from file start to line start) + (number of utf16 code units since line start) Since the compiler's printer performs a subtraction, the utf16 character position is shown. Notice that editors, vscode in particular, show you something in "Col", but its internal commands expect correct utf16 character which is different.
@@ -0,0 +1,12 @@ | |||
|
|||
[1;31mWe've found a bug for you![0m | |||
[36m/.../fixtures/unicode_location.res[0m:[2m1:43[0m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
character 43
[36m/.../fixtures/unicode_location.res[0m:[2m1:43[0m | ||
|
||
[1;31m1[0m [2m│[0m let q = "💩💩💩💩💩💩💩💩�[1;31m�[0m��💩" ++ ("a" ++ 3 ++ "b") | ||
2 [2m│[0m // ^ character position 33 + 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
33 + 10 == 43
I think this might fix a long standing issue with semantic highlighting and my native tounge (Swedish) where using letters like åäö offsets the coloring so it ends up wrong after those letters. |
Definitely. If you have an example at hand at some point, it would be handy to make sure it does. |
Out traveling, but try for example: let str = `hej på dig ${123->Int.toString}` That should color the interpolated part wrong. |
Found one independently: let foo = x => if ( x == "Den här positionen är avstängd, väldigt avstängd") {<div />} else { <div />} |
@cristianoc love the Swedish! |
This does fix the issue, once the relevant parser files in the editor extension are updated. |
This fixes highlighting, and other things, in presence of unicode strings.
When some unicode characters are present on a line, the existing encoding of positions, based on number of bytes since line start, is incorrect. This can be seen in e.g. error messages picked up in the editor (or on the command-line).
This PR takes unicode into account. Even though the ocaml locations are byte-based, one can trick the system by encoding as
pos_cnum
:(number of bytes from file start to line start) + (number of utf16 code units since line start)
Since the compiler's printer performs a subtraction, the utf16 character position is shown. Notice that editors, vscode in particular, show you something in "Col", but its internal commands expect correct utf16 character which is different.