[STAL-2474] Fix edge case with v8::String creation with wide characters #448
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR: Effectively
+1 -1
PR handling edge cases in user-provided JavaScript (the other lines are documentation + tests).What problem are you trying to solve?
The ddsa runtime has a shorthand function called v8_string that creates a v8::String.
It does this by calling new_from_one_byte, which tells v8 to expect a Latin-1 encoded string.
v8_string
is used to create kNormal string. kNormal strings are used for "long" strings, or strings that we do not expect to be repeated frequently. Thus, we usev8_string
to create a v8::Script.In general, this should not be a problem. However, there are some perfectly normal cases where a user uses text that we will incorrectly mangle. For example, a Violation error message using non-Latin-1 encoding, e.g. reporting an SQL injection vulnerability:
In this case, the rule logic will function as normal, however, the user's string message will become malformed: (
"SQL注å\u{85}¥æ¼\u{8f}æ´\u{9e}"
).There is additionally a particularly degenerate case where a user writes a script that uses JavaScript variable names with wide characters. For example, the following is valid JavaScript:
In this case, the rule will fail to compile (even though it should work without issue).
What is your solution?
Use new_from_utf8 instead.
Alternatives considered
What the reviewer should know
v8_string
. The rest of ddsa properly handles UTF-8 input (for example,node.text
on a node with wide characters works as-expected).v8_interned
usingv8::String::new_from_one_byte
because 1) we know beforehand all strings that will be used, and so we can guarantee they are ASCII and 2) the performance is faster. Adebug_assert
is added for additional coverage.[inline(always)]
should've been added initially -- might as well do it now.