Skip to content

[STAL-2474] Fix edge case with v8::String creation with wide characters #448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

jasonforal
Copy link
Collaborator

@jasonforal jasonforal commented Jul 16, 2024

TL;DR: Effectively +1 -1 PR handling edge cases in user-provided JavaScript (the other lines are documentation + tests).

What problem are you trying to solve?

The ddsa runtime has a shorthand function called v8_string that creates a v8::String.

It does this by calling new_from_one_byte, which tells v8 to expect a Latin-1 encoded string.

v8_string is used to create kNormal string. kNormal strings are used for "long" strings, or strings that we do not expect to be repeated frequently. Thus, we use v8_string to create a v8::Script.

In general, this should not be a problem. However, there are some perfectly normal cases where a user uses text that we will incorrectly mangle. For example, a Violation error message using non-Latin-1 encoding, e.g. reporting an SQL injection vulnerability:

const error = buildError(node.start.line, node.start.col, node.end.line, node.end.col, "SQL注入漏洞");

In this case, the rule logic will function as normal, however, the user's string message will become malformed: ("SQL注å\u{85}¥æ¼\u{8f}æ´\u{9e}").

There is additionally a particularly degenerate case where a user writes a script that uses JavaScript variable names with wide characters. For example, the following is valid JavaScript:

// ("𓀗".length === 2)
const 𓀗 = buildError(node.start.line, node.start.col, node.end.line, node.end.col, "SQL injection vulnerability");
addError(𓀗);

In this case, the rule will fail to compile (even though it should work without issue).

What is your solution?

Use new_from_utf8 instead.

Alternatives considered

What the reviewer should know

  • This only affects usages of v8_string. The rest of ddsa properly handles UTF-8 input (for example, node.text on a node with wide characters works as-expected).
  • We keep v8_interned using v8::String::new_from_one_byte because 1) we know beforehand all strings that will be used, and so we can guarantee they are ASCII and 2) the performance is faster. A debug_assert is added for additional coverage.
  • [inline(always)] should've been added initially -- might as well do it now.

@jasonforal jasonforal requested a review from juli1 July 16, 2024 12:49
@jasonforal jasonforal merged commit 0daa60b into main Jul 16, 2024
70 checks passed
@jasonforal jasonforal deleted the jf/STAL-2474 branch July 16, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants