-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[testharness.js] Require valid UTF-8 in test title #16253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[testharness.js] Require valid UTF-8 in test title #16253
Conversation
When tests are executed in automation, the results may be transported over channels that strictly enforce the UTF-8 restriction on unpaired surrogates. Avoid encoding errors in those contexts by renaming the tests to use human-readable representations of the code points.
Here's evidence from the latest set of data on wpt.fyi: It looks like Edge has trouble with the sub-tests, though it doesn't crash outright. |
@jugglinmike UTF-8 isn't relevant here; I think at some point we concluded that #9415 was caused by lone surrogates (and maybe U+0000 too?), but myself and @jgraham rather viewed JSON as allowing both and therefore as somebody else's problem. That said, I think mine and @jgraham's opinions have changed, and escaping lone surrogates seems sane (IIRC, @jgraham said at least one browser already did in some layer?). We should probably escape all noncharacters (U+0000, U+D800-U+DFFFF, U+xFFFE and U+xFFFF for x in [0, 0x10]), not just lone surrogates, though, given U+0000 also causes problems in places. |
I know there are precise terms that we could use to discuss this, but I'm afraid of misusing them and confusing the discussion. I'll speak in terms of JavaScript strings since I'm on firmer ground there. I agree that the JavaScript string
So even though Chromedriver reports a crash in that case, I decided not to file a bug. Am I misinterpreting things?
Yeah, some part of the equivalent stack in Firefox silently transforms the string. That's also apparent on the wpt.fyi page referenced above (Firefox is passing a test named This patch mimics that transformation in testharness.js, expanding each offending character with a sequence of 6 ASCII characters describing it (e.g.
You may also be thinking of gh-14245, which is a patch to replace the null byte due to a bug in EdgeDriver. My viewpoint on silent transformation hasn't changed, though. That kind of behavior may be convenient for some authors, but it might also confuse others. This admittedly involves assumptions for subjective questions like, "How convenient?", "How confusing?" and "How often?" |
I agree with the part of this patch that replaces unpaired surrogates with the escape sequence in test titles; that's something that would also fail in Firefox if we ran through GeckoDriver and hoists behaviour that's currently in the wptreport formatter to the test source. The same should be done for the message string. I disagree with the part that makes the tests error. Where we have these sequences in tests it's usually because someone is testing against a range of codepoints and auto-generating the titles and message from input data. Making this an error forces test authors to recreate exactly the same code for escaping on the test level. We should simply document that lone surrogates (and null) are escaped in the harness. |
While writing this patch, I found |
Chromedriver rejects "Execute Script" payloads containing unpaired surrogates.
I was getting ready to file a bug report against that project, but I think it
may be a reasonable behavior (since WebDriver is UTF-8, and UTF-8 says those
are off-limits). That's why I'm proposing that testharness.js reports a harness
error.
That's not technically necessary, though. We could silently sanitize the titles
and report an "OK" status. I don't really like changing what the authors wrote,
though.