Skip to content

Allow more characters when creating various nodes #1079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 9, 2025
Merged

Conversation

domenic
Copy link
Member

@domenic domenic commented May 5, 2022

Stop using XML grammar productions for validating element, attribute, and doctype names. These were overly-restrictive, as it is possible to create nodes with names that don't match those productions using the HTML parser. After this change, each construct has its own custom name validation algorithm.

The only remaining dependency on XML naming rules is for processing instructions, which are uncommon and cannot be created with the HTML parser anyway.

Closes #769. Closes #849. Closes #1373.

(See WHATWG Working Mode: Changes for more details.)


Original points for discussion, discussed and concluded on in following comments. These do not reflect the currently proposed change.
  • I did not disallow = inside attribute local names. Both the parser and DOM APIs currently disallow them, except the parser allows it for the first character. I'm happy to change this if people prefer; I started with the simpler version.
  • This does not disallow lone surrogates, the Unicode replacement character U+FFFD, single quotes, or < in any position, because the HTML parser allows introducing those already and it seems nicer to align.
  • I did not change validation for createProcessingInstruction() or createDocumentType(). We could try to simplify those too, perhaps after investigating parser behavior. But they didn't seem to be causing any real web developer pain, unlike elements and local names, so I thought it'd be better to just leave them as-is.

Preview | Diff

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call out the = difference in a note? At least a comment would be good I think.

I'm not a big fan of adding "DOM API" to the naming. That makes more sense if this was defined outside of the DOM Standard itself. I think dropping it would still make everything work.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me, modulo a small oversight. Anyone else that should review this? @mfreed7 perhaps?

@domenic domenic requested a review from mfreed7 May 18, 2022 16:47
Copy link
Contributor

@mfreed7 mfreed7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@domenic
Copy link
Member Author

domenic commented Jun 2, 2022

Discussed equals sign at HTML triage meeting. Conclusion: disallow it in attributes everywhere. (Even though the parser allows it in the first-character position.)

@domenic
Copy link
Member Author

domenic commented Jun 6, 2022

I think this is ready for re-review.

Potential issue: XML's definition of Char seems nonsensical (it excludes various Unicode characters below U+0020). And, its definition of the [^#x00#x09#x0A#0x0Cx0D#x20/>] syntax depends on that definition. Hmm.

@domenic
Copy link
Member Author

domenic commented Jun 7, 2022

Refined to no longer use EBNF.

domenic added a commit to whatwg/html that referenced this pull request Jun 7, 2022
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem equivalent to the sorta-EBNF from before. In particular if the first code point is from BeyondHTMLParserName the second code point was more limited.

@domenic
Copy link
Member Author

domenic commented Jun 7, 2022

In particular if the first code point is from BeyondHTMLParserName the second code point was more limited.

I'm not sure exactly what you mean. Recall that it's a union of both. The second+ code point is from HTMLParserCompatibleName, which had [^#x00#x09#x0A#0x0Cx0D#x20/>]* for that position. Which is exactly what the current draft says, right?

@annevk
Copy link
Member

annevk commented Jun 7, 2022

I don't think the EBNF allows for the second code point to be U+0001 when the first is :, for instance. At least the intent was to prevent that. Does EBNF work completely differently from ABNF in that | doesn't signify OR but instead "union"?

(I didn't see "An equivalent EBNF is the following" initially and I don't think what it states is correct.)

@domenic
Copy link
Member Author

domenic commented Jun 7, 2022

I see, I did not capture that this was a branching scenario depending on the behavior of the first code point. And you addressed what harms names like that might hypothetically cause in #849 (comment) .

I'll revise.

@domenic
Copy link
Member Author

domenic commented Jun 7, 2022

I think that is done. The other way I could write this is by looping over the characters individually, which is what a performant implementation would do (instead of using lots of O(n) "contains" operations). But I think this is relatively clear.

(Edit: well, a performant implementation would be looping over code units, since that's JS's native string format... which feels ickier to spec.)

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks accurate to me.

@domenic
Copy link
Member Author

domenic commented Jun 8, 2022

OK, this (and whatwg/html#7991) is just waiting on someone to write web platform tests. Then we can close a ~5 year old recurring pain point on the web platform!

For fun, these are all the references to this I can find:

I suspect there are more GitHub issues from earlier, because why would I have posted #449 if not because of some other issue someone filed? But I couldn't find them.

@zcorpan zcorpan added the needs tests Moving the issue forward requires someone to write tests label Jun 8, 2022
@annevk
Copy link
Member

annevk commented Feb 14, 2023

@josepharhar would you be interested in finishing this?

@josepharhar
Copy link
Contributor

Yes, I have started a WPT here: web-platform-tests/wpt#38503

@annevk
Copy link
Member

annevk commented Feb 15, 2023

\o/ I suspect that once you implement this and do a try run you'll find a lot of existing WPT tests that can be adjusted. There's probably no need for a new file, but maybe.

@cdumez
Copy link

cdumez commented May 5, 2023

Any progress on this?

@josepharhar
Copy link
Contributor

Not recently, I have unfortunately been focused on other stuff.

aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 2, 2025
This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 2, 2025
This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 2, 2025
This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}
@josepharhar
Copy link
Contributor

Here is a new WPT: web-platform-tests/wpt#38503

And here is another patch in progress to update existing WPTs, which doesn't have a WPT PR yet because we stopped generating those until the code review gets approved: https://chromium-review.googlesource.com/c/chromium/src/+/6570951

@domenic domenic removed the needs tests Moving the issue forward requires someone to write tests label Jun 3, 2025
@domenic domenic merged commit e67d5fe into main Jun 9, 2025
2 checks passed
@domenic domenic deleted the createelement-chars branch June 9, 2025 04:47
domenic added a commit to whatwg/html that referenced this pull request Jun 9, 2025
aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 18, 2025
Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 18, 2025
Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 18, 2025
Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}
lando-prod-mozilla bot pushed a commit to mozilla-firefox/firefox that referenced this pull request Jun 19, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jun 19, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Jun 21, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <[email protected]>
Reviewed-by: Mason Freed <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503
lando-prod-mozilla bot pushed a commit to mozilla-firefox/firefox that referenced this pull request Jun 23, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jun 23, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Jun 24, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <[email protected]>
Commit-Queue: Joey Arhar <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Jun 24, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: Mason Freed <masonfchromium.org>
Cr-Commit-Position: refs/heads/main{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503

UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Jun 24, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <dbaronchromium.org>
Commit-Queue: Joey Arhar <jarharchromium.org>
Cr-Commit-Position: refs/heads/main{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251

UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Jun 24, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: Mason Freed <masonfchromium.org>
Cr-Commit-Position: refs/heads/main{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503

UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Jun 24, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <dbaronchromium.org>
Commit-Queue: Joey Arhar <jarharchromium.org>
Cr-Commit-Position: refs/heads/main{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251

UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Jun 24, 2025
…te names and prefixes, a=testonly

Automatic update from web-platform-tests
Allow more characters in element/attribute names and prefixes

This patch significantly changes the parsing of element names, attribute
names, and namespace prefixes for DOM APIs to allow more flexibility and
better parity with the HTML parser.

I am planning on making an intent to ship for this behavior before
enabling it by default.

I am planning on making another WPT patch to change the existing tests
to match the new parsing behavior once the spec is merged and the I2S is
complete, and maybe also after the new behavior reaches stable with no
issues.

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: Ifbb5ac47a08a8f14489c694649ab5be1f59647ac
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4251683
Commit-Queue: Joey Arhar <jarharchromium.org>
Reviewed-by: Mason Freed <masonfchromium.org>
Cr-Commit-Position: refs/heads/main{#1468337}

--

wpt-commits: 6b9a6fb929e5adeaccf3a4151447784df5d5f941
wpt-pr: 38503

UltraBlame original commit: d481425a1a74ac2f7daeada04b500b398ca4d8ca
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Jun 24, 2025
…ute parsing, a=testonly

Automatic update from web-platform-tests
Update tests for relaxed DOM name/attribute parsing

Spec PR: whatwg/dom#1079

Bug: 40122442, 40228234
Change-Id: I42c40c3f3acdfcfc4647c6d87ffcbfadc6de13be
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6570951
Reviewed-by: David Baron <dbaronchromium.org>
Commit-Queue: Joey Arhar <jarharchromium.org>
Cr-Commit-Position: refs/heads/main{#1475652}

--

wpt-commits: 856c33e87e61f513ddc81e082f78b6d24027e83d
wpt-pr: 53251

UltraBlame original commit: 405174975ec2014a5185834e62a95ea5959be20f
github-merge-queue bot pushed a commit to servo/servo that referenced this pull request Jul 6, 2025
…37747)

A recent update in the spec (whatwg/dom#1079)
introduced new rules for name validation of attribute, element, and
doctype. This PR implements the new name validation rules in
`components/script/dom/bindings/domname.rs`. The old XML name validation
rules are not fully removed because there remains a few usage of it in
`ProcessingInstructions` and `xpath`.

Testing: Covered by WPT tests
Fixes: #37746

---------

Signed-off-by: minghuaw <[email protected]>
Signed-off-by: Minghua Wu <[email protected]>
Co-authored-by: Xiaocheng Hu <[email protected]>
github-merge-queue bot pushed a commit to servo/servo that referenced this pull request Jul 11, 2025
…37747)

A recent update in the spec (whatwg/dom#1079)
introduced new rules for name validation of attribute, element, and
doctype. This PR implements the new name validation rules in
`components/script/dom/bindings/domname.rs`. The old XML name validation
rules are not fully removed because there remains a few usage of it in
`ProcessingInstructions` and `xpath`.

Testing: Covered by WPT tests
Fixes: #37746

---------

Signed-off-by: minghuaw <[email protected]>
Signed-off-by: Minghua Wu <[email protected]>
Co-authored-by: Xiaocheng Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants