Attribute DOM representation and parsing is inconsistent

(Filed as a result of https://github.com/mozilla/readability/issues/392 ; I'm not 100% sure whether this should be considered a DOM issue or an HTML parser issue; feel free to move as appropriate )

STR:

1. open https://opinion.udn.com/opinion/story/10124/3561413 in recent versions of Chrome or Firefox
2. in their respective devtools console, run something like this:

```js
console.log(Array.from(document.querySelectorAll("table")).map(t => t.outerHTML))
```
At the moment, the DOM includes 2 or 3 tables with an attribute whose name is "0", as evidenced from the console log.

The original markup of the page as inspected via "View Source", at time of writing, looked something like:

```html
<table width="90% border="0>
```
Note the opening quote before `90%` and 'closing' quote after `border=`.

Obviously the markup's intent is to have a table with 2 attributes, `width="90%"` and `border="0"`. But both browsers parse this as attributes with name '0' and the empty string as a value. I assume this parsing is proscribed by the spec, but I haven't tried to look for the specifics there.

The problem arises when rote DOM manipulation reads through `element.attributes`, and on a new element, tries to set these same attributes. `Element.setAttribute` throws an `InvalidCharacterError` because as noted in https://dom.spec.whatwg.org/#dom-element-setattribute , `0` "does not match the Name production in XML", viz. https://www.w3.org/TR/xml/#NT-Name .

Scripts can currently work around this issue (in reasonably complete DOM implementations) by using `element.attributes.setNamedItem(otherElement.attributes[i].cloneNode())`, though this isn't very elegant.

I think the inconsistency here is unfortunate. I would argue for one of the following improvements:

- parsing an HTML document should validate attributes the same way the DOM spec says to validate them (cf. https://dom.spec.whatwg.org/#validate and https://dom.spec.whatwg.org/#dom-element-setattribute ), or if that is too problematic for backwards compatibility reasons (ie where document authors apparently intend for the element to have an attribute e.g. with name "1" or "." or somesuch), that it should only do so where it is doing parsing for questionable markup such as the above.
- `setAttribute` DOM API validation should be relaxed to the same standard that the HTML parsing uses; if not possible for backwards compatibility reasons, it should be relaxed for documents with `text/html` content types and/or HTML (rather than XHTML/XML-based) parsing models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attribute DOM representation and parsing is inconsistent #4275

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attribute DOM representation and parsing is inconsistent #4275

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions