Description
(Filed as a result of mozilla/readability#392 ; I'm not 100% sure whether this should be considered a DOM issue or an HTML parser issue; feel free to move as appropriate )
STR:
- open https://opinion.udn.com/opinion/story/10124/3561413 in recent versions of Chrome or Firefox
- in their respective devtools console, run something like this:
console.log(Array.from(document.querySelectorAll("table")).map(t => t.outerHTML))
At the moment, the DOM includes 2 or 3 tables with an attribute whose name is "0", as evidenced from the console log.
The original markup of the page as inspected via "View Source", at time of writing, looked something like:
<table width="90% border="0>
Note the opening quote before 90%
and 'closing' quote after border=
.
Obviously the markup's intent is to have a table with 2 attributes, width="90%"
and border="0"
. But both browsers parse this as attributes with name '0' and the empty string as a value. I assume this parsing is proscribed by the spec, but I haven't tried to look for the specifics there.
The problem arises when rote DOM manipulation reads through element.attributes
, and on a new element, tries to set these same attributes. Element.setAttribute
throws an InvalidCharacterError
because as noted in https://dom.spec.whatwg.org/#dom-element-setattribute , 0
"does not match the Name production in XML", viz. https://www.w3.org/TR/xml/#NT-Name .
Scripts can currently work around this issue (in reasonably complete DOM implementations) by using element.attributes.setNamedItem(otherElement.attributes[i].cloneNode())
, though this isn't very elegant.
I think the inconsistency here is unfortunate. I would argue for one of the following improvements:
- parsing an HTML document should validate attributes the same way the DOM spec says to validate them (cf. https://dom.spec.whatwg.org/#validate and https://dom.spec.whatwg.org/#dom-element-setattribute ), or if that is too problematic for backwards compatibility reasons (ie where document authors apparently intend for the element to have an attribute e.g. with name "1" or "." or somesuch), that it should only do so where it is doing parsing for questionable markup such as the above.
setAttribute
DOM API validation should be relaxed to the same standard that the HTML parsing uses; if not possible for backwards compatibility reasons, it should be relaxed for documents withtext/html
content types and/or HTML (rather than XHTML/XML-based) parsing models.