-
Notifications
You must be signed in to change notification settings - Fork 2
Fix parsing for minified css with extra semicolons #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fixes the scenario where extra semicolons are immediately followed (without whitespace) by a boundary token such as '}'. For the ruleset: .foo{bar:baz;;}div .qux{vim:fet;} Shady treats the first `}' as an extra token and adds div .qux rule under the .foo selector.
@@ -131,7 +131,7 @@ class Parser { | |||
} | |||
|
|||
while (tokenizer.currentToken && | |||
tokenizer.currentToken.is(TokenType.boundary)) { | |||
tokenizer.currentToken.is(TokenType.semicolon)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you have || TokenType.boundary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semicolon is a boundary token. the problem is that close brace is also a boundary, so parseUnknown is too eager and swallows the closing brace while we're still in the open brace context, and subsequent statements get swallowed into the nested context.
I think this fix isn't ideal, but it's ok. it looks to me like it will bring Shady closer to actual browser css error recovery, but there are still a lot of error conditions that Shady will not handle correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[};] are property boundaries. This is significantly limiting the list. Are you sure it's going to work for all cases? Please add tests and run all tests in nv-transform-css.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consuming just semicolon instead of other boundaries is harmless, since we're in error recovery. the unconsumed tokens will just trigger parseUnknown again. So instead of a single Skipped token with all the boundary tokens, we get a series of smaller Skipped tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so when it's triggering for not semicolon, what is going to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
browser error recovery is basically "skip until boundary token", not "skip over boundary tokens". (it's not identical to "skip until boundary token", but "skip until" is closer to what browsers do than "skip over")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why then you bundle ";;;;" into one discard? The whole while loop can go away then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think the actual correct version is a while loop with a token != boundary. but I don't know if that will break expectations elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same idea, that a while loop with token!= boundary would be the apt solution (since that is what is suggested to be graceful recovery). I don't think that solution is logically correct. Consider, for example, the following CSS:
host {
margin: 0;;
padding: 0;
}
With the token!= boundary loop:
The first semicolon is consumed as part of the margin rule,
The second semicolon triggers parseUnknown,
which consumes the semicolon and all the tokens up to the next boundary token, which is ":". So the 'padding' token gets consumed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had the idea of removing the while loop altogether, since it is not doing much. But this breaks a lot of tests since they seem to be accounting for not only the valid rules but the discarded ones too.
For example, the test called disregards extra semi-colons
checks for nodeFactory.discarded(';;'),
which would fail if the while loop is discarded and the unknown tokens were parsed one by one.
There are a number of other tests that fail on removing the loop and my intention was to fix the issue with low touch.
@@ -68,6 +68,8 @@ div { | |||
|
|||
export const minifiedRuleset = '.foo{bar:baz}div .qux{vim:fet;}'; | |||
|
|||
export const minifiedRulesetWithExtraSemicolons = '.foo{bar:baz;;}div .qux{vim:fet;}'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why extraSemicolons test is not enough L52? Is there anything special about minified semicolons? New line is not a property boundary or anything. All is missing - spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseUnknown current consumes a sequence of adjacent boundary tokens.
When shady sees ";;\n}",
the first semicolon is consumed as part of the rule,
the second semicolon is unrecognized and triggers parseUnknown,
which consumes the semicolon but stops at the newline since newline is not a boundary token.
When shady sees ";;}\n",
parseUnknown will consume the closing brace too, which is the problem
@@ -131,7 +131,7 @@ class Parser { | |||
} | |||
|
|||
while (tokenizer.currentToken && | |||
tokenizer.currentToken.is(TokenType.boundary)) { | |||
tokenizer.currentToken.is(TokenType.semicolon)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[};] are property boundaries. This is significantly limiting the list. Are you sure it's going to work for all cases? Please add tests and run all tests in nv-transform-css.
This fixes the scenario where extra semicolons are immediately
followed (without whitespace) by a boundary token such as '}'.
For the ruleset: .foo{bar:baz;;}div .qux{vim:fet;}
Shady treats the first `}' as an extra token and adds div .qux rule
under the .foo selector.