-
Notifications
You must be signed in to change notification settings - Fork 1.3k
RegExp Processing of CharacterClassEscape for \W should be the same as !\w with unicode and ignoreCase flags. #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixed the issue that @mathiasbynens found. |
@@ -28544,7 +28544,8 @@ | |||
</tbody> | |||
</table> | |||
</figure> | |||
<p>The production <emu-grammar>CharacterClassEscape :: `W`</emu-grammar> evaluates by returning the set of all characters not included in the set returned by <emu-grammar>CharacterClassEscape :: `w`</emu-grammar> .</p> | |||
<p>and the Boolean *false*.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line still seems like it should be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The markup around this area looks really weird (closing </tbody>
without a matching start tag etc.) but this line, at least, is intentional. It’s the continuation of the sentence on line https://github.com/tc39/ecma262/pull/516/files/1944b451b767e30575150204d0d28a8ae270a3a5#diff-3540caefa502006d8a33cb1385720803L28315.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions for a different place to add the "and the Boolean false." then after the table.
I have no idea why the change to line 28547 is only showing the one added line. This is what is shown in my local repository for the whole change - exactly what I'm proposing and pushed:
|
This still does not address the issue with Class Range [0]. Part of evaluating a Class Range is Class Escape, in which, for the production ClassEscape::CharacterClassEscape we "return the CharSet that is the result of evaluating CharacterClassEscape". This CharSet may be used to create a union with another CharSet. The Boolean return value of the new CharacterClassEscape is now lost. So with this Change, /[\W\d]/ would be equivalent to /[\w\D]/, since the Boolean return value is unused. If we are to use the Boolean value, we need a way to create a union for tuples of (CharSet, Boolean). [0] https://tc39.github.io/ecma262/#sec-classranges |
I do not have a good answer for the issue raised by @hashseed of CharacterClassEscapes being included in a CharacterClass. Ignoring the processing of 'k' and 's' characters that this proposal was trying to address, the currently specified behavior of \w and \W for unicode RegExp's is not generally useful for other reasons. For example they do not properly recognize other unicode word characters such as accented characters. Therefore I withdraw this request. |
Alright; we are all looking forward to your Unicode properties proposal, @goyaKim ! |
I created a new pull requests (525) that addresses this issue in another way. |
The current specification of CharacterClassEscape's in Regular Expressions introduces surprising behavior of \W when both the
unicode
andignoreCase
flags are provided.There are 6 CharacterClassEscapes defined:
\d is digit character and \D is not digit character
\s is space character and \S is not space character
\w is word character and \W is not word character
Furthermore, !\d matches the same as \D, !\s matches the same as \S, and !\w matches the same as \W with the exception of
/\W/.ui
and the characters 'k', 'K', 's' and 'S'.These exceptions should be removed so that
/\W/ui.test("k")
should befalse
given that/\w/ui.test("k")
istrue
. The same changes should mode for the other three subject characters. The proposed changes involve how the CharSet and invert flag are generated for CharacterSetMatcher for CharacterClassEscapes.The proposed changes are also made for \D and \S CharacterClassEscapes for consistency, even though the changes to impacted what they match.
This change has been discussed in issue 512.