Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Commit f01c310

Browse files
committed
[spec] hex-escape punctuators
Fixes #65
1 parent 994b9ce commit f01c310

File tree

1 file changed

+43
-28
lines changed

1 file changed

+43
-28
lines changed

spec.emu

Lines changed: 43 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,6 @@ contributors: Jordan Harband
1515
<emu-clause id="sec-regexp-regular-expression-objects" number="2">
1616
<h1>RegExp (Regular Expression) Objects</h1>
1717

18-
<emu-clause id="sec-patterns" number="1">
19-
<h1>Patterns</h1>
20-
21-
<h2>Syntax</h2>
22-
<p>Each `\\u` |HexTrailSurrogate| for which the choice of associated `u` |HexLeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |HexLeadSurrogate| that would otherwise have no corresponding `\\u` |HexTrailSurrogate|.</p>
23-
<emu-grammar type="definition">
24-
HexNonSurrogate ::
25-
Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive interval from 0xD800 to 0xDFFF]
26-
27-
IdentityEscape[UnicodeMode] ::
28-
[+UnicodeMode] SyntaxCharacter
29-
[+UnicodeMode] `/` <ins>`,` `-` `=` `<` `>` `#` `&` `!` `%` `:` `;` `@` `~` `'` `"` `\``</ins>
30-
<ins>[+UnicodeMode] WhiteSpace</ins>
31-
[~UnicodeMode] SourceCharacter but not UnicodeIDContinue
32-
33-
DecimalEscape ::
34-
NonZeroDigit DecimalDigits[~Sep]? [lookahead &notin; DecimalDigit]
35-
</emu-grammar>
36-
</emu-clause>
37-
3818
<emu-clause id="sec-properties-of-the-regexp-constructor" number="5">
3919
<h1>Properties of the RegExp Constructor</h1>
4020

@@ -47,24 +27,59 @@ contributors: Jordan Harband
4727
<emu-alg>
4828
1. Let _str_ be ? ToString(_S_).
4929
1. Let _cpList_ be StringToCodePoints(_str_).
50-
1. Let _punctuators_ be the following String, which consists of every ASCII punctuator except U+005F (LOW LINE): *"(){}[]|,.?\*+-^$=<>\/#&!%:;@~'"`"*.
51-
1. Let _toEscape_ be StringToCodePoints(_punctuators_).
5230
1. Let _escapedList_ be a new empty List.
5331
1. For each code point _c_ in _cpList_, do
5432
1. If _escapedList_ is empty and _c_ is matched by |DecimalDigit|, then
55-
1. Append code unit U+005C (REVERSE SOLIDUS) to _escapedList_.
56-
1. Append code unit U+0078 (LATIN SMALL LETTER X) to _escapedList_.
57-
1. Append code unit U+0033 (DIGIT THREE) to _escapedList_.
58-
1. Else if _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then
59-
1. Append code unit U+005C (REVERSE SOLIDUS) to _escapedList_.
60-
1. Append _c_ to _escapedList_.
33+
1. Append the code point U+005C (REVERSE SOLIDUS) to _escapedList_.
34+
1. Append the code point U+0078 (LATIN SMALL LETTER X) to _escapedList_.
35+
1. Append the code point U+0033 (DIGIT THREE) to _escapedList_.
36+
1. Append _c_ to _escapedList_.
37+
1. Else,
38+
1. Append the code points in EncodeForRegExpEscape(_c_) to _escapedList_.
6139
1. Return CodePointsToString(_escapedList_).
6240
</emu-alg>
6341

6442
<emu-note>
6543
<p>`escape` takes a string and escapes it so it can be literally represented as a pattern. In contrast EscapeRegExpPattern (as the name implies) takes a pattern and escapes it so that it can be represented as a string. While the two are related, they do not share the same character escape set or perform similar actions.</p>
6644
</emu-note>
6745
</emu-clause>
46+
47+
<emu-clause id="sec-encode" type="abstract operation">
48+
<h1>
49+
EncodeForRegExpEscape (
50+
_c_: a code point,
51+
): a List of code points
52+
</h1>
53+
<dl class="header">
54+
<dt>description</dt>
55+
<dd>If _c_ represents a RegExp punctuator that needs escaping, or ASCII whitespace, it produces the code points for *"\x"* followed by the relevant escape code. If _c_ represents non-ASCII white space, it produces the code points for *"\u"* followed by the relevant escape code. Otherwise, it returns a List containing _c_.</dd>
56+
</dl>
57+
58+
<emu-alg>
59+
1. Let _codePoints_ be a new empty List.
60+
1. Let _punctuators_ be the following String, which consists of every ASCII punctuator except U+005F (LOW LINE): *"(){}[]|,.?\*+-^$=<>\/#&!%:;@~'"`"*.
61+
1. Let _toEscape_ be StringToCodePoints(_punctuators_).
62+
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then
63+
1. Append the code point U+005C (REVERSE SOLIDUS) to _codePoints_.
64+
1. Let _hex_ be Number::toString(𝔽(_c_), 16).
65+
1. If the length of _hex_ is 1 or 2, then
66+
1. Set _hex_ to StringPad(_hex_, 2, *"0"*, ~start~).
67+
1. Append the code point U+0078 (LATIN SMALL LETTER X) to _codePoints_.
68+
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
69+
1. Else if the length of _hex_ is 3 or 4, then
70+
1. Set _hex_ to StringPad(_hex_, 4, *"0"*, ~start~).
71+
1. Append the code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
72+
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
73+
1. Else,
74+
1. Append the code point U+0075 (LATIN SMALL LETTER U) to _codePoints_.
75+
1. Append the code point U+007B (LEFT CURLY BRACKET) to _codePoints_.
76+
1. Append the code points in StringToCodePoints(_hex_) to _codePoints_.
77+
1. Append the code point U+007D (RIGHT CURLY BRACKET) to _codePoints_.
78+
1. Else,
79+
1. Append _c_ to _codePoints_.
80+
1. Return _codePoints_.
81+
</emu-alg>
82+
</emu-clause>
6883
</ins>
6984
</emu-clause>
7085
</emu-clause>

0 commit comments

Comments
 (0)