Skip to content

Commit c3d06cf

Browse files
aphillipseemeli
andauthored
Add bidi support and address UAX31/UTS55 requirements (#884)
* Add bidi support and address UAX31/UTS55 requirements Adds the bidi strong marks ALM, RLM, and LRM plus the bidi isolate controls LRI, RLI, FSI, and PDI to the syntax. Formally defines optional vs. non-optional whitespace. Non-optional whitespace must include at least one whitespace character. Optional whitespace may contain only bidi marks (which are invisible) * Update syntax.md including text from previous PR * Repair the guidance on strongly directional marks Include ALM and better specify how to use the marks. * Fix formatting of the "important" * Add bidi characters to description of whitespace. * Permit bidi in a few more places Add optional whitespace at the start of `variant` Add optional whitespace around `quoted-pattern` These changes result in allowing bidi around keys and quoted patterns as intended. * Update syntax.md ABNF * Update formatting.md - Add a note about the difference between formatting and message syntax. - Clarify the sentence about message directionality. * Address comment about name/identifier * Address comments related to bidi in `name` * Fix variable's location * Address comment about the list of LRI/PDI targets * One character typo :-P * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> * Address comments about rule R3a-1 * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> * Address comment about U+061C * Change [o]wsp => `o` or `s` * Match syntax spec to abnf * Remove * * Update syntax.md * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> * Update spec/message.abnf Co-authored-by: Eemeli Aro <[email protected]> * Update spec/message.abnf Co-authored-by: Eemeli Aro <[email protected]> * Update syntax.md * Update spec/message.abnf Co-authored-by: Eemeli Aro <[email protected]> * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> * Update spec/syntax.md Co-authored-by: Eemeli Aro <[email protected]> --------- Co-authored-by: Eemeli Aro <[email protected]>
1 parent 80bec52 commit c3d06cf

File tree

3 files changed

+175
-57
lines changed

3 files changed

+175
-57
lines changed

spec/formatting.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -768,7 +768,16 @@ That is, the text can can consist of a mixture of left-to-right and right-to-lef
768768
The display of bidirectional text is defined by the
769769
[Unicode Bidirectional Algorithm](http://www.unicode.org/reports/tr9/) [UAX9].
770770
771-
The directionality of the message as a whole is provided by the _formatting context_.
771+
The directionality of the formatted _message_ as a whole is provided by the _formatting context_.
772+
773+
> [!NOTE]
774+
> Keep in mind the difference between the formatted output of a _message_,
775+
> which is the topic of this section,
776+
> and the syntax of _message_ prior to formatting.
777+
> The processing of a _message_ depends on the logical sequence of Unicode code points,
778+
> not on the presentation of the _message_.
779+
> Affordances to allow users appropriate control over the appearance of the
780+
> _message_'s syntax have been provided.
772781
773782
When a _message_ is formatted, _placeholders_ are replaced
774783
with their formatted representation.

spec/message.abnf

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,41 @@
11
message = simple-message / complex-message
22

3-
simple-message = [s] [simple-start pattern]
3+
simple-message = o [simple-start pattern]
44
simple-start = simple-start-char / escaped-char / placeholder
55
pattern = *(text-char / escaped-char / placeholder)
66
placeholder = expression / markup
77

8-
complex-message = [s] *(declaration [s]) complex-body [s]
8+
complex-message = o *(declaration o) complex-body o
99
declaration = input-declaration / local-declaration
1010
complex-body = quoted-pattern / matcher
1111

12-
input-declaration = input [s] variable-expression
13-
local-declaration = local s variable [s] "=" [s] expression
12+
input-declaration = input o variable-expression
13+
local-declaration = local s variable o "=" o expression
1414

15-
quoted-pattern = "{{" pattern "}}"
15+
quoted-pattern = o "{{" pattern "}}"
1616

17-
matcher = match-statement s variant *([s] variant)
17+
matcher = match-statement s variant *(o variant)
1818
match-statement = match 1*(s selector)
1919
selector = variable
20-
variant = key *(s key) [s] quoted-pattern
20+
variant = key *(s key) quoted-pattern
2121
key = literal / "*"
2222

2323
; Expressions
2424
expression = literal-expression
2525
/ variable-expression
2626
/ function-expression
27-
literal-expression = "{" [s] literal [s function] *(s attribute) [s] "}"
28-
variable-expression = "{" [s] variable [s function] *(s attribute) [s] "}"
29-
function-expression = "{" [s] function *(s attribute) [s] "}"
27+
literal-expression = "{" o literal [s function] *(s attribute) o "}"
28+
variable-expression = "{" o variable [s function] *(s attribute) o "}"
29+
function-expression = "{" o function *(s attribute) o "}"
3030

31-
markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone
32-
/ "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close
31+
markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}" ; open and standalone
32+
/ "{" o "/" identifier *(s option) *(s attribute) o "}" ; close
3333

3434
; Expression and literal parts
3535
function = ":" identifier *(s option)
36-
option = identifier [s] "=" [s] (literal / variable)
36+
option = identifier o "=" o (literal / variable)
3737

38-
attribute = "@" identifier [[s] "=" [s] (literal / variable)]
38+
attribute = "@" identifier [o "=" o (literal / variable)]
3939

4040
variable = "$" name
4141

@@ -52,22 +52,22 @@ match = %s".match"
5252

5353
; Names and identifiers
5454
; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
55-
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD
55+
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName but excludes U+FFFD and U+061C
5656
identifier = [namespace ":"] name
5757
namespace = name
58-
name = name-start *name-char
58+
name = [bidi] name-start *name-char [bidi]
5959
name-start = ALPHA / "_"
6060
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
61-
/ %x370-37D / %x37F-1FFF / %x200C-200D
61+
/ %x370-37D / %x37F-61B / %x61D-1FFF / %x200C-200D
6262
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
6363
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
6464
name-char = name-start / DIGIT / "-" / "."
6565
/ %xB7 / %x300-36F / %x203F-2040
6666

6767
; Restrictions on characters in various contexts
6868
simple-start-char = content-char / "@" / "|"
69-
text-char = content-char / s / "." / "@" / "|"
70-
quoted-char = content-char / s / "." / "@" / "{" / "}"
69+
text-char = content-char / ws / "." / "@" / "|"
70+
quoted-char = content-char / ws / "." / "@" / "{" / "}"
7171
content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
7272
/ %x0B-0C ; omit CR (%x0D)
7373
/ %x0E-1F ; omit SP (%x20)
@@ -83,5 +83,15 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
8383
escaped-char = backslash ( backslash / "{" / "|" / "}" )
8484
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"
8585

86-
; Whitespace
87-
s = 1*( SP / HTAB / CR / LF / %x3000 )
86+
; Required whitespace
87+
s = *bidi ws o
88+
89+
; Optional whitespace
90+
o = *(ws / bidi)
91+
92+
; Bidirectional marks and isolates
93+
; ALM / LRM / RLM / LRI, RLI, FSI & PDI
94+
bidi = %x061C / %x200E / %x200F / %x2066-2069
95+
96+
; Whitespace characters
97+
ws = SP / HTAB / CR / LF / %x3000

0 commit comments

Comments
 (0)