Skip to content

Commit 9830051

Browse files
committed
auto merge of #18441 : mdinger/rust/literals, r=steveklabnik
Closes #18415 This links [`std::str`](http://doc.rust-lang.org/std/str/index.html) documentation to [literals](http://doc.rust-lang.org/reference.html#literals) in the reference guide and collects examples of literals into one group at the beginning of the section. ~~The new tables are not exhaustive (some escapes were skipped) and so I try to link back to the respective sections where more detail is located.~~ The tables are are mostly exhaustive. I misunderstood some of the whitespace codes. I don't think the tables actually look that nice if that's important and I'm not sure how it could be improved. I think it does do a good job of collecting available options together. I think listing the escapes together is particularly helpful because they vary with type and are embedded in paragraphs. [EDIT] The [ascii table](http://man-ascii.com/) is here and may be useful.
2 parents 770378a + 16bb4e6 commit 9830051

File tree

2 files changed

+63
-3
lines changed

2 files changed

+63
-3
lines changed

src/doc/reference.md

+60
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,52 @@ reserved for future extension, that is, the above gives the lexical
225225
grammar, but a Rust parser will reject everything but the 12 special
226226
cases mentioned in [Number literals](#number-literals) below.
227227

228+
#### Examples
229+
230+
##### Characters and strings
231+
232+
| | Example | Number of `#` pairs allowed | Available characters | Escapes | Equivalent to |
233+
|---|---------|-----------------------------|----------------------|---------|---------------|
234+
| [Character](#character-literals) | `'H'` | `N/A` | All unicode | `\'` & [Byte escapes](#byte-escapes) & [Unicode escapes](#unicode-escapes) | `N/A` |
235+
| [String](#string-literals) | `"hello"` | `N/A` | All unicode | `\"` & [Byte escapes](#byte-escapes) & [Unicode escapes](#unicode-escapes) | `N/A` |
236+
| [Raw](#raw-string-literals) | `r##"hello"##` | `0...` | All unicode | `N/A` | `N/A` |
237+
| [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | `\'` & [Byte escapes](#byte-escapes) | `u8` |
238+
| [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | `\"` & [Byte escapes](#byte-escapes) | `&'static [u8]` |
239+
| [Raw byte string](#raw-byte-string-literals) | `br##"hello"##` | `0...` | All ASCII | `N/A` | `&'static [u8]` (unsure...not stated) |
240+
241+
##### Byte escapes
242+
243+
| | Name |
244+
|---|------|
245+
| `\x7F` | 8-bit character code (exactly 2 digits) |
246+
| `\n` | Newline |
247+
| `\r` | Carriage return |
248+
| `\t` | Tab |
249+
| `\\` | Backslash |
250+
251+
##### Unicode escapes
252+
| | Name |
253+
|---|------|
254+
| `\u7FFF` | 16-bit character code (exactly 4 digits) |
255+
| `\U7EEEFFFF` | 32-bit character code (exactly 8 digits) |
256+
257+
##### Numbers
258+
259+
| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
260+
|----------------------------------------|---------|----------------|----------|
261+
| Decimal integer | `98_222i` | `N/A` | Integer suffixes |
262+
| Hex integer | `0xffi` | `N/A` | Integer suffixes |
263+
| Octal integer | `0o77i` | `N/A` | Integer suffixes |
264+
| Binary integer | `0b1111_0000i` | `N/A` | Integer suffixes |
265+
| Floating-point | `123.0E+77f64` | `Optional` | Floating-point suffixes |
266+
267+
`*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
268+
269+
##### Suffixes
270+
| Integer | Floating-point |
271+
|---------|----------------|
272+
| `i` (`int`), `u` (`uint`), `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64` | `f32`, `f64` |
273+
228274
#### Character and string literals
229275

230276
```{.ebnf .gram}
@@ -253,15 +299,21 @@ nonzero_dec: '1' | '2' | '3' | '4'
253299
| '5' | '6' | '7' | '8' | '9' ;
254300
```
255301

302+
##### Character literals
303+
256304
A _character literal_ is a single Unicode character enclosed within two
257305
`U+0027` (single-quote) characters, with the exception of `U+0027` itself,
258306
which must be _escaped_ by a preceding U+005C character (`\`).
259307

308+
##### String literals
309+
260310
A _string literal_ is a sequence of any Unicode characters enclosed within two
261311
`U+0022` (double-quote) characters, with the exception of `U+0022` itself,
262312
which must be _escaped_ by a preceding `U+005C` character (`\`), or a _raw
263313
string literal_.
264314

315+
##### Character escapes
316+
265317
Some additional _escapes_ are available in either character or non-raw string
266318
literals. An escape starts with a `U+005C` (`\`) and continues with one of the
267319
following forms:
@@ -281,6 +333,8 @@ following forms:
281333
* The _backslash escape_ is the character `U+005C` (`\`) which must be
282334
escaped in order to denote *itself*.
283335

336+
##### Raw string literals
337+
284338
Raw string literals do not process any escapes. They start with the character
285339
`U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
286340
`U+0022` (double-quote) character. The _raw string body_ is not defined in the
@@ -322,12 +376,16 @@ raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
322376
323377
```
324378

379+
##### Byte literals
380+
325381
A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
326382
range) enclosed within two `U+0027` (single-quote) characters, with the
327383
exception of `U+0027` itself, which must be _escaped_ by a preceding U+005C
328384
character (`\`), or a single _escape_. It is equivalent to a `u8` unsigned
329385
8-bit integer _number literal_.
330386

387+
##### Byte string literals
388+
331389
A _byte string literal_ is a sequence of ASCII characters and _escapes_
332390
enclosed within two `U+0022` (double-quote) characters, with the exception of
333391
`U+0022` itself, which must be _escaped_ by a preceding `U+005C` character
@@ -347,6 +405,8 @@ following forms:
347405
* The _backslash escape_ is the character `U+005C` (`\`) which must be
348406
escaped in order to denote its ASCII encoding `0x5C`.
349407

408+
##### Raw byte string literals
409+
350410
Raw byte string literals do not process any escapes. They start with the
351411
character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
352412
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The

src/libcollections/str.rs

+3-3
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,9 @@
4242
//! # Representation
4343
//!
4444
//! Rust's string type, `str`, is a sequence of Unicode scalar values encoded as a
45-
//! stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
46-
//! sequences. Additionally, strings are not null-terminated and can thus contain
47-
//! null bytes.
45+
//! stream of UTF-8 bytes. All [strings](../../reference.html#literals) are
46+
//! guaranteed to be validly encoded UTF-8 sequences. Additionally, strings are
47+
//! not null-terminated and can thus contain null bytes.
4848
//!
4949
//! The actual representation of strings have direct mappings to slices: `&str`
5050
//! is the same as `&[u8]`.

0 commit comments

Comments
 (0)