Skip to content

Commit 45bb5ba

Browse files
encukouAA-Turner
andauthored
gh-127833: Add links to token types to the lexical analysis intro (#131468)
Co-authored-by: Adam Turner <[email protected]>
1 parent 4c914e7 commit 45bb5ba

File tree

1 file changed

+39
-23
lines changed

1 file changed

+39
-23
lines changed

Doc/reference/lexical_analysis.rst

+39-23
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,11 @@ Logical lines
3535

3636
.. index:: logical line, physical line, line joining, NEWLINE token
3737

38-
The end of a logical line is represented by the token NEWLINE. Statements
39-
cannot cross logical line boundaries except where NEWLINE is allowed by the
40-
syntax (e.g., between statements in compound statements). A logical line is
41-
constructed from one or more *physical lines* by following the explicit or
42-
implicit *line joining* rules.
38+
The end of a logical line is represented by the token :data:`~token.NEWLINE`.
39+
Statements cannot cross logical line boundaries except where :data:`!NEWLINE`
40+
is allowed by the syntax (e.g., between statements in compound statements).
41+
A logical line is constructed from one or more *physical lines* by following
42+
the explicit or implicit *line joining* rules.
4343

4444

4545
.. _physical-lines:
@@ -160,11 +160,12 @@ Blank lines
160160
.. index:: single: blank line
161161

162162
A logical line that contains only spaces, tabs, formfeeds and possibly a
163-
comment, is ignored (i.e., no NEWLINE token is generated). During interactive
164-
input of statements, handling of a blank line may differ depending on the
165-
implementation of the read-eval-print loop. In the standard interactive
166-
interpreter, an entirely blank logical line (i.e. one containing not even
167-
whitespace or a comment) terminates a multi-line statement.
163+
comment, is ignored (i.e., no :data:`~token.NEWLINE` token is generated).
164+
During interactive input of statements, handling of a blank line may differ
165+
depending on the implementation of the read-eval-print loop.
166+
In the standard interactive interpreter, an entirely blank logical line (that
167+
is, one containing not even whitespace or a comment) terminates a multi-line
168+
statement.
168169

169170

170171
.. _indentation:
@@ -202,19 +203,20 @@ the space count to zero).
202203

203204
.. index:: INDENT token, DEDENT token
204205

205-
The indentation levels of consecutive lines are used to generate INDENT and
206-
DEDENT tokens, using a stack, as follows.
206+
The indentation levels of consecutive lines are used to generate
207+
:data:`~token.INDENT` and :data:`~token.DEDENT` tokens, using a stack,
208+
as follows.
207209

208210
Before the first line of the file is read, a single zero is pushed on the stack;
209211
this will never be popped off again. The numbers pushed on the stack will
210212
always be strictly increasing from bottom to top. At the beginning of each
211213
logical line, the line's indentation level is compared to the top of the stack.
212214
If it is equal, nothing happens. If it is larger, it is pushed on the stack, and
213-
one INDENT token is generated. If it is smaller, it *must* be one of the
215+
one :data:`!INDENT` token is generated. If it is smaller, it *must* be one of the
214216
numbers occurring on the stack; all numbers on the stack that are larger are
215-
popped off, and for each number popped off a DEDENT token is generated. At the
216-
end of the file, a DEDENT token is generated for each number remaining on the
217-
stack that is larger than zero.
217+
popped off, and for each number popped off a :data:`!DEDENT` token is generated.
218+
At the end of the file, a :data:`!DEDENT` token is generated for each number
219+
remaining on the stack that is larger than zero.
218220

219221
Here is an example of a correctly (though confusingly) indented piece of Python
220222
code::
@@ -254,20 +256,34 @@ Whitespace between tokens
254256
Except at the beginning of a logical line or in string literals, the whitespace
255257
characters space, tab and formfeed can be used interchangeably to separate
256258
tokens. Whitespace is needed between two tokens only if their concatenation
257-
could otherwise be interpreted as a different token (e.g., ab is one token, but
258-
a b is two tokens).
259+
could otherwise be interpreted as a different token. For example, ``ab`` is one
260+
token, but ``a b`` is two tokens. However, ``+a`` and ``+ a`` both produce
261+
two tokens, ``+`` and ``a``, as ``+a`` is not a valid token.
262+
263+
264+
.. _endmarker-token:
265+
266+
End marker
267+
----------
268+
269+
At the end of non-interactive input, the lexical analyzer generates an
270+
:data:`~token.ENDMARKER` token.
259271

260272

261273
.. _other-tokens:
262274

263275
Other tokens
264276
============
265277

266-
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist:
267-
*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace
268-
characters (other than line terminators, discussed earlier) are not tokens, but
269-
serve to delimit tokens. Where ambiguity exists, a token comprises the longest
270-
possible string that forms a legal token, when read from left to right.
278+
Besides :data:`~token.NEWLINE`, :data:`~token.INDENT` and :data:`~token.DEDENT`,
279+
the following categories of tokens exist:
280+
*identifiers* and *keywords* (:data:`~token.NAME`), *literals* (such as
281+
:data:`~token.NUMBER` and :data:`~token.STRING`), and other symbols
282+
(*operators* and *delimiters*, :data:`~token.OP`).
283+
Whitespace characters (other than logical line terminators, discussed earlier)
284+
are not tokens, but serve to delimit tokens.
285+
Where ambiguity exists, a token comprises the longest possible string that
286+
forms a legal token, when read from left to right.
271287

272288

273289
.. _identifiers:

0 commit comments

Comments
 (0)