Skip to content

Commit 04673d2

Browse files
authored
pythongh-119786: cleanup internal docs and fix internal links (python#127485)
1 parent 1bc4f07 commit 04673d2

11 files changed

+152
-148
lines changed

InternalDocs/README.md

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# CPython Internals Documentation
32

43
The documentation in this folder is intended for CPython maintainers.

InternalDocs/adaptive.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ quality of specialization and keeping the overhead of specialization low.
9696
Specialized instructions must be fast. In order to be fast,
9797
specialized instructions should be tailored for a particular
9898
set of values that allows them to:
99+
99100
1. Verify that incoming value is part of that set with low overhead.
100101
2. Perform the operation quickly.
101102

@@ -107,9 +108,11 @@ For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
107108
dictionaries that have a keys with the expected version.
108109

109110
This can be tested quickly:
111+
110112
* `globals->keys->dk_version == expected_version`
111113

112114
and the operation can be performed quickly:
115+
113116
* `value = entries[cache->index].me_value;`.
114117

115118
Because it is impossible to measure the performance of an instruction without
@@ -122,10 +125,11 @@ base instruction.
122125
### Implementation of specialized instructions
123126

124127
In general, specialized instructions should be implemented in two parts:
128+
125129
1. A sequence of guards, each of the form
126-
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
130+
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
127131
2. The operation, which should ideally have no branches and
128-
a minimum number of dependent memory accesses.
132+
a minimum number of dependent memory accesses.
129133

130134
In practice, the parts may overlap, as data required for guards
131135
can be re-used in the operation.

InternalDocs/changing_grammar.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Below is a checklist of things that may need to change.
3232
[`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
3333
[`Python/Python-ast.c`](../Python/Python-ast.c).
3434

35-
* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
35+
* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
3636
This is where you would add a new type of comment or string literal, for example.
3737

3838
* [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
@@ -60,4 +60,4 @@ Below is a checklist of things that may need to change.
6060
to the tokenizer.
6161

6262
* Documentation must be written! Specifically, one or more of the pages in
63-
[`Doc/reference/`](../Doc/reference/) will need to be updated.
63+
[`Doc/reference/`](../Doc/reference) will need to be updated.

InternalDocs/compiler.md

+55-57
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
Compiler design
32
===============
43

@@ -7,8 +6,8 @@ Abstract
76

87
In CPython, the compilation from source code to bytecode involves several steps:
98

10-
1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
11-
and [Parser/tokenizer/](../Parser/tokenizer/).
9+
1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
10+
and [Parser/tokenizer/](../Parser/tokenizer).
1211
2. Parse the stream of tokens into an Abstract Syntax Tree
1312
[Parser/parser.c](../Parser/parser.c).
1413
3. Transform AST into an instruction sequence
@@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
134133
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
135134
initializes the *name*, *args*, *body*, and *attributes* fields.
136135

137-
See also
138-
[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
139-
by Thomas Kluyver.
136+
See also [Green Tree Snakes - The missing Python AST docs](
137+
https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.
140138

141139
Memory management
142140
=================
@@ -260,33 +258,33 @@ manually -- `generic`, `identifier` and `int`. These types are found in
260258
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
261259
Functions and macros for creating `asdl_xx_seq *` types are as follows:
262260

263-
`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
264-
Allocate memory for an `asdl_generic_seq` of the specified length
265-
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
266-
Allocate memory for an `asdl_identifier_seq` of the specified length
267-
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
268-
Allocate memory for an `asdl_int_seq` of the specified length
261+
* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
262+
Allocate memory for an `asdl_generic_seq` of the specified length
263+
* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
264+
Allocate memory for an `asdl_identifier_seq` of the specified length
265+
* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
266+
Allocate memory for an `asdl_int_seq` of the specified length
269267

270268
In addition to the three types mentioned above, some ASDL sequence types are
271269
automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
272270
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
273271
Macros for using both manually defined and automatically generated ASDL
274272
sequence types are as follows:
275273

276-
`asdl_seq_GET(asdl_xx_seq *, int)`
277-
Get item held at a specific position in an `asdl_xx_seq`
278-
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
279-
Set a specific index in an `asdl_xx_seq` to the specified value
274+
* `asdl_seq_GET(asdl_xx_seq *, int)`:
275+
Get item held at a specific position in an `asdl_xx_seq`
276+
* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
277+
Set a specific index in an `asdl_xx_seq` to the specified value
280278

281-
Untyped counterparts exist for some of the typed macros. These are useful
279+
Untyped counterparts exist for some of the typed macros. These are useful
282280
when a function needs to manipulate a generic ASDL sequence:
283281

284-
`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
285-
Get item held at a specific position in an `asdl_seq`
286-
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
287-
Set a specific index in an `asdl_seq` to the specified value
288-
`asdl_seq_LEN(asdl_seq *)`
289-
Return the length of an `asdl_seq` or `asdl_xx_seq`
282+
* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
283+
Get item held at a specific position in an `asdl_seq`
284+
* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
285+
Set a specific index in an `asdl_seq` to the specified value
286+
* `asdl_seq_LEN(asdl_seq *)`:
287+
Return the length of an `asdl_seq` or `asdl_xx_seq`
290288

291289
Note that typed macros and functions are recommended over their untyped
292290
counterparts. Typed macros carry out checks in debug mode and aid
@@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).
379377

380378
Emission of bytecode is handled by the following macros:
381379

382-
* `ADDOP(struct compiler *, location, int)`
383-
add a specified opcode
384-
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
385-
like `ADDOP`, but also exits current scope; used for adding return value
386-
opcodes in lambdas and closures
387-
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
388-
add an opcode that takes an integer argument
389-
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
390-
add an opcode with the proper argument based on the position of the
391-
specified PyObject in PyObject sequence object, but with no handling of
392-
mangled names; used for when you
393-
need to do named lookups of objects such as globals, consts, or
394-
parameters where name mangling is not possible and the scope of the
395-
name is known; *TYPE* is the name of PyObject sequence
396-
(`names` or `varnames`)
397-
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
398-
just like `ADDOP_O`, but steals a reference to PyObject
399-
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
400-
just like `ADDOP_O`, but name mangling is also handled; used for
401-
attribute loading or importing based on name
402-
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
403-
add the `LOAD_CONST` opcode with the proper argument based on the
404-
position of the specified PyObject in the consts table.
405-
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
406-
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
407-
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
408-
create a jump to a basic block
380+
* `ADDOP(struct compiler *, location, int)`:
381+
add a specified opcode
382+
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
383+
like `ADDOP`, but also exits current scope; used for adding return value
384+
opcodes in lambdas and closures
385+
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
386+
add an opcode that takes an integer argument
387+
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
388+
add an opcode with the proper argument based on the position of the
389+
specified PyObject in PyObject sequence object, but with no handling of
390+
mangled names; used for when you
391+
need to do named lookups of objects such as globals, consts, or
392+
parameters where name mangling is not possible and the scope of the
393+
name is known; *TYPE* is the name of PyObject sequence
394+
(`names` or `varnames`)
395+
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
396+
just like `ADDOP_O`, but steals a reference to PyObject
397+
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
398+
just like `ADDOP_O`, but name mangling is also handled; used for
399+
attribute loading or importing based on name
400+
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
401+
add the `LOAD_CONST` opcode with the proper argument based on the
402+
position of the specified PyObject in the consts table.
403+
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
404+
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
405+
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
406+
create a jump to a basic block
409407

410408
The `location` argument is a struct with the source location to be
411409
associated with this instruction. It is typically extracted from an
@@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
433431
bytecode. This includes transforming pseudo instructions into actual instructions,
434432
converting jump targets from logical labels to relative offsets, and
435433
construction of the [exception table](exception_handling.md) and
436-
[locations table](locations.md).
434+
[locations table](code_objects.md#source-code-locations).
437435
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
438436
metadata, including the `consts` and `names` arrays, information about function
439437
reference to the source code (filename, etc). All of this is implemented by
@@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
453451
Important files
454452
===============
455453

456-
* [Parser/](../Parser/)
454+
* [Parser/](../Parser)
457455

458456
* [Parser/Python.asdl](../Parser/Python.asdl):
459457
ASDL syntax file.
@@ -534,7 +532,7 @@ Important files
534532
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
535533
A data structure representing a sequence of bytecode-like pseudo-instructions.
536534

537-
* [Include/](../Include/)
535+
* [Include/](../Include)
538536

539537
* [Include/cpython/code.h](../Include/cpython/code.h)
540538
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
@@ -556,7 +554,7 @@ Important files
556554
: Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).
557555

558556
* [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
559-
: Header for [Python/symtable.c](../Python/symtable.c).
557+
: Header for [Python/symtable.c](../Python/symtable.c).
560558
`struct symtable` and `PySTEntryObject` are defined here.
561559

562560
* [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
@@ -570,7 +568,7 @@ Important files
570568
by
571569
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
572570

573-
* [Objects/](../Objects/)
571+
* [Objects/](../Objects)
574572

575573
* [Objects/codeobject.c](../Objects/codeobject.c)
576574
: Contains PyCodeObject-related code.
@@ -579,7 +577,7 @@ Important files
579577
: Contains the `frame_setlineno()` function which should determine whether it is allowed
580578
to make a jump between two points in a bytecode.
581579

582-
* [Lib/](../Lib/)
580+
* [Lib/](../Lib)
583581

584582
* [Lib/opcode.py](../Lib/opcode.py)
585583
: opcode utilities exposed to Python.
@@ -591,7 +589,7 @@ Important files
591589
Objects
592590
=======
593591

594-
* [Locations](locations.md): Describes the location table
592+
* [Locations](code_objects.md#source-code-locations): Describes the location table
595593
* [Frames](frames.md): Describes frames and the frame stack
596594
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
597595
* [Exception Handling](exception_handling.md): Describes the exception table

InternalDocs/exception_handling.md

+34-31
Original file line numberDiff line numberDiff line change
@@ -87,10 +87,10 @@ offset of the raising instruction should be pushed to the stack.
8787
Handling an exception, once an exception table entry is found, consists
8888
of the following steps:
8989

90-
1. pop values from the stack until it matches the stack depth for the handler.
91-
2. if `lasti` is true, then push the offset that the exception was raised at.
92-
3. push the exception to the stack.
93-
4. jump to the target offset and resume execution.
90+
1. pop values from the stack until it matches the stack depth for the handler.
91+
2. if `lasti` is true, then push the offset that the exception was raised at.
92+
3. push the exception to the stack.
93+
4. jump to the target offset and resume execution.
9494

9595

9696
Reraising Exceptions and `lasti`
@@ -107,13 +107,12 @@ Format of the exception table
107107
-----------------------------
108108

109109
Conceptually, the exception table consists of a sequence of 5-tuples:
110-
```
111-
1. `start-offset` (inclusive)
112-
2. `end-offset` (exclusive)
113-
3. `target`
114-
4. `stack-depth`
115-
5. `push-lasti` (boolean)
116-
```
110+
111+
1. `start-offset` (inclusive)
112+
2. `end-offset` (exclusive)
113+
3. `target`
114+
4. `stack-depth`
115+
5. `push-lasti` (boolean)
117116

118117
All offsets and lengths are in code units, not bytes.
119118

@@ -123,18 +122,19 @@ For it to be searchable quickly, we need to support binary search giving us log(
123122
Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry.
124123

125124
It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as:
126-
`start, size, target, depth, push-lasti`.
125+
`start, size, target, depth, push-lasti`.
127126

128127
Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes.
129128
It also happens that depth is generally quite small.
130129

131130
So, we need to encode:
131+
132132
```
133-
`start` (up to 30 bits)
134-
`size` (up to 30 bits)
135-
`target` (up to 30 bits)
136-
`depth` (up to ~8 bits)
137-
`lasti` (1 bit)
133+
start (up to 30 bits)
134+
size (up to 30 bits)
135+
target (up to 30 bits)
136+
depth (up to ~8 bits)
137+
lasti (1 bit)
138138
```
139139

140140
We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
@@ -145,29 +145,32 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the
145145
In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.
146146

147147
For example, the exception entry:
148+
148149
```
149-
`start`: 20
150-
`end`: 28
151-
`target`: 100
152-
`depth`: 3
153-
`lasti`: False
150+
start: 20
151+
end: 28
152+
target: 100
153+
depth: 3
154+
lasti: False
154155
```
155156

156157
is encoded by first converting to the more compact four value form:
158+
157159
```
158-
`start`: 20
159-
`size`: 8
160-
`target`: 100
161-
`depth<<1+lasti`: 6
160+
start: 20
161+
size: 8
162+
target: 100
163+
depth<<1+lasti: 6
162164
```
163165

164166
which is then encoded as:
167+
165168
```
166-
148 (MSB + 20 for start)
167-
8 (size)
168-
65 (Extend bit + 1)
169-
36 (Remainder of target, 100 == (1<<6)+36)
170-
6
169+
148 (MSB + 20 for start)
170+
8 (size)
171+
65 (Extend bit + 1)
172+
36 (Remainder of target, 100 == (1<<6)+36)
173+
6
171174
```
172175

173176
for a total of five bytes.

InternalDocs/frames.md

+1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ objects, so are not allocated in the per-thread stack. See `PyGenObject` in
2727
## Layout
2828

2929
Each activation record is laid out as:
30+
3031
* Specials
3132
* Locals
3233
* Stack

InternalDocs/garbage_collector.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
Garbage collector design
32
========================
43

@@ -117,7 +116,7 @@ general, the collection of all objects tracked by GC is partitioned into disjoin
117116
doubly linked list. Between collections, objects are partitioned into "generations", reflecting how
118117
often they've survived collection attempts. During collections, the generation(s) being collected
119118
are further partitioned into, for example, sets of reachable and unreachable objects. Doubly linked lists
120-
support moving an object from one partition to another, adding a new object, removing an object
119+
support moving an object from one partition to another, adding a new object, removing an object
121120
entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC
122121
isn't running at all!), and merging partitions, all with a small constant number of pointer updates.
123122
With care, they also support iterating over a partition while objects are being added to - and

InternalDocs/generators.md

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
Generators
32
==========
43

InternalDocs/interpreter.md

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
The bytecode interpreter
32
========================
43

0 commit comments

Comments
 (0)