1
-
2
1
Compiler design
3
2
===============
4
3
7
6
8
7
In CPython, the compilation from source code to bytecode involves several steps:
9
8
10
- 1 . Tokenize the source code [ Parser/lexer/] ( ../Parser/lexer/ )
11
- and [ Parser/tokenizer/] ( ../Parser/tokenizer/ ) .
9
+ 1 . Tokenize the source code [ Parser/lexer/] ( ../Parser/lexer )
10
+ and [ Parser/tokenizer/] ( ../Parser/tokenizer ) .
12
11
2 . Parse the stream of tokens into an Abstract Syntax Tree
13
12
[ Parser/parser.c] ( ../Parser/parser.c ) .
14
13
3 . Transform AST into an instruction sequence
@@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
134
133
` FunctionDef() ` constructor function sets 'kind' to ` FunctionDef_kind ` and
135
134
initializes the * name* , * args* , * body* , and * attributes* fields.
136
135
137
- See also
138
- [ Green Tree Snakes - The missing Python AST docs] ( https://greentreesnakes.readthedocs.io/en/latest )
139
- by Thomas Kluyver.
136
+ See also [ Green Tree Snakes - The missing Python AST docs] (
137
+ https://greentreesnakes.readthedocs.io/en/latest ) by Thomas Kluyver.
140
138
141
139
Memory management
142
140
=================
@@ -260,33 +258,33 @@ manually -- `generic`, `identifier` and `int`. These types are found in
260
258
[ Include/internal/pycore_asdl.h] ( ../Include/internal/pycore_asdl.h ) .
261
259
Functions and macros for creating ` asdl_xx_seq * ` types are as follows:
262
260
263
- ` _Py_asdl_generic_seq_new(Py_ssize_t, PyArena *) `
264
- Allocate memory for an ` asdl_generic_seq ` of the specified length
265
- ` _Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *) `
266
- Allocate memory for an ` asdl_identifier_seq ` of the specified length
267
- ` _Py_asdl_int_seq_new(Py_ssize_t, PyArena *) `
268
- Allocate memory for an ` asdl_int_seq ` of the specified length
261
+ * ` _Py_asdl_generic_seq_new(Py_ssize_t, PyArena *) ` :
262
+ Allocate memory for an ` asdl_generic_seq ` of the specified length
263
+ * ` _Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *) ` :
264
+ Allocate memory for an ` asdl_identifier_seq ` of the specified length
265
+ * ` _Py_asdl_int_seq_new(Py_ssize_t, PyArena *) ` :
266
+ Allocate memory for an ` asdl_int_seq ` of the specified length
269
267
270
268
In addition to the three types mentioned above, some ASDL sequence types are
271
269
automatically generated by [ Parser/asdl_c.py] ( ../Parser/asdl_c.py ) and found in
272
270
[ Include/internal/pycore_ast.h] ( ../Include/internal/pycore_ast.h ) .
273
271
Macros for using both manually defined and automatically generated ASDL
274
272
sequence types are as follows:
275
273
276
- ` asdl_seq_GET(asdl_xx_seq *, int) `
277
- Get item held at a specific position in an ` asdl_xx_seq `
278
- ` asdl_seq_SET(asdl_xx_seq *, int, stmt_ty) `
279
- Set a specific index in an ` asdl_xx_seq ` to the specified value
274
+ * ` asdl_seq_GET(asdl_xx_seq *, int) ` :
275
+ Get item held at a specific position in an ` asdl_xx_seq `
276
+ * ` asdl_seq_SET(asdl_xx_seq *, int, stmt_ty) ` :
277
+ Set a specific index in an ` asdl_xx_seq ` to the specified value
280
278
281
- Untyped counterparts exist for some of the typed macros. These are useful
279
+ Untyped counterparts exist for some of the typed macros. These are useful
282
280
when a function needs to manipulate a generic ASDL sequence:
283
281
284
- ` asdl_seq_GET_UNTYPED(asdl_seq *, int) `
285
- Get item held at a specific position in an ` asdl_seq `
286
- ` asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty) `
287
- Set a specific index in an ` asdl_seq ` to the specified value
288
- ` asdl_seq_LEN(asdl_seq *) `
289
- Return the length of an ` asdl_seq ` or ` asdl_xx_seq `
282
+ * ` asdl_seq_GET_UNTYPED(asdl_seq *, int) ` :
283
+ Get item held at a specific position in an ` asdl_seq `
284
+ * ` asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty) ` :
285
+ Set a specific index in an ` asdl_seq ` to the specified value
286
+ * ` asdl_seq_LEN(asdl_seq *) ` :
287
+ Return the length of an ` asdl_seq ` or ` asdl_xx_seq `
290
288
291
289
Note that typed macros and functions are recommended over their untyped
292
290
counterparts. Typed macros carry out checks in debug mode and aid
@@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).
379
377
380
378
Emission of bytecode is handled by the following macros:
381
379
382
- * ` ADDOP(struct compiler *, location, int) `
383
- add a specified opcode
384
- * ` ADDOP_IN_SCOPE(struct compiler *, location, int) `
385
- like ` ADDOP ` , but also exits current scope; used for adding return value
386
- opcodes in lambdas and closures
387
- * ` ADDOP_I(struct compiler *, location, int, Py_ssize_t) `
388
- add an opcode that takes an integer argument
389
- * ` ADDOP_O(struct compiler *, location, int, PyObject *, TYPE) `
390
- add an opcode with the proper argument based on the position of the
391
- specified PyObject in PyObject sequence object, but with no handling of
392
- mangled names; used for when you
393
- need to do named lookups of objects such as globals, consts, or
394
- parameters where name mangling is not possible and the scope of the
395
- name is known; * TYPE* is the name of PyObject sequence
396
- (` names ` or ` varnames ` )
397
- * ` ADDOP_N(struct compiler *, location, int, PyObject *, TYPE) `
398
- just like ` ADDOP_O ` , but steals a reference to PyObject
399
- * ` ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE) `
400
- just like ` ADDOP_O ` , but name mangling is also handled; used for
401
- attribute loading or importing based on name
402
- * ` ADDOP_LOAD_CONST(struct compiler *, location, PyObject *) `
403
- add the ` LOAD_CONST ` opcode with the proper argument based on the
404
- position of the specified PyObject in the consts table.
405
- * ` ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *) `
406
- just like ` ADDOP_LOAD_CONST_NEW ` , but steals a reference to PyObject
407
- * ` ADDOP_JUMP(struct compiler *, location, int, basicblock *) `
408
- create a jump to a basic block
380
+ * ` ADDOP(struct compiler *, location, int) ` :
381
+ add a specified opcode
382
+ * ` ADDOP_IN_SCOPE(struct compiler *, location, int) ` :
383
+ like ` ADDOP ` , but also exits current scope; used for adding return value
384
+ opcodes in lambdas and closures
385
+ * ` ADDOP_I(struct compiler *, location, int, Py_ssize_t) ` :
386
+ add an opcode that takes an integer argument
387
+ * ` ADDOP_O(struct compiler *, location, int, PyObject *, TYPE) ` :
388
+ add an opcode with the proper argument based on the position of the
389
+ specified PyObject in PyObject sequence object, but with no handling of
390
+ mangled names; used for when you
391
+ need to do named lookups of objects such as globals, consts, or
392
+ parameters where name mangling is not possible and the scope of the
393
+ name is known; * TYPE* is the name of PyObject sequence
394
+ (` names ` or ` varnames ` )
395
+ * ` ADDOP_N(struct compiler *, location, int, PyObject *, TYPE) ` :
396
+ just like ` ADDOP_O ` , but steals a reference to PyObject
397
+ * ` ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE) ` :
398
+ just like ` ADDOP_O ` , but name mangling is also handled; used for
399
+ attribute loading or importing based on name
400
+ * ` ADDOP_LOAD_CONST(struct compiler *, location, PyObject *) ` :
401
+ add the ` LOAD_CONST ` opcode with the proper argument based on the
402
+ position of the specified PyObject in the consts table.
403
+ * ` ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *) ` :
404
+ just like ` ADDOP_LOAD_CONST_NEW ` , but steals a reference to PyObject
405
+ * ` ADDOP_JUMP(struct compiler *, location, int, basicblock *) ` :
406
+ create a jump to a basic block
409
407
410
408
The ` location ` argument is a struct with the source location to be
411
409
associated with this instruction. It is typically extracted from an
@@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
433
431
bytecode. This includes transforming pseudo instructions into actual instructions,
434
432
converting jump targets from logical labels to relative offsets, and
435
433
construction of the [ exception table] ( exception_handling.md ) and
436
- [ locations table] ( locations .md) .
434
+ [ locations table] ( code_objects .md#source-code-locations ) .
437
435
The bytecode and tables are then wrapped into a ` PyCodeObject ` along with additional
438
436
metadata, including the ` consts ` and ` names ` arrays, information about function
439
437
reference to the source code (filename, etc). All of this is implemented by
@@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
453
451
Important files
454
452
===============
455
453
456
- * [ Parser/] ( ../Parser/ )
454
+ * [ Parser/] ( ../Parser )
457
455
458
456
* [ Parser/Python.asdl] ( ../Parser/Python.asdl ) :
459
457
ASDL syntax file.
@@ -534,7 +532,7 @@ Important files
534
532
* [ Python/instruction_sequence.c] ( ../Python/instruction_sequence.c ) :
535
533
A data structure representing a sequence of bytecode-like pseudo-instructions.
536
534
537
- * [ Include/] ( ../Include/ )
535
+ * [ Include/] ( ../Include )
538
536
539
537
* [ Include/cpython/code.h] ( ../Include/cpython/code.h )
540
538
: Header file for [ Objects/codeobject.c] ( ../Objects/codeobject.c ) ;
@@ -556,7 +554,7 @@ Important files
556
554
: Declares ` _PyAST_Validate() ` external (from [ Python/ast.c] ( ../Python/ast.c ) ).
557
555
558
556
* [ Include/internal/pycore_symtable.h] ( ../Include/internal/pycore_symtable.h )
559
- : Header for [ Python/symtable.c] ( ../Python/symtable.c ) .
557
+ : Header for [ Python/symtable.c] ( ../Python/symtable.c ) .
560
558
` struct symtable ` and ` PySTEntryObject ` are defined here.
561
559
562
560
* [ Include/internal/pycore_parser.h] ( ../Include/internal/pycore_parser.h )
@@ -570,7 +568,7 @@ Important files
570
568
by
571
569
[ Tools/cases_generator/opcode_id_generator.py] ( ../Tools/cases_generator/opcode_id_generator.py ) .
572
570
573
- * [ Objects/] ( ../Objects/ )
571
+ * [ Objects/] ( ../Objects )
574
572
575
573
* [ Objects/codeobject.c] ( ../Objects/codeobject.c )
576
574
: Contains PyCodeObject-related code.
@@ -579,7 +577,7 @@ Important files
579
577
: Contains the ` frame_setlineno() ` function which should determine whether it is allowed
580
578
to make a jump between two points in a bytecode.
581
579
582
- * [ Lib/] ( ../Lib/ )
580
+ * [ Lib/] ( ../Lib )
583
581
584
582
* [ Lib/opcode.py] ( ../Lib/opcode.py )
585
583
: opcode utilities exposed to Python.
@@ -591,7 +589,7 @@ Important files
591
589
Objects
592
590
=======
593
591
594
- * [ Locations] ( locations .md) : Describes the location table
592
+ * [ Locations] ( code_objects .md#source-code-locations ) : Describes the location table
595
593
* [ Frames] ( frames.md ) : Describes frames and the frame stack
596
594
* [ Objects/object_layout.md] ( ../Objects/object_layout.md ) : Describes object layout for 3.11 and later
597
595
* [ Exception Handling] ( exception_handling.md ) : Describes the exception table
0 commit comments