cpython/Tools/cases_generator
mpage 09c240f20c
gh-115999: Specialize LOAD_GLOBAL in free-threaded builds (#126607)
Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:

Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
2024-11-21 11:22:21 -08:00
..
_typing_backports.py gh-104504: cases generator: Add --warn-unreachable to the mypy config (#108112) 2023-08-21 00:40:41 +01:00
analyzer.py gh-115999: Specialize LOAD_GLOBAL in free-threaded builds (#126607) 2024-11-21 11:22:21 -08:00
cwriter.py GH-119866: Spill the stack around escaping calls. (GH-124392) 2024-10-07 14:56:39 +01:00
generators_common.py gh-118423: Add INSTRUCTION_SIZE macro to code generator (GH-125467) 2024-10-29 17:25:05 +00:00
interpreter_definition.md gh-118423: Add INSTRUCTION_SIZE macro to code generator (GH-125467) 2024-10-29 17:25:05 +00:00
lexer.py GH-119866: Spill the stack around escaping calls. (GH-124392) 2024-10-07 14:56:39 +01:00
mypy.ini GH-111485: Separate out parsing, analysis and code-gen phases of tier 1 code generator (GH-112299) 2023-12-07 12:49:40 +00:00
opcode_id_generator.py GH-122390: Replace _Py_GetbaseOpcode with _Py_GetBaseCodeUnit (GH-122942) 2024-08-13 14:22:57 +01:00
opcode_metadata_generator.py gh-124285: Fix bug where bool() is called multiple times for the same part of a boolean expression (#124394) 2024-09-25 15:51:25 +01:00
optimizer_generator.py gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846) 2024-11-09 11:35:33 +08:00
parser.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
parsing.py gh-124285: Fix bug where bool() is called multiple times for the same part of a boolean expression (#124394) 2024-09-25 15:51:25 +01:00
plexer.py gh-106812: Refactor cases_generator to allow uops with array stack effects (#107564) 2023-08-04 09:35:56 -07:00
py_metadata_generator.py GH-120024: Tidy up case generator code a bit. (GH-122780) 2024-08-08 10:57:59 +01:00
README.md Rename tier 2 redundancy eliminator to optimizer (#115888) 2024-02-26 08:42:53 -08:00
stack.py GH-119866: Spill the stack around escaping calls. (GH-124392) 2024-10-07 14:56:39 +01:00
target_generator.py GH-120024: Tidy up case generator code a bit. (GH-122780) 2024-08-08 10:57:59 +01:00
tier1_generator.py gh-118423: Add INSTRUCTION_SIZE macro to code generator (GH-125467) 2024-10-29 17:25:05 +00:00
tier2_generator.py gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846) 2024-11-09 11:35:33 +08:00
uop_id_generator.py gh-120417: Remove unused imports in cases_generator (#120622) 2024-06-17 21:58:56 +02:00
uop_metadata_generator.py GH-126222: Fix _PyUop_num_popped (GH-126507) 2024-11-07 10:48:27 +00:00

Tooling to generate interpreters

Documentation for the instruction definitions in Python/bytecodes.c ("the DSL") is here.

What's currently here:

  • analyzer.py: code for converting AST generated by Parser to more high-level structure for easier interaction
  • lexer.py: lexer for C, originally written by Mark Shannon
  • plexer.py: OO interface on top of lexer.py; main class: PLexer
  • parsing.py: Parser for instruction definition DSL; main class: Parser
  • parser.py helper for interactions with parsing.py
  • tierN_generator.py: a couple of driver scripts to read Python/bytecodes.c and write Python/generated_cases.c.h (and several other files)
  • optimizer_generator.py: reads Python/bytecodes.c and Python/optimizer_bytecodes.c and writes Python/optimizer_cases.c.h
  • stack.py: code to handle generalized stack effects
  • cwriter.py: code which understands tokens and how to format C code; main class: CWriter
  • generators_common.py: helpers for generators
  • opcode_id_generator.py: generate a list of opcodes and write them to Include/opcode_ids.h
  • opcode_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_opcode_metadata.h
  • py_metadata_generator.py: reads the instruction definitions and write the metadata to Lib/_opcode_metadata.py
  • target_generator.py: generate targets for computed goto dispatch and write them to Python/opcode_targets.h
  • uop_id_generator.py: generate a list of uop IDs and write them to Include/internal/pycore_uop_ids.h
  • uop_metadata_generator.py: reads the instruction definitions and write the metadata to Include/internal/pycore_uop_metadata.h

Note that there is some dummy C code at the top and bottom of Python/bytecodes.c to fool text editors like VS Code into believing this is valid C code.

A bit about the parser

The parser class uses a pretty standard recursive descent scheme, but with unlimited backtracking. The PLexer class tokenizes the entire input before parsing starts. We do not run the C preprocessor. Each parsing method returns either an AST node (a Node instance) or None, or raises SyntaxError (showing the error in the C source).

Most parsing methods are decorated with @contextual, which automatically resets the tokenizer input position when None is returned. Parsing methods may also raise SyntaxError, which is irrecoverable. When a parsing method returns None, it is possible that after backtracking a different parsing method returns a valid AST.

Neither the lexer nor the parsers are complete or fully correct. Most known issues are tersely indicated by # TODO: comments. We plan to fix issues as they become relevant.