This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.
Overall, this improves performance about 2% in the free threading
build.
This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` because
the macro requires that the tag bits of the arguments match, which is
only true in certain special cases.
Enable specialization of LOAD_GLOBAL in free-threaded builds.
Thread-safety of specialization in free-threaded builds is provided by the following:
A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.
Thread-safety of specialized instructions in free-threaded builds is provided by the following:
Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.
* Mark almost all reachable objects before doing collection phase
* Add stats for objects marked
* Visit new frames before each increment
* Remove lazy dict tracking
* Update docs
* Clearer calculation of work to do.
This unifies the code for nodejs and the code for the browser. After this
commit, the browser example doesn't work; this will be fixed in a
subsequent update.
Fixes a bug where pygettext would attempt
to extract a message from a code like this:
def _(x): pass
This is because pygettext only looks at one
token at a time and '_(x)' looks like a
function call.
However, since 'x' is not a string literal,
it would erroneously issue a warning.
Add emscripten.py script to automate emscripten build.
This is modeled heavily on `Tools/wasm/wasi.py`. This will form the basis of an Emscripten build bot.
Eventually wasm32-wasi will represent WASI 1.0, and so it's currently deprecated so it can be used for that eventual purpose. wasm32-wasip1 is also more specific to what version of WASI is currently supported.
---------
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Move creation of a tuple for var-positional parameter out of
_PyArg_UnpackKeywordsWithVararg().
Merge _PyArg_UnpackKeywordsWithVararg() with _PyArg_UnpackKeywords().
Add a new parameter in _PyArg_UnpackKeywords().
The "parameters" and "converters" attributes of ParseArgsCodeGen no
longer contain the var-positional parameter. It is now available as the
"varpos" attribute. Optimize code generation for var-positional
parameter and reuse the same generating code for functions with and without
keyword parameters.
Add special converters for var-positional parameter. "tuple" represents it as
a Python tuple and "array" represents it as a continuous array of PyObject*.
"object" is a temporary alias of "tuple".
The primary objective here is to allow some later changes to be cleaner. Mostly this involves renaming things and moving a few things around.
* CrossInterpreterData -> XIData
* crossinterpdatafunc -> xidatafunc
* split out pycore_crossinterp_data_registry.h
* add _PyXIData_lookup_t
Fix the gdb pretty printer in the face of --enable-shared by delaying the attempt to load the _PyInterpreterFrame definition until after .so files are loaded.
Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.
Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.
Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
* Remove references to `Modules/_blake2`.
* Remove `Modules/_blake2` entry from CODEOWNERS
The folder does not exist anymore.
* Remove `Modules/_blake2` entry from `Tools/c-analyzer/TODO`
The cases generator inserts code to save and restore the stack pointer around
statements that contain escaping calls. To find the beginning of such statements,
we would walk backwards from the escaping call until we encountered a token that
was treated as a statement terminator. This set of terminators should include
preprocessor directives.
Avoid temporary tuple creation when all arguments either positional-only
or vararg.
Objects/setobject.c and Modules/gcmodule.c adapted. This fixes slight
performance regression for set methods, introduced by gh-115112.
These consist of a number of short snippets that help identify scaling
bottlenecks in the free threaded interpreter.
The current bottlenecks are in calling functions in benchmarks that call
functions (due to `LOAD_ATTR` not yet using deferred reference counting)
and when accessing thread-local data.
Users want to know when the current context switches to a different
context object. Right now this happens when and only when a context
is entered or exited, so the enter and exit events are synonymous with
"switched". However, if the changes proposed for gh-99633 are
implemented, the current context will also switch for reasons other
than context enter or exit. Since users actually care about context
switches and not enter or exit, replace the enter and exit events with
a single switched event.
The former exit event was emitted just before exiting the context.
The new switched event is emitted after the context is exited to match
the semantics users expect of an event with a past-tense name. If
users need the ability to clean up before the switch takes effect,
another event type can be added in the future. It is not added here
because YAGNI.
I skipped 0 in the enum as a matter of practice. Skipping 0 makes it
easier to troubleshoot when code forgets to set zeroed memory, and it
aligns with best practices for other tools (e.g.,
https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum).
Co-authored-by: Richard Hansen <rhansen@rhansen.org>
Co-authored-by: Victor Stinner <vstinner@python.org>