This finishes the work begun in gh-107760. When, while projecting a superblock, we encounter a call to a short, simple function, the superblock will now enter the function using `_PUSH_FRAME`, continue through it, and leave it using `_POP_FRAME`, and then continue through the original code. Multiple frame pushes and pops are even possible. It is also possible to stop appending to the superblock in the middle of a called function, when running out of space or encountering an unsupported bytecode.
* Split `CALL_PY_EXACT_ARGS` into uops
This is only the first step for doing `CALL` in Tier 2.
The next step involves tracing into the called code object and back.
After that we'll have to do the remaining `CALL` specialization.
Finally we'll have to deal with `KW_NAMES`.
Note: this moves setting `frame->return_offset` directly in front of
`DISPATCH_INLINED()`, to make it easier to move it into `_PUSH_FRAME`.
Introducing a new file, stacking.py, that takes over several responsibilities related to symbolic evaluation of push/pop operations, with more generality.
The linked list of objects was a global variable, which broke isolation between interpreters, causing crashes. To solve this, we've moved the linked list to each interpreter.
There's no need to use a dummy uop to skip unused cache entries. The macro syntax lets you write `unused/1` instead.
Similarly, move `unused/5` from op `_LOAD_ATTR_INSTANCE_VALUE` to macro `LOAD_ATTR_INSTANCE_VALUE`.
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed?
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
The _xxsubinterpreters module should not rely on internal API. Some of the functions it uses were recently moved there however. Here we move them back (and expose them properly).
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict.
Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.
Fix warnings C4100 in Py_UNUSED() when Python is built with "cl /W4".
Example with this function included by Python.h:
static inline unsigned int
PyUnicode_IS_READY(PyObject* Py_UNUSED(op))
{ return 1; }
Without this change, building a C program with "cl /W4" which just
includes Python.h emits the warning:
Include\cpython/unicodeobject.h(199):
warning C4100: '_unused_op': unreferenced formal parameter
This change fix this warning.
No longer export these 5 internal C API variables:
* _PyBufferWrapper_Type
* _PyImport_FrozenBootstrap
* _PyImport_FrozenStdlib
* _PyImport_FrozenTest
* _Py_SwappedOp
Fix the definition of these internal functions, replace PyAPI_DATA()
with PyAPI_FUNC():
* _PyImport_ClearExtension()
* _PyObject_IsFreed()
* _PyThreadState_GetCurrent()
No longer export these 2 internal C API functions:
* _PyEval_SignalAsyncExc()
* _PyEval_SignalReceived()
Add also comments explaining why some internal functions have to be
exported, and update existing comments.