mirror of
https://github.com/python/cpython.git
synced 2024-11-23 01:45:25 +08:00
gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization (GH-105805)
Instead of surprise crashes and memory corruption, we now hang threads that attempt to re-enter the Python interpreter after Python runtime finalization has started. These are typically daemon threads (our long standing mis-feature) but could also be threads spawned by extension modules that then try to call into Python. This marks the `PyThread_exit_thread` public C API as deprecated as there is no plausible safe way to accomplish that on any supported platform in the face of things like C++ code with finalizers anywhere on a thread's stack. Doing this was the least bad option. Co-authored-by: Gregory P. Smith <greg@krypto.org>
This commit is contained in:
parent
113b2d7583
commit
8cc5aa47ee
@ -430,7 +430,11 @@ Initializing and finalizing the interpreter
|
||||
Some memory allocated by extension modules may not be freed. Some extensions may not
|
||||
work properly if their initialization routine is called more than once; this can
|
||||
happen if an application calls :c:func:`Py_Initialize` and :c:func:`Py_FinalizeEx`
|
||||
more than once.
|
||||
more than once. :c:func:`Py_FinalizeEx` must not be called recursively from
|
||||
within itself. Therefore, it must not be called by any code that may be run
|
||||
as part of the interpreter shutdown process, such as :py:mod:`atexit`
|
||||
handlers, object finalizers, or any code that may be run while flushing the
|
||||
stdout and stderr files.
|
||||
|
||||
.. audit-event:: cpython._PySys_ClearAuditHooks "" c.Py_FinalizeEx
|
||||
|
||||
@ -960,6 +964,37 @@ thread, where the CPython global runtime was originally initialized.
|
||||
The only exception is if :c:func:`exec` will be called immediately
|
||||
after.
|
||||
|
||||
.. _cautions-regarding-runtime-finalization:
|
||||
|
||||
Cautions regarding runtime finalization
|
||||
---------------------------------------
|
||||
|
||||
In the late stage of :term:`interpreter shutdown`, after attempting to wait for
|
||||
non-daemon threads to exit (though this can be interrupted by
|
||||
:class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the runtime
|
||||
is marked as *finalizing*: :c:func:`_Py_IsFinalizing` and
|
||||
:func:`sys.is_finalizing` return true. At this point, only the *finalization
|
||||
thread* that initiated finalization (typically the main thread) is allowed to
|
||||
acquire the :term:`GIL`.
|
||||
|
||||
If any thread, other than the finalization thread, attempts to acquire the GIL
|
||||
during finalization, either explicitly via :c:func:`PyGILState_Ensure`,
|
||||
:c:macro:`Py_END_ALLOW_THREADS`, :c:func:`PyEval_AcquireThread`, or
|
||||
:c:func:`PyEval_AcquireLock`, or implicitly when the interpreter attempts to
|
||||
reacquire it after having yielded it, the thread enters **a permanently blocked
|
||||
state** where it remains until the program exits. In most cases this is
|
||||
harmless, but this can result in deadlock if a later stage of finalization
|
||||
attempts to acquire a lock owned by the blocked thread, or otherwise waits on
|
||||
the blocked thread.
|
||||
|
||||
Gross? Yes. This prevents random crashes and/or unexpectedly skipped C++
|
||||
finalizations further up the call stack when such threads were forcibly exited
|
||||
here in CPython 3.13 and earlier. The CPython runtime GIL acquiring C APIs
|
||||
have never had any error reporting or handling expectations at GIL acquisition
|
||||
time that would've allowed for graceful exit from this situation. Changing that
|
||||
would require new stable C APIs and rewriting the majority of C code in the
|
||||
CPython ecosystem to use those with error handling.
|
||||
|
||||
|
||||
High-level API
|
||||
--------------
|
||||
@ -1033,11 +1068,14 @@ code, or when embedding the Python interpreter:
|
||||
ensues.
|
||||
|
||||
.. note::
|
||||
Calling this function from a thread when the runtime is finalizing
|
||||
will terminate the thread, even if the thread was not created by Python.
|
||||
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
|
||||
check if the interpreter is in process of being finalized before calling
|
||||
this function to avoid unwanted termination.
|
||||
Calling this function from a thread when the runtime is finalizing will
|
||||
hang the thread until the program exits, even if the thread was not
|
||||
created by Python. Refer to
|
||||
:ref:`cautions-regarding-runtime-finalization` for more details.
|
||||
|
||||
.. versionchanged:: next
|
||||
Hangs the current thread, rather than terminating it, if called while the
|
||||
interpreter is finalizing.
|
||||
|
||||
.. c:function:: PyThreadState* PyThreadState_Get()
|
||||
|
||||
@ -1092,11 +1130,14 @@ with sub-interpreters:
|
||||
to call arbitrary Python code. Failure is a fatal error.
|
||||
|
||||
.. note::
|
||||
Calling this function from a thread when the runtime is finalizing
|
||||
will terminate the thread, even if the thread was not created by Python.
|
||||
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
|
||||
check if the interpreter is in process of being finalized before calling
|
||||
this function to avoid unwanted termination.
|
||||
Calling this function from a thread when the runtime is finalizing will
|
||||
hang the thread until the program exits, even if the thread was not
|
||||
created by Python. Refer to
|
||||
:ref:`cautions-regarding-runtime-finalization` for more details.
|
||||
|
||||
.. versionchanged:: next
|
||||
Hangs the current thread, rather than terminating it, if called while the
|
||||
interpreter is finalizing.
|
||||
|
||||
.. c:function:: void PyGILState_Release(PyGILState_STATE)
|
||||
|
||||
@ -1374,17 +1415,20 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
|
||||
If this thread already has the lock, deadlock ensues.
|
||||
|
||||
.. note::
|
||||
Calling this function from a thread when the runtime is finalizing
|
||||
will terminate the thread, even if the thread was not created by Python.
|
||||
You can use :c:func:`Py_IsFinalizing` or :func:`sys.is_finalizing` to
|
||||
check if the interpreter is in process of being finalized before calling
|
||||
this function to avoid unwanted termination.
|
||||
Calling this function from a thread when the runtime is finalizing will
|
||||
hang the thread until the program exits, even if the thread was not
|
||||
created by Python. Refer to
|
||||
:ref:`cautions-regarding-runtime-finalization` for more details.
|
||||
|
||||
.. versionchanged:: 3.8
|
||||
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
|
||||
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
|
||||
and terminate the current thread if called while the interpreter is finalizing.
|
||||
|
||||
.. versionchanged:: next
|
||||
Hangs the current thread, rather than terminating it, if called while the
|
||||
interpreter is finalizing.
|
||||
|
||||
:c:func:`PyEval_RestoreThread` is a higher-level function which is always
|
||||
available (even when threads have not been initialized).
|
||||
|
||||
|
@ -152,6 +152,19 @@ PyAPI_FUNC(int) PyThread_join_thread(PyThread_handle_t);
|
||||
* a non-zero value on failure.
|
||||
*/
|
||||
PyAPI_FUNC(int) PyThread_detach_thread(PyThread_handle_t);
|
||||
/*
|
||||
* Hangs the thread indefinitely without exiting it.
|
||||
*
|
||||
* gh-87135: There is no safe way to exit a thread other than returning
|
||||
* normally from its start function. This is used during finalization in lieu
|
||||
* of actually exiting the thread. Since the program is expected to terminate
|
||||
* soon anyway, it does not matter if the thread stack stays around until then.
|
||||
*
|
||||
* This is unfortunate for embedders who may not be terminating their process
|
||||
* when they're done with the interpreter, but our C API design does not allow
|
||||
* for safely exiting threads attempting to re-enter Python post finalization.
|
||||
*/
|
||||
void _Py_NO_RETURN PyThread_hang_thread(void);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
|
@ -17,7 +17,26 @@ typedef enum PyLockStatus {
|
||||
|
||||
PyAPI_FUNC(void) PyThread_init_thread(void);
|
||||
PyAPI_FUNC(unsigned long) PyThread_start_new_thread(void (*)(void *), void *);
|
||||
PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
|
||||
/* Terminates the current thread. Considered unsafe.
|
||||
*
|
||||
* WARNING: This function is only safe to call if all functions in the full call
|
||||
* stack are written to safely allow it. Additionally, the behavior is
|
||||
* platform-dependent. This function should be avoided, and is no longer called
|
||||
* by Python itself. It is retained only for compatibility with existing C
|
||||
* extension code.
|
||||
*
|
||||
* With pthreads, calls `pthread_exit` causes some libcs (glibc?) to attempt to
|
||||
* unwind the stack and call C++ destructors; if a `noexcept` function is
|
||||
* reached, they may terminate the process. Others (macOS) do unwinding.
|
||||
*
|
||||
* On Windows, calls `_endthreadex` which kills the thread without calling C++
|
||||
* destructors.
|
||||
*
|
||||
* In either case there is a risk of invalid references remaining to data on the
|
||||
* thread stack.
|
||||
*/
|
||||
Py_DEPRECATED(3.14) PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
|
||||
|
||||
PyAPI_FUNC(unsigned long) PyThread_get_thread_ident(void);
|
||||
|
||||
#if (defined(__APPLE__) || defined(__linux__) || defined(_WIN32) \
|
||||
|
@ -1171,6 +1171,76 @@ class ThreadTests(BaseTestCase):
|
||||
self.assertEqual(out.strip(), b"OK")
|
||||
self.assertIn(b"can't create new thread at interpreter shutdown", err)
|
||||
|
||||
@cpython_only
|
||||
def test_finalize_daemon_thread_hang(self):
|
||||
if support.check_sanitizer(thread=True, memory=True):
|
||||
# the thread running `time.sleep(100)` below will still be alive
|
||||
# at process exit
|
||||
self.skipTest(
|
||||
"https://github.com/python/cpython/issues/124878 - Known"
|
||||
" race condition that TSAN identifies.")
|
||||
# gh-87135: tests that daemon threads hang during finalization
|
||||
script = textwrap.dedent('''
|
||||
import os
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
import _testcapi
|
||||
|
||||
lock = threading.Lock()
|
||||
lock.acquire()
|
||||
thread_started_event = threading.Event()
|
||||
def thread_func():
|
||||
try:
|
||||
thread_started_event.set()
|
||||
_testcapi.finalize_thread_hang(lock.acquire)
|
||||
finally:
|
||||
# Control must not reach here.
|
||||
os._exit(2)
|
||||
|
||||
t = threading.Thread(target=thread_func)
|
||||
t.daemon = True
|
||||
t.start()
|
||||
thread_started_event.wait()
|
||||
# Sleep to ensure daemon thread is blocked on `lock.acquire`
|
||||
#
|
||||
# Note: This test is designed so that in the unlikely case that
|
||||
# `0.1` seconds is not sufficient time for the thread to become
|
||||
# blocked on `lock.acquire`, the test will still pass, it just
|
||||
# won't be properly testing the thread behavior during
|
||||
# finalization.
|
||||
time.sleep(0.1)
|
||||
|
||||
def run_during_finalization():
|
||||
# Wake up daemon thread
|
||||
lock.release()
|
||||
# Sleep to give the daemon thread time to crash if it is going
|
||||
# to.
|
||||
#
|
||||
# Note: If due to an exceptionally slow execution this delay is
|
||||
# insufficient, the test will still pass but will simply be
|
||||
# ineffective as a test.
|
||||
time.sleep(0.1)
|
||||
# If control reaches here, the test succeeded.
|
||||
os._exit(0)
|
||||
|
||||
# Replace sys.stderr.flush as a way to run code during finalization
|
||||
orig_flush = sys.stderr.flush
|
||||
def do_flush(*args, **kwargs):
|
||||
orig_flush(*args, **kwargs)
|
||||
if not sys.is_finalizing:
|
||||
return
|
||||
sys.stderr.flush = orig_flush
|
||||
run_during_finalization()
|
||||
|
||||
sys.stderr.flush = do_flush
|
||||
|
||||
# If the follow exit code is retained, `run_during_finalization`
|
||||
# did not run.
|
||||
sys.exit(1)
|
||||
''')
|
||||
assert_python_ok("-c", script)
|
||||
|
||||
class ThreadJoinOnShutdown(BaseTestCase):
|
||||
|
||||
def _run_and_join(self, script):
|
||||
|
@ -0,0 +1,15 @@
|
||||
Attempting to acquire the GIL after runtime finalization has begun in a
|
||||
different thread now causes the thread to hang rather than terminate, which
|
||||
avoids potential crashes or memory corruption caused by attempting to
|
||||
terminate a thread that is running code not specifically designed to support
|
||||
termination. In most cases this hanging is harmless since the process will
|
||||
soon exit anyway.
|
||||
|
||||
The ``PyThread_exit_thread`` function is now deprecated. Its behavior is
|
||||
inconsistent across platforms, and it can only be used safely in the
|
||||
unlikely case that every function in the entire call stack has been designed
|
||||
to support the platform-dependent termination mechanism. It is recommended
|
||||
that users of this function change their design to not require thread
|
||||
termination. In the unlikely case that thread termination is needed and can
|
||||
be done safely, users may migrate to calling platform-specific APIs such as
|
||||
``pthread_exit`` (POSIX) or ``_endthreadex`` (Windows) directly.
|
@ -3310,6 +3310,35 @@ test_critical_sections(PyObject *module, PyObject *Py_UNUSED(args))
|
||||
Py_RETURN_NONE;
|
||||
}
|
||||
|
||||
// Used by `finalize_thread_hang`.
|
||||
#ifdef _POSIX_THREADS
|
||||
static void finalize_thread_hang_cleanup_callback(void *Py_UNUSED(arg)) {
|
||||
// Should not reach here.
|
||||
Py_FatalError("pthread thread termination was triggered unexpectedly");
|
||||
}
|
||||
#endif
|
||||
|
||||
// Tests that finalization does not trigger pthread cleanup.
|
||||
//
|
||||
// Must be called with a single nullary callable function that should block
|
||||
// (with GIL released) until finalization is in progress.
|
||||
static PyObject *
|
||||
finalize_thread_hang(PyObject *self, PyObject *callback)
|
||||
{
|
||||
// WASI builds some pthread stuff but doesn't have these APIs today?
|
||||
#if defined(_POSIX_THREADS) && !defined(__wasi__)
|
||||
pthread_cleanup_push(finalize_thread_hang_cleanup_callback, NULL);
|
||||
#endif
|
||||
PyObject_CallNoArgs(callback);
|
||||
// Should not reach here.
|
||||
Py_FatalError("thread unexpectedly did not hang");
|
||||
#if defined(_POSIX_THREADS) && !defined(__wasi__)
|
||||
pthread_cleanup_pop(0);
|
||||
#endif
|
||||
Py_RETURN_NONE;
|
||||
}
|
||||
|
||||
|
||||
static PyMethodDef TestMethods[] = {
|
||||
{"set_errno", set_errno, METH_VARARGS},
|
||||
{"test_config", test_config, METH_NOARGS},
|
||||
@ -3449,6 +3478,7 @@ static PyMethodDef TestMethods[] = {
|
||||
{"test_weakref_capi", test_weakref_capi, METH_NOARGS},
|
||||
{"function_set_warning", function_set_warning, METH_NOARGS},
|
||||
{"test_critical_sections", test_critical_sections, METH_NOARGS},
|
||||
{"finalize_thread_hang", finalize_thread_hang, METH_O, NULL},
|
||||
{NULL, NULL} /* sentinel */
|
||||
};
|
||||
|
||||
|
@ -7,6 +7,7 @@
|
||||
#include "pycore_pylifecycle.h" // _PyErr_Print()
|
||||
#include "pycore_pymem.h" // _PyMem_IsPtrFreed()
|
||||
#include "pycore_pystats.h" // _Py_PrintSpecializationStats()
|
||||
#include "pycore_pythread.h" // PyThread_hang_thread()
|
||||
|
||||
/*
|
||||
Notes about the implementation:
|
||||
@ -277,10 +278,9 @@ drop_gil(PyInterpreterState *interp, PyThreadState *tstate, int final_release)
|
||||
/* Take the GIL.
|
||||
|
||||
The function saves errno at entry and restores its value at exit.
|
||||
It may hang rather than return if the interpreter has been finalized.
|
||||
|
||||
tstate must be non-NULL.
|
||||
|
||||
Returns 1 if the GIL was acquired, or 0 if not. */
|
||||
tstate must be non-NULL. */
|
||||
static void
|
||||
take_gil(PyThreadState *tstate)
|
||||
{
|
||||
@ -293,12 +293,18 @@ take_gil(PyThreadState *tstate)
|
||||
|
||||
if (_PyThreadState_MustExit(tstate)) {
|
||||
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
|
||||
thread which called Py_Finalize(), exit immediately the thread.
|
||||
thread which called Py_Finalize(), this thread cannot continue.
|
||||
|
||||
This code path can be reached by a daemon thread after Py_Finalize()
|
||||
completes. In this case, tstate is a dangling pointer: points to
|
||||
PyThreadState freed memory. */
|
||||
PyThread_exit_thread();
|
||||
PyThreadState freed memory.
|
||||
|
||||
This used to call a *thread_exit API, but that was not safe as it
|
||||
lacks stack unwinding and local variable destruction important to
|
||||
C++. gh-87135: The best that can be done is to hang the thread as
|
||||
the public APIs calling this have no error reporting mechanism (!).
|
||||
*/
|
||||
PyThread_hang_thread();
|
||||
}
|
||||
|
||||
assert(_PyThreadState_CheckConsistency(tstate));
|
||||
@ -342,7 +348,9 @@ take_gil(PyThreadState *tstate)
|
||||
if (drop_requested) {
|
||||
_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);
|
||||
}
|
||||
PyThread_exit_thread();
|
||||
// gh-87135: hang the thread as *thread_exit() is not a safe
|
||||
// API. It lacks stack unwind and local variable destruction.
|
||||
PyThread_hang_thread();
|
||||
}
|
||||
assert(_PyThreadState_CheckConsistency(tstate));
|
||||
|
||||
@ -383,7 +391,7 @@ take_gil(PyThreadState *tstate)
|
||||
|
||||
if (_PyThreadState_MustExit(tstate)) {
|
||||
/* bpo-36475: If Py_Finalize() has been called and tstate is not
|
||||
the thread which called Py_Finalize(), exit immediately the
|
||||
the thread which called Py_Finalize(), gh-87135: hang the
|
||||
thread.
|
||||
|
||||
This code path can be reached by a daemon thread which was waiting
|
||||
@ -393,7 +401,7 @@ take_gil(PyThreadState *tstate)
|
||||
/* tstate could be a dangling pointer, so don't pass it to
|
||||
drop_gil(). */
|
||||
drop_gil(interp, NULL, 1);
|
||||
PyThread_exit_thread();
|
||||
PyThread_hang_thread();
|
||||
}
|
||||
assert(_PyThreadState_CheckConsistency(tstate));
|
||||
|
||||
|
@ -2020,7 +2020,7 @@ _Py_Finalize(_PyRuntimeState *runtime)
|
||||
/* Ensure that remaining threads are detached */
|
||||
_PyEval_StopTheWorldAll(runtime);
|
||||
|
||||
/* Remaining daemon threads will automatically exit
|
||||
/* Remaining daemon threads will be trapped in PyThread_hang_thread
|
||||
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
|
||||
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
|
||||
_PyRuntimeState_SetFinalizing(runtime, tstate);
|
||||
|
@ -291,6 +291,14 @@ PyThread_exit_thread(void)
|
||||
_endthreadex(0);
|
||||
}
|
||||
|
||||
void _Py_NO_RETURN
|
||||
PyThread_hang_thread(void)
|
||||
{
|
||||
while (1) {
|
||||
SleepEx(INFINITE, TRUE);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Lock support. It has to be implemented as semaphores.
|
||||
* I [Dag] tried to implement it with mutex but I could find a way to
|
||||
|
@ -16,6 +16,7 @@
|
||||
#undef destructor
|
||||
#endif
|
||||
#include <signal.h>
|
||||
#include <unistd.h> /* pause(), also getthrid() on OpenBSD */
|
||||
|
||||
#if defined(__linux__)
|
||||
# include <sys/syscall.h> /* syscall(SYS_gettid) */
|
||||
@ -23,8 +24,6 @@
|
||||
# include <pthread_np.h> /* pthread_getthreadid_np() */
|
||||
#elif defined(__FreeBSD_kernel__)
|
||||
# include <sys/syscall.h> /* syscall(SYS_thr_self) */
|
||||
#elif defined(__OpenBSD__)
|
||||
# include <unistd.h> /* getthrid() */
|
||||
#elif defined(_AIX)
|
||||
# include <sys/thread.h> /* thread_self() */
|
||||
#elif defined(__NetBSD__)
|
||||
@ -419,6 +418,18 @@ PyThread_exit_thread(void)
|
||||
#endif
|
||||
}
|
||||
|
||||
void _Py_NO_RETURN
|
||||
PyThread_hang_thread(void)
|
||||
{
|
||||
while (1) {
|
||||
#if defined(__wasi__)
|
||||
sleep(9999999); // WASI doesn't have pause() ?!
|
||||
#else
|
||||
pause();
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef USE_SEMAPHORES
|
||||
|
||||
/*
|
||||
|
Loading…
Reference in New Issue
Block a user