cpython/Tools/c-analyzer
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
..
c_analyzer Fix typos in the Tools directory (GH-28769) 2021-10-06 10:55:16 -07:00
c_common Fix typos in the Tools directory (GH-28769) 2021-10-06 10:55:16 -07:00
c_parser bpo-45952: Get the C analyzer tool working again. (gh-29882) 2021-12-01 11:20:20 -07:00
cpython bpo-45952: Get the C analyzer tool working again. (gh-31220) 2022-02-08 12:37:53 -07:00
c-analyzer.py bpo-36876: Fix the C analyzer tool. (GH-22841) 2020-10-22 18:42:51 -06:00
check-c-globals.py bpo-36876: [c-analyzer tool] Add a "capi" subcommand to the c-analyzer tool. (gh-23918) 2020-12-24 11:04:19 -07:00
must-resolve.sh bpo-36876: [c-analyzer tool] Additional CLI updates for "capi" command. (gh-23929) 2020-12-25 15:57:30 -07:00
README bpo-36876: Fix the C analyzer tool. (GH-22841) 2020-10-22 18:42:51 -06:00
TODO bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928) 2022-02-08 13:39:07 -07:00

#######################################
# C Globals and CPython Runtime State.

CPython's C code makes extensive use of global variables.  Each global
falls into one of several categories:

* (effectively) constants (incl. static types)
* globals used exclusively in main or in the REPL
* freelists, caches, and counters
* process-global state
* module state
* Python runtime state

The ignored-globals.txt file is organized similarly.  Of the different
categories, the last two are problematic and generally should not exist
in the codebase.

Globals that hold module state (i.e. in Modules/*.c) cause problems
when multiple interpreters are in use.  For more info, see PEP 3121,
which addresses the situation for extension modules in general.

Globals in the last category should be avoided as well.  The problem
isn't with the Python runtime having state.  Rather, the problem is with
that state being spread throughout the codebase in dozens of individual
globals.  Unlike the other globals, the runtime state represents a set
of values that are constantly shifting in a complex way.  When they are
spread out it's harder to get a clear picture of what the runtime
involves.  Furthermore, when they are spread out it complicates efforts
that change the runtime.

Consequently, the globals for Python's runtime state have been
consolidated under a single top-level _PyRuntime global. No new globals
should be added for runtime state.  Instead, they should be added to
_PyRuntimeState or one of its sub-structs.  The check-c-globals script
should be run to ensure that no new globals have been added:

  python3 Tools/c-analyzer/check-c-globals.py

You can also use the more generic tool:

  python3 Tools/c-analyzer/c-analyzer.py

If it reports any globals then they should be resolved.  If the globals
are runtime state then they should be folded into _PyRuntimeState.
Otherwise they should be added to ignored-globals.txt.