Commit Graph

393 Commits

Author SHA1 Message Date
Kumar Aditya
bd1cf6ecee
bpo-47012: speed up iteration of bytes and bytearray (GH-31867) 2022-03-23 04:30:05 -04:00
Inada Naoki
2d8b764210
bpo-46864: Deprecate PyBytesObject.ob_shash. (GH-31598) 2022-03-06 11:39:10 +09:00
Victor Stinner
b6b711a1aa
bpo-46848: Move _PyBytes_Find() to internal C API (GH-31642)
Move _PyBytes_Find() and _PyBytes_ReverseFind() functions to the
internal C API.

bytesobject.c now includes pycore_bytesobject.h.
2022-03-02 14:15:26 +01:00
Dennis Sweeney
6ddb09f35b
bpo-46848: Use stringlib/fastsearch in mmap (GH-31625)
Speed up mmap.find(). Add _PyBytes_Find() and _PyBytes_ReverseFind().
2022-03-01 23:46:30 -05:00
Eric Snow
81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Victor Stinner
097f74a5a3
bpo-46670: Define all macros for stringlib (GH-31176)
bytesobject.c, bytearrayobject.c and unicodeobject.c now define all
macros used by stringlib, to avoid using undefined macros.
Fix "gcc -Wundef" warnings.
2022-02-07 01:26:58 +01:00
Victor Stinner
a1bf329bca
bpo-46417: Add missing types of _PyTypes_InitTypes() (GH-30749)
Add types removed by mistake by the commit adding
_PyTypes_FiniTypes().

Move also PyBool_Type at the end, since it depends on PyLong_Type.

PyBytes_Type and PyUnicode_Type no longer depend explicitly on
PyBaseObject_Type: it's the default of PyType_Ready().
2022-01-21 17:53:13 +01:00
Eric Snow
cf496d657a
bpo-45953: Statically allocate and initialize global bytes objects. (gh-30096)
The empty bytes object (b'') and the 256 one-character bytes objects were allocated at runtime init.  Now we statically allocate and initialize them.

https://bugs.python.org/issue45953
2022-01-11 09:37:24 -07:00
Eric Snow
c8749b5783
bpo-46008: Make runtime-global object/type lifecycle functions and state consistent. (gh-29998)
This change is strictly renames and moving code around.  It helps in the following ways:

* ensures type-related init functions focus strictly on one of the three aspects (state, objects, types)
* passes in PyInterpreterState * to all those functions, simplifying work on moving types/objects/state to the interpreter
* consistent naming conventions help make what's going on more clear
* keeping API related to a type in the corresponding header file makes it more obvious where to look for it

https://bugs.python.org/issue46008
2021-12-09 12:59:26 -07:00
Victor Stinner
5f09bb021a
bpo-35134: Add Include/cpython/longobject.h (GH-29044)
Move Include/longobject.h non-limited API to a new
Include/cpython/longobject.h header file.

Move the following definitions to the internal C API:

* _PyLong_DigitValue
* _PyLong_FormatAdvancedWriter()
* _PyLong_FormatWriter()
2021-10-19 02:04:52 +02:00
Victor Stinner
bbe7497c5a
bpo-45434: Remove pystrhex.h header file (GH-28923)
Move Include/pystrhex.h to Include/internal/pycore_strhex.h.
The header file only contains private functions.

The following C extensions are now built with Py_BUILD_CORE_MODULE
macro defined to get access to the internal C API:

* _blake2
* _hashopenssl
* _md5
* _sha1
* _sha3
* _ssl
* binascii
2021-10-13 15:22:35 +02:00
Victor Stinner
d943d19172
bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)
* Move _PyObject_CallNoArgs() to pycore_call.h (internal C API).
* _ssl, _sqlite and _testcapi extensions now call the public
  PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs().
* _lsprof extension is now built with Py_BUILD_CORE_MODULE macro
  defined to get access to internal _PyObject_CallNoArgs().
2021-10-12 08:38:19 +02:00
Victor Stinner
ce3489cfdb
bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)
Fix typo in the private _PyObject_CallNoArg() function name: rename
it to _PyObject_CallNoArgs() to be consistent with the public
function PyObject_CallNoArgs().
2021-10-12 00:42:23 +02:00
Mark Dickinson
ae5259171b
Fix bytes.__bytes__ to not truncate at a zero byte (GH-27902) 2021-08-23 15:24:12 +01:00
Dong-hee Na
24b63c695a
bpo-24234: Implement bytes.__bytes__ (GH-27901) 2021-08-23 19:01:51 +09:00
Brandt Bucher
145bf269df
bpo-42128: Structural Pattern Matching (PEP 634) (GH-22917)
Co-authored-by: Guido van Rossum <guido@python.org>
Co-authored-by: Talin <viridia@gmail.com>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
2021-02-26 14:51:55 -08:00
Victor Stinner
bcb094b41f
bpo-43268: Pass interp rather than tstate to internal functions (GH-24580)
Pass the current interpreter (interp) rather than the current Python
thread state (tstate) to internal functions which only use the
interpreter.

Modified functions:

* _PyXXX_Fini() and _PyXXX_ClearFreeList() functions
* _PyEval_SignalAsyncExc(), make_pending_calls()
* _PySys_GetObject(), sys_set_object(), sys_set_object_id(), sys_set_object_str()
* should_audit(), set_flags_from_config(), make_flags()
* _PyAtExit_Call()
* init_stdio_encoding()
* etc.
2021-02-19 15:10:45 +01:00
Serhiy Storchaka
2ad93821a6
bpo-42431: Fix outdated bytes comments (GH-23458)
Also move definitions of internal macros F_LJUST etc to private header.
2020-12-03 12:46:16 +02:00
Victor Stinner
32bd68c839
bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)
No longer use deprecated aliases to functions:

* Replace PyObject_MALLOC() with PyObject_Malloc()
* Replace PyObject_REALLOC() with PyObject_Realloc()
* Replace PyObject_FREE() with PyObject_Free()
* Replace PyObject_Del() with PyObject_Free()
* Replace PyObject_DEL() with PyObject_Free()
2020-12-01 10:37:39 +01:00
Serhiy Storchaka
313467efdc
bpo-42435: Speed up comparison of bytes and bytearray object (GH--23461)
* Speed up comparison of bytes objects with non-bytes objects when
  option -b is specified.
* Speed up comparison of bytarray objects with non-buffer object.
2020-11-22 22:00:53 +02:00
Serhiy Storchaka
e2ec0b27c0
bpo-41974: Remove complex.__float__, complex.__floordiv__, etc (GH-22593)
Remove complex special methods __int__, __float__, __floordiv__,
__mod__, __divmod__, __rfloordiv__, __rmod__ and __rdivmod__
which always raised a TypeError.
2020-10-09 14:14:37 +03:00
Guido van Rossum
488512bf49
A (very) slight speed improvement for iterating over bytes (#21705)
My mentee @xvxvxvxvxv noticed that iterating over array.array is
slightly faster than iterating over bytes.  Looking at the source I
observed that arrayiter_next() calls `getitem(ao, it->index++)` wheras
striter_next() uses the idiom (paraphrased)

    item = PyLong_FromLong(seq->ob_sval[it->it_index]);
    if (item != NULL)
        ++it->it_next;
    return item;

I'm not 100% sure but I think that the second version has fewer
opportunity for the CPU to overlap the `index++` operation with the
rest of the code (which in both cases involves a call).  So here I am
optimistically incrementing the index -- if the PyLong_FromLong() call
fails, this will leave the iterator pointing at the next byte, but
honestly I doubt that anyone would seriously consider resuming use of
the iterator after that kind of failure (it would have to be a
MemoryError).  And the author of arrayiter_next() made the same
consideration (or never ever gave it a thought :-).

With this, a loop like

    for _ in b: pass

is now slightly *faster* than the same thing over an equivalent array,
rather than slightly *slower* (in both cases a few percent).
2020-08-03 09:04:13 -07:00
Serhiy Storchaka
12f433411b
bpo-41334: Convert constructors of str, bytes and bytearray to Argument Clinic (GH-21535) 2020-07-20 15:53:55 +03:00
Serhiy Storchaka
e67f7db3c3
bpo-37999: Simplify the conversion code for %c, %d, %x, etc. (GH-20437)
Since PyLong_AsLong() no longer use __int__, explicit call
of PyNumber_Index() before it is no longer needed.
2020-06-29 22:36:41 +03:00
Victor Stinner
91698d8caa
bpo-40521: Optimize PyBytes_FromStringAndSize(str, 0) (GH-21142)
Always create the empty bytes string singleton.

Optimize PyBytes_FromStringAndSize(str, 0): it no longer has to check
if the empty string singleton was created or not, it is always
available.

Add functions:

* _PyBytes_Init()
* bytes_get_empty(), bytes_new_empty()
* bytes_create_empty_string_singleton()
* unicode_create_empty_string_singleton()

_Py_unicode_state: rename empty structure member to empty_string.
2020-06-25 14:07:40 +02:00
Victor Stinner
c41eed1a87
bpo-40521: Make bytes singletons per interpreter (GH-21074)
Each interpreter now has its own empty bytes string and single byte
character singletons.

Replace STRINGLIB_EMPTY macro with STRINGLIB_GET_EMPTY() macro.
2020-06-23 15:54:35 +02:00
Victor Stinner
04fc4f2a46
bpo-40989: PyObject_INIT() becomes an alias to PyObject_Init() (GH-20901)
The PyObject_INIT() and PyObject_INIT_VAR() macros become aliases to,
respectively, PyObject_Init() and PyObject_InitVar() functions.

Rename _PyObject_INIT() and _PyObject_INIT_VAR() static inline
functions to, respectively, _PyObject_Init() and _PyObject_InitVar(),
and move them to pycore_object.h. Remove their return value:
their return type becomes void.

The _datetime module is now built with the Py_BUILD_CORE_MODULE macro
defined.

Remove an outdated comment on _Py_tracemalloc_config.
2020-06-16 01:28:07 +02:00
Victor Stinner
d36cf5f1d2
bpo-40943: Replace PY_FORMAT_SIZE_T with "z" (GH-20781)
The PEP 353, written in 2005, introduced PY_FORMAT_SIZE_T. Python no
longer supports macOS 10.4 and Visual Studio 2010, but requires more
recent macOS and Visual Studio versions. In 2020 with Python 3.10, it
is now safe to use directly "%zu" to format size_t and "%zi" to
format Py_ssize_t.
2020-06-10 18:38:05 +02:00
Serhiy Storchaka
5f4b229df7
bpo-40792: Make the result of PyNumber_Index() always having exact type int. (GH-20443)
Previously, the result could have been an instance of a subclass of int.

Also revert bpo-26202 and make attributes start, stop and step of the range
object having exact type int.

Add private function _PyNumber_Index() which preserves the old behavior
of PyNumber_Index() for performance to use it in the conversion functions
like PyLong_AsLong().
2020-05-28 10:33:45 +03:00
sweeneyde
a81849b031
bpo-39939: Add str.removeprefix and str.removesuffix (GH-18939)
Added str.removeprefix and str.removesuffix methods and corresponding
bytes, bytearray, and collections.UserString methods to remove affixes
from a string if present. See PEP 616 for a full description.
2020-04-22 23:05:48 +02:00
Victor Stinner
d9ea5cae1d
bpo-40268: Remove unused pycore_pymem.h includes (GH-19531) 2020-04-15 02:57:50 +02:00
Victor Stinner
da7933ecc3
bpo-40268: Add _PyInterpreterState_GetConfig() (GH-19492)
Don't access PyInterpreterState.config member directly anymore, but
use new functions:

* _PyInterpreterState_GetConfig()
* _PyInterpreterState_SetConfig()
* _Py_GetConfig()
2020-04-13 03:04:28 +02:00
Serhiy Storchaka
8f87eefe7f
bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. (GH-19472) 2020-04-12 14:58:27 +03:00
Serhiy Storchaka
cd8295ff75
bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345) 2020-04-11 10:48:40 +03:00
Victor Stinner
a15e260b70
bpo-40170: Add _PyIndex_Check() internal function (GH-19426)
Add _PyIndex_Check() function to the internal C API: fast inlined
verson of PyIndex_Check().

Add Include/internal/pycore_abstract.h header file.

Replace PyIndex_Check() with _PyIndex_Check() in C files of Objects
and Python subdirectories.
2020-04-08 02:01:56 +02:00
Victor Stinner
45876a90e2
bpo-35081: Move bytes_methods.h to the internal C API (GH-18492)
Move the bytes_methods.h header file to the internal C API as
pycore_bytes_methods.h: it only contains private symbols (prefixed by
"_Py"), except of the PyDoc_STRVAR_shared() macro.
2020-02-12 22:32:34 +01:00
Petr Viktorin
ffd9753a94
bpo-39245: Switch to public API for Vectorcall (GH-18460)
The bulk of this patch was generated automatically with:

    for name in \
        PyObject_Vectorcall \
        Py_TPFLAGS_HAVE_VECTORCALL \
        PyObject_VectorcallMethod \
        PyVectorcall_Function \
        PyObject_CallOneArg \
        PyObject_CallMethodNoArgs \
        PyObject_CallMethodOneArg \
    ;
    do
        echo $name
        git grep -lwz _$name | xargs -0 sed -i "s/\b_$name\b/$name/g"
    done

    old=_PyObject_FastCallDict
    new=PyObject_VectorcallDict
    git grep -lwz $old | xargs -0 sed -i "s/\b$old\b/$new/g"

and then cleaned up:

- Revert changes to in docs & news
- Revert changes to backcompat defines in headers
- Nudge misaligned comments
2020-02-11 17:46:57 +01:00
Victor Stinner
60ac6ed557
bpo-39573: Use Py_SET_SIZE() function (GH-18402)
Replace direct acccess to PyVarObject.ob_size with usage of
the Py_SET_SIZE() function.
2020-02-07 23:18:08 +01:00
Victor Stinner
58ac700fb0
bpo-39573: Use Py_TYPE() macro in Objects directory (GH-18392)
Replace direct access to PyObject.ob_type with Py_TYPE().
2020-02-07 03:04:21 +01:00
Victor Stinner
49932fec62
bpo-39542: Simplify _Py_NewReference() (GH-18332)
* Remove _Py_INC_REFTOTAL and _Py_DEC_REFTOTAL macros: modify
  directly _Py_RefTotal.
* _Py_ForgetReference() is no longer defined if the Py_TRACE_REFS
  macro is not defined.
* Remove _Py_NewReference() implementation from object.c:
  unify the two implementations in object.h inline function.
* Fix Py_TRACE_REFS build: _Py_INC_TPALLOCS() macro has been removed.
2020-02-03 17:55:04 +01:00
Victor Stinner
c6e5c1123b
bpo-39489: Remove COUNT_ALLOCS special build (GH-18259)
Remove:

* COUNT_ALLOCS macro
* sys.getcounts() function
* SHOW_ALLOC_COUNT code in listobject.c
* SHOW_TRACK_COUNT code in tupleobject.c
* PyConfig.show_alloc_count field
* -X showalloccount command line option
* @test.support.requires_type_collecting decorator
2020-02-03 15:17:15 +01:00
Hai Shi
46874c26ee
bpo-39487: Merge duplicated _Py_IDENTIFIER identifiers in C code (GH-18254)
Moving repetitive `_Py_IDENTIFIER` instances to a global location helps identify them more easily in regards to sub-interpreter support.
2020-01-30 15:20:25 -08:00
Victor Stinner
60ec6efd96
bpo-36389: Fix _PyBytesWriter in release mode (GH-16624)
Fix _PyBytesWriter API when Python is built in release mode with
assertions.
2019-10-07 22:31:42 +02:00
Victor Stinner
6876257eaa
bpo-36389: _PyObject_CheckConsistency() available in release mode (GH-16612)
bpo-36389, bpo-38376: The _PyObject_CheckConsistency() function is
now also available in release mode. For example, it can be used to
debug a crash in the visit_decref() function of the GC.

Modify the following functions to also work in release mode:

* _PyDict_CheckConsistency()
* _PyObject_CheckConsistency()
* _PyType_CheckConsistency()
* _PyUnicode_CheckConsistency()

Other changes:

* _PyMem_IsPtrFreed(ptr) now also returns 1 if ptr is NULL
  (equals to 0).
* _PyBytesWriter_CheckConsistency() now returns 1 and is only used
  with assert().
* Reorder _PyObject_Dump() to write safe fields first, and only
  attempt to render repr() at the end.
2019-10-07 18:42:01 +02:00
Serhiy Storchaka
279f44678c
bpo-37206: Unrepresentable default values no longer represented as None. (GH-13933)
In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values
(like in the optional third parameter of getattr). "None" should be used if None is accepted
as argument and passing None has the same effect as not passing the argument at all.
2019-09-14 12:24:05 +03:00
Greg Price
3a4f66707e Cut disused recode_encoding logic in _PyBytes_DecodeEscape. (GH-16013)
All call sites pass NULL for `recode_encoding`, so this path is
completely untested.  That's been true since before Python 3.0.
It adds significant complexity to this logic, so it's best to
take it out.

All call sites now have a literal NULL, and that's been true since
commit 768921cf3 eliminated a conditional (`foo ? bar : NULL`) at
the call site in Python/ast.c where we're parsing a bytes literal.
But even before then, that condition `foo` had been a constant
since unadorned string literals started meaning Unicode, in commit
572dbf8f1 aka v3.0a1~1035 .

The `unicode` parameter is already unused, so mark it as unused too.
The code that acted on it was also taken out before Python 3.0, in
commit 8d30cc014 aka v3.0a1~1031 .

The function (PyBytes_DecodeEscape) is exposed in the API, but it's
never been documented.
2019-09-12 19:12:22 +01:00
Sergey Fedoseev
afdeb189e9 Remove unneeded assignment in PyBytes_Concat() (GH-15274)
The `wb.len = -1` assignment is unneeded since its introduction in 161d695fb0 as `PyObject_GetBuffer` always fills it in.
2019-09-10 17:11:10 +01:00
Greg Price
0711642eec Cut tricky goto that isn't needed, in _PyBytes_DecodeEscape. (GH-15825)
This is the sort of `goto` that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

* It jumps from the bottom of the loop to almost the top; the effect
  is to bypass the loop condition `s < end` and also the
  `if`-condition `*s != '\\'`, acting as if both are true.

* We've just decremented `s`, after incrementing it in the `switch`
  condition.  So it has the same value as when `s == end` failed.
  Before that was another increment... and before that we had
  `s < end`.  So `s < end` true, then increment, then `s == end`
  false... that means `s < end` is still true.

* Also this means `s` points to the same character as it did for the
  `switch` condition.  And there was a `case '\\'`, which we didn't
  hit -- so `*s != '\\'` is also true.

* That means this has no effect on the behavior!  The most it might do
  is an optimization -- we get to skip those two checks, because (as
  just proven above) we know they're true.

* But gosh, this is the *invalid escape sequence* path.  This does not
  seem like the kind of code path that calls for extreme optimization
  tricks.

So, take the `goto` and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code.  Or perhaps it won't!  That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to *stop* holding true... the compiler will adjust
this jump accordingly.  One of us fallible humans might not.
2019-09-10 09:51:04 +01:00
Victor Stinner
bed4817d52
Make PyXXX_Fini() functions private (GH-15531)
For example, rename PyTuple_Fini() to _PyTuple_Fini().

These functions are only declared in the internal C API.
2019-08-27 00:12:32 +02:00
Jeroen Demeyer
196a530e00 bpo-37483: add _PyObject_CallOneArg() function (#14558) 2019-07-04 19:31:34 +09:00