cpython/Objects/stringlib
Jessica Clarke dec0757549
bpo-43179: Generalise alignment for optimised string routines (GH-24624)
* Remove m68k-specific hack from ascii_decode

On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.

Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.

NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.

* Optimise string routines for architectures with non-natural alignment

C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).

This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
2021-03-31 12:12:39 +02:00
..
clinic bpo-40792: Make the result of PyNumber_Index() always having exact type int. (GH-20443) 2020-05-28 10:33:45 +03:00
asciilib.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
codecs.h bpo-43179: Generalise alignment for optimised string routines (GH-24624) 2021-03-31 12:12:39 +02:00
count.h Implement PEP 393. 2011-09-28 07:41:54 +02:00
ctype.h bpo-35081: Move bytes_methods.h to the internal C API (GH-18492) 2020-02-12 22:32:34 +01:00
eq.h bpo-38249: Expand Py_UNREACHABLE() to __builtin_unreachable() in the release mode. (GH-16329) 2020-03-09 20:49:52 +02:00
fastsearch.h bpo-41972: Use the two-way algorithm for string searching (GH-22904) 2021-02-28 12:20:50 -06:00
find_max_char.h bpo-43179: Generalise alignment for optimised string routines (GH-24624) 2021-03-31 12:12:39 +02:00
find.h Issue #26765: Moved common code and docstrings for bytes and bytearray methods 2016-05-04 22:23:26 +03:00
join.h bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586) 2020-12-01 09:56:42 +01:00
localeutil.h bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) 2018-11-26 13:40:01 +01:00
partition.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
README.txt bpo-40521: Make bytes singletons per interpreter (GH-21074) 2020-06-23 15:54:35 +02:00
replace.h Issue #16061: Speed up str.replace() for replacing 1-character strings. 2013-04-13 22:45:04 +03:00
split.h bpo-39573: Use Py_SET_SIZE() function (GH-18402) 2020-02-07 23:18:08 +01:00
stringdefs.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
stringlib_find_two_way_notes.txt bpo-41972: Use the two-way algorithm for string searching (GH-22904) 2021-02-28 12:20:50 -06:00
transmogrify.h bpo-28029: Make "".replace("", s, n) returning s for any n != 0. (GH-16981) 2019-10-30 12:03:53 +02:00
ucs1lib.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
ucs2lib.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
ucs4lib.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00
undef.h bpo-39372: Clean header files of declared interfaces with no implementations (GH-18037) 2020-01-18 03:14:59 +00:00
unicode_format.h bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587) 2020-12-01 10:37:39 +01:00
unicodedefs.h bpo-40521: Make empty Unicode string per interpreter (GH-21096) 2020-06-24 00:10:40 +02:00

bits shared by the bytesobject and unicodeobject implementations (and
possibly other modules, in a not too distant future).

the stuff in here is included into relevant places; see the individual
source files for details.

--------------------------------------------------------------------
the following defines used by the different modules:

STRINGLIB_CHAR

    the type used to hold a character (char or Py_UNICODE)

STRINGLIB_GET_EMPTY()

    returns a PyObject representing the empty string, only to be used if
    STRINGLIB_MUTABLE is 0. It must not be NULL.

Py_ssize_t STRINGLIB_LEN(PyObject*)

    returns the length of the given string object (which must be of the
    right type)

PyObject* STRINGLIB_NEW(STRINGLIB_CHAR*, Py_ssize_t)

    creates a new string object

STRINGLIB_CHAR* STRINGLIB_STR(PyObject*)

    returns the pointer to the character data for the given string
    object (which must be of the right type)

int STRINGLIB_CHECK_EXACT(PyObject *)

    returns true if the object is an instance of our type, not a subclass

STRINGLIB_MUTABLE

    must be 0 or 1 to tell the cpp macros in stringlib code if the object
    being operated on is mutable or not