Commit Graph

41596 Commits

Author SHA1 Message Date
Joseph Myers
99671e72bb Add multithreaded test of sem_getvalue
Test coverage of sem_getvalue is fairly limited.  Add a test that runs
it on threads on each CPU.  For this purpose I adapted
tst-skeleton-thread-affinity.c; it didn't seem very suitable to use
as-is or include directly in a different test doing things per-CPU,
but did seem a suitable starting point (thus sharing
tst-skeleton-affinity.c) for such testing.

Tested for x86_64.
2024-11-22 16:58:51 +00:00
Adhemerval Zanella
bccb0648ea math: Use tanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
d846f4c12d math: Use lgammaf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
baa495f231 math: Use erfcf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
994fec2397 math: Use erff from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
c4c64ba5d1 math: Split s_erfF in erff and erfc
So we can eventually replace each implementation.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:26 -03:00
Adhemerval Zanella
c5d241f06b math: Use cbrtf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cbrtf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master        patched       improvement
x86_64                       68.6348        36.8908            46.25%
x86_64v2                     67.3418        36.6968            45.51%
x86_64v3                     63.4981        32.7859            48.37%
aarch64                      29.3172        12.1496            58.56%
power10                      18.0845         8.8893            50.85%
powerpc                      18.0859        8.79527            51.37%

reciprocal-throughput         master        patched       improvement
x86_64                       36.4369        13.3565            63.34%
x86_64v2                     37.3611        13.1149            64.90%
x86_64v3                     31.6024        11.2102            64.53%
aarch64                      18.6866        7.3474             60.68%
power10                       9.4758        3.6329             61.66%
powerpc                      9.58896        3.90439            59.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
2234b08763 benchtests: Add tanf benchmark
Random inputs in [-pi, pi].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
ce4122ff97 benchtests: Add lgammaf benchmark
Random inputs in the range [-20.0,20.0].

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
d7612d04e4 benchtests: Add erfcf benchmark
It is based on binary64 erfc-inputs, with random inputs in
[0,b=0x1.41bbf6p+3] where b in the smallest number such that
erfcf(b) rounds to 0 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
50657965da benchtests: Add erff benchmark
It is based on binary64 erf-inputs, with random inputs in [0,b=0x1.f5a888p+1]
where b in the smallest number such that erff(b) rounds to 1 (to nearest).

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
53c80be8da benchtests: Add cbrtf benchmark
Based on binary64 benchtests, with random inputs in [1,8].
2024-11-22 10:01:03 -03:00
H.J. Lu
e7b5532721 elf: Handle static PIE with non-zero load address [BZ #31799]
For a static PIE with non-zero load address, its PT_DYNAMIC segment
entries contain the relocated values for the load address in static PIE.
Since static PIE usually doesn't have PT_PHDR segment, use p_vaddr of
the PT_LOAD segment with offset == 0 as the load address in static PIE
and adjust the entries of PT_DYNAMIC segment in static PIE by properly
setting the l_addr field for static PIE.  This fixes BZ #31799.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-11-22 06:22:13 +08:00
Siddhesh Poyarekar
713d6d7e78 x86/string: Use movsl instead of movsd in strncat [BZ #32344]
The previous patch missed strncat, so fixed that.

Resolves: BZ #32344

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-11-21 17:11:01 -05:00
Florian Weimer
7a61e7f557 stdlib: Make getenv thread-safe in more cases
Async-signal-safety is preserved, too.  In fact, getenv is fully
reentrant and can be called from the malloc call in setenv
(if a replacement malloc uses getenv during its initialization).

This is relatively easy to implement because even before this change,
setenv, unsetenv, clearenv, putenv do not deallocate the environment
strings themselves as they are removed from the environment.

The main changes are:

* Use release stores for environment array updates, following
  the usual pattern for safely publishing immutable data
  (in this case, the environment strings).

* Do not deallocate the environment array.  Instead, keep older
  versions around and adopt an  exponential resizing policy.  This
  results in an amortized constant space leak per active environment
  variable, but there already is such a leak for the variable itself
  (and that is even length-dependent, and includes no-longer used
  values).

* Add a seqlock-like mechanism to retry getenv if a concurrent
  unsetenv is observed.  Without that, it is possible that
  getenv returns NULL for a variable that is never unset.  This
  is visible on some AArch64 implementations with the newly
  added stdlib/tst-getenv-unsetenv test case.  The mechanism
  is not a pure seqlock because it tolerates one write from
  unsetenv.  This avoids the need for a second copy of the
  environ array that getenv can read from a signal handler
  that happens to interrupt an unsetenv call.

No manual updates are included with this patch because environ
usage with execve, posix_spawn, system is still not thread-safe
relative unsetenv.  The new process may end up with an environment
that misses entries that were never unset.  This is the same issue
described above for getenv.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-21 21:10:52 +01:00
Andrew Pinski
e6590f0c86 aarch64: Remove non-temporal load/stores from oryon-1's memset
The hardware architects have a new recommendation not to use
non-temporal load/stores for memset. This patch removes this path.
I found there was no difference in the memset speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-21 11:32:23 -03:00
Andrew Pinski
eb5eeb4740 aarch64: Remove non-temporal load/stores from oryon-1's memcpy
The hardware architects have a new recommendation not to use
non-temporal load/stores for memcpy. This patch removes this path.
I found there was no difference in the memcpy speed with/without
non-temporal load/stores either.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-21 11:32:17 -03:00
Sachin Monga
3051f3495c powerpc64le: _init/_fini file changes for ROP
The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops.  This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.

Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.

Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-11-20 16:50:34 -05:00
Samuel Thibault
c0365d3791 mman.h: Fix MAP_HASSEMPHORE typo
BSD's MAP_HASSEMAPHORE is with an A. MAP_HASSEMPHORE is not used in any
Debian software for instance.
2024-11-20 19:52:44 +01:00
Andreas Schwab
6e7778ecde misc: remove extra va_end in error_tail (bug 32233)
This is an addendum to commit b7b52b9dec ("error, error_at_line: Add
missing va_end calls"), which added the va_end calls in the callers where
they belong.
2024-11-20 14:05:52 +01:00
Andreas Schwab
ab545460b0 intl: avoid alloca for arbitrary sizes (bug 32380)
Use malloc for the copy of the domain name and the category value, which
can both be of arbitrary size.
2024-11-20 14:05:52 +01:00
Yury Khrustalev
47311cca31 manual: Add description of AArch64-specific pkey flags
Describe AArch64 specific flags PKEY_DISABLE_READ and PKEY_DISABLE_EXECUTE that
are available on AArch64 systems with enabled Stage 1 permission overlays
feature introduced in Armv8.9 / 9.4 (FEAT_S1POE).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-20 11:30:58 +00:00
Yury Khrustalev
f4d00dd60d AArch64: Add support for memory protection keys
This patch adds support for memory protection keys on AArch64 systems with
enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4
(FEAT_S1POE) [1].

 1. Internal functions "pkey_read" and "pkey_write" to access data
    associated with memory protection keys.
 2. Implementation of API functions "pkey_get" and "pkey_set" for
    the AArch64 target.
 3. AArch64-specific PKEY flags for READ and EXECUTE (see below).
 4. New target-specific test that checks behaviour of pkeys on
    AArch64 targets.
 5. This patch also extends existing generic test for pkeys.
 6. HWCAP constant for Permission Overlay Extension feature.

To support more accurate mapping of underlying permissions to the
PKEY flags, we introduce additional AArch64-specific flags. The full
list of flags is:

 - PKEY_UNRESTRICTED: 0x0 (for completeness)
 - PKEY_DISABLE_ACCESS: 0x1 (existing flag)
 - PKEY_DISABLE_WRITE: 0x2 (existing flag)
 - PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific)
 - PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific)

The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as
it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ.
For this reason mapping between permission bits RWX and "restrictions"
bits awxr (a for disable access, etc) becomes complicated:

 - PKEY_DISABLE_ACCESS disables both R and W
 - PKEY_DISABLE_{WRITE,READ} disables W and R respectively
 - PKEY_DISABLE_EXECUTE disables X

Combinations like the one below are accepted although they are redundant:

 - PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE

Reverse mapping tries to retain backward compatibility and ORs
PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and
PKEY_DISABLE_WRITE would be present.

This will break code that compares pkey_get output with == instead
of using bitwise operations. The latter is more correct since PKEY_*
constants are essentially bit flags.

It should be noted that PKEY_DISABLE_ACCESS does not prevent execution.

[1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-20 11:30:58 +00:00
Andrew Pinski
e162ab2bf1 AArch64: Remove thunderx{,2} memcpy
ThunderX1 and ThunderX2 have been retired for a few years now.
So let's remove the thunderx{,2} specific versions of memcpy.
The performance gain or them was for medium and large sizes
while the generic (aarch64) memcpy will handle just slightly worse.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-11-20 11:23:53 +00:00
Joseph Myers
d899b48a30 Fix femode_t conditionals for arc and or1k
Two of the architecture bits/fenv.h headers define femode_t if
__GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition
__GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit
0175c9e9be, but were probably first
developed before it and then not updated to take account of its
changes).  This results in failures of the installed headers check for
fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't
yet have an installed-headers test specifically for C23 mode and don't
yet require a compiler with such a mode for building glibc) together
with a combination of options leaving C23 features enabled, since the
declarations of functions using femode_t use the correct conditions;
see
<https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>.
Fix the conditionals to get <fenv.h> to work correctly in C23 mode
again.

Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf,
or1k-linux-gnu-hard, or1k-linux-gnu-soft).
2024-11-19 22:25:39 +00:00
Mahesh Bodapati
3ef7e42861 powerpc64le: Optimized strcat for POWER10
This patch adds an optimized strcat which makes use of the default
strcat function which calls the Power10 strcpy and strlen routines.
2024-11-19 15:59:15 -05:00
Peter Bergner
229265cc2c powerpc: Improve the inline asm for syscall wrappers
Update the inline asm syscall wrappers to match the newer register constraint
usage in INTERNAL_VSYSCALL_CALL_TYPE.  Use the faster mfocrf instruction when
available, rather than the slower mfcr microcoded instruction.
2024-11-19 12:43:57 -05:00
gfleury
7f045c0b48 htl: move pthread_attr_init into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
1a1cedd635 htl: move pthread_attr_setguardsize into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
f26b272a75 htl: move pthread_attr_setschedparam into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
32aa498ceb htl: move pthread_attr_setscope into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
4a8b7d7e62 htl: move pthread_attr_setstackaddr into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
d69a010e7b htl: move pthread_attr_setstacksize into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
330c1fad5b htl: move pthread_attr_getstack into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
1428ae39e8 htl: move pthread_attr_getstackaddr into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:37:35 +01:00
gfleury
993440a260 htl move pthread_attr_getstacksize into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:34:34 +01:00
gfleury
4bcda927fe htl move pthread_attr_getscope into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:19:00 +01:00
gfleury
6caf24c972 htl move pthread_attr_getguardsize into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:18:59 +01:00
gfleury
f55cf584ff htl: move __pthread_default_attr into libc
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:08:27 +01:00
gfleury
736befab6c htl: move pthread_attr_destroy into libc.
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19 01:08:14 +01:00
Maciej W. Rozycki
ce13ab5033 stdio-common: Fix C23-ism in formatted output specifier tests [BZ #32360]
Nameless function parameters have only been added to ISO C with the C23
revision of the language standard.  Give names to the unused parameters
of the stub 'dladdr' implementation then so as to make compilation happy
with the earlier language definitions, fixing errors such as:

tst-printf-format-skeleton.c:374:9: error: parameter name omitted
  374 | dladdr (const void *, Dl_info *)

reported by older compilers.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-15 22:43:54 +00:00
Aurelien Jarno
6c915c73d0 elf: handle addition overflow in _dl_find_object_update_1 [BZ #32245]
The remaining_to_add variable can be 0 if (current_used + count) wraps,
This is caught by GCC 14+ on hppa, which determines from there that
target_seg could be be NULL when remaining_to_add is zero, which in
turns causes a -Wstringop-overflow warning:

 In file included from ../include/atomic.h:49,
                  from dl-find_object.c:20:
 In function '_dlfo_update_init_seg',
     inlined from '_dl_find_object_update_1' at dl-find_object.c:689:30,
     inlined from '_dl_find_object_update' at dl-find_object.c:805:13:
 ../sysdeps/unix/sysv/linux/hppa/atomic-machine.h:44:4: error: '__atomic_store_4' writing 4 bytes into a region of size 0 overflows the destination [-Werror=stringop-overflow=]
    44 |    __atomic_store_n ((mem), (val), __ATOMIC_RELAXED);                        \
       |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 dl-find_object.c:644:3: note: in expansion of macro 'atomic_store_relaxed'
   644 |   atomic_store_relaxed (&seg->size, new_seg_size);
       |   ^~~~~~~~~~~~~~~~~~~~
 In function '_dl_find_object_update':
 cc1: note: destination object is likely at address zero

In practice, this is not possible as it represent counts of link maps.
Link maps have sizes larger than 1 byte, so the sum of any two link map
counts will always fit within a size_t without wrapping around.

This patch therefore adds a check on remaining_to_add == 0 and tell GCC
that this can not happen using __builtin_unreachable.

Thanks to Andreas Schwab for the investigation.

Closes: BZ #32245
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: John David Anglin <dave.anglin@bell.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-13 23:06:43 +01:00
Noah Goldstein
c510681a69 x86/string: Use movsl instead of movsd in strncpy/strncat [BZ #32344]
`ld`, starting at 2.40, emits a warning when using `movsd`. There is
no change to the actual code produced.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-13 10:09:30 -06:00
Jonathan Wakely
8d3fb43797 manual: Fix overeager s/int/size_t/ in memory.texi
The change in e3960d1c57 should only have
affected 'int' not 'internally'.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
2024-11-13 14:43:58 +00:00
John David Anglin
b919fe1f6d hppa: Update libm-test-ulps
Update imaginary part of csin.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2024-11-12 21:32:54 -05:00
Samuel Thibault
e5c2738f17 Revert "hurd: Stop depending on the default_pager stubs provided by gnumach"
This reverts commit f7f7dd8009.

default_pager is actually also used in e.g. xosview.
2024-11-13 01:34:09 +01:00
Adhemerval Zanella
461cab1de7 linux: Add support for getrandom vDSO
Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque
state allocated with mmap using flags specified by the vDSO.

Multiple states are allocated at once, as many as fit into a page, and
these are held in an array of available states to be doled out to each
thread upon first use, and recycled when a thread terminates. As these
states run low, more are allocated.

To make this procedure async-signal-safe, a simple guard is used in the
LSB of the opaque state address, falling back to the syscall if there's
reentrancy contention.

Also, _Fork() is handled by blocking signals on opaque state allocation
(so _Fork() always sees a consistent state even if it interrupts a
getrandom() call) and by iterating over the thread stack cache on
reclaim_stack. Each opaque state will be in the free states list
(grnd_alloc.states) or allocated to a running thread.

The cancellation is handled by always using GRND_NONBLOCK flags while
calling the vDSO, and falling back to the cancellable syscall if the
kernel returns EAGAIN (would block). Since getrandom is not defined by
POSIX and cancellation is supported as an extension, the cancellation is
handled as 'may occur' instead of 'shall occur' [1], meaning that if
vDSO does not block (the expected behavior) getrandom will not act as a
cancellation entrypoint. It avoids a pthread_testcancel call on the fast
path (different than 'shall occur' functions, like sem_wait()).

It is currently enabled for x86_64, which is available in Linux 6.11,
and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are
available in Linux 6.12.

Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1]
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64
Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64
Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64
Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-11-12 14:42:12 -03:00
Siddhesh Poyarekar
b583b1080b io: Add setuid tests for faccessat
Add a new test tst-faccessat-setuid that iterates through real and
effective UID/GID combination and tests the faccessat() interface for
default and AT_EACCESS flags.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-12 10:19:58 -05:00
Siddhesh Poyarekar
ea75860813 tst-faccessat.c: Port to libsupport
Use libsupport convenience functions and macros instead of the old
test-skeleton.  Also add a new xdup() convenience wrapper function.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-12 10:19:58 -05:00
Siddhesh Poyarekar
04b1eb161f support: Add xdup
Add xdup as the error-checking version of dup for test cases.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-12 10:19:58 -05:00