glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-23 01:33:36 +08:00

Author	SHA1	Message	Date
Joseph Myers	99671e72bb	Add multithreaded test of sem_getvalue Test coverage of sem_getvalue is fairly limited. Add a test that runs it on threads on each CPU. For this purpose I adapted tst-skeleton-thread-affinity.c; it didn't seem very suitable to use as-is or include directly in a different test doing things per-CPU, but did seem a suitable starting point (thus sharing tst-skeleton-affinity.c) for such testing. Tested for x86_64.	2024-11-22 16:58:51 +00:00
Adhemerval Zanella	bccb0648ea	math: Use tanf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic tanf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, and to use a generic 128 bit routine for ABIs that do not support it natively. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 82.3961 54.8052 33.49% x86_64v2 82.3415 54.8052 33.44% x86_64v3 69.3661 50.4864 27.22% i686 219.271 45.5396 79.23% aarch64 29.2127 19.1951 34.29% power10 19.5060 16.2760 16.56% reciprocal-throughput master patched improvement x86_64 28.3976 19.7334 30.51% x86_64v2 28.4568 19.7334 30.65% x86_64v3 21.1815 16.1811 23.61% i686 105.016 15.1426 85.58% aarch64 18.1573 10.7681 40.70% power10 8.7207 8.7097 0.13% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:27 -03:00
Adhemerval Zanella	d846f4c12d	math: Use lgammaf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic lgammaf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, to use math_narrow_eval on overflow usage, and to adapt to make it reentrant. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 86.5609 70.3278 18.75% x86_64v2 78.3030 69.9709 10.64% x86_64v3 74.7470 59.8457 19.94% i686 387.355 229.761 40.68% aarch64 40.8341 33.7563 17.33% power10 26.5520 16.1672 39.11% powerpc 28.3145 17.0625 39.74% reciprocal-throughput master patched improvement x86_64 68.0461 48.3098 29.00% x86_64v2 55.3256 47.2476 14.60% x86_64v3 52.3015 38.9028 25.62% i686 340.848 195.707 42.58% aarch64 36.8000 30.5234 17.06% power10 20.4043 12.6268 38.12% powerpc 22.6588 13.8866 38.71% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:27 -03:00
Adhemerval Zanella	baa495f231	math: Use erfcf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic erfcf. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 98.8796 66.2142 33.04% x86_64v2 98.9617 67.4221 31.87% x86_64v3 87.4161 53.1754 39.17% aarch64 33.8336 22.0781 34.75% power10 21.1750 13.5864 35.84% powerpc 21.4694 13.8149 35.65% reciprocal-throughput master patched improvement x86_64 48.5620 27.6731 43.01% x86_64v2 47.9497 28.3804 40.81% x86_64v3 42.0255 18.1355 56.85% aarch64 24.3938 13.4041 45.05% power10 10.4919 6.1881 41.02% powerpc 11.763 6.76468 42.49% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:27 -03:00
Adhemerval Zanella	994fec2397	math: Use erff from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic erff. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 85.7363 45.1372 47.35% x86_64v2 86.6337 38.5816 55.47% x86_64v3 71.3810 34.0843 52.25% i686 190.143 97.5014 48.72% aarch64 34.9091 14.9320 57.23% power10 38.6160 8.5188 77.94% powerpc 39.7446 8.45781 78.72% reciprocal-throughput master patched improvement x86_64 35.1739 14.7603 58.04% x86_64v2 34.5976 11.2283 67.55% x86_64v3 27.3260 9.8550 63.94% i686 91.0282 30.8840 66.07% aarch64 22.5831 6.9615 69.17% power10 18.0386 3.0918 82.86% powerpc 20.7277 3.63396 82.47% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:27 -03:00
Adhemerval Zanella	c4c64ba5d1	math: Split s_erfF in erff and erfc So we can eventually replace each implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-22 10:52:26 -03:00
Adhemerval Zanella	c5d241f06b	math: Use cbrtf from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic cbrtf. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 68.6348 36.8908 46.25% x86_64v2 67.3418 36.6968 45.51% x86_64v3 63.4981 32.7859 48.37% aarch64 29.3172 12.1496 58.56% power10 18.0845 8.8893 50.85% powerpc 18.0859 8.79527 51.37% reciprocal-throughput master patched improvement x86_64 36.4369 13.3565 63.34% x86_64v2 37.3611 13.1149 64.90% x86_64v3 31.6024 11.2102 64.53% aarch64 18.6866 7.3474 60.68% power10 9.4758 3.6329 61.66% powerpc 9.58896 3.90439 59.28% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-22 10:01:03 -03:00
Siddhesh Poyarekar	713d6d7e78	x86/string: Use `movsl` instead of `movsd` in strncat [BZ #32344 ] The previous patch missed strncat, so fixed that. Resolves: BZ #32344 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2024-11-21 17:11:01 -05:00
Andrew Pinski	e6590f0c86	aarch64: Remove non-temporal load/stores from oryon-1's memset The hardware architects have a new recommendation not to use non-temporal load/stores for memset. This patch removes this path. I found there was no difference in the memset speed with/without non-temporal load/stores either. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-21 11:32:23 -03:00
Andrew Pinski	eb5eeb4740	aarch64: Remove non-temporal load/stores from oryon-1's memcpy The hardware architects have a new recommendation not to use non-temporal load/stores for memcpy. This patch removes this path. I found there was no difference in the memcpy speed with/without non-temporal load/stores either. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-21 11:32:17 -03:00
Sachin Monga	3051f3495c	powerpc64le: _init/_fini file changes for ROP The ROP instructions were added in ISA 3.1 (ie, Power10), however they were defined so that if executed on older cpus, they would behave as nops. This allows us to emit them on older cpus and they'd just be ignored, but if run on a Power10, then the binary would be ROP protected. Hash instructions use negative offsets so the default position of ROP pointer is FRAME_ROP_SAVE from caller's SP. Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve additional 16 bytes for ROP save slot and padding. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-11-20 16:50:34 -05:00
Yury Khrustalev	f4d00dd60d	AArch64: Add support for memory protection keys This patch adds support for memory protection keys on AArch64 systems with enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4 (FEAT_S1POE) [1]. 1. Internal functions "pkey_read" and "pkey_write" to access data associated with memory protection keys. 2. Implementation of API functions "pkey_get" and "pkey_set" for the AArch64 target. 3. AArch64-specific PKEY flags for READ and EXECUTE (see below). 4. New target-specific test that checks behaviour of pkeys on AArch64 targets. 5. This patch also extends existing generic test for pkeys. 6. HWCAP constant for Permission Overlay Extension feature. To support more accurate mapping of underlying permissions to the PKEY flags, we introduce additional AArch64-specific flags. The full list of flags is: - PKEY_UNRESTRICTED: 0x0 (for completeness) - PKEY_DISABLE_ACCESS: 0x1 (existing flag) - PKEY_DISABLE_WRITE: 0x2 (existing flag) - PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific) - PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific) The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ. For this reason mapping between permission bits RWX and "restrictions" bits awxr (a for disable access, etc) becomes complicated: - PKEY_DISABLE_ACCESS disables both R and W - PKEY_DISABLE_{WRITE,READ} disables W and R respectively - PKEY_DISABLE_EXECUTE disables X Combinations like the one below are accepted although they are redundant: - PKEY_DISABLE_ACCESS \| PKEY_DISABLE_READ \| PKEY_DISABLE_WRITE Reverse mapping tries to retain backward compatibility and ORs PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and PKEY_DISABLE_WRITE would be present. This will break code that compares pkey_get output with == instead of using bitwise operations. The latter is more correct since PKEY_* constants are essentially bit flags. It should be noted that PKEY_DISABLE_ACCESS does not prevent execution. [1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4 Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-20 11:30:58 +00:00
Andrew Pinski	e162ab2bf1	AArch64: Remove thunderx{,2} memcpy ThunderX1 and ThunderX2 have been retired for a few years now. So let's remove the thunderx{,2} specific versions of memcpy. The performance gain or them was for medium and large sizes while the generic (aarch64) memcpy will handle just slightly worse. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-11-20 11:23:53 +00:00
Joseph Myers	d899b48a30	Fix femode_t conditionals for arc and or1k Two of the architecture bits/fenv.h headers define femode_t if __GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition __GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit `0175c9e9be`, but were probably first developed before it and then not updated to take account of its changes). This results in failures of the installed headers check for fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't yet have an installed-headers test specifically for C23 mode and don't yet require a compiler with such a mode for building glibc) together with a combination of options leaving C23 features enabled, since the declarations of functions using femode_t use the correct conditions; see <https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>. Fix the conditionals to get <fenv.h> to work correctly in C23 mode again. Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf, or1k-linux-gnu-hard, or1k-linux-gnu-soft).	2024-11-19 22:25:39 +00:00
Mahesh Bodapati	3ef7e42861	powerpc64le: Optimized strcat for POWER10 This patch adds an optimized strcat which makes use of the default strcat function which calls the Power10 strcpy and strlen routines.	2024-11-19 15:59:15 -05:00
Peter Bergner	229265cc2c	powerpc: Improve the inline asm for syscall wrappers Update the inline asm syscall wrappers to match the newer register constraint usage in INTERNAL_VSYSCALL_CALL_TYPE. Use the faster mfocrf instruction when available, rather than the slower mfcr microcoded instruction.	2024-11-19 12:43:57 -05:00
gfleury	7f045c0b48	htl: move pthread_attr_init into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	1a1cedd635	htl: move pthread_attr_setguardsize into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	f26b272a75	htl: move pthread_attr_setschedparam into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	32aa498ceb	htl: move pthread_attr_setscope into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	4a8b7d7e62	htl: move pthread_attr_setstackaddr into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	d69a010e7b	htl: move pthread_attr_setstacksize into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	330c1fad5b	htl: move pthread_attr_getstack into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	1428ae39e8	htl: move pthread_attr_getstackaddr into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:37:35 +01:00
gfleury	993440a260	htl move pthread_attr_getstacksize into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:34:34 +01:00
gfleury	4bcda927fe	htl move pthread_attr_getscope into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:19:00 +01:00
gfleury	6caf24c972	htl move pthread_attr_getguardsize into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:18:59 +01:00
gfleury	f55cf584ff	htl: move __pthread_default_attr into libc Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:08:27 +01:00
gfleury	736befab6c	htl: move pthread_attr_destroy into libc. Signed-off-by: gfleury <gfleury@disroot.org>	2024-11-19 01:08:14 +01:00
Noah Goldstein	c510681a69	x86/string: Use `movsl` instead of `movsd` in strncpy/strncat [BZ #32344 ] `ld`, starting at 2.40, emits a warning when using `movsd`. There is no change to the actual code produced. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-11-13 10:09:30 -06:00
John David Anglin	b919fe1f6d	hppa: Update libm-test-ulps Update imaginary part of csin. Signed-off-by: John David Anglin <dave.anglin@bell.net>	2024-11-12 21:32:54 -05:00
Samuel Thibault	e5c2738f17	Revert "hurd: Stop depending on the default_pager stubs provided by gnumach" This reverts commit `f7f7dd8009`. default_pager is actually also used in e.g. xosview.	2024-11-13 01:34:09 +01:00
Adhemerval Zanella	461cab1de7	linux: Add support for getrandom vDSO Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always sees a consistent state even if it interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque state will be in the free states list (grnd_alloc.states) or allocated to a running thread. The cancellation is handled by always using GRND_NONBLOCK flags while calling the vDSO, and falling back to the cancellable syscall if the kernel returns EAGAIN (would block). Since getrandom is not defined by POSIX and cancellation is supported as an extension, the cancellation is handled as 'may occur' instead of 'shall occur' [1], meaning that if vDSO does not block (the expected behavior) getrandom will not act as a cancellation entrypoint. It avoids a pthread_testcancel call on the fast path (different than 'shall occur' functions, like sem_wait()). It is currently enabled for x86_64, which is available in Linux 6.11, and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are available in Linux 6.12. Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1] Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64 Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64 Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64 Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x	2024-11-12 14:42:12 -03:00
caiyinyu	ab4388f91c	LoongArch: Update ulps Needed for test-float-cacosh, test-float-csin, test-float32-cacosh and test-float32-csin. Signed-off-by: caiyinyu <caiyinyu@loongson.cn> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-11-12 09:19:23 +08:00
Samuel Thibault	d2e65aa7d6	mach: Fix __xpg_strerror_r on in-range but undefined errors [BZ #32350 ] For instance, 1073741906 leads to system 16, subsystem 0 and code 82, which is in range (max_code is 122), but not defined. Return EINVAL in that case, like	2024-11-09 20:00:40 +01:00
Noah Goldstein	6754b5becf	x86/string: Use `movsl` instead of `movsd` [BZ #32344 ] `ld`, starting at 2.40, emits a warning when using `movsd`. There is no change to the actual code produced. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-11-08 17:23:05 -06:00
Joseph Myers	c7dcf594f4	Rename new tst-sem17 test to tst-sem18 As noted by Adhemerval, we already have a tst-sem17 in nptl. Tested for x86_64.	2024-11-08 17:08:09 +00:00
Joseph Myers	f745d78e26	Avoid uninitialized result in sem_open when file does not exist A static analyzer apparently reported an uninitialized use of the variable result in sem_open in the case where the file is required to exist but does not exist. The report appears to be correct; set result to SEM_FAILED in that case, and add a test for it. Note: the test passes for me even without the sem_open fix, I guess because result happens to get value SEM_FAILED (i.e. 0) when uninitialized. Tested for x86_64.	2024-11-08 01:53:48 +00:00
Michael Jeanson	97f60abd25	nptl: initialize rseq area prior to registration Per the rseq syscall documentation, 3 fields are required to be initialized by userspace prior to registration, they are 'cpu_id', 'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread' is cleared on all architectures, explicitly set those 3 fields prior to registration. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-11-07 22:23:49 +01:00
Mark Wielaard	c18de3b76a	s390x: Update ulps Needed for test-float-cacosh, test-float-csin, test-float32-cacosh and test-float32-csin. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-11-07 20:58:05 +01:00
Adhemerval Zanella	12b8dd7718	math: Fix log10f on some ABIs The commit `9247f53219` triggered some regressions on loongarch and riscv: math/test-float-log10 math/test-float32-log10 And it is due a wrong sync with CORE-MATH for special 0.0/-0.0 inputs. Checked on aarch64-linux-gnu and loongarch64-linux-gnu-lp64d.	2024-11-07 07:59:43 -03:00
caiyinyu	1b70a0a024	nptl: fix __builtin_thread_pointer detection on LoongArch Signed-off-by: caiyinyu <caiyinyu@loongson.cn>	2024-11-07 14:08:30 +08:00
Florian Weimer	ba60be8735	math: Fix incorrect results of exp10m1f with some GCC versions On GCC 11 (x86-64), the previous code produced test failures like this one: Failure: Test: exp10m1_towardzero (-0x1.1p+4) Result: is: -1.00000000e+00 -0x1.000000p+0 should be: -9.99999940e-01 -0x1.fffffep-1 difference: 5.96046447e-08 0x1.000000p-24 ulp : 1.0000 max.ulp : 0.0000 Apply a similar fix to exp2m1f. Co-authored-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-06 16:09:05 +01:00
Yury Khrustalev	ff254cabd6	misc: Align argument name for pkey_*() functions with the manual Change name of the access_rights argument to access_restrictions of the following functions: - pkey_alloc() - pkey_set() as this argument refers to access restrictions rather than access rights and previous name might have been misleading. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-06 13:11:33 +00:00
Florian Weimer	f2326c2ec0	elf: Introduce _dl_relocate_object_no_relro And make _dl_protect_relro apply RELRO conditionally. Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-06 10:33:44 +01:00
Aurelien Jarno	273694cd78	Add Arm HWCAP2_* constants from Linux 3.15 and 6.2 to <bits/hwcap.h> Linux 3.15 and 6.2 added HWCAP2_* values for Arm. These bits have already been added to dl-procinfo.{c,h} in commits `9aea0cb842` and `8ebe9c0b38`. Also add them to <bits/hwcap.h> so that they can be used in user code. For example, for checking bits in the value returned by getauxval(AT_HWCAP2). Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2024-11-05 21:03:37 +01:00
Joe Ramsay	2d82d781a5	AArch64: Remove SVE erf and erfc tables By using a combination of mask-and-add instead of the shift-based index calculation the routines can share the same table as other variants with no performance degradation. The tables change name because of other changes in downstream AOR. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-11-01 16:10:41 +00:00
Adhemerval Zanella	6d477b8de8	x86_64: Add exp2m1f with FMA The CORE-MATH exp2m1f implementation showed slight worse latency when using x86_64 baseline ABI. This patch adds a ifunc variant with similar performance for x86_64-v3. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-01 11:27:40 -03:00
Adhemerval Zanella	c28f8d7f19	x86_64: Add exp10m1f with FMA The CORE-MATH exp10m1f implementation showed slight worse latency when using x86_64 baseline ABI. This patch adds a ifunc variant with similar performance for x86_64-v3. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-01 11:27:40 -03:00
Adhemerval Zanella	f338c7c5f5	math: Use log10p1f from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows slight better performance to the generic log10p1f. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): Latency master patched improvement x86_64 68.5251 32.2627 52.92% x86_64v2 68.8912 32.7887 52.41% x86_64v3 59.3427 27.0521 54.41% i686 162.026 103.383 36.19% aarch64 26.8513 14.5695 45.74% power10 12.7426 8.4929 33.35% powerpc 16.6768 9.29135 44.29% reciprocal-throughput master patched improvement x86_64 26.0969 12.4023 52.48% x86_64v2 25.0045 11.0748 55.71% x86_64v3 20.5610 10.2995 49.91% i686 89.8842 78.5211 12.64% aarch64 17.1200 9.4832 44.61% power10 6.7814 6.4258 5.24% powerpc 15.769 7.6825 51.28% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2024-11-01 11:27:40 -03:00

1 2 3 4 5 ...

16456 Commits