Non-gcc compilers (clang and possibly other compilers that do not
masquerade as gcc 5.0 or later) are unable to use
__warn_memset_zero_len since the symbol is no longer available on
glibc built with gcc 5.0 or later. While it was likely an oversight
that caused this omission, the fact that it wasn't noticed until
recently (when clang closed the gap on _FORTIFY_SUPPORT) that the
symbol was missing.
Given that both gcc and clang are capable of doing this check in the
compiler, drop all remaining signs of __warn_memset_zero_len from
glibc so that no more objects are built with this symbol in future.
(cherry-picked from dc274b1416)
The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)
In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.
Fixes bug 26798.
(cherry picked from commit 558251bd87)
Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.
Backport of commit 59803e81f9 upstream,
with the fix from cb3a749a22 ("x86:
Restore processing of cache size tunables in init_cacheinfo") applied.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Co-Authored-by: Florian Weimer <fweimer@redhat.com>
The __x86_shared_non_temporal_threshold determines when memcpy on x86
uses non_temporal stores to avoid pushing other data out of the last
level cache.
This patch proposes to revert the calculation change made by H.J. Lu's
patch of June 2, 2017.
H.J. Lu's patch selected a threshold suitable for a single thread
getting maximum performance. It was tuned using the single threaded
large memcpy micro benchmark on an 8 core processor. The last change
changes the threshold from using 3/4 of one thread's share of the
cache to using 3/4 of the entire cache of a multi-threaded system
before switching to non-temporal stores. Multi-threaded systems with
more than a few threads are server-class and typically have many
active threads. If one thread consumes 3/4 of the available cache for
all threads, it will cause other active threads to have data removed
from the cache. Two examples show the range of the effect. John
McCalpin's widely parallel Stream benchmark, which runs in parallel
and fetches data sequentially, saw a 20% slowdown with this patch on
an internal system test of 128 threads. This regression was discovered
when comparing OL8 performance to OL7. An example that compares
normal stores to non-temporal stores may be found at
https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test
shows performance loss of 400 to 500% due to a failure to use
nontemporal stores. These performance losses are most likely to occur
when the system load is heaviest and good performance is critical.
The tunable x86_non_temporal_threshold can be used to override the
default for the knowledgable user who really wants maximum cache
allocation to a single thread in a multi-threaded system.
The manual entry for the tunable has been expanded to provide
more information about its purpose.
modified: sysdeps/x86/cacheinfo.c
modified: manual/tunables.texi
(cherry picked from commit d3c5702747)
Both commands are Linux extensions where the third argument is either
a 'struct shminfo' (IPC_INFO) or a 'struct shm_info' (SHM_INFO) instead
of 'struct shmid_ds'. And their information does not contain any time
related fields, so there is no need to extra conversion for __IPC_TIME64.
The regression testcase checks for Linux specifix SysV ipc message
control extension. For SHM_INFO it tries to match the values against the
tunable /proc values and for MSG_STAT/MSG_STAT_ANY it check if the create\
shared memory is within the global list returned by the kernel.
Checked on x86_64-linux-gnu and on i686-linux-gnu (Linux v5.4 and on
Linux v4.15).
(cherry picked from commit a49d7fd4f7)
Both commands are Linux extensions where the third argument is a
'struct msginfo' instead of 'struct msqid_ds' and its information
does not contain any time related fields (so there is no need to
extra conversion for __IPC_TIME64.
The regression testcase checks for Linux specifix SysV ipc message
control extension. For IPC_INFO/MSG_INFO it tries to match the values
against the tunable /proc values and for MSG_STAT/MSG_STAT_ANY it
check if the create message queue is within the global list returned
by the kernel.
Checked on x86_64-linux-gnu and on i686-linux-gnu (Linux v5.4 and on
Linux v4.15).
(cherry picked from commit 20a00dbefc)
Handle SEM_STAT_ANY the same way as SEM_STAT so that the buffer argument
of SEM_STAT_ANY is properly passed to the kernel and back.
The regression testcase checks for Linux specifix SysV ipc message
control extension. For IPC_INFO/SEM_INFO it tries to match the values
against the tunable /proc values and for SEM_STAT/SEM_STAT_ANY it
check if the create message queue is within the global list returned
by the kernel.
Checked on x86_64-linux-gnu and on i686-linux-gnu (Linux v5.4 and on
Linux v4.15).
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 574500a108)
Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as
the memcpy/memmove ifunc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit e11ed9d2b4)
On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses. So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit bd394d131c)
The RELEASE macro was accidentaly set to "release" instead of
the expected "stable" by the release manager. This is a mistake
that leads to the build using "-g -O1" instead of "-g -O2" if
configure was executed with "CFLAGS=" (CFLAGS set but empty).
It returns the string of the error constant, not its description (as
strerrordesc_np). To handle the Hurd error mapping, the ERR_MAP was
removed from errlist.h to errlist.c.
Also, the testcase test-strerr (added on 325081b9eb) was not added
on the check build neither it builds correctly. This patch also
changed it to decouple from errlist.h, the expected return values
are added explicitly for both both strerrorname_np and strerrordesc_np
directly.
Checked on x86_64-linux-gnu and i686-linux-gnu. I also run a make
check for i686-gnu.
(cherry picked from commit cef95fdc2e)
Commit 91927b7c76 (Rewrite iconv option parsing [BZ #19519]) did not
handle cases where the output codeset for translations (via the `gettext'
family of functions) might have a caller specified encoding suffix such as
TRANSLIT or IGNORE. This led to a regression where translations did not
work when the codeset had a suffix.
This commit fixes the above issue by parsing any suffixes passed to
__dcigettext and adds two new test-cases to intl/tst-codeset.c to
verify correct behaviour. The iconv-internal function __gconv_create_spec
and the static iconv-internal function gconv_destroy_spec are now visible
internally within glibc and used in intl/dcigettext.c.
(cherry picked from commit 7d4ec75e11)
A typo in commit 107e6a3c22 causes the
FMA4 code path to be taken on systems that support FMA, even if they do
not support FMA4. Fix this to detect FMA4.
(cherry picked from commit 23af890b3f)
After some discussions it seems the original news was not clear
and that it is valid to manually pass the branch protection flags
iff GCC target libs are built with them too. The main difference
between manually passing the flags and using the configure
option is that the latter also makes branch protection the
default in GCC which may not be desirable in some cases.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
__GLRO loaded the word after the requested variable on big-endian
PowerPC, where LOWORD is 4. This can cause the memset implement
go wrong because the masking with the cache line size produces
wrong results, particularly if the loaded value happens to be 1.
The __GLRO macro is not used in any place where loading the lower
32-bit word of a 64-bit value is desired, so the +4 offset is always
wrong.
Fixes commit 18363b4f01
("powerpc: Move cache line size to rtld_global_ro") and bug 26332.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It was fixed in commit d937694059
("Fix array overflow in backtrace on PowerPC (bug 25423)"), which
went into glibc 2.31.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Storing user databases in DNS, without client-side DNSSEC validation,
is problematic from a security point of view.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
nptl has
/* Opcodes and data types for communication with the signal handler to
change user/group IDs. */
struct xid_command
{
int syscall_no;
long int id[3];
volatile int cntr;
volatile int error;
};
/* This must be last, otherwise the current thread might not have
permissions to send SIGSETXID syscall to the other threads. */
result = INTERNAL_SYSCALL_NCS (cmdp->syscall_no, 3,
cmdp->id[0], cmdp->id[1], cmdp->id[2]);
But the second argument of setgroups syscal is a pointer:
int setgroups (size_t size, const gid_t *list);
But on x32, pointers passed to syscall must have pointer type so that
they will be zero-extended. The kernel XID arguments are unsigned and
do not require sign extension. Change xid_command to
struct xid_command
{
int syscall_no;
unsigned long int id[3];
volatile int cntr;
volatile int error;
};
so that all arguments are zero-extended. A testcase is added for x32 and
setgroups returned with EFAULT when running as root without the fix.
Make glibc MTE-safe on systems where MTE is available. This allows
using heap tagging with an LD_PRELOADed malloc implementation that
enables MTE. We don't document this as guaranteed contract yet, so
glibc may not be MTE safe when HWCAP2_MTE is set (older glibcs
certainly aren't). This is mainly for testing and debugging.
The HWCAP flag is not exposed in public headers until Linux adds it
to its uapi. The HWCAP value reservation will be in Linux 5.9.
Use PROT_READ and PROT_WRITE according to the load segment p_flags
when adding PROT_BTI.
This is before processing relocations which may drop PROT_BTI in
case of textrels. Executable stacks are not protected via PROT_BTI
either. PROT_BTI is hardening in case memory corruption happened,
it's value is reduced if there is writable and executable memory
available so missing it on such memory is fine, but we should
respect the p_flags and should not drop PROT_WRITE.
The SELinux API deprecated several symbols in its 3.1 release, including
security_context_t, matchpathcon, avc_init, and sidput, which are used in
makedb and nscd. While the usage of these should eventually be replaced by
newer interfaces, this commit disables GCC warnings due to the use of the
above symbols.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Add a line that was missing from a previous commit.
Without increasing str, the null-byte is not validated, and
_dl_string_platform returns -1.
Fixes: d2ba3677da ("powerpc: Add support for POWER10")
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Upstream GCC 11 development is now building the ibm128 runtime
support (in libgcc) without a .gnu.attributes section on ppc64le.
Ensure we have one to replace by building one ibm128 file in
libc and libm with attributes.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
__nss_readline supersedes it. This reverts part of commit
3f5e3f5d06 ("libio: Implement
internal function __libc_readline_unlocked"). The internal
aliases __fseeko64 and __ftello64 are preserved because
they are needed by __nss_readline as well.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
And helper functions __nss_readline, __nss_readline_seek,
__nss_parse_line_result.
This consolidates common code for handling overlong lines and
parse files. Use the new functionality in internal_getent
in nss/nss_files/files-XXX.c.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As a result, all parse_line functions have the same prototype, except
for that producing struct hostent. This change is ABI-compatible, so
it does not alter the internal GLIBC_PRIVATE ABI (otherwise we should
probably have renamed the exported functions).
A future change will use this to implement a generict fget*ent_r
function.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
These functions should eventually have the same type, so it makes
sense to declare them together.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This avoids crashes in case the files are truncated for some reason.
For typically file sizes, it is also going to be slightly faster.
Using __nss_files_fopen instead mirrors what nss_files does.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>