Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
<sse2p4_1>_pinsr<ssemodesuffix> insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.
Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.
PR target/117116
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr117116.c: New test.
(cherry picked from commit 80d7032067)
Hi,
this patch fixes wrong code in case store-merging introduces load of function
parameter that was previously write-only (which happens for bitfields).
Without this, the whole store-merged area is consdered to be killed.
PR ipa/111613
gcc/ChangeLog:
* ipa-modref.cc (analyze_parms): Do not preserve EAF_NO_DIRECT_READ and
EAF_NO_INDIRECT_READ from past flags.
gcc/testsuite/ChangeLog:
* gcc.c-torture/pr111613.c: New test.
(cherry picked from commit 1407477335)
As shown in somewhat convoluted testcase, ipa-modref is mistreating
ECF_NOVOPS as "having no side effects". This come from time when
modref cared only about memory accesses and thus it was possible to
shortcut on it.
This patch removes (hopefully) all those bad shortcuts.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
PR ipa/109985
* ipa-modref.cc (modref_summary::useful_p): Fix handling of ECF_NOVOPS.
(modref_access_analysis::process_fnspec): Likevise.
(modref_access_analysis::analyze_call): Likevise.
(propagate_unknown_call): Likevise.
(modref_propagate_in_scc): Likevise.
(modref_propagate_flags_in_scc): Likewise.
(ipa_merge_modref_summary_after_inlining): Likewise.
(cherry picked from commit efcbe7b985)
TARGET_MEM_REF can be used to offset constant base into a memory object (to
produce lea instruction). This confuses points_to_local_or_readonly_memory_p
which treats the constant address as a base of the access.
Bootstrapped/regtsted x86_64-linux, comitted.
Honza
gcc/ChangeLog:
PR ipa/113787
* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not
look into TARGET_MEM_REFS with constant opreand 0.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr113787.c: New test.
(cherry picked from commit 96d53252ae)
modref_eaf_analysis::analyze_ssa_name misinterprets EAF flags. If dereferenced
parameter is passed (to map_iterator in the testcase) it can be returned
indirectly which in turn makes it to escape into the next function call.
PR ipa/115033
gcc/ChangeLog:
* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Fix checking of
EAF flags when analysing values dereferenced as function parameters.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr115033.c: New test.
(cherry picked from commit cf8ffc58aa)
unadjusted_ptr_and_unit_offset accidentally throws away the offset computed by
get_addr_base_and_unit_offset. Instead of passing extra_offset it passes offset.
PR ipa/114207
gcc/ChangeLog:
* ipa-prop.cc (unadjusted_ptr_and_unit_offset): Fix accounting of offsets in ADDR_EXPR.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr114207.c: New test.
(cherry picked from commit 391f46f10b)
This patch removes a buggy special case in irange::invert which seems
to have been broken for a while, and probably never triggered because
the legacy code was handled elsewhere, and the non-legacy code was
using an int_range_max of int_range<255> which made it extremely
likely for num_ranges == 255. However, with auto-resizing ranges,
int_range_max will start off at 3 and can hit this bogus code in the
unswitching code.
PR tree-optimization/109934
gcc/ChangeLog:
* value-range.cc (irange::invert): Remove buggy special case.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr109934.c: New test.
(cherry picked from commit 8d5f050dab)
In some cases, a legal type conversion in a generic package is correctly
accepted but the corresponding type conversion in an instance of the generic
is incorrectly rejected.
gcc/ada/
PR ada/114593
* sem_res.adb (Valid_Conversion): Test In_Instance instead of
In_Instance_Body.
For H8/300 with -msx -mn -mint32 the type of (_M_len - __pos) is int,
because int is wider than size_t so the operands are promoted.
libstdc++-v3/ChangeLog:
* include/std/string_view (basic_string_view::copy) Use explicit
template argument for call to std::min<size_t>.
(basic_string_view::substr): Likewise.
This fixes a warning from one of the test allocators:
warning: base class 'class std::allocator<__gnu_test::copy_tracker>' should be explicitly initialized in the copy constructor [-Wextra]
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_allocator.h (tracker_allocator):
Initialize base class in copy constructor.
(cherry picked from commit e2fb245b07)
Although POSIX requires ELOOP, FreeBSD documents that openat with
O_NOFOLLOW returns EMLINK if the last component of a filename is a
symbolic link. Check for EMLINK as well as ELOOP, so that the TOCTTOU
mitigation in remove_all works correctly.
See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214633 or the
FreeBSD man page for reference.
According to its man page, DragonFlyBSD also uses EMLINK for this error,
and NetBSD uses its own EFTYPE. OpenBSD follows POSIX and uses EMLINK.
This fixes these failures on FreeBSD:
FAIL: 27_io/filesystem/operations/remove_all.cc -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++17 execution test
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (remove_all) [__FreeBSD__ || __DragonFly__]:
Check for EMLINK as well as ELOOP.
[__NetBSD__]: Check for EFTYPE as well as ELOOP.
The shift operations for dynamic_bitset fail to zero out words where the
non-zero bits were shifted to a completely different word.
For a right shift we don't need to sanitize the unused bits in the high
word, because we know they were already clear and a right shift doesn't
change that.
libstdc++-v3/ChangeLog:
PR libstdc++/115399
* include/tr2/dynamic_bitset (operator>>=): Remove redundant
call to _M_do_sanitize.
* include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out
low bits in words that should no longer be populated.
(_M_do_right_shift): Likewise for high bits.
* testsuite/tr2/dynamic_bitset/pr115399.cc: New test.
(cherry picked from commit bd3a312728)
There is no file ext/type_traits, point it to ext/type_traits.h instead.
libstdc++-v3/ChangeLog:
* include/bits/cpp_type_traits.h: Improve doxygen file docs.
(cherry picked from commit f6ed7a61a7)
A few of these files self-identified as ext/random.tcc, update to use
the actual basename.
libstdc++-v3/ChangeLog:
* config/cpu/aarch64/opt/ext/opt_random.h: Improve doxygen file
docs.
* config/cpu/i486/opt/ext/opt_random.h: Likewise.
(cherry picked from commit c2ad7b2d52)
We should not use [[unlikely]] before C++20, so use [[__unlikely__]]
instead.
libstdc++-v3/ChangeLog:
* include/std/variant (_Variant_storage::_M_reset): Use
__unlikely__ form of attribute instead of unlikely.
(cherry picked from commit 9f1cd51766)
I misused the AC_CHECK_DECL macro, assuming that it behaved like
AC_CHECK_DECLS and always defined a HAVE_xxx macro if the decl was
found. Instead, the [action-if-found] shell commands are needed to
defined HAVE_O_NONBLOCK explicitly.
libstdc++-v3/ChangeLog:
* configure.ac: Fix check for O_NONBLOCK.
* config.h.in: Regenerate.
* configure: Regenerate.
(cherry picked from commit b68561dd79)
The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made
std::string::assign use the propagate_on_container_copy_assignment
(POCCA) trait, for consistency with operator=(const basic_string&).
However, this also unintentionally affected operator=(basic_string&&)
which calls assign(str) to make a deep copy when performing a move is
not possible. The fix is for the move assignment operator to call
_M_assign(str) instead of assign(str), as this just does the deep copy
and doesn't check the POCCA trait first.
The bug only affects the unlikely/useless combination of POCCA==true and
POCMA==false, but we should fix it for correctness anyway. it should
also make move assignment slightly cheaper to compile and execute,
because we skip the extra code in assign(const basic_string&).
libstdc++-v3/ChangeLog:
PR libstdc++/116641
* include/bits/basic_string.h (operator=(basic_string&&)): Call
_M_assign instead of assign.
* testsuite/21_strings/basic_string/allocator/116641.cc: New
test.
(cherry picked from commit c07cf418fd)
When the library is configured with --disable-libstdcxx-verbose the
assertions just abort instead of calling __glibcxx_assert_fail, and so I
didn't export that function for the non-verbose build. However, that
option is documented to not change the library ABI, so we still need to
export the symbol from the library. It could be needed by programs
compiled against the headers from a verbose build.
The non-verbose definition can just call abort so that it doesn't pull
in I/O symbols, which are unwanted in a non-verbose build.
libstdc++-v3/ChangeLog:
PR libstdc++/115585
* src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add
definition for non-verbose builds.
(cherry picked from commit 52370c839e)
Since naked functions should not enable stack protector, define
TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector
for naked functions.
gcc/
PR target/116962
* config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New
function.
(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New.
gcc/testsuite/
PR target/116962
* gcc.target/i386/pr116962.c: New file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 7d2845da11)
split_constant_offset when looking through SSA defs can end up
picking SSA leafs that are subject to abnormal coalescing. This
can lead to downstream consumers to insert code based on the
result (like from dataref analysis) in places that violate constraints
for abnormal coalescing. It's best to not expand defs whose operands
are subject to abnormal coalescing - and not either do something when
a subexpression has operands like that already.
PR tree-optimization/116585
* tree-data-ref.cc (split_constant_offset_1): When either
operand is subject to abnormal coalescing do no further
processing.
* gcc.dg/torture/pr116585.c: New testcase.
(cherry picked from commit 1d0cb3b5fc)
testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.
While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths. This may be different
with the mobile version though.
On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:
real_t s323(struct args_t * func_args)
{
// recurrences
// coupled recurrence
initialise_arrays(__func__);
gettimeofday(&func_args->t1, NULL);
for (int nl = 0; nl < iterations/2; nl++) {
for (int i = 1; i < LEN_1D; i++) {
a[i] = b[i-1] + c[i] * d[i];
b[i] = a[i] + c[i] * e[i];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
gettimeofday(&func_args->t2, NULL);
return calc_checksum(__func__);
}
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
znver5.
(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
(cherry picked from commit d6360b4083)
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply copied from the bogus znver4 costs. The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.
* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.
(cherry picked from commit 896393791e)
Address override only applies to the (reg32) part in the thread address
fs:(reg32). Don't rewrite thread address like
(set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ])
(mem/c:SI (plus:SI (plus:SI (unspec:SI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:SI 107))
(const:SI (unspec:SI [
(symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>)
] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32])))
if address override is used to avoid the invalid memory operand like
cmpl %fs:previous_emax@dtpoff(%eax), %r12d
gcc/
PR target/116839
* config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it
static. Return if TLS address is thread register plus an integer
register.
gcc/testsuite/
PR target/116839
* gcc.target/i386/pr116839.c: New file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c79cc30862)
Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation. This in turn leads to wrong register
renaming. Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the
lines
(define_insn "*tf_to_fprx2_0"
[(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
UNSPEC_TF_TO_FPRX2_0))]
"TARGET_VXE"
"#")
(define_insn "*tf_to_fprx2_0"
[(set (match_operand:DF 0 "nonimmediate_operand" "=f")
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
UNSPEC_TF_TO_FPRX2_0))]
"TARGET_VXE"
"vpdi\t%v0,%v1,%v0,1
[(set_attr "op_type" "VRR")])
and similar for *tf_to_fprx2_1. Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.
Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all. The downside of this is that the assembler template
contains two instructions, now. The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable. That is implemented by this patch.
In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now. Instead of coming up
with its counterpart %F for floating-point registers, which would also
only be used in tf_to_fprx2, I print the operands directly. This
renders %V unused which is why it is removed by this patch.
gcc/ChangeLog:
PR target/115860
* config/s390/s390.cc (print_operand): Remove operand specifier
%V.
* config/s390/s390.md (UNSPEC_TF_TO_FPRX2): New.
* config/s390/vector.md (*tf_to_fprx2_0): Remove.
(*tf_to_fprx2_1): Remove.
(tf_to_fprx2): New.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/long-double-asm-abi.c: Adapt
scan-assembler directive.
* gcc.target/s390/vector/long-double-to-i64.c: Adapt
scan-assembler directive.
* gcc.target/s390/pr115860-1.c: New test.
(cherry picked from commit 46c2538435)
Ensure for AQ and AR constraints that the resulting displacement after
adding any positive offset less than the size of the object being
referenced is still valid.
gcc/ChangeLog:
* config/s390/s390.cc (s390_mem_constraint): Check displacement
for AQ and AR constraints.
(cherry picked from commit 1a71ff3b89)