noce_try_sign_mask as documented will optimize
if (c < 0)
x = t;
else
x = 0;
into x = (c >> bitsm1) & t;
The optimization is done if either t is unconditional
(e.g. for
x = t;
if (c >= 0)
x = 0;
) or if it is cheap. We already check that t doesn't have side-effects,
but if t is conditional, we need to punt also if it may trap or fault,
as we make it unconditional.
I've briefly skimmed other noce_try* optimizations and didn't find one that
would suffer from the same problem.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/106032
* ifcvt.cc (noce_try_sign_mask): Punt if !t_unconditional, and
t may_trap_or_fault_p, even if it is cheap.
* gcc.c-torture/execute/pr106032.c: New test.
(cherry picked from commit a0c30fe3b8)
If expand_cond_expr_using_cmove can't find a cmove optab for a particular
mode, it tries to promote the mode and perform the cmove in the promoted
mode.
The testcase in the patch ICEs on arm because in that case we pass temp which
has the promoted mode (SImode) as target to expand_operands where the
operands have the non-promoted mode (QImode).
Later on the function uses paradoxical subregs:
if (GET_MODE (op1) != mode)
op1 = gen_lowpart (mode, op1);
if (GET_MODE (op2) != mode)
op2 = gen_lowpart (mode, op2);
to change the operand modes.
The following patch fixes it by passing NULL_RTX as target if it has
promoted mode.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106030
* expr.cc (expand_cond_expr_using_cmove): Pass NULL_RTX instead of
temp to expand_operands if mode has been promoted.
* gcc.c-torture/compile/pr106030.c: New test.
(cherry picked from commit 2df1df945f)
The i variable is used inside of the parallel in:
#pragma omp simd safelen(32) private (v)
for (i = 0; i < 64; i++)
{
v = 3 * i;
ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
}
where i is predetermined linear (so while inside of the body
it is safe, private per SIMD lane var) the final value is written to
the shared variable, and in:
for (i = 0; i < 64; i++)
if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
#pragma omp atomic write
err = 1;
which is a normal loop and so it isn't in any way privatized there.
So we have a data race, fixed by adding private (i) clause to the
parallel.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
Paul Iannetta <piannetta@kalrayinc.com>
PR libgomp/106045
* testsuite/libgomp.c/target-31.c: Add private (i) clause.
(cherry picked from commit 85d613da34)
The epilogue may clobber LARCH_PROLOGUE_TEMP ($r13/$t1), so it cannot be
used for sibcalls.
gcc/ChangeLog:
PR target/106096
* config/loongarch/loongarch.h (REG_CLASS_CONTENTS): Exclude
$r13 from SIBCALL_REGS.
* config/loongarch/loongarch.cc (loongarch_regno_to_class):
Change $r13 to JIRL_REGS.
gcc/testsuite/ChangeLog:
PR target/106096
* g++.target/loongarch/loongarch.exp: New test support file.
* g++.target/loongarch/pr106096.C: New test.
(cherry picked from commit 020b7d9858)
Since around GCC 10, the condition `j < (INTMAX_MAX / 10)' will get
optimized into `j != 922337203685477580', which will result in an
infinite loop for certain inputs of `j'.
Copy the condition already used by the -DTILEPRO generator code, which
doesn't fall into this trap.
gcc/ChangeLog:
* config/tilepro/gen-mul-tables.cc (tilegx_emit): Adjust loop
condition to avoid overflow.
(cherry picked from commit c0ad48527c)
Changing the type of N from int to unsigned in decltype82.C (from
r13-986-g0ecb6b906f215e) reveals another spot where we perform constexpr
evaluation in an unevaluated context for sake of warnings, this time
from the call to shorten_compare in cp_build_binary_op, which calls
fold_for_warn.
We could (and probably should) suppress the shorten_compare warnings
when in an unevaluated context, but there's probably other callers of
fold_for_warn that are similarly affected. So this patch takes the
approach of directly suppressing fold_for_warn when in an unevaluated
context.
PR c++/105931
gcc/cp/ChangeLog:
* expr.cc (fold_for_warn): Don't fold when in an unevaluated
context.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype82a.C: New test.
(cherry picked from commit b00b95198e)
This testcase was failing because CONSTRUCTOR_IS_DESIGNATED_INIT wasn't
getting set on the introduced CONSTRUCTOR for the anonymous union, and
build_aggr_conv uses that flag to decide whether to pay attention to the
indexes of the CONSTRUCTOR. So set the flag when we see a designator rather
than relying on copying it from another CONSTRUCTOR.
PR c++/105925
gcc/cp/ChangeLog:
* decl.cc (reshape_init_array_1): Set
CONSTRUCTOR_IS_DESIGNATED_INIT here.
(reshape_init_class): And here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/desig26.C: New test.
We already suppress various warnings for code that would be tautological if
written directly, but not when it's the result of template substitution. It
seems we need to do this for -Waddress as well.
PR c++/105885
gcc/cp/ChangeLog:
* pt.cc (tsubst_copy_and_build): Also suppress -Waddress for
comparison of dependent operands.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/constexpr-if37.C: New test.
Similarly to cgraph_nodes, it may happen that body_removed is set
during merging of symbols.
PR ipa/105600
gcc/ChangeLog:
* ipa-icf.cc (sem_item_optimizer::filter_removed_items):
Skip variables with body_removed.
(cherry picked from commit 31ce821a79)
The addr_expr computation does not check for error_mark_node before
returning the size expression. This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.
Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.
gcc/ChangeLog:
PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.
gcc/testsuite/ChangeLog:
PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4): New struct.
(val3): New variable.
(test_pr105736): New test.
(main): Call it.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
(cherry picked from commit 70454c50b4)
helper<token>::c isn't dependent just because we haven't deduced its return
type yet. type_dependent_expression_p already knows how to deal with that
for bare FUNCTION_DECL, but needs to learn to look through a BASELINK.
PR c++/105964
gcc/cp/ChangeLog:
* pt.cc (type_dependent_expression_p): Look through BASELINK.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/nontype-auto21.C: New test.
In r12-1273 for PR91706, I removed the code in get_class_binding that
stripped BASELINK. This testcase demonstrates that we still need to strip
it in outer_binding before putting the overload set in IDENTIFIER_BINDING,
for compatibility with bindings added directly for declarations.
PR c++/105908
gcc/cp/ChangeLog:
* name-lookup.cc (outer_binding): Strip BASELINK.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/trailing16.C: New test.
In f2ebf2d98e I'd forced the
chosen unroll factor to be a factor of the VF, in order to
work around an exact_div ICE in PR105254. This was completely
bogus -- clearly I didn't look in enough detail at why we ended
up with an unrolled VF that wasn't a multiple of the UF.
Kewen has since fixed the bug properly for PR105940, so this
patch reverts my earlier attempt. Sorry for the stupidity.
gcc/
PR tree-optimization/105254
PR tree-optimization/105940
Revert:
* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Take a
loop_vec_info as argument. Restrict the unroll factor to values
that divide the VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_14.c: New test.
(cherry picked from commit 2636660b6f)
As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE on:
exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
loop_vinfo->suggested_unroll_factor);
In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases. As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before. It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.
PR tree-optimization/105940
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
applying suggested_unroll_factor after start_over.
(cherry picked from commit f907cf4c07)
Disallow siball when calling ifunc functions with PIC register so that
PIC register can be restored.
gcc/
PR target/105960
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if PIC register is used when calling ifunc functions.
gcc/testsuite/
PR target/105960
* gcc.target/i386/pr105960.c: New test.
(cherry picked from commit fe9765c0b9)
This patch introduces alpha-specific version of store_data_bypass_p that
ignores TRAP_IF that would result in assertion failure (and internal
compiler error) in the generic store_data_bypass_p function.
While at it, also remove ev4_ist_c reservation, store_data_bypass_p
can handle the patterns with multiple sets since some time ago.
2022-06-17 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105209
* config/alpha/alpha-protos.h (alpha_store_data_bypass_p): New.
* config/alpha/alpha.cc (alpha_store_data_bypass_p): New function.
(alpha_store_data_bypass_p_1): Ditto.
* config/alpha/ev4.md: Use alpha_store_data_bypass_p instead
of generic store_data_bypass_p.
(ev4_ist_c): Remove insn reservation.
gcc/testsuite/ChangeLog:
PR target/105209
* gcc.target/alpha/pr105209.c: New test.
(cherry picked from commit cc378e6557)
The mode of pointer argument should equal ptr_mode, not Pmode.
2022-06-17 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105970
* config/i386/i386.cc (ix86_function_arg): Assert that
the mode of pointer argumet is equal to ptr_mode, not Pmode.
gcc/testsuite/ChangeLog:
PR target/105970
* gcc.target/i386/pr105970.c: New test.
(cherry picked from commit 1f8278bfcf)
The following testcase ICEs because there is NON_LVALUE_EXPR (location
wrapper) around a VAR_DECL and has TYPE_MODE V2SImode and
SCALAR_INT_TYPE_MODE on that ICEs. Or for -m32 -march=i386 TYPE_MODE
is DImode, but SCALAR_INT_TYPE_MODE still uses the raw V2SImode and ICEs
too.
2022-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/105998
* varasm.cc (narrowing_initializer_constant_valid_p): Check
SCALAR_INT_MODE_P instead of INTEGRAL_MODE_P, also break on
! INTEGRAL_TYPE_P and do the same check also on op{0,1}'s type.
* c-c++-common/pr105998.c: New test.
(cherry picked from commit ef66212017)
In this case the STATIC_CAST_EXPR expressions in the call aren't
type nor value dependent, but maybe_constant_value still ICEs on those
when processing_template_decl. Calling fold_non_dependent_expr on it
instead fixes the ICE and folds them to INTEGER_CSTs.
2022-06-17 Jakub Jelinek <jakub@redhat.com>
PR c++/106001
* typeck.cc (build_x_shufflevector): Use fold_non_dependent_expr
instead of maybe_constant_value.
* g++.dg/ext/builtin-shufflevector-4.C: New test.
(cherry picked from commit a284fadcce)
Both IFN_ATOMIC_BIT_TEST_AND_* and IFN_ATOMIC_*_FETCH_CMP_0 ifns
are matched if their corresponding optab is implemented for the particular
mode. The fact that those optabs are implemented doesn't guarantee
they will succeed though, they can just FAIL in their expansion.
The expansion in that case uses expand_atomic_fetch_op as fallback, but
as has been reported and and can be reproduced on the testcases,
even those can fail and we didn't have any fallback after that.
For IFN_ATOMIC_BIT_TEST_AND_* we actually have such calls. One is
done whenever we lost lhs of the ifn at some point in between matching
it in tree-ssa-ccp.cc and expansion. The following patch for that case
just falls through and expands as if there was a lhs, creates a temporary
for it. For the other expand_atomic_fetch_op call in the same expander
and for the only expand_atomic_fetch_op call in the other, this falls
back the hard way, by constructing a CALL_EXPR to the call from which
the ifn has been matched and expanding that. Either it is lucky and manages
to expand inline, or it emits a libatomic API call.
So that we don't have to rediscover which builtin function to call in the
fallback, we record at tree-ssa-ccp.cc time gimple_call_fn (call) in
an extra argument to the ifn.
2022-06-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/105951
* tree-ssa-ccp.cc (optimize_atomic_bit_test_and,
optimize_atomic_op_fetch_cmp_0): Remember gimple_call_fn (call)
as last argument to the internal functions.
* builtins.cc (expand_ifn_atomic_bit_test_and): Adjust for the
extra call argument to ifns. If expand_atomic_fetch_op fails for the
lhs == NULL_TREE case, fall through into the optab code with
gen_reg_rtx (mode) as target. If second expand_atomic_fetch_op
fails, construct a CALL_EXPR and expand that.
(expand_ifn_atomic_op_fetch_cmp_0): Adjust for the extra call argument
to ifns. If expand_atomic_fetch_op fails, construct a CALL_EXPR and
expand that.
* gcc.target/i386/pr105951-1.c: New test.
* gcc.target/i386/pr105951-2.c: New test.
(cherry picked from commit 6a27c43046)
Check for volatile flag to ipa_load_from_parm_agg.
gcc/ChangeLog:
2022-06-10 Jan Hubicka <hubicka@ucw.cz>
PR ipa/105739
* ipa-prop.cc (ipa_load_from_parm_agg): Punt on volatile loads.
gcc/testsuite/ChangeLog:
2022-06-10 Jan Hubicka <hubicka@ucw.cz>
* gcc.dg/ipa/pr105739.c: New test.
(cherry picked from commit 8f6c317b3a)
As the following testcase shows, BIT_FIELD_REF result doesn't have to have
just integral type, it can also have vector type. And in that case
cxx_eval_bit_field_ref just ICEs on it because it is unprepared for that
case, creates the initial value with build_int_cst (sure, that one could be
easily replaced with build_zero_cst) and then expects it can through shifts,
ands and ors come up with the final value, but that doesn't work for
vectors.
We already call fold_ternary if whole is a VECTOR_CST, this patch does the
same if the result doesn't have integral type. And, there is no guarantee
fold_ternary will succeed and the callers certainly don't expect NULL
being returned, so it also diagnoses those as non-constant and returns
original t in that case.
2022-06-09 Jakub Jelinek <jakub@redhat.com>
PR c++/105871
* constexpr.cc (cxx_eval_bit_field_ref): For BIT_FIELD_REF with
non-integral result type use fold_ternary too like for BIT_FIELD_REFs
from VECTOR_CST. If fold_ternary returns NULL, diagnose non-constant
expression, set *non_constant_p and return t, instead of returning
NULL.
* g++.dg/pr105871.C: New test.
(cherry picked from commit 4c334e0e4f)
The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
since r230663. The problem is that we use gen_lowpart, etc. to split
the 64-bit quantity, but fail to account for the fact that these
routines are really dealing with 64-bit /values/ and in big-endian the
ordering of the sub-registers changes.
To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
as first_reg and second_reg, and then used different logic for
big-endian targets to initialize these values. This makes the logic
clearer than trying to think about high bits and low bits.
gcc/ChangeLog:
PR target/105981
* config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
to first_reg and second_reg respectively. Initialize them correctly
when generating big-endian code.
(cherry picked from commit 8aaa948059)
In common with system tools, GCC uses a version obtained from the kernel as
the prevailing macOS target, when that is not overridden by command line or
environment versions (i.e. mmacosx-version-min=, MACOSX_DEPLOYMENT_TARGET).
Presently, GCC assumes that if the OS version is >= 20, the value used should
include both major and minium version identifiers. However the system tools
(for those versions) truncate the value to the major version - this leads to
link errors when combining objects built with clang and GCC for example:
ld: warning: object file (null.o) was built for newer macOS version (12.2)
than being linked (12.0)
The change here truncates the values GCC uses to the major version.
gcc/ChangeLog:
PR target/104871
* config/darwin-driver.cc (darwin_find_version_from_kernel): If the OS
version is darwin20 (macOS 11) or greater, truncate the version to the
major number.
(cherry picked from commit add1adaa17)
f18cbc1ee1 (2021-12-18) updated various parts of gcc to not impose a
Darwin or macOS version maximum of the current known release. Different
parts of gcc accept, variously, Darwin version numbers matching
darwin2*, and macOS major version numbers up to 99. The current released
version is Darwin 21 and macOS 12, with Darwin 22 and macOS 13 expected
for public release later this year. With one major OS release per year,
this strategy is expected to provide another 8 years of headroom.
However, f18cbc1ee1 missed config/darwin-c.c (now .cc), which
continued to impose a maximum of macOS 12 on the -mmacosx-version-min
compiler driver argument. This was last updated from 11 to 12 in
11b9675774 (2021-10-27), but kicking the can down the road one year at
a time is not a viable strategy, and is not in line with the more recent
technique from f18cbc1ee1.
Prior to 556ab51259 (2020-11-06), config/darwin-c.c did not impose a
maximum that needed annual maintenance, as at that point, all macOS
releases had used a major version of 10. The stricter approach imposed
since then was valuable for a time until the particulars of the new
versioning scheme were established and understood, but now that they
are, it's prudent to restore a more permissive approach.
gcc/ChangeLog:
* config/darwin-c.cc: Make -mmacosx-version-min more future-proof.
Signed-off-by: Mark Mentovai <mark@mentovai.com>
(cherry picked from commit 6725f186cb)
An empty g++ command line should produce a diagnostic that there are no
inputs. The PR is that currently Darwin produces a dignostic about missing
link items instead - this is because (errnoeously), for this driver, we are
creating a link job for empty command lines.
The problem occurs in four stages:
The g++ driver appends -shared-libgcc to the command line.
The Darwin driver_init code in the backend does not see this (it sees an
empty command line).
When the back end driver code driver sees an empty command line, it does not
add any supplementary flags (e.g. asm-macosx-version-min) - precisely to
avoid anything being claimed as an input_file and therefore triggering a link
line.
Since we do not have a value for asm-macosx-version-min when processing the
driver specs, we unconditionally inject 'multiply_defined suppress' which is
used with shared libgcc (but only intended on very old Darwin). This then
causes the generation of a link job.
The solution, for the present, is to move version-specific link params to the
LINK_SPEC so that they are only processed when a link job has already been
decided.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/105599
gcc/ChangeLog:
* config/darwin.h: Move versions-specific handling of multiply_defined
from SUBTARGET_DRIVER_SELF_SPECS to LINK_SPEC.
(cherry picked from commit 794737976b)
Prevents them from triggering warnings when compiling with `-Wpadded'.
gcc/d/ChangeLog:
* typeinfo.cc (make_internal_typeinfo): Set TYPE_ARTIFICIAL.
gcc/testsuite/ChangeLog:
* gdc.dg/Wpadded.d: New test.
(cherry picked from commit 57b2adae53)
This is LWG 3220 which is about to become Tentatively Ready.
libstdc++-v3/ChangeLog:
* include/std/atomic (__atomic_val_t): Use __type_identity_t
instead of atomic<T>::value_type, as per LWG 3220.
* testsuite/29_atomics/atomic/lwg3220.cc: New test.
(cherry picked from commit 30cc1b65e4)
The macOS 13 SDK (and equivalent-version iOS and other Apple OS SDKs)
contain this definition in <sys/cdefs.h>:
863 #define __null_terminated
This collides with the use of __null_terminated in libstdc++'s
experimental fs_path.h.
As libstdc++'s use of this token is entirely internal to fs_path.h, the
simplest workaround, renaming it, is most appropriate. Here, it's
renamed to __nul_terminated, referencing the NUL ('\0') value that is
used to terminate the strings in the context in which this tag structure
is used.
libstdc++-v3/ChangeLog:
* include/experimental/bits/fs_path.h (__detail::__null_terminated):
Rename to __nul_terminated to avoid colliding with a macro in
Apple's SDK.
Signed-off-by: Mark Mentovai <mark@mentovai.com>
(cherry picked from commit 254e88b3d7)
Since F16C and VAES are only usable with AVX, require AVX for F16C and
VAES.
libgcc/105920
* common/config/i386/cpuinfo.h (get_available_features): Require
AVX for F16C and VAES.
(cherry picked from commit 751f306688)
The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x80000000/(-1ull << 31)/(1 << 31)). This moves the special
handling of this SImode value (i.e. the check for (-1ull << 31) to
riscv.cc and relaxes the SINGLE_BIT_MASK_OPERAND() test.
With this, the code-generation for loading (1ULL << 31) from:
li a0,1
slli a0,a0,31
to:
bseti a0,zero,31
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer_1): Rewrite value as
(-1 << 31) for the single-bit case, when operating on (1 << 31)
in SImode.
* config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
any single-bit value, moving the special case for (1 << 31) to
riscv_build_integer_1 (in riscv.c).
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
(cherry picked from commit 4e72ccad80)
The existing TypeInfo errors can be cryptic. This alters the diagnostic
to include which expression is requiring `object.TypeInfo'.
gcc/d/ChangeLog:
* d-tree.h (check_typeinfo_type): Add Expression* parameter.
(build_typeinfo): Likewise. Declare new override.
* expr.cc (ExprVisitor): Call build_typeinfo with Expression*.
* typeinfo.cc (check_typeinfo_type): Include expression in the
diagnostic message.
(build_typeinfo): New override.
gcc/testsuite/ChangeLog:
* gdc.dg/rtti1.d: New test.
(cherry picked from commit e55eda2385)