default_zero_call_used_regs currently requires all potentially zeroed
registers to offer a move opcode that accepts zero as an operand.
This is not the case e.g. for ARM's r12/ip in Thumb mode, and it was
not the case of FP registers on AArch64 as of GCC 10.
This patch introduces a fallback strategy to zero out registers,
copying from registers that have already been zeroed. Adjacent
sources to make up wider modes are also supported.
This does not guarantee that there will be some zeroed-out register to
use as the source, but it expands the cases in which the default
implementation works out of the box.
for gcc/ChangeLog
* targhooks.c (default_zero_call_used_regs): Retry using
successfully-zeroed registers as sources.
gcc/ChangeLog:
* omp-low.c (finish_taskreg_scan): Use the proper detach decl.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/task-detach-12.c: New test.
* testsuite/libgomp.fortran/task-detach-12.f90: New test.
The previous changes to irange::constant_p return TRUE for
VARYING, since VARYING has numerical end points like any other
constant range. The problem is that some users of constant_p
depended on constant_p excluding the full domain. The
range handler for __builtin_clz, that is shared between ranger
and vr_values, is one such user.
This patch excludes varying_p(), to match the original behavior
for clz.
gcc/ChangeLog:
PR c/100521
* gimple-range.cc (range_of_builtin_call): Skip out on
processing __builtin_clz when varying.
This warning is of questionable value when it's emitted when
instantiating a template, as in the following testcase. It could be
silenced by writing hb(i) << 1 instead of 2 * hb(i) but that's
unnecessary obfuscation.
gcc/cp/ChangeLog:
* pt.c (tsubst_copy_and_build): Add warn_int_in_bool_context
sentinel.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wint-in-bool-context-2.C: New test.
Add nvptx option -mptx that sets the ptx ISA version. This is currently
hardcoded to 3.1.
Tested libgomp on x86_64-linux with nvptx accelerator, both with default set to
3.1 and 6.3.
gcc/ChangeLog:
2021-05-12 Tom de Vries <tdevries@suse.de>
PR target/96005
* config/nvptx/nvptx-opts.h (enum ptx_version): New enum.
* config/nvptx/nvptx.c (nvptx_file_start): Print .version according
to ptx_version_option.
* config/nvptx/nvptx.h (TARGET_PTX_6_3): Define.
* config/nvptx/nvptx.md (define_insn "nvptx_shuffle<mode>")
(define_insn "nvptx_vote_ballot"): Use sync variant for
TARGET_PTX_6_3.
* config/nvptx/nvptx.opt (ptx_version): Add enum.
(mptx): Add option.
* doc/invoke.texi (Nvidia PTX Options): Add mptx item.
This amends the fix for PR100053 where I failed to amend all edge
tests in dominated_by_p_w_unex.
2021-05-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/100566
* tree-ssa-sccvn.c (dominated_by_p_w_unex): Properly handle
allow_back for all edge queries.
* gcc.dg/torture/pr100566.c: New testcase.
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/pstl/alg_nonmodifying/find_end.cc:
Increase dg-timeout-factor to 4. Fix -Wunused-parameter
warnings. Replace bitwise AND with logical AND in loop
condition.
* testsuite/25_algorithms/pstl/alg_nonmodifying/search_n.cc:
Replace bitwise AND with logical AND in loop condition.
* testsuite/util/pstl/test_utils.h: Remove unused parameter
names.
If a header doesn't end with a new-line, with -fdirectives-only we right now
preprocess it as
int i = 1;# 2 "pr100392.c" 2
i.e. the line directive isn't on the next line, which means we fail to parse
it when compiling.
GCC 10 and earlier libcpp/directives-only.c had for this:
if (!pfile->state.skipping && cur != base)
{
/* If the file was not newline terminated, add rlimit, which is
guaranteed to point to a newline, to the end of our range. */
if (cur[-1] != '\n')
{
cur++;
CPP_INCREMENT_LINE (pfile, 0);
lines++;
}
cb->print_lines (lines, base, cur - base);
}
and we have the assertion
/* Files always end in a newline or carriage return. We rely on this for
character peeking safety. */
gcc_assert (buffer->rlimit[0] == '\n' || buffer->rlimit[0] == '\r');
So, this patch just does readd the more less same thing, so that we emit
a newline after the inline even when it wasn't there before.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/100392
* lex.c (cpp_directive_only_process): If buffer doesn't end with '\n',
add buffer->rlimit[0] character to the printed range and
CPP_INCREMENT_LINE and increment line_count.
* gcc.dg/cpp/pr100392.c: New test.
* gcc.dg/cpp/pr100392.h: New file.
Silents the following warning:
lto-wrapper: warning: using serial compilation of 2 LTRANS jobs
gcc/testsuite/ChangeLog:
* lib/lto.exp: When running tests without jobserver, one can see
the following warning for tests that use 1to1 partitioning.
This splits can_associate_p into checks for SSA defs and checks
for the type so it can be called from is_reassociable_op to
catch cases not catched by the earlier fix.
2021-05-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/100519
* tree-ssa-reassoc.c (can_associate_p): Split into...
(can_associate_op_p): ... this
(can_associate_type_p): ... and this.
(is_reassociable_op): Call can_associate_op_p.
(break_up_subtract_bb): Call the appropriate predicates.
(reassociate_bb): Likewise.
* gcc.dg/torture/pr100519.c: New testcase.
Size_In_Slots uses the Nkind to look up the size in a table indexed
by Nkind. This patch fixes a couple of places where the Nkind is
wrong (uninitialized or zeroed out) so Size_In_Slots cannot be used.
gcc/ada/
PR ada/100564
* atree.adb (Change_Node): Do not call Zero_Slots on a Node_Id
when the Nkind has not yet been set; call the other Zero_Slots
that takes a range of slot offsets. Call the new Mutate_Kind
that takes an Old_Size, for the same reason -- the size cannot
be computed without the Nkind.
(Mutate_Nkind): New function that allows specifying the Old_Size.
(Size_In_Slots): Assert that the Nkind has proper (nonzero) value.
* atree.ads: Minor reformatting.
gcc/ChangeLog:
* lto-wrapper.c (print_lto_docs_link): New function.
(run_gcc): Print warning about missing job server detection
after we know NR of partitions. Do the same for -flto{,=1}.
* opts.c (get_option_html_page): Support -flto option.
In this testcase the compile unit consists of a single
text section with a single embedded DECL_IGNORED_P function.
So we have a kind of multi-range text section here.
To avoid an ICE in output_rnglists we need to make sure
that have_multiple_function_sections is set to true.
This is a regression from
e69ac02037 ("Add line debug info for virtual thunks")
2021-05-12 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR debug/100515
* dwarf2out.c (dwarf2out_finish): Set
have_multiple_function_sections with multi-range text_section.
* gcc.dg/debug/dwarf2/pr100515.c: New testcase.
This makes the rtvec_alloc argument size_t catching overflow and
truncated arguments (from "invalid" testcases), verifying the
argument against INT_MAX which is the limit set by the int
typed rtvec_def.num_elem member.
2021-05-12 Richard Biener <rguenther@suse.de>
PR middle-end/100547
* rtl.h (rtvec_alloc): Make argument size_t.
* rtl.c (rtvec_alloc): Verify the count is less than INT_MAX.
The inliner doesn't remap DEBUG_EXPR_DECLs, so the same decls can appear
in multiple functions.
Furthermore, expansion reuses corresponding DEBUG_EXPRs too, so they again
can be reused in multiple functions.
Neither of that is a major problem, DEBUG_EXPRs are just magic value holders
and what value they stand for is independent in each function and driven by
what debug stmts or DEBUG_INSNs they are bound to.
Except for DEBUG_EXPR*s with vector types, TYPE_MODE can be either BLKmode
or some vector mode depending on whether current function's enabled ISAs
support that vector mode or not. On the following testcase, we expand it
first in foo function without AVX2 enabled and so the DEBUG_EXPR is
BLKmode, but later the same DEBUG_EXPR_DECL is used in a simd clone with
AVX2 enabled and expansion ICEs because of a mode mismatch.
The following patch fixes that by forcing recreation of a DEBUG_EXPR if
there is a mode mismatch for vector typed DEBUG_EXPR_DECL, DEBUG_EXPRs
will be still reused in between functions otherwise and within the same
function the mode should be always the same.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100508
* cfgexpand.c (expand_debug_expr): For DEBUG_EXPR_DECL with vector
type, don't reuse DECL_RTL if it has different mode, instead force
creation of a new DEBUG_EXPR.
* gcc.dg/gomp/pr100508.c: New test.
> Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> that could be worth doing already in GIMPLE.
Apparently it is
/* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
constant folding if x/y is a constant. */
if ((code == EQ || code == NE)
&& (op0code == AND || op0code == IOR)
&& !side_effects_p (op1)
&& op1 != CONST0_RTX (cmp_mode))
{
/* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
(eq/ne (and (not y) x) 0). */
...
/* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
(eq/ne (and (not x) y) 0). */
Yes, doing that on GIMPLE for the case where the not argument is constant
would simplify the phiopt follow-up (it would be single imm use then).
On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:
> We can probably do it in 2 steps, first something like
>
> (for cmp (eq ne)
> (simplify
> (cmp (bit_and:c @0 @1) @0)
> (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))
>
> to get rid of the double use, and then simplify X&C==0 to X<=~C if C is a
> mask 111...000 (I thought we already had a function to detect such masks, or
> the 000...111, but I can't find them anymore).
Ok, here is the first step then.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/94589
* match.pd ((X & Y) == X -> (X & ~Y) == 0,
(X | Y) == Y -> (X & ~Y) == 0): New GIMPLE simplifications.
* gcc.dg/tree-ssa/pr94589-1.c: New test.
C2X adds #elifdef and #elifndef preprocessor directives; these have
also been proposed for C++. Implement these directives in libcpp
accordingly.
In this implementation, #elifdef and #elifndef are treated as
non-directives for any language version other than c2x and gnu2x (if
the feature is accepted for C++, it can trivially be enabled for
relevant C++ versions). In strict conformance modes for prior
language versions, this is required, as illustrated by the
c11-elifdef-1.c test added.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
* include/cpplib.h (struct cpp_options): Add elifdef.
* init.c (struct lang_flags): Add elifdef.
(lang_defaults): Update to include elifdef initializers.
(cpp_set_lang): Set elifdef for pfile based on language.
* directives.c (STDC2X, ELIFDEF): New macros.
(EXTENSION): Increase value to 3.
(DIRECTIVE_TABLE): Add #elifdef and #elifndef.
(_cpp_handle_directive): Do not treat ELIFDEF directives as
directives for language versions without the #elifdef feature.
(do_elif): Handle #elifdef and #elifndef.
(do_elifdef, do_elifndef): New functions.
gcc/testsuite/
* gcc.dg/cpp/c11-elifdef-1.c, gcc.dg/cpp/c2x-elifdef-1.c,
gcc.dg/cpp/c2x-elifdef-2.c: New tests.
Resolves:
PR middle-end/21433 - The COMPONENT_REF case of expand_expr_real_1 is probably wrong
gcc/ChangeLog:
PR middle-end/21433
* expr.c (expand_expr_real_1): Replace unreachable code with an assert.
The libcpp function cpp_avoid_paste is used to insert whitespace in
preprocessed output where needed to avoid two consecutive
preprocessing tokens, that logically (e.g. when stringized) do not
have whitespace between them, from being incorrectly lexed as one when
the preprocessed input is reread by a compiler.
This fails to allow for digit separators, so meaning that invalid
code, that has a CPP_NUMBER (from a macro expansion) followed by a
character literal, can result in preprocessed output with a valid use
of digit separators, so that required syntax errors do not occur when
compiling with -save-temps. Fix this by handling that case in
cpp_avoid_paste (as with other cases in cpp_avoid_paste, this doesn't
try to check whether the language version in use supports digit
separators; it's always OK to have unnecessary whitespace in
preprocessed output).
Note: there are other cases, with various kinds of wide character or
string literal following a CPP_NUMBER, where spurious pasting of
preprocessing tokens can occur but the sequence of tokens remains
invalid both before and after that pasting. Maybe cpp_avoid_paste
should also handle those cases (and similar cases after a CPP_NAME),
to ensure the sequence of preprocessing tokens in preprocessed output
is exactly right, whether or not it affects whether syntax errors
occur. This patch only addresses the case with digit separators where
invalid code can fail to be diagnosed without the space inserted.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
* lex.c (cpp_avoid_paste): Do not allow pasting CPP_NUMBER with
CPP_CHAR.
gcc/testsuite/
* g++.dg/cpp1y/digit-sep-paste.C, gcc.dg/c2x-digit-separators-3.c:
New tests.
The type of the output operands *p and *q of the extended asm statement
of function foo is unsigned long whereas the type of the corresponding
input operands is int. This results, e.g. on IBM Z, in the case that
the immediates 2 and 3 are written into registers in SI mode and read in
DI mode resulting in wrong values. Fixed by lifting the input operands
to type long.
gcc/testsuite/ChangeLog:
* gcc.dg/guality/pr43077-1.c: Align types of output and input
operands by lifting immediates to type long.
floating_to_chars.cc includes the Ryu sources into an anonymous
namespace as a convenient way to give all its symbols internal linkage.
But an entity declared extern "C" always has external linkage even
from within an anonymous namespace, so this trick doesn't work in the
presence of extern "C", and it causes the Ryu function generic_to_chars
to be visible from libstdc++.a.
This patch removes the only use of extern "C" from our local copy of
Ryu along with some declarations for never-defined functions that GCC
now warns about.
libstdc++-v3/ChangeLog:
* src/c++17/ryu/LOCAL_PATCHES: Update.
* src/c++17/ryu/ryu_generic_128.h: Remove extern "C".
Remove declarations for never-defined functions.
* testsuite/20_util/to_chars/4.cc: New test.
The header synopsis test fails to define NOTHROW for C++98.
The shared_ptr test should be skipped for C++98.
The debug mode one should work for C++98 too, it just needs to avoid
C++11 syntax that isn't valid in C++98.
libstdc++-v3/ChangeLog:
* testsuite/20_util/headers/memory/synopsis.cc: Define C++98
alternative for macro.
* testsuite/20_util/shared_ptr/creation/99006.cc: Add effective
target keyword.
* testsuite/25_algorithms/copy/debug/99402.cc: Avoid C++11
syntax.
The changes in 75c6a925da were slightly
incorrect, because the converting constructor should be noexcept, and
the POCMA and is_always_equal traits should still be present in C++20.
This fixes it, and slightly refactors the preprocessor conditions and
order of members. Also add comments explaining things.
The non-standard construct and destroy members added for PR 78052 can be
private if allocator_traits<allocator<void>> is made a friend.
libstdc++-v3/ChangeLog:
* include/bits/allocator.h (allocator<void>) [C++20]: Add
missing noexcept to constructor. Restore missing POCMA and
is_always_equal_traits.
[C++17]: Make construct and destroy members private and
declare allocator_traits as a friend.
* include/bits/memoryfwd.h (allocator_traits): Declare.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Add nodiscard attribute. Add static assertion for LWG 3307.
* include/ext/new_allocator.h (new_allocator::allocate): Add
static assertion for LWG 3307.
* testsuite/20_util/allocator/void.cc: Check that converting
constructor is noexcept. Check for propagation traits and
size_type and difference_type. Check that pointer and
const_pointer are gone in C++20.
C2X adds digit separators, as in C++. Enable them accordingly in
libcpp and c-lex.c. Some basic tests are added that digit separators
behave as expected for C2X and are properly disabled for C11; further
test coverage is included in the existing g++.dg/cpp1y/digit-sep*.C
tests.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c-family/
* c-lex.c (interpret_float): Handle digit separators for C2X.
libcpp/
* init.c (lang_defaults): Enable digit separators for GNUC2X and
STDC2X.
gcc/testsuite/
* gcc.dg/c11-digit-separators-1.c,
gcc.dg/c2x-digit-separators-1.c, gcc.dg/c2x-digit-separators-2.c:
New tests.
My recent change to reject calling rvalue() with an argument of class type
crashes on this testcase, where we use rvalue() on what we expect to be an
argument of integer or vector type. Fixed by checking first.
gcc/cp/ChangeLog:
PR c++/100517
* typeck.c (build_reinterpret_cast_1): Check intype on
cast to vector.
gcc/testsuite/ChangeLog:
PR c++/100517
* g++.dg/ext/vector41.C: New test.
This removes stale users of maybe_fold_reference where IL constraints
make it never do anything.
2021-05-11 Richard Biener <rguenther@suse.de>
* gimple-fold.c (gimple_fold_call): Do not call
maybe_fold_reference on call arguments or the static chain.
(fold_stmt_1): Do not call maybe_fold_reference on GIMPLE_ASM
inputs.
This fixes unintended clobbering of SSA_NAME_DEF_STMT of the
cloned/inlined from SSA name during IPA parameter manipulation
of call stmt LHSs. gimple_call_set_lhs adjusts SSA_NAME_DEF_STMT
of the lhs to the stmt being modified but when
ipa_param_body_adjustments::modify_call_stmt is called the
cloning/inlining process has not yet remapped the stmts operands
to the copy variants but they are still original.
2021-05-11 Richard Biener <rguenther@suse.de>
PR ipa/100513
* ipa-param-manipulation.c
(ipa_param_body_adjustments::modify_call_stmt): Avoid
altering SSA_NAME_DEF_STMT by adjusting the calls LHS
via gimple_call_lhs_ptr.
The PR shows us attaching REG_CFA_ADJUST_CFA notes to stack pointer
adjustments emitted in cmse_nonsecure_call_inline_register_clear (when
-march=armv8.1-m.main). However, the stack pointer is not guaranteed to
be the CFA reg. If we're at -O0 or we have -fno-omit-frame-pointer, then
the frame pointer will be used as the CFA reg, and these notes on the sp
adjustments will lead to ICEs in dwarf2out_frame_debug_adjust_cfa.
This patch avoids emitting these notes if the current function has a
frame pointer.
gcc/ChangeLog:
PR target/99725
* config/arm/arm.c (cmse_nonsecure_call_inline_register_clear):
Avoid emitting CFA adjusts on the sp if we have the fp.
gcc/testsuite/ChangeLog:
PR target/99725
* gcc.target/arm/cmse/pr99725.c: New test.
This patch removes the duplication between the mul_laneq<mode>3
and the older mul-lane patterns. The older patterns were previously
divided into two based on whether the indexed operand had the same mode
as the other operands or whether it had the opposite length from the
other operands (64-bit vs. 128-bit). However, it seemed easier to
divide them instead based on whether the indexed operand was 64-bit or
128-bit, since that maps directly to the arm_neon.h “q” conventions.
Also, it looks like the older patterns were missing cases for
V8HF<->V4HF combinations, which meant that vmul_laneq_f16 and
vmulq_lane_f16 didn't produce single instructions.
There was a typo in the V2SF entry for VCONQ, but in practice
no patterns were using that entry until now.
The test passes for both endiannesses, but endianness does change
the mapping between regexps and functions.
gcc/
* config/aarch64/iterators.md (VMUL_CHANGE_NLANES): Delete.
(VMULD): New iterator.
(VCOND): Handle V4HF and V8HF.
(VCONQ): Fix entry for V2SF.
* config/aarch64/aarch64-simd.md (mul_lane<mode>3): Use VMULD
instead of VMUL. Use a 64-bit vector mode for the indexed operand.
(*aarch64_mul3_elt_<vswap_width_name><mode>): Merge with...
(mul_laneq<mode>3): ...this define_insn. Use VMUL instead of VDQSF.
Use a 128-bit vector mode for the indexed operand. Use stype for
the scheduling type.
gcc/testsuite/
* gcc.target/aarch64/fmul_lane_1.c: New test.