Commit Graph

187108 Commits

Author SHA1 Message Date
Jonathan Wakely
07b70dfc4e libstdc++: Add testsuite proc for testing deprecated features
This change adds options to tests that explicitly use deprecated
features, so that -D_GLIBCXX_USE_DEPRECATED=0 can be used to run the
rest of the testsuite. The tests that explicitly/intentionally use
deprecated features will still be able to use them, but they can be
disabled for the majority of tests.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/23_containers/forward_list/operations/3.cc:
	Use lambda instead of std::bind2nd.
	* testsuite/20_util/function_objects/binders/3113.cc: Add
	options for testing deprecated features.
	* testsuite/20_util/pair/cons/99957.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/assign/auto_ptr_rvalue.cc:
	Likewise.
	* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/cons/auto_ptr.cc: Likewise.
	* testsuite/20_util/shared_ptr/cons/auto_ptr_neg.cc: Likewise.
	* testsuite/20_util/shared_ptr/creation/dr925.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/auto_ptr.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/auto_ptr_neg.cc: Likewise.
	* testsuite/ext/pb_ds/example/priority_queue_erase_if.cc:
	Likewise.
	* testsuite/ext/pb_ds/example/priority_queue_split_join.cc:
	Likewise.
	* testsuite/lib/dg-options.exp (dg_add_options_using-deprecated):
	New proc.
2021-08-03 15:30:17 +01:00
Jonathan Wakely
e9f64fff64 libstdc++: Reduce header dependencies in <regex>
This reduces the size of <regex> a little. This is one of the largest
and slowest headers in the library.

By using <bits/stl_algobase.h> and <bits/stl_algo.h> instead of
<algorithm> we don't need to parse all the parallel algorithms and
std::ranges:: algorithms that are not needed by <regex>. Similarly, by
using <bits/stl_tree.h> and <bits/stl_map.h> instead of <map> we don't
need to parse the definition of std::multimap.

The _State_info type is not movable or copyable, so doesn't need to use
std::unique_ptr<bool[]> to manage a bitset, we can just delete it in the
destructor. It would use a lot less space if we used a bitset instead,
but that would be an ABI break. We could do it for the versioned
namespace, but this patch doesn't do so. For future reference, using
vector<bool> would work, but would increase sizeof(_State_info) by two
pointers, because it's three times as large as unique_ptr<bool[]>. We
can't use std::bitset because the length isn't constant. We want a
bitset with a non-constant but fixed length.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/regex_executor.h (_State_info): Replace
	unique_ptr<bool[]> with array of bool.
	* include/bits/regex_executor.tcc: Likewise.
	* include/bits/regex_scanner.tcc: Replace std::strchr with
	__builtin_strchr.
	* include/std/regex: Replace standard headers with smaller
	internal ones.
	* testsuite/28_regex/traits/char/lookup_classname.cc: Include
	<string.h> for strlen.
	* testsuite/28_regex/traits/char/lookup_collatename.cc:
	Likewise.
2021-08-03 15:24:52 +01:00
H.J. Lu
98d7f305d5 x86: Use XMM31 for scratch SSE register
In 64-bit mode, use XMM31 for scratch SSE register to avoid vzeroupper
if possible.

gcc/

	* config/i386/i386.c (ix86_gen_scratch_sse_rtx): In 64-bit mode,
	try XMM31 to avoid vzeroupper.

gcc/testsuite/

	* gcc.target/i386/avx-vzeroupper-14.c: Pass -mno-avx512f to
	disable XMM31.
	* gcc.target/i386/avx-vzeroupper-15.c: Likewise.
	* gcc.target/i386/pr82941-1.c: Updated.  Check for vzeroupper.
	* gcc.target/i386/pr82942-1.c: Likewise.
	* gcc.target/i386/pr82990-1.c: Likewise.
	* gcc.target/i386/pr82990-3.c: Likewise.
	* gcc.target/i386/pr82990-5.c: Likewise.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-6b.c: Likewise.
	* gcc.target/i386/pr100865-7b.c: Likewise.
	* gcc.target/i386/pr100865-10b.c: Likewise.
	* gcc.target/i386/pr100865-8b.c: Updated.
	* gcc.target/i386/pr100865-9b.c: Likewise.
	* gcc.target/i386/pr100865-11b.c: Likewise.
	* gcc.target/i386/pr100865-12b.c: Likewise.
2021-08-03 07:11:58 -07:00
Jonathan Wakely
a1a2654cdc libstdc++: Avoid using std::unique_ptr in <locale>
std::wstring_convert and std::wbuffer_convert types are not copyable or
movable, and store a plain pointer without a deleter. That means a much
simpler type that just uses delete in its destructor can be used instead
of std::unique_ptr.

That avoids including and parsing all of <bits/unique_ptr.h> in every
header that includes <locale>. It also avoids instantiating
unique_ptr<C> and std::tuple<C*, default_delete<C>> when the conversion
utilities are used.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/locale_conv.h (__detail::_Scoped_ptr): Define new
	RAII class template.
	(wstring_convert, wbuffer_convert): Use __detail::_Scoped_ptr
	instead of unique_ptr.
2021-08-03 15:06:56 +01:00
Richard Sandiford
048039c49b aarch64: Add -mtune=neoverse-512tvb
This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE).  This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narrow) and tuning for AArch64 cores
in general (which can be too wide).

-mcpu=neoverse-512tvb is equivalent to -mcpu=neoverse-v1
-mtune=neoverse-512tvb.

gcc/
	* doc/invoke.texi: Document -mtune=neoverse-512tvb and
	-mcpu=neoverse-512tvb.
	* config/aarch64/aarch64-cores.def (neoverse-512tvb): New entry.
	* config/aarch64/aarch64-tune.md: Regenerate.
	* config/aarch64/aarch64.c (neoverse512tvb_sve_vector_cost)
	(neoverse512tvb_sve_issue_info, neoverse512tvb_vec_issue_info)
	(neoverse512tvb_vector_cost, neoverse512tvb_tunings): New structures.
	(aarch64_adjust_body_cost_sve): Handle -mtune=neoverse-512tvb.
	(aarch64_adjust_body_cost): Likewise.
2021-08-03 13:00:49 +01:00
Richard Sandiford
9690309baf aarch64: Restrict issue heuristics to inner vector loop
The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.

gcc/
	* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Only
	record issue information for operations that occur in the
	innermost loop.
2021-08-03 13:00:48 +01:00
Richard Sandiford
028059b46e aarch64: Tweak MLA vector costs
The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction.  This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.

The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway.  I therefore
don't have a self-contained testcase.  However, the distinction
becomes more important with a later patch.

gcc/
	* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
	parameter.  Detect cases in which an Advanced SIMD MLA would almost
	certainly require a MOV.
	(aarch64_count_ops): Update accordingly.
2021-08-03 13:00:47 +01:00
Richard Sandiford
537afb0857 aarch64: Tweak the cost of elementwise stores
When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element.  However, extracting element 0 is free on AArch64,
so it should have zero cost.

I don't have a testcase that requires this for existing -mtune
options, but it becomes more important with a later patch.

gcc/
	* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
	function, split out from...
	(aarch64_detect_vector_stmt_subtype): ...here.
	(aarch64_add_stmt_cost): Treat extracting element 0 as free.
2021-08-03 13:00:46 +01:00
Richard Sandiford
78770e0e5d aarch64: Add gather_load_xNN_cost tuning fields
This patch adds tuning fields for the total cost of a gather load
instruction.  Until now, we've costed them as one scalar load
per element instead.  Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.

gcc/
	* config/aarch64/aarch64-protos.h (sve_vec_cost):
	Add gather_load_x32_cost and gather_load_x64_cost.
	* config/aarch64/aarch64.c (generic_sve_vector_cost)
	(a64fx_sve_vector_cost, neoversev1_sve_vector_cost): Update
	accordingly, using the values given by the scalar_load * number
	of elements calculation that we used previously.
	(aarch64_detect_vector_stmt_subtype): Use the new fields.
2021-08-03 13:00:45 +01:00
Richard Sandiford
b585f0112f aarch64: Split out aarch64_adjust_body_cost_sve
This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once.  I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.

gcc/
	* config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
	function, split out from...
	(aarch64_adjust_body_cost): ...here.
2021-08-03 13:00:45 +01:00
Richard Sandiford
83d796d3e5 aarch64: Add a simple fixed-point class for costing
This patch adds a simple fixed-point class for holding fractional
cost values.  It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per_cycle fields.

For now the assumption is that the number of possible reciprocals
is fixed at compile time and so the class should always be able
to hold an exact value.

The class uses a uint64_t to hold the fixed-point value, which means
that it can hold any scaled uint32_t cost.  Normally we don't worry
about overflow when manipulating raw uint32_t costs, but just to be
on the safe side, the class uses saturating arithmetic for all
operations.

As far as the changes to the cost routines themselves go:

- The changes to aarch64_add_stmt_cost and its subroutines are
  just laying groundwork for future patches; no functional change
  intended.

- The changes to aarch64_adjust_body_cost mean that we now
  take fractional differences into account.

gcc/
	* config/aarch64/fractional-cost.h: New file.
	* config/aarch64/aarch64.c: Include <algorithm> (indirectly)
	and cost_fraction.h.
	(vec_cost_fraction): New typedef.
	(aarch64_detect_scalar_stmt_subtype): Use it for statement costs.
	(aarch64_detect_vector_stmt_subtype): Likewise.
	(aarch64_sve_adjust_stmt_cost, aarch64_adjust_stmt_cost): Likewise.
	(aarch64_estimate_min_cycles_per_iter): Use vec_cost_fraction
	for cycle counts.
	(aarch64_adjust_body_cost): Likewise.
	(aarch64_test_cost_fraction): New function.
	(aarch64_run_selftests): Call it.
2021-08-03 13:00:44 +01:00
Richard Sandiford
fa3ca6151c aarch64: Turn sve_width tuning field into a bitmask
The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable).  This patch turns the field into a bitmask so that
it can specify multiple widths at the same time.  For now we
always treat the mininum width as the likely width.

An alternative would have been to add extra fields, which would
have coped correctly with non-power-of-2 widths.  However,
we're very far from supporting constant non-power-of-2 vectors
in GCC, so I think the non-power-of-2 case will in reality always
have to be hidden behind VLA.

gcc/
	* config/aarch64/aarch64-protos.h (tune_params::sve_width): Turn
	into a bitmask.
	* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Update
	accordingly.
	(aarch64_estimated_poly_value): Likewise.  Use the least significant
	set bit for the minimum and likely values.  Use the most significant
	set bit for the maximum value.
2021-08-03 13:00:43 +01:00
liuhongt
d0b952edd3 Add cond_add/sub/mul for vector integer modes.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<insn><mode>): New expander.
	(cond_mul<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmul_d-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_d-2.c: New test.
	* gcc.target/i386/cond_op_addsubmul_q-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_q-2.c: New test.
	* gcc.target/i386/cond_op_addsubmul_w-1.c: New test.
	* gcc.target/i386/cond_op_addsubmul_w-2.c: New test.
2021-08-03 19:27:52 +08:00
Mosè Giordano
759f3854f0 Fix bashism in `libsanitizer/configure.tgt'
Appending to a string variable with `+=' is a bashism and does not work in
strict POSIX shells like dash.  This results in the extra compilation flags not
to be set correctly.  This patch replaces the `+=' syntax with a simple string
interpolation to append to the `EXTRA_CXXFLAGS' variable.

libsanitizer/ChangeLog

	PR sanitizer/101111
	* configure.tgt: Fix bashism in setting of `EXTRA_CXXFLAGS'.
2021-08-03 13:24:47 +02:00
Jakub Jelinek
1a830c0636 analyzer: Fix ICE on MD builtin [PR101721]
The following testcase ICEs because DECL_FUNCTION_CODE asserts the builtin
is BUILT_IN_NORMAL, but it sees a backend (MD) builtin instead.
The FE, normal and MD builtin numbers overlap, so one should always
check what kind of builtin it is before looking at specific codes.

On the other side, region-model.cc has:
      if (fndecl_built_in_p (callee_fndecl, BUILT_IN_NORMAL)
          && gimple_builtin_call_types_compatible_p (call, callee_fndecl))
        switch (DECL_UNCHECKED_FUNCTION_CODE (callee_fndecl))
which IMO should use DECL_FUNCTION_CODE instead, it checked first it is
a normal builtin...

2021-08-03  Jakub Jelinek  <jakub@redhat.com>

	PR analyzer/101721
	* sm-malloc.cc (known_allocator_p): Only check DECL_FUNCTION_CODE on
	BUILT_IN_NORMAL builtins.

	* gcc.dg/analyzer/pr101721.c: New test.
2021-08-03 12:44:17 +02:00
Martin Liska
872c1a56e3 ChangeLog: add problematic commit 2e96b5f14e.
gcc/ChangeLog:

	* ChangeLog: Add manually.

libgomp/ChangeLog:

	* ChangeLog: Add manually.

gcc/testsuite/ChangeLog:

	* ChangeLog: Add manually.
2021-08-03 09:57:21 +02:00
GCC Administrator
4d17ca1bc7 Daily bump. 2021-08-03 07:49:16 +00:00
Martin Liska
e460471571 gcc-changelog: ignore one more commit
contrib/ChangeLog:

	* gcc-changelog/git_update_version.py: Ignore problematic
	  commit.
2021-08-03 09:22:30 +02:00
H.J. Lu
585394d30d x86: Add testcases for PR target/80566
PR target/80566
	* g++.target/i386/pr80566-1.C: New test.
	* g++.target/i386/pr80566-2.C: Likewise.
2021-08-02 20:34:13 -07:00
Kewen Lin
daaed9e365 tree-cfg: Fix typos on dloop in move_sese_region_to_fn
As mentioned in [1], there is one pre-existing issue before
the refactoring of FOR_EACH_LOOP_FN.  The macro will always
set the given LOOP as NULL at the end of iterating unless
there is some early break inside, obviously there is no
early break and dloop will be set as NULL after the loop
iterating.  It's kept as NULL after the factoring.

I tried to debug the test case gcc.dg/graphite/pr83359.c
with commit 555758de90 (also reproduced the ICE with
555758de90074~), and noticed the compilation of the test
case only covers the hunk:

  else
    {
      moved_orig_loop_num[dloop->orig_loop_num] = -1;
      dloop->orig_loop_num = 0;
    }

it doesn't touch the if condition hunk to increase
"moved_orig_loop_num[dloop->orig_loop_num]".  So the
following hunk guarded with

  if (moved_orig_loop_num[orig_loop_num] == 2)

using dloop for dereference doesn't get executed.  It
explains why the problem doesn't get exposed before.

By looking to the code using dloop, I think it's a copy
paste typo, the modified assertion codes have the same
words as the above condition check.  In that context, the
expected original number has been assigned to variable
orig_loop_num by extracting from the arg0 of the call
IFN_LOOP_DIST_ALIAS.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576367.html

gcc/ChangeLog:

	* tree-cfg.c (move_sese_region_to_fn): Fix typos on dloop.
2021-08-02 22:12:00 -05:00
liuhongt
724adffe65 Support cond_add/sub/mul/div for vector float/double.
gcc/ChangeLog:

	* config/i386/sse.md (cond_<insn><mode>):New expander.
	(cond_mul<mode>): Ditto.
	(cond_div<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/cond_op_addsubmuldiv_double-1.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_double-2.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_float-1.c: New test.
	* gcc.target/i386/cond_op_addsubmuldiv_float-2.c: New test.
2021-08-03 09:10:27 +08:00
Ian Lance Taylor
7459bfa8a3 compiler, runtime: allow slice to array pointer conversion
Panic if the slice is too short.

For golang/go#395

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338630
2021-08-02 15:27:08 -07:00
Ian Lance Taylor
06d0437d4a compiler, runtime: support unsafe.Add and unsafe.Slice
For golang/go#19367
For golang/go#40481

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/338949
2021-08-02 13:56:28 -07:00
Patrick Palka
14d8a5ae47 libstdc++: Add missing std::move to ranges::copy/move/reverse_copy [PR101599]
In passing, this also renames the template parameter _O2 to _Out2 in
ranges::partition_copy and uglifies two of its function parameters,
out_true and out_false.

	PR libstdc++/101599

libstdc++-v3/ChangeLog:

	* include/bits/ranges_algo.h (__reverse_copy_fn::operator()):
	Add missing std::move in return statement.
	(__partition_copy_fn::operator()): Rename templtae parameter
	_O2 to _Out2.  Uglify function parameters out_true and out_false.
	* include/bits/ranges_algobase.h (__copy_or_move): Add missing
	std::move to recursive call that unwraps a __normal_iterator
	output iterator.
	* testsuite/25_algorithms/copy/constrained.cc (test06): New test.
	* testsuite/25_algorithms/move/constrained.cc (test05): New test.
2021-08-02 15:30:15 -04:00
Patrick Palka
4414057186 libstdc++: Fix up implementation of LWG 3533 [PR101589]
In r12-569 I accidentally applied the LWG 3533 change to
elements_view::iterator::base instead to elements_view::base.

This patch corrects this, and also applies the corresponding LWG 3533
change to lazy_split_view::inner-iter::base now that we implement P2210.

	PR libstdc++/101589

libstdc++-v3/ChangeLog:

	* include/std/ranges (lazy_split_view::_InnerIter::base): Make
	the const& overload unconstrained and return a const reference
	as per LWG 3533.  Make unconditionally noexcept.
	(elements_view::base): Revert accidental r12-569 change.
	(elements_view::_Iterator::base): Make the const& overload
	unconstrained and return a const reference as per LWG 3533.
	Make unconditionally noexcept.
2021-08-02 15:30:13 -04:00
Patrick Palka
0e1bb3c88c libstdc++: Add missing std::move to join_view::iterator ctor [PR101483]
PR libstdc++/101483

libstdc++-v3/ChangeLog:

	* include/std/ranges (join_view::_Iterator::_Iterator): Add
	missing std::move.
2021-08-02 15:30:10 -04:00
H.J. Lu
af863ef935 x86: Also pass -mno-sse to vect8-ret.c
Also pass -mno-sse to vect8-ret.c to disable XMM load/store when running
GCC tests with "-march=x86-64 -m32".

	* gcc.target/i386/vect8-ret.c: Also pass -mno-sse.
2021-08-02 10:40:50 -07:00
H.J. Lu
ff12cc3d4e x86: Update gcc.target/i386/incoming-11.c
Expect no stack realignment since we no longer realign stack when
copying data.

	* gcc.target/i386/incoming-11.c: Expect no stack realignment.
2021-08-02 10:40:50 -07:00
H.J. Lu
dadbb1a886 x86: Also pass -mno-avx to sw-1.c for ia32
Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

	* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
2021-08-02 10:40:50 -07:00
H.J. Lu
20a1c9aae0 x86: Also pass -mno-avx to cold-attribute-1.c
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
2021-08-02 10:40:50 -07:00
H.J. Lu
d7d74754a0 x86: Also pass -mno-avx to pr72839.c
Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

	* gcc.target/i386/pr72839.c: Also pass -mno-avx.
2021-08-02 10:40:50 -07:00
H.J. Lu
0d3be08a23 x86: Add tests for piecewise move and store
* gcc.target/i386/pieces-memcpy-10.c: New test.
	* gcc.target/i386/pieces-memcpy-11.c: Likewise.
	* gcc.target/i386/pieces-memcpy-12.c: Likewise.
	* gcc.target/i386/pieces-memcpy-13.c: Likewise.
	* gcc.target/i386/pieces-memcpy-14.c: Likewise.
	* gcc.target/i386/pieces-memcpy-15.c: Likewise.
	* gcc.target/i386/pieces-memcpy-16.c: Likewise.
	* gcc.target/i386/pieces-memset-1.c: Likewise.
	* gcc.target/i386/pieces-memset-2.c: Likewise.
	* gcc.target/i386/pieces-memset-3.c: Likewise.
	* gcc.target/i386/pieces-memset-4.c: Likewise.
	* gcc.target/i386/pieces-memset-5.c: Likewise.
	* gcc.target/i386/pieces-memset-6.c: Likewise.
	* gcc.target/i386/pieces-memset-7.c: Likewise.
	* gcc.target/i386/pieces-memset-8.c: Likewise.
	* gcc.target/i386/pieces-memset-9.c: Likewise.
	* gcc.target/i386/pieces-memset-10.c: Likewise.
	* gcc.target/i386/pieces-memset-11.c: Likewise.
	* gcc.target/i386/pieces-memset-12.c: Likewise.
	* gcc.target/i386/pieces-memset-13.c: Likewise.
	* gcc.target/i386/pieces-memset-14.c: Likewise.
	* gcc.target/i386/pieces-memset-15.c: Likewise.
	* gcc.target/i386/pieces-memset-16.c: Likewise.
	* gcc.target/i386/pieces-memset-17.c: Likewise.
	* gcc.target/i386/pieces-memset-18.c: Likewise.
	* gcc.target/i386/pieces-memset-19.c: Likewise.
	* gcc.target/i386/pieces-memset-20.c: Likewise.
	* gcc.target/i386/pieces-memset-21.c: Likewise.
	* gcc.target/i386/pieces-memset-22.c: Likewise.
	* gcc.target/i386/pieces-memset-23.c: Likewise.
	* gcc.target/i386/pieces-memset-24.c: Likewise.
	* gcc.target/i386/pieces-memset-25.c: Likewise.
	* gcc.target/i386/pieces-memset-26.c: Likewise.
	* gcc.target/i386/pieces-memset-27.c: Likewise.
	* gcc.target/i386/pieces-memset-28.c: Likewise.
	* gcc.target/i386/pieces-memset-29.c: Likewise.
	* gcc.target/i386/pieces-memset-30.c: Likewise.
	* gcc.target/i386/pieces-memset-31.c: Likewise.
	* gcc.target/i386/pieces-memset-32.c: Likewise.
	* gcc.target/i386/pieces-memset-33.c: Likewise.
	* gcc.target/i386/pieces-memset-34.c: Likewise.
	* gcc.target/i386/pieces-memset-35.c: Likewise.
	* gcc.target/i386/pieces-memset-36.c: Likewise.
	* gcc.target/i386/pieces-memset-37.c: Likewise.
	* gcc.target/i386/pieces-memset-38.c: Likewise.
	* gcc.target/i386/pieces-memset-39.c: Likewise.
	* gcc.target/i386/pieces-memset-40.c: Likewise.
	* gcc.target/i386/pieces-memset-41.c: Likewise.
	* gcc.target/i386/pieces-memset-42.c: Likewise.
	* gcc.target/i386/pieces-memset-43.c: Likewise.
	* gcc.target/i386/pieces-memset-44.c: Likewise.
2021-08-02 10:40:32 -07:00
H.J. Lu
bf159e5e12 x86: Add AVX2 tests for PR middle-end/90773
PR middle-end/90773
	* gcc.target/i386/pr90773-20.c: New test.
	* gcc.target/i386/pr90773-21.c: Likewise.
	* gcc.target/i386/pr90773-22.c: Likewise.
	* gcc.target/i386/pr90773-23.c: Likewise.
	* gcc.target/i386/pr90773-26.c: Likewise.
2021-08-02 10:38:19 -07:00
H.J. Lu
29f0e955c9 x86: Update piecewise move and store
We can use TImode/OImode/XImode integers for piecewise move and store.

1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
bytes that a single instruction can move quickly between memory and
registers or between two memory locations.
2. Define MOVE_MAX to the maximum number of bytes we can move from memory
to memory in one reasonably fast instruction.  The difference between
MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX must be a constant,
independent of compiler options, since it is used in reload.h to define
struct target_reload and MOVE_MAX can vary, depending on compiler options.
3. When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

	* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
	check stack_realign_needed for stack realignment.
	(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
	than the largest integer supported by vector register.
	* config/i386/i386.h (MAX_MOVE_MAX): New.  Set to 64.
	(MOVE_MAX): Set to bytes of the largest integer supported by
	vector register.
	(STORE_MAX_PIECES): New.

gcc/testsuite/

	* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
	* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
	* gcc.target/i386/pr90773-15.c: Likewise.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-24.c: Likewise.
	* gcc.target/i386/pr90773-25.c: Likewise.
	* gcc.target/i386/pr100865-1.c: Likewise.
	* gcc.target/i386/pr100865-2.c: Likewise.
	* gcc.target/i386/pr100865-3.c: Likewise.
	* gcc.target/i386/pr90773-14.c: Also run for 32-bit and expect
	XMM movd to store 4 bytes.
	* gcc.target/i386/pr100865-4a.c: Also run for 32-bit and expect
	YMM registers.
	* gcc.target/i386/pr100865-4b.c: Likewise.
	* gcc.target/i386/pr100865-10a.c: Expect YMM registers.
	* gcc.target/i386/pr100865-10b.c: Likewise.
2021-08-02 10:38:19 -07:00
H.J. Lu
7f4c3943f7 x86: Avoid stack realignment when copying data
To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

	* config/i386/i386-expand.c (ix86_expand_vector_move): Call
	ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
	data from one memory location to another.

gcc/testsuite/

	* gcc.target/i386/eh_return-1.c: New test.
2021-08-02 10:38:18 -07:00
H.J. Lu
1bee034e01 x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX
Define TARGET_GEN_MEMSET_SCRATCH_RTX to ix86_gen_scratch_sse_rtx to
return a scratch SSE register for memset.

gcc/

	PR middle-end/90773
	* config/i386/i386.c (TARGET_GEN_MEMSET_SCRATCH_RTX): New.

gcc/testsuite/

	PR middle-end/90773
	* gcc.target/i386/pr90773-5.c: Updated to expect XMM register.
	* gcc.target/i386/pr90773-14.c: Likewise.
	* gcc.target/i386/pr90773-15.c: New test.
	* gcc.target/i386/pr90773-16.c: Likewise.
	* gcc.target/i386/pr90773-17.c: Likewise.
	* gcc.target/i386/pr90773-18.c: Likewise.
	* gcc.target/i386/pr90773-19.c: Likewise.
2021-08-02 10:38:06 -07:00
Jonathan Wakely
38fb24ba4d libstdc++: Fix filesystem::temp_directory_path [PR101709]
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR libstdc++/101709
	* src/filesystem/ops-common.h (get_temp_directory_from_env):
	Add error_code parameter.
	* src/c++17/fs_ops.cc (fs::temp_directory_path): Pass error_code
	argument to get_temp_directory_from_env and check it.
	* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
2021-08-02 16:33:44 +01:00
Jonathan Wakely
2aaf69133f libstc++: Add dg-error for additional error in C++11 mode
When the comparison with a nullptr_t is ill-formed, there is an
additional error for C++11 mode due to the constexpr function body being
invalid.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
	Add dg-error for c++11_only target.
2021-08-02 16:22:24 +01:00
Aldy Hernandez
cac2353f8b Remove --param=threader-iterative.
This was meant to be an internal construct, but I see folks are using
it and submitting PRs against it.  Let's just remove this to avoid
further confusion.

Tested on x86-64 Linux.

gcc/ChangeLog:

	PR tree-optimization/101724
	* params.opt: Remove --param=threader-iterative.
	* tree-ssa-threadbackward.c (pass_thread_jumps::execute): Remove
	iterative mode.
2021-08-02 16:58:07 +02:00
Tom de Vries
7d8577dd46 [gcc/doc] Improve nonnull attribute documentation
Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).

Mention -Wnonnull-compare.

gcc/ChangeLog:

2021-07-28  Tom de Vries  <tdevries@suse.de>

	PR middle-end/101665
	* doc/extend.texi (nonnull attribute): Improve documentation.
2021-08-02 16:49:27 +02:00
Andrew Pinski
99b520f031 Fix PR 101683: FP exceptions for float->unsigned
Just like the old bug PR9651, unsigned_fix rtl should
also be handled as a trapping instruction.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

	PR rtl-optimization/101683
	* rtlanal.c (may_trap_p_1): Handle UNSIGNED_FIX.
2021-08-02 14:47:03 +00:00
Patrick Palka
f48c3cd2e3 c++: Improve memory usage of subsumption [PR100828]
Constraint subsumption is implemented in two steps.  The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint.   Performing these two
steps separately is problematic because in the first step the DNF/CNF
can be exponentially larger than the original constraint, and by
computing it ahead of time we'd have to keep all of it in memory.

This patch fixes this exponential blowup in memory usage by interleaving
the two steps, so that as soon as we decompose one clause we check
implication for it.  In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs.  (Note the
_time_ complexity of subsumption is still exponential in the worst case.)

In order for this to work we need to make formula::branch() insert the
copy of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it.  Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.

	PR c++/100828

gcc/cp/ChangeLog:

	* logic.cc (formula::formula): Use emplace_back instead of
	push_back.
	(formula::branch): Insert a copy of m_current directly after
	m_current instead of at the end of the list.
	(formula::erase): Define.
	(decompose_formula): Remove.
	(decompose_antecedents): Remove.
	(decompose_consequents): Remove.
	(derive_proofs): Remove.
	(max_problem_size): Remove.
	(diagnose_constraint_size): Remove.
	(subsumes_constraints_nonnull): Rewrite directly in terms of
	decompose_clause and derive_proof, interleaving decomposition
	with implication checking.  Remove limit on constraint complexity.
	Use formula::erase to free the current clause before moving on to
	the next one.
2021-08-02 09:59:56 -04:00
Roger Sayle
f9fcf75482 Optimize x ? bswap(x) : 0 in tree-ssa-phiopt
Many thanks again to Jakub Jelinek for a speedy fix for PR 101642.
Interestingly, that test case "bswap16(x) ? : x" also reveals a
missed optimization opportunity.  The resulting "x ? bswap(x) : 0"
can be further simplified to just bswap(x).

Conveniently, tree-ssa-phiopt.c already recognizes/optimizes the
related "x ? popcount(x) : 0", so this patch simply makes that
transformation make general, additionally handling bswap, parity,
ffs and clrsb.  All of the required infrastructure is already
present thanks to Jakub previously adding support for clz/ctz.
To reflect this generalization, the name of the function is changed
from cond_removal_in_popcount_clz_ctz_pattern to the hopefully
equally descriptive cond_removal_in_builtin_zero_pattern.

2021-08-02  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern):
	Renamed from cond_removal_in_popcount_clz_ctz_pattern.
	Add support for BSWAP, FFS, PARITY and CLRSB builtins.
	(tree_ssa_phiop_worker): Update call to function above.

gcc/testsuite/ChangeLog
	* gcc.dg/tree-ssa/phi-opt-25.c: New test case.
2021-08-02 13:30:38 +01:00
H.J. Lu
6f0c43e978 i386: Improve SImode constant - __builtin_clzll for -mno-lzcnt
Add a zero_extend patten for bsr_rex64_1 and use it to split SImode
constant - __builtin_clzll to avoid unncessary zero_extend.

gcc/

	PR target/78103
	* config/i386/i386.md (bsr_rex64_1_zext): New.
	(combine splitter for constant - clzll): Replace gen_bsr_rex64_1
	with gen_bsr_rex64_1_zext.

gcc/testsuite/

	PR target/78103
	* gcc.target/i386/pr78103-2.c: Also scan incl.
	* gcc.target/i386/pr78103-3.c: Scan leal|addl|incl for x32.  Also
	scan incq.
2021-08-01 13:32:55 -07:00
Jonathan Wakely
8dd1644734 Add missing descriptions gcc/testsuite/ChangeLog 2021-08-01 19:37:52 +01:00
Joseph Myers
9a89a0643c Update gcc fr.po.
* fr.po: Update.
2021-07-31 19:30:11 +00:00
Jason Merrill
af76342b44 c++: ICE on anon struct with base [PR96636]
pinski pointed out that my recent change to reject anonymous structs with
bases was relevant to this PR.  But we still ICEd after giving that error;
this fixes the ICE.

	PR c++/96636

gcc/cp/ChangeLog:

	* decl.c (fixup_anonymous_aggr): Clear TYPE_NEEDS_CONSTRUCTING
	after error.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/anon-struct9.C: New test.
2021-07-31 10:43:42 -04:00
Jason Merrill
5b759cdcb7 c++: pretty-print TYPE_PACK_EXPANSION better
gcc/cp/ChangeLog:

	* ptree.c (cxx_print_type) [TYPE_PACK_EXPANSION]: Also print
	PACK_EXPANSION_PATTERN.
2021-07-31 10:43:07 -04:00
Roger Sayle
4c4249b71d [Committed] Tweak new test case gcc.target/i386/dec-cmov-2.c
With -m32, this test case is sensitive to the instruction timings of
the target (for ifcvt to normalize bar() to foo() during the ce1 pass,
prior to the transformations actually being tested here).  Specifying
-march=core2 prevents these failures.  Committed as obvious.

2021-07-31  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
	* gcc.target/i386/dec-cmov-2.c: Require -march=core2 with -m32.
2021-07-31 11:09:31 +01:00
Jakub Jelinek
05bcef5a88 openmp: Handle OpenMP directives in attribute syntax in attribute-declaration
Now that we parse attribute-declaration (outside of functions), the following
patch handles OpenMP directives in its attribute(s).
What needs handling incrementally is diagnose mismatching begin/end pair
like
 [[omp::directive (declare target)]];
 int a;
 #pragma omp end declare target
or
 #pragma omp declare target
 int b;
 [[omp::directive (end declare target)]];
and handling declare simd/declare variant on declarations (function
definitions and declarations), for those in two different spots.

2021-07-31  Jakub Jelinek  <jakub@redhat.com>

	* parser.c (cp_parser_declaration): Handle OpenMP directives
	in attribute-declaration.

	* g++.dg/gomp/attrs-9.C: New test.
2021-07-31 09:35:25 +02:00