Commit Graph

213455 Commits

Author SHA1 Message Date
Jakub Jelinek
b64980b077 testsuite: Fix optimize_one.c FAIL on i686-linux
The test FAILs on i686-linux because -mfpmath=sse is used without
-msse2 being enabled.

2024-09-02  Jakub Jelinek  <jakub@redhat.com>

	* gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.
2024-09-02 20:14:49 +02:00
Alexandre Oliva
af1500dd8c [libstdc++-v3] [testsuite] improve future/*/poll.cc calibration
30_threads/future/members/poll.cc has calibration code that, on
systems with very low clock resolution, may spuriously fail to run.
Even when it does run, low resolution and reasonable
timeouts limit severely the viability of increasing the loop counts so
as to reduce measurement noise, so we end up with very noisy results.

On various vxworks targets, high iteration count (low-noise)
measurements confirmed that some of the operations that we expected to
be up to 100x slower than the fastest ones can run a little slower
than that and, with significant noise, may seem to be even slower,
comparatively.

Bump the factors up to 200x, so that we have plenty of margin over
measured results.


for  libstdc++-v3/ChangeLog

	* testsuite/30_threads/future/members/poll.cc: Factor out
	calibration, and run it unconditionally.  Lower its
	strictness.  Bump wait_until_*'s slowness factor.
2024-09-02 11:32:03 -03:00
Alexandre Oliva
410061b15a [libstdc++] [testsuite] avoid async.cc loss of precision [PR91486]
When we get to test_pr91486_wait_until(), we're about 10s past the
float_steady_clock epoch.  This is enough for the 1s delta for the
timeout to come out slightly lower when the futex-less wait_until
converts the deadline from float_steady_clock to __clock_t.  So we may
wake up a little too early, and end up looping one extra time to sleep
for e.g. another 954ns until we hit the deadline.

Each iteration calls float_steady_clock::now(), bumping the call_count
that we VERIFY() at the end of the subtest.  Since we expect at most 3
calls, and we're going to have at the very least 3 on futex-less
targets (one in the test proper, one before wait_until_impl to compute
the deadline, and one after wait_until_impl to check whether the
deadline was hit), any such imprecision that causes an extra iteration
will reach 5 and cause the test to fail.

Initializing the epoch in the beginning of the test makes such
spurious fails due to loss of precision far less likely.  I don't
suppose allowing for an extra couple of calls would be desirable.

While at that, I'm annotating unused status variables as such.


for  libstdc++-v3/ChangeLog

	PR libstdc++/91486
	* testsuite/30_threads/async/async.cc
	(test_pr91486_wait_for): Mark status as unused.
	(test_pr91486_wait_until): Likewise.  Initialize epoch later.
2024-09-02 11:31:59 -03:00
Alexandre Oliva
9223d17159 [testsuite] add linkonly to dg-additional-sources [PR115295]
The D testsuite shows it was a mistake to assume that
dg-additional-sources are never to be used for compilation tests.
Even if an output file is specified for compilation, extra module
files can be named and used in the compilation without being flagged
as errors.

Introduce a 'linkonly' flag for dg-additional-sources, and use it in
pr95401.cc and other vector tests that default to run, so that its
additional sources get discarded when vector tests downgrade to
compile-only.  This reverts previous workarounds for this very
circumstance, that relied on being able to run vector tests anyway,
even after failing to detect runtime or hardware vector support.


for  gcc/ChangeLog

	PR d/115295
	* doc/sourcebuild.texi (dg-additional-sources): Add linkonly.

for  gcc/testsuite/ChangeLog

	PR d/115295
	* g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources.
	* g++.dg/vect/pr68762-1.cc: Likewise.
	* g++.dg/vect/simd-clone-3.cc: Likewise.
	* g++.dg/vect/simd-clone-5.cc: Likewise.
	* gcc.dg/vect/vect-simd-clone-10.c: Likewise.  Drop dg-do run.
	* gcc.dg/vect/vect-simd-clone-12.c: Likewise.  Likewise.
	* lib/gcc-defs.exp (additional_sources_omit_on_compile): New.
	(dg-additional-sources): Add to it on linkonly.
	(dg-additional-files-options): Omit select sources on compile.
2024-09-02 11:31:51 -03:00
Andrew Stubbs
b9bf0c3f54 amdgcn: Remove TARGET_GCN5_PLUS
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so
we can make that code unconditional, and remove all the "else" cases.

The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS,
TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also
redundant and can be made unconditional.

The naming of the "gcc_version" attribute has been confusing since the "rdna"
attribute was added and this makes it worse, so it has been renamed to "cdna".

The add-with-carry assembler mnemonics no longer have two forms, so '%^' can be
removed.

gcc/ChangeLog:

	* config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete.
	(TARGET_GLOBAL_ADDRSPACE): Delete.
	(TARGET_FLAT_OFFSETS): Delete.
	(TARGET_EXPLICIT_CARRY): Delete.
	(TARGET_MULTIPLY_IMMEDIATE): Delete.
	* config/gcn/gcn-valu.md (*mov<mode>): Rename "gcn_version" to "cdna".
	(*mov<mode>_4reg): Likewise.
	(@mov<mode>_sgprbase): Likwise.
	(gather<mode>_insn_1offset<exec>): Likewise.
	(gather<mode>_insn_1offset_ds<exec>): Likewise.
	(gather<mode>_insn_2offsets<exec>): Likewise.
	(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
	(scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise.
	(scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
	(gather<mode>_insn_1offset<exec>): Remove TARGET_FLAT_OFFSETS
	conditionals.
	(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
	(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
	(add<mode>3<exec_clobber>): Use "_co" instead of "%^".
	(add<mode>3_dup<exec_clobber>): Likewise.
	(add<mode>3_vcc<exec_vcc>): Likewise.
	(add<mode>3_vcc_dup<exec_vcc>): Likewise.
	(addc<mode>3<exec_vcc>): Likewise.
	(sub<mode>3<exec_clobber>): Likewise.
	(sub<mode>3_vcc<exec_vcc>): Likewise.
	(subc<mode>3<exec_vcc>): Likewise.
	(*plus_carry_dpp_shr_<mode>): Likewise.
	(*plus_carry_in_dpp_shr_<mode>): Likewise.
	* config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS
	conditionals.
	(gcn_addr_space_legitimate_address_p): Likewise.
	(gcn_addr_space_legitimize_address): Likewise.
	(gcn_expand_scalar_to_vector_address): Likewise.
	(print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also.
	(print_operand): Remove "%^" operand code.
	Remove TARGET_GLOBAL_ADDRSPACE assertion.
	* config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional.
	* config/gcn/gcn.md (gcn_version): Rename attribute ...
	(cdna): ... to this, and remove the gcn3 and gcn5 values.
	(enabled): Replace old "gcn_version" logic with new "cdna" logic.
	(*mov<mode>_insn): Rename "gcn_version" to "cdna".
	(*movti_insn): Likewise.
	(addsi3): Use "_co" instead of "%^".
	(addsi3_scalar_carry): Likewise.
	(addsi3_scalar_carry_cst): Likewise.
	(addcsi3_scalar): Likewise.
	(addcsi3_scalar_zero): Likewise.
	(addptrdi3): Likewise.
	(subsi3): Likewise.
	(<su>mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions.
	(<su>mulsi3_highpart_reg): Remove "gcn_version" attribute.
	(muldi3): Likewise.
	(atomic_fetch_<bare_mnemonic><mode>): Likewise.
	(atomic_<bare_mnemonic><mode>): Likewise.
	(sync_compare_and_swap<mode>_insn): Likewise.
	(atomic_load<mode>): Likewise.
	(atomic_store<mode>): Likewise.
	(atomic_exchange<mode>): Likewise.
	(<su>mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and
	"gcn_version".
	(<su>mulsidi3): Likewise.
	(<su>mulsidi3_imm): Likewise.
2024-09-02 13:08:46 +00:00
Andrew Stubbs
023641d97c amdgcn: Remove TARGET_GCN3
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific
code and features can be removed from the back-end.

gcc/ChangeLog:

	* config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3.
	(TARGET_GCN3): Delete.
	(TARGET_GCN3_PLUS): Delete.
	(TARGET_M0_LDS_LIMIT): Delete.
	* config/gcn/gcn-valu.md
	(gather<mode>_insn_1offset<exec>): Remove TARGET_GCN3 from conditions.
	(*<reduc_op>_dpp_shr_<mode>): Likewise.
	* config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5.
	(gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature.
	(gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions.
	* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3.
2024-09-02 13:08:46 +00:00
Andrew Stubbs
57af002207 amdgcn: remove gfx803 "Fiji" support
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and
hasn't worked properly with the drivers since about ROCm 4.

This patch removes the device from GCC options and documentation, and removes
the direct mentions from the internals.

The TARGET_GCN3 support in the back-end is now unused and can be removed (in a
follow-up patch).

gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks.
	* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative.
	(NO_XNACK): Likewise.
	(NO_SRAM_ECC): Likewise.
	(ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC.
	* config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI.
	(TARGET_FIJI): Delete.
	* config/gcn/gcn.cc (gcn_option_override): Remove Fiji.
	(gcn_omp_device_kind_arch_isa): Likewise.
	(output_file_start): Likewise.
	* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise.
	* config/gcn/gcn.opt (gpu_type): Likewise.
	(march, mtune): Change default to PROCESSOR_VEGA10.
	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete.
	(copy_early_debug_info): Remove elf_flags_actual.
	Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally.
	(get_arch): Remove Fiji.
	(main): Remove gfx803.
	* config/gcn/t-omp-device
	(omp-device-properties-gcn): Remove fiji and gfx803.
	* doc/install.texi (amdgcn*-*-*): Remove fiji and special instructions.
	* doc/invoke.texi: Remove fiji.

libgomp/ChangeLog:

	* libgomp.texi: Remove fiji and gfx803.
	* testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803.
	* testsuite/libgomp.c/declare-variant-4-fiji.c: Removed.
	* testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.
2024-09-02 13:08:46 +00:00
Gaius Mulley
78dc2e2575 PR modula2/116557 Remove physical address from the GPL header comment
This patch removes the physical address from all the header comments
in the m2 subdirectory.  The physical address is replaced with the
text "You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>." instead.

gcc/m2/ChangeLog:

	PR modula2/116557
	* gm2-lang.cc: Replace physical address with URL in GPL header.
	* gm2-lang.h: Ditto.
	* images/LICENSE.IMG: Ditto.
	* m2-tree.def: Ditto.
	* mc-boot/GIndexing.cc: Ditto.
	* mc-boot/Gkeyc.cc: Ditto.
	* mc-boot/Glists.cc: Ditto.
	* mc-boot/GmcComp.cc: Ditto.
	* mc-boot/GmcDebug.cc: Ditto.
	* mc-boot/GmcFileName.cc: Ditto.
	* mc-boot/GmcMetaError.cc: Ditto.
	* mc-boot/GmcOptions.cc: Ditto.
	* mc-boot/GmcPreprocess.cc: Ditto.
	* mc-boot/GmcPretty.cc: Ditto.
	* mc-boot/GmcPrintf.cc: Ditto.
	* mc-boot/GmcQuiet.cc: Ditto.
	* mc-boot/GmcReserved.cc: Ditto.
	* mc-boot/GmcSearch.cc: Ditto.
	* mc-boot/GmcStack.cc: Ditto.
	* mc/Indexing.mod: Ditto.
	* mc/keyc.mod: Ditto.
	* mc/lists.mod: Ditto.
	* mc/mcComp.mod: Ditto.
	* mc/mcDebug.mod: Ditto.
	* mc/mcFileName.mod: Ditto.
	* mc/mcMetaError.mod: Ditto.
	* mc/mcOptions.mod: Ditto.
	* mc/mcPreprocess.mod: Ditto.
	* mc/mcPretty.mod: Ditto.
	* mc/mcPrintf.mod: Ditto.
	* mc/mcQuiet.mod: Ditto.
	* mc/mcReserved.mod: Ditto.
	* mc/mcSearch.mod: Ditto.
	* mc/mcStack.mod: Ditto.
	* tools-src/buildpg: Ditto.
	* tools-src/calcpath: Ditto.
	* tools-src/checkmeta.py: Ditto.
	* tools-src/def2doc.py: Ditto.
	* tools-src/makeSystem: Ditto.
	* tools-src/tidydates.py: Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-09-02 13:29:25 +01:00
Andreas Schwab
4bf758b212 libsupc++: Fix handling of m68k extended real in <compare>
PR libstdc++/116513
	* libsupc++/compare (_S_fp_bits) [__fmt == _M68k_80bit]: Shift
	padding out of exponent word.
2024-09-02 11:50:27 +02:00
Alex Coplan
e4d3e7f9ad testsuite: Rename scanltranstree.exp -> scanltrans.exp
Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer
makes sense to have "tree" in the name.  This renames the file
accordingly and updates users.

libatomic/ChangeLog:

	* testsuite/lib/libatomic.exp: Load scanltrans.exp instead of
	scanltranstree.exp.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp: Load scanltrans.exp instead of
	scanltranstree.exp.

libitm/ChangeLog:

	* testsuite/lib/libitm.exp: Load scanltrans.exp instead of
	scanltranstree.exp.

libphobos/ChangeLog:

	* testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of
	scanltranstree.exp.

libvtv/ChangeLog:

	* testsuite/lib/libvtv.exp: Load scanltrans.exp instead of
	scanltranstree.exp.

gcc/testsuite/ChangeLog:

	* gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of
	scanltranstree.exp.
	* lib/gcc-dg.exp: Likewise.
	* lib/scanltranstree.exp: Rename to ...
	* lib/scanltrans.exp: ... this.
2024-09-02 10:07:09 +01:00
Richard Sandiford
2865719efb Rename gimple_asm_input_p to gimple_asm_basic_p
Following on from the earlier tree rename, this patch renames
gimple_asm_input_p to gimple_asm_basic_p, and similarly for
related names.

gcc/
	* doc/gimple.texi (gimple_asm_basic_p): Document.
	(gimple_asm_set_basic): Likewise.
	* gimple.h (GF_ASM_INPUT): Rename to...
	(GF_ASM_BASIC): ...this.
	(gimple_asm_set_input): Rename to...
	(gimple_asm_set_basic): ...this.
	(gimple_asm_input_p): Rename to...
	(gimple_asm_basic_p): ...this.
	* cfgexpand.cc (expand_asm_stmt): Update after above renaming.
	* gimple.cc (gimple_asm_clobbers_memory_p): Likewise.
	* gimplify.cc (gimplify_asm_expr): Likewise.
	* ipa-icf-gimple.cc (func_checker::compare_gimple_asm): Likewise.
	* tree-cfg.cc (stmt_can_terminate_bb_p): Likewise.
2024-09-02 09:56:56 +01:00
Richard Sandiford
a4b6c6ab0b Rename ASM_INPUT_P to ASM_BASIC_P
ASM_INPUT_P is so named because it causes the eventual rtl insn
pattern to be a top-level ASM_INPUT rather than an ASM_OPERANDS.
However, this name has caused confusion, partly due to earlier
documentation.  The name also sounds related to ASM_INPUTS but
is for a different piece of state.

This patch renames it to ASM_BASIC_P, with the inverse meaning
an extended asm.  ("Basic asm" is the term used in extend.texi.)

gcc/
	* doc/generic.texi (ASM_BASIC_P): Document.
	* tree.h (ASM_INPUT_P): Rename to...
	(ASM_BASIC_P): ...this.
	(ASM_VOLATILE_P, ASM_INLINE_P): Reindent.
	* gimplify.cc (gimplify_asm_expr): Update after above renaming.
	* tree-core.h (tree_base): Likewise.

gcc/c/
	* c-typeck.cc (build_asm_expr): Rename ASM_INPUT_P to ASM_BASIC_P.

gcc/cp/
	* pt.cc (tsubst_stmt): Rename ASM_INPUT_P to ASM_BASIC_P.
	* parser.cc (cp_parser_asm_definition): Likewise.

gcc/d/
	* toir.cc (IRVisitor): Rename ASM_INPUT_P to ASM_BASIC_P.

gcc/jit/
	* jit-playback.cc (playback::block::add_extended_asm):  Rename
	ASM_INPUT_P to ASM_BASIC_P.

gcc/m2/
	* gm2-gcc/m2block.cc (flush_pending_note): Rename ASM_INPUT_P
	to ASM_BASIC_P.
	* gm2-gcc/m2statement.cc (m2statement_BuildAsm): Likewise.
2024-09-02 09:56:56 +01:00
Tobias Burnus
5cbfb3a799 lto/lto.cc: Fix build with not HAVE_WORKING_FORK
gcc/lto/ChangeLog:

	* lto.cc: Add missing HAVE_WORKING_FORK.
2024-09-02 10:29:36 +02:00
Tobias Burnus
6640a59fed lto-wrapper: Honor -save-temps for ltrans' makefile
gcc/ChangeLog:

	* lto-wrapper.cc (run_gcc): Honor -save-temps for
	makefile name.
2024-09-02 10:28:29 +02:00
Eric Botcazou
571d0450b2 ada: Diagnose too large size clause on floating-point type
The problem is that the size clause changes the floating-point format used
for the type, but it must not when this format is the widest format that is
supported in hardware on the target.  Instead a padding type must be built
and the associated warning given.

gcc/ada/

	* gcc-interface/decl.cc (gnat_to_gnu_entity): Cap the Esize of a
	floating-point type to the size of the widest format supported in
	hardware if it is explicity defined.
2024-09-02 10:22:50 +02:00
Viljar Indus
1c9a6d8203 ada: Create usage entry for -gnatw_l
gcc/ada/

	* doc/gnat_ugn/building_executable_programs_with_gnat.rst: update
	documentation for the -gnatw_l switch.
	* usage.adb: Add -gnatw_l entry.
	* gnat_ugn.texi: Regenerate.
2024-09-02 10:22:50 +02:00
Ronan Desplanques
2df253f35e ada: Fix standard output stream for gnatcmd output
Before this patch, the gnat command sent to standard error pieces of
information that are a better match for standard output. This patch
makes this information go to standard output.

gcc/ada/

	* gnatcmd.adb (GNATCmd): Fix standard output stream.
2024-09-02 10:22:50 +02:00
Ronan Desplanques
91f0a3a5a5 ada: Fix minor issues in -gnaty0's documentation
Before this patch, the documentation of -gnaty0 used 0-based indexing
for column numbers while 1-based indexing is used everywhere else. This
patch makes this documentation use 1-based indexing, and also adds a
missing parenthesis.

gcc/ada/

	* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix
	minor issues.
	* gnat_ugn.texi: Regenerate.
2024-09-02 10:22:50 +02:00
Bob Duff
a004c28c50 ada: Documentation for generic type inference
...plus minor improvements to existing documentation.

gcc/ada/

	* doc/gnat_rm/gnat_language_extensions.rst: I assume "extended set
	of extensions" was a typo for "experimental set of extensions",
	because "extended extensions" is repetitive and redundant. "in
	addition" clarifies that the one subsumes the other. Add a
	reminder at the start of each subsection about what switch/pragma
	enables what extensions. Add new section about "Inference of
	Dependent Types in Generic Instantiations".
	* gnat_rm.texi: Regenerate.
2024-09-02 10:22:50 +02:00
Patrick Bernardi
34437eb472 ada: Small fixes for FreeBSD
Size of pthread data types now need to be defined for FreeBSD ports.
Traceback support for AArch64 FreeBSD is now defined.

gcc/ada/

	* s-oscons-tmplt.c: Define sizes of pthread data types on FreeBSD.
	* tracebak.c: Use GCC unwinder and adjust PC appropriately on
	aarch64-freebsd.
2024-09-02 10:22:50 +02:00
Marc Poulhiès
cb690aa1ce ada: Also reset scope for some nested declaration
When changing the scope for entities found in the entry body that is
mutated into a procedure, the compiler needs to look deeper than only
the top level entities as expansion may produce object declarations
which scopes are also the entry. For example, the tree after expansion
may look like:

  procedure This_Is_An_Entry_Proc is
     ...
     O1 : Typ := do
       TMP1 : OTyp := ...;
       ...
     in TMP1;

O1's scope needs to be reset to This_Is_An_Entry_Proc, but so does
TMP1's scope.

This change also fix a small oversight where
N_Implicit_Label_Declaration scope must be reset and its content
skipped.

gcc/ada/

	* exp_ch9.adb (Reset_Scopes_To): Adjust comment.
	(Reset_Scopes_To.Reset_Scope): Adjust the scope reset for object
	declaration. In particular, visit the children nodes if any. Also
	extend the handling of other declarations to
	N_Implicit_Label_Declaration.
2024-09-02 10:22:49 +02:00
Piotr Trojanek
905ab329cc ada: Cleanup expansion of object declarations
Replace repeated calls to Sloc with uses of local constant Loc.

Code cleanup; behavior is unaffected.

gcc/ada/

	* exp_ch3.adb (Expand_N_Object_Declaration): Replace calls to Sloc
	with uses of Loc; turn variable Prag into constant.
2024-09-02 10:22:49 +02:00
Piotr Trojanek
78acc6d85f ada: Remove repeated guards in validity checks
Routine Insert_Valid_Check only applies checks when Expr_Known_Valid
query returns False; there is no need to call this query before
inserting checks.

Code cleanup; behavior is unaffected.

gcc/ada/

	* exp_imgv.adb (Expand_User_Defined_Enumeration_Image)
	(Expand_Image_Attribute): Remove redundant guards.
2024-09-02 10:22:49 +02:00
Jakub Jelinek
25d51fb7d0 ranger: Fix up range computation for CLZ [PR116486]
The initial CLZ gimple-range-op.cc implementation handled just the
case where second argument to .CLZ is equal to prec, but in
r15-1014 I've added also handling of the -1 case.  As the following
testcase shows, incorrectly though for the case where the first argument
has [0,0] range.  If the second argument is prec, then the result should
be [prec,prec] and that was handled correctly, but when the second argument
is -1, the result should be [-1,-1] but instead it was incorrectly computed
as [prec-1,prec-1] (when second argument is prec, mini is 0 and maxi is
prec, while when second argument is -1, mini is -1 and maxi is prec-1).

Fixed thusly (the actual handling is then similar to the CTZ [0,0] case).

2024-09-02  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/116486
	* gimple-range-op.cc (cfn_clz::fold_range): If lh is [0,0]
	and mini is -1, return [-1,-1] range rather than [prec-1,prec-1].

	* gcc.dg/bitint-109.c: New test.
2024-09-02 09:44:09 +02:00
Richard Biener
9aaedfc414 load and store-lanes with SLP
The following is a prototype for how to represent load/store-lanes
within SLP.  I've for now settled with having a single load node
with multiple permute nodes acting as selection, one for each loaded lane
and a single store node fed from all stored lanes.  For

  for (int i = 0; i < 1024; ++i)
    {
      a[2*i] = b[2*i] + 7;
      a[2*i+1] = b[2*i+1] * 3;
    }

you have the following SLP graph where I explain how things are set
up and code-generated:

t.c:23:21: note:   SLP graph after lowering permutations:
t.c:23:21: note:   node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int
t.c:23:21: note:   op template: *_6 = _7;
t.c:23:21: note:        stmt 0 *_6 = _7;
t.c:23:21: note:        stmt 1 *_12 = _13;
t.c:23:21: note:        children 0x50dc488 0x50dc6e8

This is the store node, it's marked with ldst_lanes = true during
SLP discovery.  This node code-generates

  vect_array.65[0] = vect__7.61_29;
  vect_array.65[1] = vect__13.62_28;
  MEM <int[8]> [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65);

...
t.c:23:21: note:   node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:        stmt 0 _5 = *_4;
t.c:23:21: note:        lane permutation { 0[0] }
t.c:23:21: note:        children 0x50dc948
t.c:23:21: note:   node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:        stmt 0 _11 = *_10;
t.c:23:21: note:        lane permutation { 0[1] }
t.c:23:21: note:        children 0x50dc948

These are the selection nodes, marked with ldst_lanes = true.
They code generate nothing.

t.c:23:21: note:   node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int
t.c:23:21: note:   op template: _5 = *_4;
t.c:23:21: note:        stmt 0 _5 = *_4;
t.c:23:21: note:        stmt 1 _11 = *_10;
t.c:23:21: note:        load permutation { 0 1 }

This is the load node, marked with ldst_lanes = true (the load
permutation is only accurate when taking into account the lane permute
in the selection nodes).  It code generates

  vect_array.58 = .LOAD_LANES (MEM <int[8]> [(int *)vectp_b.56_33]);
  vect__5.59_31 = vect_array.58[0];
  vect__5.60_30 = vect_array.58[1];

This scheme allows to leave code generation in vectorizable_load/store
mostly as-is.

While this should support both load-lanes and (masked) store-lanes
the decision to do either is done during SLP discovery time and
cannot be reversed without altering the SLP tree - as-is the SLP
tree is not usable for non-store-lanes on the store side, the
load side is OK representation-wise but will very likely fail
permute handling as the lowering to deal with the two input vector
restriction isn't done - but of course since the permute node is
marked as to be ignored that doesn't work out.  So I've put
restrictions in place that fail vectorization if a load/store-lane
SLP tree is later classified differently by get_load_store_type.

I'll note that for example gcc.target/aarch64/sve/mask_struct_store_3.c
will not get SLP store-lanes used because the full store SLPs just
fine though we then fail to handle the "splat" load-permutation

t2.c:5:21: note:   node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int
t2.c:5:21: note:   op template: _6 = *_5;
t2.c:5:21: note:        stmt 0 _6 = *_5;
t2.c:5:21: note:        stmt 1 _6 = *_5;
t2.c:5:21: note:        stmt 2 _6 = *_5;
t2.c:5:21: note:        stmt 3 _6 = *_5;
t2.c:5:21: note:        load permutation { 0 0 0 0 }

the load permute lowering code currently doesn't consider it worth
lowering single loads from a group (or in this case not grouped loads).
The expectation is the target can handle this by two interleaves with
itself.

So what we see here is that while the explicit SLP representation is
helpful in some cases, in cases like this it would require changing
it when we make decisions how to vectorize.  My idea is that this
all will change a lot when we re-do SLP discovery (for loops) and
when we get rid of non-SLP as I think vectorizable_* should be
allowed to alter the SLP graph during analysis.

The patch also removes the code cancelling SLP if we can use
load/store-lanes from the main loop vector analysis code and
re-implements it as re-discovering the SLP instance with
forced single-lane splits so SLP load/store-lanes scheme can be
used.

This is now done after SLP discovery and SLP pattern recog are
complete to not disturb the latter but per SLP instance instead
of being a global decision on the whole loop.

This is a behavioral change that for example shows in
gcc.dg/vect/slp-perm-6.c on ARM where we formerly used SLP permutes
but now a mix of SLP without permutes and load/store lanes.  The
previous flaky heuristic is now flaky in a different way.

Testing on RISC-V and aarch64 reveal several testcases that require
adjustment as to now expect SLP even when load/store lanes are being
used.  If in doubt I've adjusted them to the final expectation which
will lead to one or two new FAILs where we still do the SLP cancelling.
I have a followup that implements that while remaining in SLP that's
in final testing.

Note that gcc.dg/vect/slp-42.c and gcc.dg/vect/pr68445.c will FAIL
on aarch64 with SVE because for some odd reason vect_stridedN
is true for any N for check_effective_target_vect_fully_masked
targets but SVE cannot do ld8 while risc-v can.

I have not bothered to adjust target tests that now fail assembly-scan.

	* tree-vectorizer.h (_slp_tree::ldst_lanes): New flag to mark
	load, store and permute nodes.
	* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize ldst_lanes.
	(vect_build_slp_instance): For stores iff the target prefers
	store-lanes discover single-lane sub-groups, do not perform
	interleaving lowering but mark the node with ldst_lanes.
	Also allow i == 0 - fatal failure - for splitting up a store group
	when we're not doing single-lane discovery already.
	(vect_lower_load_permutations): When the target supports
	load lanes and the loads all fit the pattern split out
	a single level of permutes only and mark the load and
	permute nodes with ldst_lanes.
	(vectorizable_slp_permutation_1): Handle the load-lane permute
	forwarding of vector defs.
	(vect_analyze_slp): After SLP pattern recog is finished see if
	there are any SLP instances that would benefit from using
	load/store-lanes and re-discover those with forced single lanes.
	* tree-vect-stmts.cc (get_group_load_store_type): Support
	load/store-lanes for SLP.
	(vectorizable_store): Support SLP code generation for store-lanes.
	(vectorizable_load): Support SLP code generation for load-lanes.
	* tree-vect-loop.cc (vect_analyze_loop_2): Do not cancel SLP
	when store-lanes can be used.

	* gcc.dg/vect/slp-55.c: New testcase.
	* gcc.dg/vect/slp-56.c: Likewise.
	* gcc.dg/vect/slp-11c.c: Adjust.
	* gcc.dg/vect/slp-53.c: Likewise.
	* gcc.dg/vect/slp-cond-1.c: Likewise.
	* gcc.dg/vect/vect-complex-5.c: Likewise.
	* gcc.dg/vect/slp-1.c: Likewise.
	* gcc.dg/vect/slp-54.c: Remove riscv XFAIL.
	* gcc.dg/vect/slp-perm-5.c: Adjust.
	* gcc.dg/vect/slp-perm-7.c: Likewise.
	* gcc.dg/vect/slp-perm-8.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11.c: Likewise.
	* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
	* gcc.dg/vect/slp-perm-9.c: Remove expected SLP fail due to
	three-vector permute.
	* gcc.dg/vect/slp-perm-6.c: Remove XFAIL.
	* gcc.dg/vect/slp-perm-1.c: Adjust.
	* gcc.dg/vect/slp-perm-2.c: Likewise.
	* gcc.dg/vect/slp-perm-3.c: Likewise.
	* gcc.dg/vect/slp-perm-4.c: Likewise.
	* gcc.dg/vect/pr68445.c: Likewise.
	* gcc.dg/vect/slp-11b.c: Likewise.
	* gcc.dg/vect/slp-2.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-33.c: Likewise.
	* gcc.dg/vect/slp-42.c: Likewise.
	* gcc.dg/vect/slp-46.c: Likewise.
	* gcc.dg/vect/slp-perm-10.c: Likewise.
2024-09-02 08:50:32 +02:00
Richard Biener
464067a242 lower SLP load permutation to interleaving
The following emulates classical interleaving for SLP load permutes
that we are unlikely handling natively.  This is to handle cases
where interleaving (or load/store-lanes) is the optimal choice for
vectorizing even when we are doing that within SLP.  An example
would be

void foo (int * __restrict a, int * b)
{
  for (int i = 0; i < 16; ++i)
    {
      a[4*i + 0] = b[4*i + 0] * 3;
      a[4*i + 1] = b[4*i + 1] + 3;
      a[4*i + 2] = (b[4*i + 2] * 3 + 3);
      a[4*i + 3] = b[4*i + 3] * 3;
    }
}

where currently the SLP store is merging four single-lane SLP
sub-graphs but none of the loads in it can be code-generated
with V4SImode vectors and a VF of four as the permutes would need
three vectors.

The patch introduces a lowering phase after SLP discovery but
before SLP pattern recognition or permute optimization that
analyzes all loads from the same dataref group and creates an
interleaving scheme starting from an unpermuted load.

What can be handled is power-of-two group size and a group size of
three.  The possibility for doing the interleaving with a load-lanes
like instruction is done as followup.

For a group-size of three this is done by using
the non-interleaving fallback code which then creates at VF == 4 from
{ { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
to produce { c0, c1, c2, c3 }.  This turns out to be more effective
than the scheme implemented for non-SLP for SSE and only slightly
worse for AVX512 and a bit more worse for AVX2.  It seems to me that
this would extend to other non-power-of-two group-sizes though (but
the patch does not).  Optimal schemes are likely difficult to lay out
in VF agnostic form.

I'll note that while the lowering assumes even/odd extract is
generally available for all vector element sizes (which is probably
a good assumption), it doesn't in any way constrain the other
permutes it generates based on target availability.  Again difficult
to do in a VF agnostic way (but at least currently the vector type
is fixed).

I'll also note that the SLP store side merges lanes in a way
producing three-vector permutes for store group-size of three, so
the testcase uses a store group-size of four.

The patch has a fallback for when there are multi-lane groups
and the resulting permutes to not fit interleaving.  Code
generation is not optimal when this triggers and might be
worse than doing single-lane group interleaving.

The patch handles gaps by representing them with NULL
entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node.
The SLP discovery changes could be elided if we manually build the
load node instead.

SLP load nodes covering enough lanes to not need intermediate
permutes are retained as having a load-permutation and do not
use the single SLP load node for each dataref group.  That's
something we might want to change, making load-permutation
something purely local to SLP discovery (but then SLP discovery
could do part of the lowering).

The patch misses CSEing intermediate generated permutes and
registering them with the bst_map which is possibly required
for SLP pattern detection in some cases - this re-spin of the
patch moves the lowering after SLP pattern detection.

	* tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt.
	(vect_build_slp_tree_2): Likewise.  Release load permutation
	when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's
	no actual permutation in that case.
	(vllp_cmp): New function.
	(vect_lower_load_permutations): Likewise.
	(vect_analyze_slp): Call it.

	* gcc.dg/vect/slp-11a.c: Expect SLP.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-51.c: New testcase.
	* gcc.dg/vect/slp-52.c: New testcase.
2024-09-02 08:49:20 +02:00
Xianmiao Qu
eca320bfe3 [PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.
Currently, in RV32, even with the D extension enabled, the cost of DFmode
register moves is still set to 'COSTS_N_INSNS (2)'. This results in the
'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG
register moves, leading to the generation of many redundant instructions.

As an example, consider the following test case:
  double foo (int t, double a, double b)
  {
    if (t > 0)
      return a;
    else
      return b;
  }

When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is generated:
          .cfi_startproc
          addi    sp,sp,-32
          .cfi_def_cfa_offset 32
          fsd     fa0,8(sp)
          fsd     fa1,16(sp)
          lw      a4,8(sp)
          lw      a5,12(sp)
          lw      a2,16(sp)
          lw      a3,20(sp)
          bgt     a0,zero,.L1
          mv      a4,a2
          mv      a5,a3
  .L1:
          sw      a4,24(sp)
          sw      a5,28(sp)
          fld     fa0,24(sp)
          addi    sp,sp,32
          .cfi_def_cfa_offset 0
          jr      ra
          .cfi_endproc

After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the
generated code is as follows, with a significant reduction in the number
of instructions.
          .cfi_startproc
          ble     a0,zero,.L5
          ret
  .L5:
          fmv.d   fa0,fa1
          ret
          .cfi_endproc

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the
	DFmode register move for RV32.

gcc/testsuite/
	* gcc.target/riscv/rv32-movdf-cost.c: New test.
2024-09-01 22:28:38 -06:00
Jeff Law
0562976d62 [committed][PR rtl-optimization/116544] Fix test for promoted subregs
This is a small bug in the ext-dce code's handling of promoted subregs.

Essentially when we see a promoted subreg we need to make additional bit groups
live as various parts of the RTL path know that an extension of a suitably
promoted subreg can be trivially eliminated.

When I added support for dealing with this quirk I failed to account for the
larger modes properly and it ignored the case when the size of the inner object
was > 32 bits.  Opps.

This does _not_ fix the outstanding x86 issue.  That's caused by something
completely different and more concerning ;(

Bootstrapped and regression tested on x86.  Obviously fixes the testcase on
riscv as well.

Pushing to the trunk.

	PR rtl-optimization/116544
gcc/
	* ext-dce.cc (ext_dce_process_uses): Fix thinko in promoted subreg
	handling.

gcc/testsuite/
	* gcc.dg/torture/pr116544.c: New test.
2024-09-01 22:18:29 -06:00
Levy Hsu
f77435aa39 i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2
gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode
	for int mask cmp.
	* config/i386/sse.md (vec_cmp<mode><avx512fmaskmodelower>): New
	vec_cmp expand for VBF modes.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: New test.
	* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Ditto.
2024-09-02 10:25:35 +08:00
Levy Hsu
e19f65b0be i386: Support vectorized BF16 sqrt with AVX10.2 instruction
gcc/ChangeLog:

	* config/i386/sse.md: Expand VF2H to VF2HB with VBF modes.
2024-09-02 10:24:48 +08:00
Levy Hsu
29ef601973 i386: Support vectorized BF16 smaxmin with AVX10.2 instructions
gcc/ChangeLog:

	* config/i386/sse.md
	(<code><mode>3): New define expand pattern for BF smaxmin.

gcc/testsuite/ChangeLog:
	* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test.
	* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test.
2024-09-02 10:24:47 +08:00
Levy Hsu
6d294fb8ac i386: Support vectorized BF16 FMA with AVX10.2 instructions
gcc/ChangeLog:

	* config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test.
	* gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test.
2024-09-02 10:24:46 +08:00
Levy Hsu
f82fa0da4d i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions
AVX10.2 introduces several non-exception instructions for BF16 vector.
Enable vectorized BF add/sub/mul/div operation by supporting standard
optab for them.

gcc/ChangeLog:

	* config/i386/sse.md (div<mode>3): New expander for BFmode div.
	(VF_BHSD): New mode iterator with vector BFmodes.
	(<insn><mode>3<mask_name><round_name>): Change mode to VF_BHSD.
	(mul<mode>3<mask_name><round_name>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: New test.
	* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Ditto.
2024-09-02 10:24:45 +08:00
Hu, Lin1
3b1decef83 i386: Optimize generate insn for AVX10.2 compare
gcc/ChangeLog:

	* config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to
	support the optimization.
	* config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ.
	* config/i386/i386.md (*cmpx<unord><MODEF:mode>): New define_insn.
	(*cmpx<unord>hf): Ditto.
	* config/i386/predicates.md (ix86_trivial_fp_comparison_operator):
	Add ne/eq.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx10_2-compare-1b.c: New test.
2024-09-02 10:24:36 +08:00
Hu, Lin1
86f5031c80 i386: Optimize ordered and nonequal
Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc
will emit
  ucomiss %xmm1, %xmm0
  movl $1, %ecx
  setp %dl
  setnp %al
  cmovne %ecx, %edx
  andl %edx, %eax
  movzbl %al, %eax

In fact,
  xorl %eax, %eax
  ucomiss %xmm1, %xmm0
  setne %al
is better.

gcc/ChangeLog:

	* match.pd: Optimize (and ordered non-equal) to
	(not (or unordered  equal))

gcc/testsuite/ChangeLog:

	* gcc.target/i386/optimize_one.c: New test.
2024-09-02 10:24:31 +08:00
Haochen Jiang
b1f9fbb6da i386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructions
gcc/ChangeLog:

	* config/i386/sse.md (VI1_AVX512VNNIBW): New.
	(VI2_AVX10_2): Ditto.
	(sdot_prod<mode>): Add AVX10.2
	to auto vectorize and combine 512 bit part.
	(udot_prod<mode>): Ditto.
	(sdot_prodv64qi): Removed.
	(udot_prodv64qi): Ditto.
	(usdot_prod<mode>): Add AVX10.2 to auto vectorize.
	(udot_prod<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vnniint16-auto-vectorize-2.c: Only define
	TEST when not defined.
	* gcc.target/i386/vnniint8-auto-vectorize-2.c: Ditto.
	* gcc.target/i386/vnniint16-auto-vectorize-3.c: New test.
	* gcc.target/i386/vnniint16-auto-vectorize-4.c: Ditto.
	* gcc.target/i386/vnniint8-auto-vectorize-3.c: Ditto.
	* gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.
2024-09-02 10:24:29 +08:00
Pan Li
5239902210 RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 3
This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 3.  Aka:

Form 3:
  #define DEF_SAT_U_TRUC_FMT_3(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x <= max ? (NT)x : (NT) max;    \
  }

QUAD:
DEF_SAT_U_TRUC_FMT_3 (uint16_t, uint64_t)
DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint32_t)

OCT:
DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_u_trunc-16.c: New test.
	* gcc.target/riscv/sat_u_trunc-17.c: New test.
	* gcc.target/riscv/sat_u_trunc-18.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-16.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-17.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-18.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02 09:26:42 +08:00
Pan Li
ea81e21d53 RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 2.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x > max ? (NT) max : (NT)x;     \
  }

QUAD:
DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)

OCT:
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_u_trunc-10.c: New test.
	* gcc.target/riscv/sat_u_trunc-11.c: New test.
	* gcc.target/riscv/sat_u_trunc-12.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-10.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-11.c: New test.
	* gcc.target/riscv/sat_u_trunc-run-12.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02 09:26:37 +08:00
Pan Li
56ed1dfa79 RISC-V: Add testcases for form 4 of unsigned vector .SAT_ADD IMM
This patch would like to add test cases for the unsigned vector .SAT_ADD
when one of the operand is IMM.

Form 4:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_4(T, IMM)                               \
  T __attribute__((noinline))                                               \
  vec_sat_u_add_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)      \
  {                                                                         \
    unsigned i;                                                             \
    T ret;                                                                  \
    for (i = 0; i < limit; i++)                                             \
      {                                                                     \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) == 0 ? ret : -1; \
      }                                                                     \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_4(uint64_t, 123)

The below test are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-13.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-14.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-15.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02 09:25:45 +08:00
Pan Li
72f3e9021e RISC-V: Add testcases for form 3 of unsigned vector .SAT_ADD IMM
This patch would like to add test cases for the unsigned vector .SAT_ADD
when one of the operand is IMM.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 123)

The below test are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02 09:25:39 +08:00
Pan Li
e96d4bf6a6 RISC-V: Refactor gen zero_extend rtx for SAT_* when expand SImode in RV64
In previous, we have some specially handling for both the .SAT_ADD and
.SAT_SUB for unsigned int.  There are similar to take care of SImode
in RV64 for zero extend.  Thus refactor these two helper function
into one for possible code duplication.

The below test suite are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Merge
	the zero_extend handing from func riscv_gen_unsigned_xmode_reg.
	(riscv_gen_unsigned_xmode_reg): Remove.
	(riscv_expand_ussub): Leverage riscv_gen_zero_extend_rtx
	instead of riscv_gen_unsigned_xmode_reg.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_u_sub-11.c: Adjust asm check.
	* gcc.target/riscv/sat_u_sub-15.c: Ditto.
	* gcc.target/riscv/sat_u_sub-19.c: Ditto.
	* gcc.target/riscv/sat_u_sub-23.c: Ditto.
	* gcc.target/riscv/sat_u_sub-27.c: Ditto.
	* gcc.target/riscv/sat_u_sub-3.c: Ditto.
	* gcc.target/riscv/sat_u_sub-31.c: Ditto.
	* gcc.target/riscv/sat_u_sub-35.c: Ditto.
	* gcc.target/riscv/sat_u_sub-39.c: Ditto.
	* gcc.target/riscv/sat_u_sub-43.c: Ditto.
	* gcc.target/riscv/sat_u_sub-47.c: Ditto.
	* gcc.target/riscv/sat_u_sub-7.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-11.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-15.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-3.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-7.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto.
	* gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02 09:24:44 +08:00
GCC Administrator
880834d3e7 Daily bump. 2024-09-02 00:16:51 +00:00
Andrew Pinski
592a335de5 slsr: Use simple_dce_from_worklist in SLSR [PR116554]
While working on a phiopt patch, it was noticed that
SLSR would leave around some unused ssa names. Let's
add simple_dce_from_worklist usage to SLSR to remove
the dead statements. This should give a small improvemnent
for passes afterwards.

Boostrapped and tested on x86_64.

gcc/ChangeLog:

	PR tree-optimization/116554
	* gimple-ssa-strength-reduction.cc: Include tree-ssa-dce.h.
	(replace_mult_candidate): Add sdce_worklist argument, mark
	the rhs1/rhs2 for maybe dceing.
	(replace_unconditional_candidate): Add sdce_worklist argument,
	Update call to replace_mult_candidate.
	(replace_conditional_candidate): Add sdce_worklist argument,
	update call to replace_mult_candidate.
	(replace_uncond_cands_and_profitable_phis): Add sdce_worklist argument,
	update call to replace_conditional_candidate,
	replace_unconditional_candidate, and replace_uncond_cands_and_profitable_phis.
	(replace_one_candidate): Add sdce_worklist argument, mark
	the orig_rhs1/orig_rhs2 for maybe dceing.
	(replace_profitable_candidates): Add sdce_worklist argument,
	update call to replace_one_candidate and replace_profitable_candidates.
	(analyze_candidates_and_replace): Call simple_dce_from_worklist and
	update calls to replace_profitable_candidates, and
	replace_uncond_cands_and_profitable_phis.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-31 23:42:59 -07:00
Hans-Peter Nilsson
f22788c7c0 testsuite: Prune compilation messages for modules tests
All testsuite compiler-calls pass default_target_compile in the
dejagnu installation (typically /usr/share/dejagnu/target.exp) which
also calls the dejagnu-installed prune_warnings.

Normally, tests using the dg framework (most or all tests these days)
compile and link by calling various wrappers that end up calling
dg-test in the dejagnu installation, typically installed as
/usr/share/dejagnu/dg.exp.  That, besides the compiler call, also
calls ${tool}-dg-prune (g++-dg-prune) on the messages, which in turn
ends up calling prune_gcc_output in gcc/testsuite/lib/prune.exp.  That
gcc-specific "pruning" function handles more cases than the dejagnu
prune_warnings, and also has updated patterns.

But, module_do_it in modules.exp calls the lower-level
${tool}_target_compile "directly", i.e. g++_target_compile defined in
gcc/testsuite/lib/g++.exp.  That does not call ${tool}-dg-prune,
meaning those test-cases miss the gcc-specific pruning.

Noticed while testing a dejagnu update that handled the miniscule "in"
in the warning (line-breaks added below besides the original one after
"(void*)':")

"/path/to/cris-elf/bin/ld:
/gccobj/cris-elf/./libstdc++-v3/src/.libs/libstdc++.a(random.o): in
function `std::(anonymous namespace)::__libc_getentropy(void*)':
/gccsrc/libstdc++-v3/src/c++11/random.cc:183: warning: _getentropy is
not implemented and will always fail"

The line saying "in function" rather than "In function" (from the
binutils linker since 2018) is pruned by prune_gcc_output. The
prune_warnings in dejagnu-1.6.3 and earlier handles the second line
separately.  It's an unfortunate wart that neither consumes the
delimiting line-break, leaving to the callers to prune residual empty
lines.  See prune_warnings in dejagnu (default_target_compile and
dg-test) for those other line-break fixups, as alluded in the comment.

	* g++.dg/modules/modules.exp (module_do_it): Prune compilation
	messages.
2024-09-01 02:27:56 +02:00
GCC Administrator
49fd9b33bd Daily bump. 2024-09-01 00:25:25 +00:00
Roger Sayle
bac00c3422 i386: Support read-modify-write memory operands in STV.
This patch enables STV when the first operand of a TImode binary
logic operand (AND, IOR or XOR) is a memory operand, which is commonly
the case with read-modify-write instructions.

A different motivating example from the one given previously is:

__int128 m, p, q;
void foo() {
    m ^= (p & q);
}

Currently with -O2 -mavx the RMW instructions are rejected by STV,
resulting in scalar code:

foo:	movq    p(%rip), %rax
        movq    p+8(%rip), %rdx
        andq    q(%rip), %rax
        andq    q+8(%rip), %rdx
        xorq    %rax, m(%rip)
        xorq    %rdx, m+8(%rip)
        ret

With this patch they become scalar-to-vector candidates:

foo:	vmovdqa p(%rip), %xmm0
        vpand   q(%rip), %xmm0, %xmm0
        vpxor   m(%rip), %xmm0, %xmm0
        vmovdqa %xmm0, m(%rip)
        ret

2024-08-31  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-features.cc (timode_scalar_to_vector_candidate_p):
	Support the first operand of AND, IOR and XOR being MEM_P, i.e. a
	read-modify-write insn.

gcc/testsuite/ChangeLog
	* gcc.target/i386/movti-2.c: Change dg-options to -Os.
	* gcc.target/i386/movti-4.c: Expected output of original movti-2.c.
2024-08-31 14:19:33 -06:00
Andrew Pinski
2ac27bd503 libobjc: Add cast to void* to disable warning for casting between incompatible function types [PR89586]
Even though __objc_get_forward_imp returns an IMP type, it will be casted to a compatable function
type before calling it. So we adding a cast to `void*` will disable warning about the incompatible type.

Pushed after bootstrap/test on x86_64.

libobjc/ChangeLog:

	PR libobjc/89586
	* sendmsg.c (__objc_get_forward_imp): Add cast to `void*` before casting to IMP.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-31 12:04:22 -07:00
Georg-Johann Lay
df89afbb77 AVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg.
gcc/
	* config/avr/avr-passes.cc (avr_pass_fuse_add) <clone>: Override.
	* config/avr/avr-passes.def (avr_pass_fuse_add): Run again
	after pass_cprop_hardreg.
2024-08-31 20:30:40 +02:00
Georg-Johann Lay
60fc5501dd AVR: Tidy pass avr-fuse-add.
gcc/
	* config/avr/avr-protos.h (avr_split_tiny_move): Rename to
	avr_split_fake_addressing_move.
	* config/avr/avr-passes.def: Same.
	* config/avr/avr-passes.cc: Same.
	(avr_pass_data_fuse_add) <tv_id>: Set to TV_MACH_DEP.
	* config/avr/avr.md (split-lpmx): Remove a define_split.  Such
	splits are performed by avr_split_fake_addressing_move.
2024-08-31 20:28:27 +02:00
Iain Sandoe
7f27d1f1b9 testsuite, c++, coroutines: Avoid 'unused' warnings [NFC].
The 'torture' section of the coroutine tests is primarily about checking
correct operation of the generated code.  It should, ideally, be possible
to run this part of the testsuite with '-Wall' and expect no fails.  In
the case that we wish to test for a specific diagnostic (and that it does
not appear over a range of optimisation/debug conditions) then we should
make that explict (as done, for example, in pr109867.C).

The tests amended here have warnings because of unused entities; in many
cases those are relevant to the test, and so we just mark them with
__attribute__((__unused__)).

We amend the debug output in coro.h to avoid similar warnings when print
output is disabled (the default).

gcc/testsuite/ChangeLog:

	* g++.dg/coroutines/coro.h: Use a variadic macro for PRINTF to
	avoid unused warnings when output is disabled.
	* g++.dg/coroutines/torture/co-await-04-control-flow.C: Avoid
	unused warnings.
	* g++.dg/coroutines/torture/co-ret-13-template-2.C: Likewise.
	* g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C: Likewise.
	* g++.dg/coroutines/torture/local-var-04-hiding-nested-scopes.C:
	Likewise.
	* g++.dg/coroutines/torture/pr109867.C: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-31 17:33:31 +01:00