When making patch to replace TARGET_P8_VECTOR, I noticed
for *eqv<BOOL_128:mode>3_internal1 unlike the other logical
operations, we only exploited the vsx version. I think it
is an oversight, this patch is to consider veqv as well.
gcc/ChangeLog:
* config/rs6000/rs6000.md (*eqv<BOOL_128:mode>3_internal1): Generate
insn veqv if TARGET_ALTIVEC and operands are altivec_register_operand.
When working to get rid of mask bit OPTION_MASK_P8_VECTOR,
I noticed that the check on ISA_3_0_MASKS_IEEE is actually
to check TARGET_P9_VECTOR, since we check all three mask
bits together and p9 vector guarantees p8 vector and vsx
should be enabled. So this patch is to adjust this first
as preparatory patch for the following patch to change
all uses of OPTION_MASK_P8_VECTOR and TARGET_P8_VECTOR.
gcc/ChangeLog:
* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_IEEE): Remove.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
ISA_3_0_MASKS_IEEE check with TARGET_P9_VECTOR.
When I was making a patch to rework TARGET_P8_VECTOR, I
noticed that there are some redundant checks and dead code
related to TARGET_DIRECT_MOVE, so I made this patch as one
separated preparatory patch, it consists of:
- Check either TARGET_DIRECT_MOVE or TARGET_P8_VECTOR only
according to the context, rather than checking both of
them since they are actually the same (TARGET_DIRECT_MOVE
is defined as TARGET_P8_VECTOR).
- Simplify TARGET_VSX && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE since direct move ensures VSX enabled.
- Replace some TARGET_POWERPC64 && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE_64BIT to simplify it.
- Remove some dead code guarded with TARGET_DIRECT_MOVE
but the condition never holds here.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Simplify
TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_P8_VECTOR.
(rs6000_output_move_128bit): Simplify TARGET_VSX && TARGET_DIRECT_MOVE
as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.h (TARGET_XSCVDPSPN): Simplify conditions
TARGET_DIRECT_MOVE || TARGET_P8_VECTOR as TARGET_P8_VECTOR.
(TARGET_XSCVSPDPN): Likewise.
(TARGET_DIRECT_MOVE_128): Simplify TARGET_DIRECT_MOVE &&
TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
(TARGET_VEXTRACTUB): Likewise.
(TARGET_DIRECT_MOVE_64BIT): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.md (signbit<mode>2, @signbit<mode>2_dm,
*signbit<mode>2_dm_mem, floatsi<mode>2_lfiwax,
floatsi<SFDF:mode>2_lfiwax_<QHI:mode>_mem_zext,
floatunssi<mode>2_lfiwzx, float<QHI:mode><SFDF:mode>2,
*float<QHI:mode><SFDF:mode>2_internal, floatuns<QHI:mode><SFDF:mode>2,
*floatuns<QHI:mode><SFDF:mode>2_internal, p8_mtvsrd_v16qidi2,
p8_mtvsrd_df, p8_xxpermdi_<mode>, reload_vsx_from_gpr<mode>,
p8_mtvsrd_sf, reload_vsx_from_gprsf, p8_mfvsrd_3_<mode>,
reload_gpr_from_vsx<mode>, reload_gpr_from_vsxsf, unpack<mode>_dm):
Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as
TARGET_DIRECT_MOVE_64BIT.
(unpack<mode>_nodm): Simplify !TARGET_DIRECT_MOVE || !TARGET_POWERPC64
as !TARGET_DIRECT_MOVE_64BIT.
(fix_trunc<mode>si2, fix_trunc<mode>si2_stfiwx,
fix_trunc<mode>si2_internal): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
(fix_trunc<mode>si2_stfiwx, fixuns_trunc<mode>si2_stfiwx): Remove some
dead code as the guard TARGET_DIRECT_MOVE there never holds.
(fixuns_trunc<mode>si2_stfiwx): Change TARGET_P8_VECTOR with
TARGET_DIRECT_MOVE which is a better fit.
* config/rs6000/vsx.md (define_peephole2 for SFmode in GPR): Simplify
TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr69175.C: Added option "-mcpu=unset".
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.
gcc/testsuite/ChangeLog:
* g++.dg/ext/pr57735.C: Use effective-target arm_cpu_xscale_arm.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.
gcc/testsuite/ChangeLog:
* g++.target/arm/mve/general-c++/nomve_fp_1.c: Added option
"-mcpu=unset".
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.
gcc/testsuite/ChangeLog:
* gcc.target/arm/vect-early-break-cbranch.c: Use
effective-target arm_arch_v8a_hard.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.
gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/crc_hf_1.c: Use effective-target
arm_arch_v8a_crc_hard.
* lib/target-supports.exp: Define effective-target
arm_arch_v8a_crc_hard.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
While testing future 64-bit location_t support, I ran into an
-fcompare-debug issue that was traced back here. Despite the name,
next_discriminator_for_locus() is meant to take an integer line number
argument, not a location_t. There is one call site which has been passing a
location_t instead. For the most part that is harmless, although in case
there are two CALL stmts on the same line with different location_t, it may
fail to generate a unique discriminator where it should. If/when location_t
changes to be 64-bit, however, it will produce an -fcompare-debug
failure. Fix it by passing the line number rather than the location_t.
I am not aware of a testcase that demonstrates any observable wrong
behavior, but the file debug/pr53466.C is an example where the discriminator
assignment is indeed different before and after this change.
gcc/ChangeLog:
* tree-cfg.cc (assign_discriminators): Fix incorrect value passed to
next_discriminator_for_locus().
Bump libgm2 version ready for the gcc-15 release.
libgm2/ChangeLog:
PR modula2/117703
* configure: Regenerate.
* configure.ac (libtool_VERSION): Bump to 20:0:0.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
When a symbol was use-associated in the ancestor of a submodule, a
PROTECTED attribute was ignored in the submodule or its descendants.
Find the real ancestor of symbols when used in a variable definition
context in a submodule.
PR fortran/83135
gcc/fortran/ChangeLog:
* expr.cc (sym_is_from_ancestor): New helper function.
(gfc_check_vardef_context): Refine checking of PROTECTED attribute
of symbols that are indirectly use-associated in a submodule.
gcc/testsuite/ChangeLog:
* gfortran.dg/protected_10.f90: New test.
As reported in bug 114266, GCC fails to pedwarn for a compound
literal, whose type is an array of unknown size, initialized with an
empty initializer. This case is disallowed by C23 (which doesn't have
zero-size objects); the case of a named object is diagnosed as
expected, but not that for compound literals. (Before C23, the
pedwarn for empty initializers sufficed.) Add a check for this
specific case with a pedwarn.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/114266
gcc/c/
* c-decl.cc (build_compound_literal): Diagnose array of unknown
size with empty initializer for C23.
gcc/testsuite/
* gcc.dg/c23-empty-init-4.c: New test.
On i686 PR116587 test compilation resulted in LRA failure to find
registers for a reload insn pseudo. The insn requires 6 regs for 4
reload insn pseudos where two of them require 2 regs each. But we
have only 5 free regs as sp is a fixed reg, bp is fixed because of
-fno-omit-frame-pointer, bx is assigned to pic_offset_table_pseudo
because of -fPIC. LRA spills pic_offset_table_pseudo as the last
chance approach to allocate registers to the reload pseudo. Although
it makes 2 free registers for the unallocated reload pseudo requiring
also 2 regs, the pseudo still can not be allocated as the 2 free regs
are disjoint. The patch spills all pseudos conflicting with the
unallocated reload pseudo including already allocated reload insn
pseudos, then standard LRA code allocates spilled pseudos requiring
more one register first and avoid situation of the disjoint regs for
reload pseudos requiring more one reg.
gcc/ChangeLog:
PR target/116587
* lra-assigns.cc (find_all_spills_for): Consider all pseudos whose
classes intersect given pseudo class.
gcc/testsuite/ChangeLog:
PR target/116587
* gcc.target/i386/pr116587.c: New test.
gcc/jit/ChangeLog:
PR jit/108762
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_32): New ABI tag.
* docs/topics/functions.rst: Add documentation for the function
gcc_jit_context_get_target_builtin_function.
* dummy-frontend.cc: Include headers target.h, jit-recording.h,
print-tree.h, unordered_map and string, new variables (target_builtins,
target_function_types, and target_builtins_ctxt), new function
(tree_type_to_jit_type).
* jit-builtins.cc: Specify that the function types are not from
target builtins.
* jit-playback.cc: New argument is_target_builtin to new_function.
* jit-playback.h: New argument is_target_builtin to
new_function.
* jit-recording.cc: New argument is_target_builtin to
new_function_type, function_type constructor and function
constructor, new function
(get_target_builtin_function).
* jit-recording.h: Include headers string and unordered_map, new
variable target_function_types, new argument is_target_builtin
to new_function_type, function_type and function, new functions
(get_target_builtin_function, copy).
* libgccjit.cc: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.h: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.map: New functions
(gcc_jit_context_get_target_builtin_function).
gcc/testsuite:
PR jit/108762
* jit.dg/all-non-failing-tests.h: New test test-target-builtins.c.
* jit.dg/test-target-builtins.c: New test.
This fixes a few aarch64 specific testcases after the move to default to GNU C23.
For the SME testcases, the GNU C23 cases as `()` changing to mean `(void)` instead
of a non-prototype declaration; the non-prototype declaration merging was confusing
some of the time so the updated way is the expected way even for that.
For pic-*.c `-Wno-old-style-definition` was added not to warn about old style definitions.
For pr113573.c, I added `-std=gnu17` since I was not sure if `(...)` with C23 would invoke
the same issue.
tested for aarch64-linux-gnu.
PR testsuite/117680
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pic-constantpool1.c: Add -Wno-old-style-definition.
* gcc.target/aarch64/pic-symrefplus.c: Likewise.
* gcc.target/aarch64/pr113573.c: Add `-std=gnu17`
* gcc.target/aarch64/sme/streaming_mode_1.c: Correct testcase.
* gcc.target/aarch64/sme/za_state_1.c: Likewise.
* gcc.target/aarch64/sme/za_state_2.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
reuse_rtx is not documented nor the format to use it is ever documented.
So it should not be supported for the .md files.
This also fixes the problem if an invalid index is supplied for reuse_rtx,
instead of ICEing, put out a real error message. Note since this code
still uses atoi, an invalid index can still be used in some cases but that is
recorded as part of PR 44574.
Note I did a grep of the sources to make sure that this was only used for
the read rtl in the GCC rather than while reading in .md files.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* read-md.h (class rtx_reader): Don't include m_reuse_rtx_by_id
when GENERATOR_FILE is defined.
* read-rtl.cc (rtx_reader::read_rtx_code): Disable reuse_rtx
support when GENERATOR_FILE is defined.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
RISC-V vector currently does not support big endian so the postcommit
was getting the sorry, not implemented error on vector targets. Restrict
the testcase to non-vector targets
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117595.c: Restrict to non vector targets.
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
When diverting to VMAT_GATHER_SCATTER we fail to zero *poffset
which was previously set if a load was classified as
VMAT_CONTIGUOUS_REVERSE. The following refactors
get_group_load_store_type a bit to avoid this but this all needs
some serious TLC.
PR tree-optimization/117709
* tree-vect-stmts.cc (get_group_load_store_type): Only
set *poffset when we end up with VMAT_CONTIGUOUS_DOWN
or VMAT_CONTIGUOUS_REVERSE.
When SLP vectorizing we fail to mark the general alignment check
as irrelevant when using VMAT_STRIDED_SLP (the implementation checks
for itself) and when VMAT_INVARIANT the override isn't effective.
This results in extra FAILs on sparc which the following fixes.
PR tree-optimization/117698
* tree-vect-stmts.cc (get_group_load_store_type): Properly
disregard alignment for VMAT_STRIDED_SLP and VMAT_INVARIANT.
(vectorizable_load): Adjust guard for dumping whether we
vectorize and unaligned access.
(vectorizable_store): Likewise.
gcc/jit/ChangeLog:
* docs/topics/contexts.rst: Add documentation for new option.
* jit-recording.cc (recording::context::get_str_option): New
method.
* jit-recording.h (get_str_option): New method.
* libgccjit.cc (gcc_jit_context_new_function): Allow special
characters in function names.
* libgccjit.h (enum gcc_jit_str_option): New option.
gcc/testsuite/ChangeLog:
* jit.dg/test-special-chars.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/declare-variant-dup-match-clause.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
* c-c++-common/gomp/dispatch-8.c: New test.
* c-c++-common/gomp/dispatch-9.c: New test.
* c-c++-common/gomp/dispatch-10.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/dispatch-1.c: New test.
* testsuite/libgomp.c-c++-common/dispatch-2.c: New test.
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.
Additional C/C++ common testcases are provided in a subsequent patch in the
series.
gcc/cp/ChangeLog:
* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add 'tree parms' parameter. Update
call to cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
(tsubst_stmt): Handle OMP_DISPATCH.
(tsubst_expr): Handle IFN_GOMP_DISPATCH.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/adjust-args-3.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
* g++.dg/gomp/dispatch-3.C: New test.
* g++.dg/gomp/dispatch-4.C: New test.
* g++.dg/gomp/dispatch-5.C: New test.
* g++.dg/gomp/dispatch-6.C: New test.
* g++.dg/gomp/dispatch-7.C: New test.
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.
Additional common C/C++ testcases are in a later patch in the series.
gcc/c-family/ChangeLog:
* c-attribs.cc (c_common_gnu_attributes): Add attribute for adjust_args
need_device_ptr.
* c-omp.cc (c_omp_directives): Uncomment dispatch.
* c-pragma.cc (omp_pragmas): Add dispatch.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_DISPATCH.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_NOCONTEXT and
PRAGMA_OMP_CLAUSE_NOVARIANTS.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
* gcc.dg/gomp/dispatch-2.c: New test.
* gcc.dg/gomp/dispatch-3.c: New test.
* gcc.dg/gomp/dispatch-4.c: New test.
* gcc.dg/gomp/dispatch-5.c: New test.
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `omp_get_mapped_ptr` for the adequate device.
For dispatch, the following steps are performed:
* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.
* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.
gcc/ChangeLog:
* builtins.cc (builtin_fnspec): Handle BUILT_IN_OMP_GET_MAPPED_PTR.
* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH.
* gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_dispatch): New function.
(gimple_copy): Handle GIMPLE_OMP_DISPATCH.
* gimple.def (GIMPLE_OMP_DISPATCH): Define.
* gimple.h (gimple_build_omp_dispatch): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH.
(gimple_omp_dispatch_clauses): New function.
(gimple_omp_dispatch_clauses_ptr): Likewise.
(gimple_omp_dispatch_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH.
* gimplify.cc (enum omp_region_type): Add ORT_DISPATCH.
(struct gimplify_omp_ctx): Add in_call_args.
(gimplify_call_expr): Handle need_device_ptr arguments.
(is_gimple_stmt): Handle OMP_DISPATCH.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch
construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(omp_has_novariants): New function.
(omp_has_nocontext): Likewise.
(omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext
clause.
(find_ifn_gomp_dispatch): New function.
(gimplify_omp_dispatch): Likewise.
(gimplify_expr): Handle OMP_DISPATCH.
* gimplify.h (omp_has_novariants): Declare.
* internal-fn.cc (expand_GOMP_DISPATCH): New function.
* internal-fn.def (GOMP_DISPATCH): Define.
* omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define.
(BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define.
(BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define.
* omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH.
(struct omp_ts_info): Add dispatch.
(omp_resolve_declare_variant): Handle novariants. Adjust
DECL_ASSEMBLER_NAME.
* omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_DISPATCH.
(lower_omp_dispatch): New function.
(lower_omp_1): Call it.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_DISPATCH.
(estimate_num_insns): Handle GIMPLE_OMP_DISPATCH.
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.
gcc/fortran/ChangeLog:
* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
This patch adds support for the SVE_B16B16 extension, which provides
non-widening BF16 versions of existing instructions.
Mostly it's just a simple extension of iterators. The main
complications are:
(1) The new instructions have no immediate forms. This is easy to
handle for the cond_* patterns (the ones that have an explicit
else value) since those are already divided into register and
non-register versions. All we need to do is tighten the predicates.
However, the @aarch64_pred_<optab><mode> patterns handle the
immediates directly. Rather than complicate them further,
it seemed best to add a single @aarch64_pred_<optab><mode> for
all BF16 arithmetic.
(2) There is no BFSUBR, so the usual method of handling reversed
operands breaks down. The patch deals with this using some
new attributes that together disable the "BFSUBR" alternative.
(3) Similarly, there are no BFMAD or BFMSB instructions, so we need
to disable those forms in the BFMLA and BFMLS patterns.
The patch includes support for generic bf16 vectors too.
It would be possible to use these instructions for scalars, as with
the recent FLOGB patch, but that's left as future work.
gcc/
* config/aarch64/aarch64-option-extensions.def
(sve-b16b16): New extension.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64.h (TARGET_SME_B16B16, TARGET_SVE2_OR_SME2)
(TARGET_SSVE_B16B16): New macros.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Conditionally define __ARM_FEATURE_SVE_B16B16
* config/aarch64/aarch64-sve-builtins-sve2.def: Add AARCH64_FL_SVE2
to the SVE2p1 requirements. Add SVE_B16B16 forms of existing
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc (type_suffixes): Treat
bfloat as a floating-point type.
(TYPES_h_bfloat): New macro.
* config/aarch64/aarch64.md (is_bf16, is_rev, supports_bf16_rev)
(mode_enabled): New attributes.
(enabled): Test mode_enabled.
* config/aarch64/iterators.md (SVE_FULL_F_BF): New mode iterator.
(SVE_CLAMP_F): Likewise.
(SVE_Fx24): Add BF16 modes when TARGET_SSVE_B16B16.
(sve_lane_con): Handle BF16 modes.
(b): Handle SF and DF modes.
(is_bf16): New mode attribute.
(supports_bf16, supports_bf16_rev): New int attributes.
* config/aarch64/predicates.md
(aarch64_sve_float_maxmin_immediate): Reject BF16 modes.
* config/aarch64/aarch64-sve.md
(*post_ra_<sve_fp_op><mode>3): Add BF16 support, and likewise
for the associated define_split.
(<optab:SVE_COND_FP_BINARY_OPTAB><mode>): Add BF16 support.
(@cond_<optab:SVE_COND_FP_BINARY><mode>): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_2_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_2_strict): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_3_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_3_strict): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_any_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_BINARY><mode>_any_strict): Likewise.
(@aarch64_mul_lane_<mode>): Likewise.
(<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(@aarch64_pred_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(@cond_<optab:SVE_COND_FP_TERNARY><mode>): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_4_strict): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_relaxed): Likewise.
(*cond_<optab:SVE_COND_FP_TERNARY><mode>_any_strict): Likewise.
(@aarch64_<optab:SVE_FP_TERNARY_LANE>_lane_<mode>): Likewise.
* config/aarch64/aarch64-sve2.md
(@aarch64_pred_<optab:SVE_COND_FP_BINARY><mode>): Define BF16 version.
(@aarch64_sve_fclamp<mode>): Add BF16 support.
(*aarch64_sve_fclamp<mode>_x): Likewise.
(*aarch64_sve_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
(*aarch64_sve_single_<maxmin_uns_op><SVE_Fx24:mode>): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Return false for BF16 modes.
gcc/testsuite/
* lib/target-supports.exp: Test the assembler for sve-b16b16 support.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test the new B16B16
macros.
* gcc.target/aarch64/sve/fmad_1.c: Test bfloat16 too.
* gcc.target/aarch64/sve/fmla_1.c: Likewise.
* gcc.target/aarch64/sve/fmls_1.c: Likewise.
* gcc.target/aarch64/sve/fmsb_1.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_9.c: New test.
* gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/clamp_bf16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/max_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/max_bf16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/maxnm_bf16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/min_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/min_bf16_x4.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/minnm_bf16_x4.c: Likewise.
* gcc.target/aarch64/sve/bf16_arith_1.c: Likewise.
* gcc.target/aarch64/sve/bf16_arith_1.h: Likewise.
* gcc.target/aarch64/sve/bf16_arith_2.c: Likewise.
* gcc.target/aarch64/sve/bf16_arith_3.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/add_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/clamp_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/max_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/maxnm_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/min_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/minnm_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mla_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mla_lane_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mls_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mls_lane_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mul_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mul_lane_bf16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sub_bf16.c: Likewise.
@aarch64_sme_write<mode> and *aarch64_sme_write<mode>_plus
were using UNSPEC_SME_READ instead of UNSPEC_SME_WRITE.
gcc/
* config/aarch64/aarch64-sme.md (@aarch64_sme_write<mode>)
(*aarch64_sme_write<mode>_plus): Use UNSPEC_SME_WRITE instead
of UNSPEC_SME_READ.
This patch just renames the iterators SME_READ and SME_WRITE to
SME_READ_HV and SME_WRITE_HV, to distinguish them from other forms
of ZA read and write.
gcc/
* config/aarch64/iterators.md (SME_READ): Rename to...
(SME_READ_HV): ...this.
(SME_WRITE): Rename to...
(SME_WRITE_HV): ...this.
* config/aarch64/aarch64-sme.md: Update accordingly.
There are separate patterns for predicated FADD, FSUB, and FMUL.
Previously they each had their own in-built split to convert the
instruction to unpredicated form where appropriate. However, it's
more convenient for later patches if we use a single separate split
instead.
gcc/
* config/aarch64/iterators.md (SVE_COND_FP): New code attribute.
* config/aarch64/aarch64-sve.md: Use a single define_split to
handle the conversion of predicated FADD, FSUB, and FMUL into
unpredicated forms.
Many of the SME ZA intrinsics have two type suffixes: one for ZA
and one for the vectors. The ZA suffix only conveys an element
size, while the vector suffix conveys both an element type and
an element size. Internally, the ZA suffix maps to an integer mode;
e.g. za32 maps to VNx4SI.
For SME2, it was relatively convenient to use the modes associated
with both suffixes directly. For example, the (non-widening) FMLA
intrinsics used SME_ZA_SDF_I to iterate over the possible ZA modes,
used SME_ZA_SDFx24 to iterate over the possible vector tuple modes,
and used a C++ condition to make sure that the element sizes agree.
However, for later patches it's more convenient to rely only on
the vector mode in cases where the ZA and vector element sizes
are the same. This means splitting the widening MOPA/S patterns
from the non-widening ones, but otherwise it's not a big change.
gcc/
* config/aarch64/iterators.md (SME_ZA_SDF_I): Delete.
(SME_MOP_HSDF): Replace with...
(SME_MOP_SDF): ...this.
* config/aarch64/aarch64-sme.md: Change the non-widening FMLA and
FMLS patterns so that both mode parameters are the same, rather than
using both SME_ZA_SDF_I and SME_ZA_SDFx24 and checking that their
element sizes are the same. Split the FMOPA and FMOPS patterns
into separate non-widening and widening forms, then update the
non-widening forms in a similar way to FMLA and FMLS.
* config/aarch64/aarch64-sve-builtins-functions.h
(sme_2mode_function_t::expand): If the two type suffixes have the same
element size, use the vector tuple mode for both mode parameters.
Evaluate the BACK argument of MINLOC/MAXLOC once before the
scalarization loops in the case where the DIM argument is present.
This is a follow-up to r15-1994-ga55d24b3cf7f4d07492bb8e6fcee557175b47ea3
which added knowledge of BACK to the scalarizer, to
r15-2701-ga10436a8404ad2f0cc5aa4d6a0cc850abe5ef49e which removed it to
handle it out of scalarization instead, and to more immediate previous
patches that added inlining support for MINLOC/MAXLOC with DIM. The
inlining support for MINLOC/MAXLOC with DIM introduced nested loops, which
made the evaluation of BACK (removed from the scalarizer knowledge by the
forementionned commit) wrapped in a loop, so possibly executed more than
once. This change adds BACK to the scalarization chain if MINLOC/MAXLOC
will use nested loops, so that it is evaluated by the scalarizer only once
before the outermost loop in that case.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc
(walk_inline_intrinsic_minmaxloc): Add a scalar element for BACK as
first item of the chain if BACK is present and there will be nested
loops.
(gfc_conv_intrinsic_minmaxloc): Evaluate BACK using an inherited
scalarization chain if there is a nested loop.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_8.f90: New test.
* gfortran.dg/minloc_9.f90: New test.