Commit Graph

1983 Commits

Author SHA1 Message Date
Thomas Schwinge
7ab75a6e6d Fix 'libgomp.fortran/reverse-offload-6.f90' nvptx offloading compilation
Fix-up for recent commit 0b1ce70a81
"libgomp: Fix reverse offload issues".

	libgomp/
	* testsuite/libgomp.fortran/reverse-offload-6.f90: Fix nvptx
	offloading compilation.
2023-02-07 23:44:33 +01:00
GCC Administrator
49e52115b0 Daily bump. 2023-02-04 00:16:24 +00:00
Tobias Burnus
0b1ce70a81 libgomp: Fix reverse offload issues
If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.

Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.

libgomp/
	* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
	freeing not allocated memory.
	* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
2023-02-03 11:31:53 +01:00
Tobias Burnus
f84fdb134d libgomp: enable reverse offload for AMDGCN
libgomp/ChangeLog:

	* libgomp.texi (5.0 Impl. Status, gcn specifics): Update for
	reverse offload.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept
	reverse-offload requirement.
2023-02-03 08:33:17 +01:00
GCC Administrator
a37a0cb303 Daily bump. 2023-02-03 00:16:44 +00:00
Andrew Stubbs
f6fff8a6fc amdgcn, libgomp: Manually allocated stacks
Switch from using stacks in the "private segment" to using a memory block
allocated on the host side.  The primary reason is to permit the reverse
offload implementation to access values located on the device stack, but
there may also be performance benefits, especially with repeated kernel
invocations.

This implementation unifies the stacks with the "team arena" optimization
feature, and now allows both to have run-time configurable sizes.

A new ABI is needed, so all libraries must be rebuilt, and newlib must be
version 4.3.0.20230120 or newer.

gcc/ChangeLog:

	* config/gcn/gcn-run.cc: Include libgomp-gcn.h.
	(struct kernargs): Replace the common content with kernargs_abi.
	(struct heap): Delete.
	(main): Read GCN_STACK_SIZE envvar.
	Allocate space for the device stacks.
	Write the new kernargs fields.
	* config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt.
	(default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and
	PRIVATE_SEGMENT_WAVE_OFFSET_ARG.
	(gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content.
	(gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top.
	Set up the stacks from the values in the kernargs, not private.
	(gcn_expand_builtin_1): Match the stack configuration in the prologue.
	(gcn_hsa_declare_function_name): Turn off the private segment.
	(gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed.
	* config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register.
	* config/gcn/gcn.opt (mstack-size): Change the description.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define.
	(DEFAULT_TEAM_ARENA_SIZE): New define.
	(struct heap): Move to this file.
	(struct kernargs_abi): Likewise.
	* config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from
	the kernargs.
	* libgomp.h: Include libgomp-gcn.h.
	(TEAM_ARENA_SIZE): Remove.
	(team_malloc): Update the error message.
	* plugin/plugin-gcn.c (struct kernargs): Move common content to
	struct kernargs_abi.
	(struct agent_info): Rename team arenas to ephemeral memories.
	(struct team_arena_list): Rename ....
	(struct ephemeral_memories_list): to this.
	(struct heap): Delete.
	(team_arena_size): New variable.
	(stack_size): New variable.
	(print_kernel_dispatch): Update debug messages.
	(init_environment_variables): Read GCN_TEAM_ARENA_SIZE.
	Read GCN_STACK_SIZE.
	(get_team_arena): Rename ...
	(configure_ephemeral_memories): ... to this, and set up stacks.
	(release_team_arena): Rename ...
	(release_ephemeral_memories): ... to this.
	(destroy_team_arenas): Rename ...
	(destroy_ephemeral_memories): ... to this.
	(create_kernel_dispatch): Add num_threads parameter.
	Adjust for kernargs_abi refactor and ephemeral memories.
	(release_kernel_dispatch): Adjust for ephemeral memories.
	(run_kernel): Pass thread-count to create_kernel_dispatch.
	(GOMP_OFFLOAD_init_device): Adjust for ephemeral memories.
	(GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr47237.c: Xfail on amdgcn.
	* gcc.dg/builtin-apply3.c: Xfail for amdgcn.
	* gcc.dg/builtin-apply4.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn.
	* gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-02-02 11:47:03 +00:00
Tobias Burnus
8da7476c5f libgomp.texi (OpenMP TR11 impl. status): Fix 'strict' item
Fix the 'strict' modifier status: it is already listed (as 'Y') for OpenMP
5.1 for num_task and grainsize; only strict on num_threads is new with TR11.

libgomp/
	* libgomp.texi (OpenMP TR11): Fix item for 'strict' modifier.
2023-02-02 12:05:58 +01:00
GCC Administrator
0a251e7497 Daily bump. 2023-02-02 00:17:43 +00:00
Tobias Burnus
bf2cf6f3f1 Fortran: Extend align-clause checks of OpenMP's allocate directive
gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_clauses): Check also for
	power of two.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-3.f90: Fix ALIGN
	usage, remove unused -fdump-tree-original.
	* testsuite/libgomp.fortran/allocate-4.f90: New.
2023-02-01 14:51:00 +01:00
Tobias Burnus
eda38850a7 libgomp.texi: Reverse-offload updates
libgomp/
	* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
	(GCN): Add item about 'omp requires'.
	(nvptx): Likewise; add item about reverse offload.
2023-02-01 12:19:27 +01:00
GCC Administrator
338eb0f04b Daily bump. 2023-01-28 00:16:39 +00:00
Tobias Burnus
2325c8920b OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]
gcc/fortran/ChangeLog:

	PR fortran/108558
	* trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.

libgomp/ChangeLog:

	PR fortran/108558
	* testsuite/libgomp.fortran/has_device_addr.f90: New test.
2023-01-27 11:33:46 +01:00
GCC Administrator
607f278a35 Daily bump. 2023-01-24 00:17:23 +00:00
Tobias Burnus
20552407ae libgomp.texi: Impl. status - non-rect loop nest only partial
libgomp/
	* libgomp.texi (OpenMP 5.0): Set non-rectangular
	loop nest back to 'P' as Fortran support is incomplete.
2023-01-23 09:40:41 +01:00
GCC Administrator
0846336de5 Daily bump. 2023-01-20 00:17:40 +00:00
Jakub Jelinek
46644ec99c openmp: Fix up OpenMP expansion of non-rectangular loops [PR108459]
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...).  This
will just return NULL if it can't be folded, we need fold_build1
instead.

2023-01-19  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/108459
	* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
	than fold_unary for NEGATE_EXPR.

	* testsuite/libgomp.c/pr108459.c: New test.
2023-01-19 21:00:08 +01:00
GCC Administrator
8d07b193d7 Daily bump. 2023-01-18 00:17:21 +00:00
Martin Liska
42bf66e4a7 Regenerate Makefile.in files.
libbacktrace/ChangeLog:

	* Makefile.in: Regenerate.

libgomp/ChangeLog:

	* Makefile.in: Regenerate.
	* configure: Regenerate.

libphobos/ChangeLog:

	* Makefile.in: Regenerate.
	* libdruntime/Makefile.in: Regenerate.

libstdc++-v3/ChangeLog:

	* src/libbacktrace/Makefile.in: Regenerate.
2023-01-17 12:20:05 +01:00
Jakub Jelinek
83ffe9cde7 Update copyright years. 2023-01-16 11:52:17 +01:00
GCC Administrator
d901bf8a44 Daily bump. 2023-01-08 00:16:59 +00:00
LIU Hao
902c755930 Always define WIN32_LEAN_AND_MEAN before <windows.h>
Recently, mingw-w64 has got updated <msxml.h> from Wine which is included
indirectly by <windows.h> if `WIN32_LEAN_AND_MEAN` is not defined. The
`IXMLDOMDocument` class has a member function named `abort()`, which gets
affected by our `abort()` macro in "system.h".

`WIN32_LEAN_AND_MEAN` should, nevertheless, always be defined. This
can exclude 'APIs such as Cryptography, DDE, RPC, Shell, and Windows
Sockets' [1], and speed up compilation of these files a bit.

[1] https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers

gcc/

	PR middle-end/108300
	* config/xtensa/xtensa-dynconfig.c: Define `WIN32_LEAN_AND_MEAN`
	before <windows.h>.
	* diagnostic-color.cc: Likewise.
	* plugin.cc: Likewise.
	* prefix.cc: Likewise.

gcc/ada/

	PR middle-end/108300
	* adaint.c: Define `WIN32_LEAN_AND_MEAN` before `#include
	<windows.h>`.
	* cio.c: Likewise.
	* ctrl_c.c: Likewise.
	* expect.c: Likewise.
	* gsocket.h: Likewise.
	* mingw32.h: Likewise.
	* mkdir.c: Likewise.
	* rtfinal.c: Likewise.
	* rtinit.c: Likewise.
	* seh_init.c: Likewise.
	* sysdep.c: Likewise.
	* terminals.c: Likewise.
	* tracebak.c: Likewise.

gcc/jit/

	PR middle-end/108300
	* jit-w32.h: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.

libatomic/

	PR middle-end/108300
	* config/mingw/lock.c: Define `WIN32_LEAN_AND_MEAN` before
	<windows.h>.

libffi/

	PR middle-end/108300
	* src/aarch64/ffi.c: Define `WIN32_LEAN_AND_MEAN` before
	<windows.h>.

libgcc/

	PR middle-end/108300
	* config/i386/enable-execute-stack-mingw32.c: Define
	`WIN32_LEAN_AND_MEAN` before <windows.h>.
	* libgcc2.c: Likewise.
	* unwind-generic.h: Likewise.

libgfortran/

	PR middle-end/108300
	* intrinsics/sleep.c: Define `WIN32_LEAN_AND_MEAN` before
	<windows.h>.

libgomp/

	PR middle-end/108300
	* config/mingw32/proc.c: Define `WIN32_LEAN_AND_MEAN` before
	<windows.h>.

libiberty/

	PR middle-end/108300
	* make-temp-file.c: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
	* pex-win32.c: Likewise.

libssp/

	PR middle-end/108300
	* ssp.c: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.

libstdc++-v3/

	PR middle-end/108300
	* src/c++11/system_error.cc: Define `WIN32_LEAN_AND_MEAN` before
	<windows.h>.
	* src/c++11/thread.cc: Likewise.
	* src/c++17/fs_ops.cc: Likewise.
	* src/filesystem/ops.cc: Likewise.

libvtv/

	PR middle-end/108300
	* vtv_malloc.cc: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
	* vtv_rts.cc: Likewise.
	* vtv_utils.cc: Likewise.
2023-01-07 06:51:06 +00:00
GCC Administrator
53ef7c1d9a Daily bump. 2023-01-06 00:17:35 +00:00
Jakub Jelinek
29c3218618 openmp: Fix up finish_omp_target_clauses [PR108286]
The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause.  Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).

So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.

2023-01-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108286
	* semantics.cc (finish_omp_target_clauses): Ignore clauses other than
	OMP_CLAUSE_MAP.

	* testsuite/libgomp.c++/pr108286.C: New test.
2023-01-05 11:57:30 +01:00
GCC Administrator
fee53a3194 Daily bump. 2023-01-03 00:17:09 +00:00
Jakub Jelinek
74d5206fb6 Update copyright dates.
Manual part of copyright year updates.

2023-01-02  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gcc.cc (process_command): Update copyright notice dates.
	* gcov-dump.cc (print_version): Ditto.
	* gcov.cc (print_version): Ditto.
	* gcov-tool.cc (print_version): Ditto.
	* gengtype.cc (create_file): Ditto.
	* doc/cpp.texi: Bump @copying's copyright year.
	* doc/cppinternals.texi: Ditto.
	* doc/gcc.texi: Ditto.
	* doc/gccint.texi: Ditto.
	* doc/gcov.texi: Ditto.
	* doc/install.texi: Ditto.
	* doc/invoke.texi: Ditto.
gcc/ada/
	* gnat_ugn.texi: Bump @copying's copyright year.
	* gnat_rm.texi: Likewise.
gcc/d/
	* gdc.texi: Bump @copyrights-d year.
gcc/fortran/
	* gfortranspec.cc (lang_specific_driver): Update copyright notice
	dates.
	* gfc-internals.texi: Bump @copying's copyright year.
	* gfortran.texi: Ditto.
	* intrinsic.texi: Ditto.
	* invoke.texi: Ditto.
gcc/go/
	* gccgo.texi: Bump @copyrights-go year.
libgomp/
	* libgomp.texi: Bump @copying's copyright year.
libitm/
	* libitm.texi: Bump @copying's copyright year.
libquadmath/
	* libquadmath.texi: Bump @copying's copyright year.
2023-01-02 09:26:59 +01:00
Jakub Jelinek
68127a8e87 Update Copyright year in ChangeLog files
2022 -> 2023
2023-01-02 09:23:36 +01:00
GCC Administrator
de282a2012 Daily bump. 2022-12-22 00:17:29 +00:00
Chung-Lin Tang
fdc7469cf5 nvptx: reimplement libgomp barriers [PR99555]
Instead of trying to have the GPU do CPU-with-OS-like things, this new barriers
implementation for NVPTX uses simplistic bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

It is noted that: there might be a little bit of performance forfeited for
cases where earlier arriving threads could've been used to process tasks ahead
of other threads, but that has the requirement of implementing complex
futex-wait/wake like behavior, which is what we're try to avoid with this patch.
It is deemed that task processing is not what GPU target offloading is usually
used for.

Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in
   the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
   The main synchronization is done using a 'bar.red' instruction. This reduces
   across all threads the condition (team->task_count != 0), to enable the task
   processing down below if any thread created a task.
   (this bar.red usage means that this patch is dependent on the prior NVPTX
   bar.red GCC patch)

	PR target/99555

libgomp/ChangeLog:

	* config/nvptx/bar.c (generation_to_barrier): Remove.
	(futex_wait,futex_wake,do_spin,do_wait): Remove.
	(GOMP_WAIT_H): Remove.
	(#include "../linux/bar.c"): Remove.
	(gomp_barrier_wait_end): New function.
	(gomp_barrier_wait): Likewise.
	(gomp_barrier_wait_last): Likewise.
	(gomp_team_barrier_wait_end): Likewise.
	(gomp_team_barrier_wait): Likewise.
	(gomp_team_barrier_wait_final): Likewise.
	(gomp_team_barrier_wait_cancel_end): Likewise.
	(gomp_team_barrier_wait_cancel): Likewise.
	(gomp_team_barrier_cancel): Likewise.
	* config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
	(gomp_barrier_init): Remove init of waiters, lock fields.
	(gomp_team_barrier_wake): Remove prototype, add new static inline
	function.
2022-12-21 05:58:49 -08:00
Jakub Jelinek
1119902b6c openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.

Fixed thusly.

2022-12-21  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108180
	* pt.cc (tsubst_expr): Don't call cp_finish_decl on
	DECL_OMP_PRIVATIZED_MEMBER vars.

	* testsuite/libgomp.c++/pr108180.C: New test.
2022-12-21 09:10:03 +01:00
GCC Administrator
5fb1e67453 Daily bump. 2022-12-17 00:17:56 +00:00
Tobias Burnus
18af26fc37 Remove libgomp/testsuite/libgomp.fortran/allocate-4.f90 [PR108056]
Commit r13-4716-ge205ec03f0794aeac3e8a89e947c12624d5a274e accidentally
included a testcase of another patch that is pending review:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html

libgomp/
	PR libfortran/108056

	* testsuite/libgomp.fortran/allocate-4.f90: Remove
	accidentally added file.
2022-12-16 08:56:03 +01:00
GCC Administrator
c8f767b2c0 Daily bump. 2022-12-16 00:17:46 +00:00
Tobias Burnus
e205ec03f0 libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]
Since GCC 12, the conversion between the array descriptors formats - the
internal (GFC) and the C binding one (CFI) - moved to the compiler itself
such that the cfi_desc_to_gfc_desc/gfc_desc_to_cfi_desc functions are only
used with older code (GCC 9 to 11).  The newly added checks caused asserts
as older code did not pass the proper values (e.g. real(4) as effective
argument arrived as BT_ASSUME type as the effective type got lost inbetween).

As proposed in the PR, revert to the GCC 11 version - known bugs is better
than some fixes and new issues. Still, GCC 12 is much better in terms of
TS29113 support and should really be used.

This patch uses the current libgomp version of the GCC 11 branch, except
it fixes the GFC version number (which is 0), uses calloc instead of malloc,
and sets the lower bound to 1 instead of keeping it as is for
CFI_attribute_other.

libgfortran/ChangeLog:

	PR libfortran/108056
	* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
	gfc_desc_to_cfi_desc): Mostly revert to GCC 11 version for
	those backward-compatiblity-only functions.
2022-12-15 12:26:06 +01:00
GCC Administrator
26f4aefaeb Daily bump. 2022-12-15 00:17:29 +00:00
Julian Brown
9316ad3b43 OpenMP/Fortran: Combined directives with map/firstprivate of same symbol
This patch fixes a case where a combined directive (e.g. "!$omp target
parallel ...") contains both a map and a firstprivate clause for the
same variable.  When the combined directive is split into two nested
directives, the outer "target" gets the "map" clause, and the inner
"parallel" gets the "firstprivate" clause, like so:

  !$omp target parallel map(x) firstprivate(x)

  -->

  !$omp target map(x)
    !$omp parallel firstprivate(x)
      ...

When there is no map of the same variable, the firstprivate is distributed
to both directives, e.g. for 'y' in:

  !$omp target parallel map(x) firstprivate(y)

  -->

  !$omp target map(x) firstprivate(y)
    !$omp parallel firstprivate(y)
      ...

This is not a recent regression, but appear to fix a long-standing ICE.
(The included testcase is based on one by Tobias.)

2022-12-06  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
	* trans-openmp.cc (gfc_add_firstprivate_if_unmapped): New function.
	(gfc_split_omp_clauses): Call above.

libgomp/
	* testsuite/libgomp.fortran/combined-directive-splitting-1.f90: New
	test.
2022-12-14 14:11:45 +00:00
GCC Administrator
c6b12b802c Daily bump. 2022-12-11 00:17:43 +00:00
Tobias Burnus
ea4b23d9c8 libgomp: Handle OpenMP's reverse offloads
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called.  And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.

libgomp/ChangeLog:

	* libgomp.h (struct target_mem_desc): Predeclare; move
	below after 'reverse_splay_tree_node' and add rev_array
	member.
	(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
	(reverse_splay_tree_node, reverse_splay_tree,
	reverse_splay_tree_key): New typedef.
	(struct gomp_device_descr): Add mem_map_rev member.
	* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
	support for GOMP_REQUIRES_REVERSE_OFFLOAD.
	* splay-tree.h (splay_tree_callback_stop): New typedef; like
	splay_tree_callback but returning int not void.
	(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
	taking splay_tree_callback_stop as argument.
	* splay-tree.c (splay_tree_foreach_internal_lazy,
	splay_tree_foreach_lazy): New; but early exit if callback returns
	nonzero.
	* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
	(gomp_map_lookup_rev): New.
	(gomp_load_image_to_device): Handle reverse-offload function
	lookup table.
	(gomp_unload_image_from_device): Free devicep->mem_map_rev.
	(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
	gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
	gomp_map_cdata_lookup): New auxiliary structs and functions for
	gomp_target_rev.
	(gomp_target_rev): Implement reverse offloading and its mapping.
	(gomp_target_init): Init current_device.mem_map_rev.root.
	* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
	mapping of on-device allocated variables.
2022-12-10 13:42:08 +01:00
GCC Administrator
40ce6485f3 Daily bump. 2022-12-10 00:17:39 +00:00
Tobias Burnus
b2e1c49b4a Fortran/OpenMP: align/allocator modifiers to the allocate clause
gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
	output.
	* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
	(gfc_free_omp_namelist): Add bool arg.
	* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
	* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
	gfc_match_omp_flush): Update call.
	(gfc_match_omp_clauses): Match 'align/allocate modifers in
	'allocate' clause.
	(resolve_omp_clauses): Resolve align.
	* st.cc (gfc_free_statement): Update call
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.

libgomp/ChangeLog:

	* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
	item about 'align'; mark clause as 'Y' and directive as 'N'.
	* testsuite/libgomp.fortran/allocate-2.f90: New test.
	* testsuite/libgomp.fortran/allocate-3.f90: New test.
2022-12-09 21:45:37 +01:00
GCC Administrator
3fe66f7f9f Daily bump. 2022-12-07 00:18:44 +00:00
Marcel Vollweiler
81476bc4f4 OpenMP: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.

Additionally, a limitation of the number of teams on gcn offload devices is
implemented.  The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit).  This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory.  Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).

gcc/ChangeLog:

	* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
	to "-2" instead of "1" for non-existing num_teams clause in order to
	disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

	* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
	allow processing of device-specific values.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* icv-device.c (omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	(omp_set_teams_thread_limit): Likewise.
	* icv.c (omp_set_teams_thread_limit): Removed.
	(omp_get_teams_thread_limit): Likewise.
	(ialias): Likewise.
	* libgomp.texi: Updated documentation for nvptx and gcn corresponding
	to the limitation of the number of teams.
	* plugin/plugin-gcn.c (limit_teams): New helper function that limits
	the number of teams by twice the number of compute units.
	(parse_target_attributes): Limit the number of teams on gcn offload
	devices.
	* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
	handling.
	(gomp_load_image_to_device): Added a size check for the ICVs struct
	variable.
	(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
	copy back the ICV values from device to host.
	(GOMP_target_ext): Update the number of teams and threads in the kernel
	args also considering device-specific values.
	* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
	of OMP_TEAMS_THREAD_LIMIT from the environment.
	* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
	* testsuite/libgomp.c-c++-common/icv-9.c: New test.
	* testsuite/libgomp.fortran/icv-5.f90: New test.
	* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
	num_teams from "1" to "-2" in cases without num_teams clause.
	* g++.dg/gomp/target-teams-1.C: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
2022-12-06 06:03:50 -08:00
Tobias Burnus
9f80367e53 libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item
libgomp/
	* libgomp.texi (OpenMP 5.2): Add missing 'the'.
	(TR11): Add missing '@tab N @tab'.
2022-12-06 09:51:12 +01:00
GCC Administrator
6eea85a95e Daily bump. 2022-12-01 00:17:51 +00:00
Tobias Burnus
e0b95c2e8b libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors
libgomp/ChangeLog:

	* libgomp.texi (OpenMP Context Selectors): Add 'gfx803' to gcn's isa.
2022-11-30 11:23:41 +01:00
Paul-Antoine Arras
1fd508744e amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors
Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.

gcc/ChangeLog:

	* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
	* config/gcn/t-omp-device: Add gfx803.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
	* testsuite/libgomp.c/declare-variant-4.h: New header file.
2022-11-30 10:51:42 +01:00
GCC Administrator
b774853514 Daily bump. 2022-11-29 00:18:09 +00:00
Tobias Burnus
091b6dbc48 OpenMP/Fortran: Permit end-clause on directive
gcc/fortran/ChangeLog:

	* openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES,
	OMP_SECTIONS_CLAUSES): Add 'nowait'.
	(OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'.
	(gfc_match_omp_distribute_parallel_do,
	gfc_match_omp_distribute_parallel_do_simd,
	gfc_match_omp_parallel_do,
	gfc_match_omp_parallel_do_simd,
	gfc_match_omp_parallel_sections,
	gfc_match_omp_teams_distribute_parallel_do,
	gfc_match_omp_teams_distribute_parallel_do_simd): Disallow 'nowait'.
	(gfc_match_omp_workshare): Match 'nowait' clause.
	(gfc_match_omp_end_single): Use clause matcher for 'nowait'.
	(resolve_omp_clauses): Reject 'nowait' + 'copyprivate'.
	* parse.cc (decode_omp_directive): Break too long line.
	(parse_omp_do, parse_omp_structured_block): Diagnose duplicated
	'nowait' clause.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.2): Mark end-directive as Y.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/copyprivate-1.f90: New test.
	* gfortran.dg/gomp/copyprivate-2.f90: New test.
	* gfortran.dg/gomp/nowait-2.f90: Move dg-error tests ...
	* gfortran.dg/gomp/nowait-4.f90: ... to this new file.
	* gfortran.dg/gomp/nowait-5.f90: New test.
	* gfortran.dg/gomp/nowait-6.f90: New test.
	* gfortran.dg/gomp/nowait-7.f90: New test.
	* gfortran.dg/gomp/nowait-8.f90: New test.
2022-11-28 11:10:31 +01:00
GCC Administrator
d769c50408 Daily bump. 2022-11-26 00:17:08 +00:00
Sandra Loosemore
309e2d95e3 OpenMP: Generate SIMD clones for functions with "declare target"
This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.

gcc/ChangeLog:

	* common.opt (fopenmp-target-simd-clone): New option.
	(target_simd_clone_device): New enum to go with it.
	* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
	* flag-types.h (enum omp_target_simd_clone_device_kind): New.
	* omp-simd-clone.cc (auto_simd_fail): New function.
	(auto_simd_check_stmt): New function.
	(plausible_type_for_simd_clone): New function.
	(ok_for_auto_simd_clone): New function.
	(simd_clone_create): Add force_local argument, make the symbol
	have internal linkage if it is true.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass explicit_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool explicit_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-simd-clone-1.C: New.
	* g++.dg/gomp/target-simd-clone-2.C: New.
	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
	* gcc.dg/gomp/target-simd-clone-5.c: New.
	* gcc.dg/gomp/target-simd-clone-6.c: New.
	* gcc.dg/gomp/target-simd-clone-7.c: New.
	* gcc.dg/gomp/target-simd-clone-8.c: New.
	* lib/scanoffloadipa.exp: New.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library.
	* testsuite/libgomp.c/target-simd-clone-1.c: New.
	* testsuite/libgomp.c/target-simd-clone-2.c: New.
	* testsuite/libgomp.c/target-simd-clone-3.c: New.
2022-11-25 18:13:22 +00:00
Tobias Burnus
9f9d128f45 libgomp: Add no-target-region rev offload test + fix plugin-nvptx
OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case.  This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
	for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
	as valid and the code having no reverse-offload code.
	* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.
2022-11-25 13:48:17 +01:00