gcc/libgomp
Tom de Vries f07178ca3c [nvptx] Disable warp sync in simt region
I ran into a hang for this code:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
    {
      #pragma omp atomic update
      counter_N0 = counter_N0 + 1 ;
    }
...

This has to do with the nature of -muniform-simt.  It has two modes of
operation: inside and outside an SIMT region.

Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent.  This approach works unless the insn that is executed is a syscall
or an atomic insn.  In that case, the insn is predicated, such that it
executes in only one thread.  If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.

Inside an SIMT region, a warp executes in all threads.  However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code.  Care has been taken
though to ensure that the predication and propagation is a nop.  That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.

That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
  @%r33 atom.add.u32		_, [%r29], 1;
	shfl.sync.idx.b32	%r30, %r30, %r32, 31, 0xffffffff;
...

The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.

Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-08  Tom de Vries  <tdevries@suse.de>

	PR target/104783
	* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
	(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
	(nvptx_get_unisimt_outside_simt_predicate): New function.
	(predicate_insn): New function, factored out of ...
	(nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
	* config/nvptx/nvptx.h (struct machine_function): Add
	unisimt_outside_simt_predicate field.
	* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
	(define_insn "nvptx_uniform_warp_check"): Make predicable.

libgomp/ChangeLog:

2022-03-10  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr104783.c: New test.
2022-03-10 12:20:44 +01:00
..
config [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end 2022-02-22 15:48:03 +01:00
plugin amdgcn: Tune default OpenMP/OpenACC GPU utilization 2022-01-16 17:25:36 +01:00
testsuite [nvptx] Disable warp sync in simt region 2022-03-10 12:20:44 +01:00
.gitattributes libgomp: Fixes + cleanup for OpenACC's Fortran module + openacc_lib.h 2020-02-19 09:13:44 +01:00
acc_prof.h Update copyright years. 2022-01-03 10:42:10 +01:00
acinclude.m4 Add mold detection for libs. 2022-01-31 09:46:44 +01:00
aclocal.m4 libgomp: Regenerate configure files with automake 1.15.1 2020-10-02 12:08:47 +02:00
affinity-fmt.c Update copyright years. 2022-01-03 10:42:10 +01:00
affinity.c Update copyright years. 2022-01-03 10:42:10 +01:00
alloc.c Update copyright years. 2022-01-03 10:42:10 +01:00
allocator.c Update copyright years. 2022-01-03 10:42:10 +01:00
atomic.c Update copyright years. 2022-01-03 10:42:10 +01:00
barrier.c Update copyright years. 2022-01-03 10:42:10 +01:00
ChangeLog Daily bump. 2022-03-05 00:16:31 +00:00
ChangeLog.graphite
config.h.in offload-defaulted: Config option to silently ignore uninstalled offload compilers 2021-04-28 18:46:47 +02:00
configure make -Werror optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer 2022-02-03 16:10:18 +01:00
configure.ac make -Werror optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer 2022-02-03 16:10:18 +01:00
configure.tgt [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' some more [PR101484] 2021-07-20 09:14:28 +02:00
critical.c Update copyright years. 2022-01-03 10:42:10 +01:00
env.c Update copyright years. 2022-01-03 10:42:10 +01:00
error.c Update copyright years. 2022-01-03 10:42:10 +01:00
fortran.c Update copyright years. 2022-01-03 10:42:10 +01:00
hashtab.h Update copyright years. 2022-01-03 10:42:10 +01:00
icv-device.c Update copyright years. 2022-01-03 10:42:10 +01:00
icv.c Update copyright years. 2022-01-03 10:42:10 +01:00
iter_ull.c Update copyright years. 2022-01-03 10:42:10 +01:00
iter.c Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp_f.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp_g.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp-plugin.c Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp-plugin.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp.map openmp: Honor OpenMP 5.1 num_teams lower bound 2021-11-12 12:41:22 +01:00
libgomp.spec.in
libgomp.texi C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. 2022-02-09 23:47:12 -08:00
lock.c Update copyright years. 2022-01-03 10:42:10 +01:00
loop_ull.c Update copyright years. 2022-01-03 10:42:10 +01:00
loop.c Update copyright years. 2022-01-03 10:42:10 +01:00
Makefile.am openmp: Implement OpenMP 5.1 scope construct 2021-08-17 09:30:09 +02:00
Makefile.in openmp: Implement OpenMP 5.1 scope construct 2021-08-17 09:30:09 +02:00
oacc-async.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-cuda.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-host.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-init.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-int.h Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-mem.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-parallel.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-plugin.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-plugin.h Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-profiling.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-target.c GCN libgomp port 2019-11-13 12:38:04 +00:00
omp_lib.f90.in Update copyright years. 2022-01-03 10:42:10 +01:00
omp_lib.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
omp.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
openacc_lib.h Update copyright years. 2022-01-03 10:42:10 +01:00
openacc.f90 Update copyright years. 2022-01-03 10:42:10 +01:00
openacc.h Update copyright years. 2022-01-03 10:42:10 +01:00
ordered.c Update copyright years. 2022-01-03 10:42:10 +01:00
parallel.c Update copyright years. 2022-01-03 10:42:10 +01:00
priority_queue.c Update copyright years. 2022-01-03 10:42:10 +01:00
priority_queue.h Update copyright years. 2022-01-03 10:42:10 +01:00
scope.c Update copyright years. 2022-01-03 10:42:10 +01:00
sections.c Update copyright years. 2022-01-03 10:42:10 +01:00
secure_getenv.h Update copyright years. 2022-01-03 10:42:10 +01:00
single.c Update copyright years. 2022-01-03 10:42:10 +01:00
splay-tree.c Update copyright years. 2022-01-03 10:42:10 +01:00
splay-tree.h Update copyright years. 2022-01-03 10:42:10 +01:00
target.c C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. 2022-02-09 23:47:12 -08:00
task.c libgomp: Fix segfault with posthumous orphan tasks [PR104385] 2022-02-08 09:30:17 +01:00
taskloop.c Update copyright years. 2022-01-03 10:42:10 +01:00
team.c Update copyright years. 2022-01-03 10:42:10 +01:00
teams.c Update copyright years. 2022-01-03 10:42:10 +01:00
work.c Update copyright years. 2022-01-03 10:42:10 +01:00