gcc/libgomp
Tom de Vries 5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
..
config [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end 2022-02-22 15:48:03 +01:00
plugin amdgcn: Tune default OpenMP/OpenACC GPU utilization 2022-01-16 17:25:36 +01:00
testsuite [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end 2022-02-22 15:48:03 +01:00
.gitattributes libgomp: Fixes + cleanup for OpenACC's Fortran module + openacc_lib.h 2020-02-19 09:13:44 +01:00
acc_prof.h Update copyright years. 2022-01-03 10:42:10 +01:00
acinclude.m4 Add mold detection for libs. 2022-01-31 09:46:44 +01:00
aclocal.m4 libgomp: Regenerate configure files with automake 1.15.1 2020-10-02 12:08:47 +02:00
affinity-fmt.c Update copyright years. 2022-01-03 10:42:10 +01:00
affinity.c Update copyright years. 2022-01-03 10:42:10 +01:00
alloc.c Update copyright years. 2022-01-03 10:42:10 +01:00
allocator.c Update copyright years. 2022-01-03 10:42:10 +01:00
atomic.c Update copyright years. 2022-01-03 10:42:10 +01:00
barrier.c Update copyright years. 2022-01-03 10:42:10 +01:00
ChangeLog Daily bump. 2022-02-16 00:16:26 +00:00
ChangeLog.graphite
config.h.in offload-defaulted: Config option to silently ignore uninstalled offload compilers 2021-04-28 18:46:47 +02:00
configure make -Werror optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer 2022-02-03 16:10:18 +01:00
configure.ac make -Werror optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer 2022-02-03 16:10:18 +01:00
configure.tgt [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' some more [PR101484] 2021-07-20 09:14:28 +02:00
critical.c Update copyright years. 2022-01-03 10:42:10 +01:00
env.c Update copyright years. 2022-01-03 10:42:10 +01:00
error.c Update copyright years. 2022-01-03 10:42:10 +01:00
fortran.c Update copyright years. 2022-01-03 10:42:10 +01:00
hashtab.h Update copyright years. 2022-01-03 10:42:10 +01:00
icv-device.c Update copyright years. 2022-01-03 10:42:10 +01:00
icv.c Update copyright years. 2022-01-03 10:42:10 +01:00
iter_ull.c Update copyright years. 2022-01-03 10:42:10 +01:00
iter.c Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp_f.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp_g.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp-plugin.c Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp-plugin.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp.h Update copyright years. 2022-01-03 10:42:10 +01:00
libgomp.map openmp: Honor OpenMP 5.1 num_teams lower bound 2021-11-12 12:41:22 +01:00
libgomp.spec.in
libgomp.texi C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. 2022-02-09 23:47:12 -08:00
lock.c Update copyright years. 2022-01-03 10:42:10 +01:00
loop_ull.c Update copyright years. 2022-01-03 10:42:10 +01:00
loop.c Update copyright years. 2022-01-03 10:42:10 +01:00
Makefile.am openmp: Implement OpenMP 5.1 scope construct 2021-08-17 09:30:09 +02:00
Makefile.in openmp: Implement OpenMP 5.1 scope construct 2021-08-17 09:30:09 +02:00
oacc-async.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-cuda.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-host.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-init.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-int.h Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-mem.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-parallel.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-plugin.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-plugin.h Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-profiling.c Update copyright years. 2022-01-03 10:42:10 +01:00
oacc-target.c GCN libgomp port 2019-11-13 12:38:04 +00:00
omp_lib.f90.in Update copyright years. 2022-01-03 10:42:10 +01:00
omp_lib.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
omp.h.in Update copyright years. 2022-01-03 10:42:10 +01:00
openacc_lib.h Update copyright years. 2022-01-03 10:42:10 +01:00
openacc.f90 Update copyright years. 2022-01-03 10:42:10 +01:00
openacc.h Update copyright years. 2022-01-03 10:42:10 +01:00
ordered.c Update copyright years. 2022-01-03 10:42:10 +01:00
parallel.c Update copyright years. 2022-01-03 10:42:10 +01:00
priority_queue.c Update copyright years. 2022-01-03 10:42:10 +01:00
priority_queue.h Update copyright years. 2022-01-03 10:42:10 +01:00
scope.c Update copyright years. 2022-01-03 10:42:10 +01:00
sections.c Update copyright years. 2022-01-03 10:42:10 +01:00
secure_getenv.h Update copyright years. 2022-01-03 10:42:10 +01:00
single.c Update copyright years. 2022-01-03 10:42:10 +01:00
splay-tree.c Update copyright years. 2022-01-03 10:42:10 +01:00
splay-tree.h Update copyright years. 2022-01-03 10:42:10 +01:00
target.c C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. 2022-02-09 23:47:12 -08:00
task.c libgomp: Fix segfault with posthumous orphan tasks [PR104385] 2022-02-08 09:30:17 +01:00
taskloop.c Update copyright years. 2022-01-03 10:42:10 +01:00
team.c Update copyright years. 2022-01-03 10:42:10 +01:00
teams.c Update copyright years. 2022-01-03 10:42:10 +01:00
work.c Update copyright years. 2022-01-03 10:42:10 +01:00