[gdb/build] Workaround tsan select bug

When building gdb with -O0 and -fsanitize-thread, I run into a large number of
timeouts caused by gdb hanging, for instance:
...
(gdb) continue^M
Continuing.^M
[Inferior 1 (process 378) exited normally]^M
FAIL: gdb.multi/stop-all-on-exit.exp: continue until exit (timeout)
...

What happens is the following:
- two inferiors are added, stopped at main
- inferior 1 is setup to exit after 1 second
- inferior 2 is setup to exit after 10 seconds
- the continue command is issued
- because of set schedule-multiple on, both inferiors continue
- the first inferior exits
- gdb sends a SIGSTOP to the second inferior
- the second inferior receives the SIGSTOP, and raises a SIGCHILD
- gdb calls select, and blocks
- the signal arrives, and interrupts select
- ThreadSanitizers signal handler is called, which marks the signal pending
  internally
- select returns -1 with errno == EINTR
- gdb calls select again, and blocks
- gdb hangs, waiting for gdb's sigchild_handler to be called

This is a bug [1] in ThreadSanitizer.  When select is called with
timeout == nullptr, it is blocking but ThreadSanitizer doesn't consider it so,
and consequently doesn't see the need to call sigchild_handler.

Work around this by:
- instead of using the blocking select variant, forcing a small timeout and
- upon timeout calling a function that ThreadSanitizer does consider
  blocking: usleep, forcing sigchild_handler to be called.

Tested on x86_64-linux.

PR build/32295
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32295

[1] https://github.com/google/sanitizers/issues/1813
This commit is contained in:
Tom de Vries 2024-11-22 12:54:57 +01:00
parent dcc4d67866
commit c4df8ad79c

View File

@ -1344,6 +1344,31 @@ interruptible_select (int n,
if (n <= fd)
n = fd + 1;
bool tsan_forced_timeout = false;
#if defined (__SANITIZE_THREAD__)
struct timeval tv;
if (timeout == nullptr)
{
/* A nullptr timeout means select is blocking, and ThreadSanitizer has
a bug that it considers select non-blocking, and consequently when
intercepting select it will not call signal handlers for pending
signals, and gdb will hang in select waiting for those signal
handlers to be called.
Filed here ( https://github.com/google/sanitizers/issues/1813 ).
Work around this by:
- forcing a small timeout, and
- upon timeout calling a function that ThreadSanitizer does consider
blocking: usleep, forcing signal handlers to be called for pending
signals. */
tv.tv_sec = 0;
tv.tv_usec = 1000;
timeout = &tv;
tsan_forced_timeout = true;
}
#endif
{
fd_set ret_readfds, ret_writefds, ret_exceptfds;
struct timeval ret_timeout;
@ -1359,6 +1384,12 @@ interruptible_select (int n,
if (res == -1 && errno == EINTR)
continue;
if (tsan_forced_timeout && res == 0)
{
usleep (0);
continue;
}
break;
}