The `ready` flag of async event handlers is cleared by the async event
handler system right before invoking the associated callback, in
check_async_event_handlers.
This is not ideal with how the infrun subsystem consumes events: all
targets' async event handler callbacks essentially just invoke
`inferior_event_handler`, which eventually calls `fetch_inferior_event`
and `do_target_wait`. `do_target_wait` picks an inferior at random,
and thus a target at random (it could be the target whose `ready` flag
was cleared, or not), and pulls one event from it.
So it's possible that:
- the async event handler for a target A is called
- we end up consuming an event for target B
- all threads of target B are stopped, target_async(0) is called on it,
so its async event handler is cleared (e.g.
record_btrace_target::async)
As a result, target A still has events to report while its async event
handler is left unmarked, so these events are not consumed. To counter
this, at the end of their async event handler callbacks, targets check
if they still have something to report and re-mark their async event
handler (e.g. remote_async_inferior_event_handler).
The linux_nat target does not suffer from this because it doesn't use an
async event handler at the moment. It only uses a pipe registered with
the event loop. It is written to in the SIGCHLD handler (and in other
spots that want to get target wait method called) and read from in
the target's wait method. So if linux_nat happened to be target A in
the example above, the pipe would just stay readable, and the event loop
would wake up again, until linux_nat's wait method is finally called and
consumes the contents of the pipe.
I think it would be nicer if targets using async_event_handler worked in
a similar way, where the flag would stay set until the target's wait
method is actually called. As a first step towards that, this patch
moves the responsibility of clearing the ready flags of async event
handlers to the invoked callback.
All async event handler callbacks are modified to clear their ready flag
before doing anything else. So in practice, nothing changes with this
patch. It's only the responsibility of clearing the flag that is
shifted toward the callee.
gdb/ChangeLog:
* async-event.h (async_event_handler_func): Add documentation.
* async-event.c (check_async_event_handlers): Don't clear
async_event_handler ready flag.
* infrun.c (infrun_async_inferior_event_handler): Clear ready
flag.
* record-btrace.c (record_btrace_handle_async_inferior_event):
Likewise.
* record-full.c (record_full_async_inferior_event_handler):
Likewise.
* remote-notif.c (remote_async_get_pending_events_handler):
Likewise.
* remote.c (remote_async_inferior_event_handler): Likewise.
Change-Id: I179ef8e99580eae642d332846fd13664dbddc0c1
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the
raw, unadorned name of CTF types, have one unfortunate wrinkle: they
return NULL not only on error but when returning the name of types
without a name in writable dicts. This was unintended: it not only
makes it impossible to reliably tell if a given call to
ctf_type_name_raw failed (due to a bad string offset say), but also
complicates all its callers, who now have to check for both NULL and "".
The written-out form of CTF has no concept of a NULL pointer instead of
a string: all null strings are strtab offset 0, "". So the more we can
do to remove this distinction from the writable form, the less complex
the rest of our code needs to be.
Armour against NULL in multiple places, arranging to return "" from
ctf_type_name_raw if offset 0 is passed in, and removing a risky
optimization from ctf_str_add* that avoided doing anything if a NULL was
passed in: this added needless irregularity to the functions' API
surface, since "" and NULL should be treated identically, and in the
case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to
the list of references to be updated no matter what the content of the
string happens to be.
This means we can simplify the deduplicator a tiny bit, also fixing a
bug (latent when used by ld) where if the input dict was writable,
we failed to realise when types were nameless and could end up creating
deeply unhelpful synthetic forwards with no name, which we just banned
a few commits ago, so the link failed.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-string.c (ctf_str_add): Treat adding a NULL as adding "".
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Likewise.
* ctf-types.c (ctf_type_name_raw): Always return "" for offset 0.
* ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour
against NULL name.
(ctf_dedup_maybe_synthesize_forward): Likewise.
We declare a variable to hold errors at two scopes, and then initialize
the inner one and jump to a scope where only the outer one is in scope.
The consequences are minor: only the version of the error message
printed in the debugging stream is impacted.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-create.c (ctf_serialize): Fix shadowing.
Now that "anonymous typedef nodes" have been extirpated, we can mandate
that things that have names in C must have names in CTF too. (Unlike
the no-forwards embarrassment, the deduplicator does nothing special
with names: types that have names in C will have the same name in CTF.
So we can assume that the CTF rules and the C rules are the same.)
include/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ECTF_NONAME): New.
(ECTF_NERR): Adjust.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-create.c (ctf_add_encoded): Add check for non-empty name.
(ctf_add_forward): Likewise.
(ctf_add_typedef): Likewise.
There is special code in libctf to handle typedefs with no name, which
the code calls "anonymous typedef nodes".
These monsters are obviously not something C programs can include: the
whole point of a ttypedef is to introduce a new name. Looking back at
the history of DWARF in GCC, the only thing (outside C++ anonymous
namespaces) which can generate a DW_TAG_typedef without a DW_AT_name is
obsolete code to handle the long-removed -feliminate-dwarf2-dups option.
Looking at OpenSolaris, typedef nodes with no name couldn't be generated
by the DWARF->CTF converter at all (and its deduplicator barfed on
them): the only reason for the existence of this code is a special case
working around a peculiarity of stabs whereby types could sometimes be
referenced before they were introduced.
We don't need to carry code in libctf to handle special cases in an
obsolete OpenSolaris converter (that yields a format that isn't readable
by libctf anyway). So drop it.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-open.c (init_types): Rip out code to check anonymous typedef
nodes.
* ctf-create.c (ctf_add_reftype): Likewise.
* ctf-lookup.c (refresh_pptrtab): Likewise.
The variable section in a CTF dict is meant to contain the types of
variables that do not appear in the symbol table (mostly file-scope
static declarations). We implement this by having the compiler emit
all potential data symbols into both sections, then delete those
symbols from the variable section that correspond to data symbols the
linker has reported.
Unfortunately, the check for this in ctf_serialize is wrong: rather than
checking the set of linker-reported symbols, we check the set of names
in the data object symtypetab section: if the linker has reported no
symbols at all (usually if ld -r has been run, or if a non-linker
program that does not use symbol tables is calling ctf_link) this will
include every single symbol, emptying the variable section completely.
Worse, when ld -r is in use, we want to force writeout of every
symtypetab entry on the inputs, in an indexed section, whether or not
the linker has reported them, since this isn't a final link yet and the
symbol table is not finalized (and may grow more symbols than the linker
has yet reported). But the check for this is flawed too: we were
relying on ctf_link_shuffle_syms not having been called if no symbols
exist, but that function is *always* called by ld even when ld -r is in
use: ctf_link_add_linker_symbol is the one that's not called when there
are no symbols.
We clearly need to rethink this. Using the emptiness of the set of
reported symbols as a test for ld -r is just ugly: the linker already
knows if ld -r is underway and can just tell us. So add a new linker
flag CTF_LINK_NO_FILTER_REPORTED_SYMS that is set to stop the linker
filtering the symbols in the symtypetab sections using the set that the
linker has reported: use the presence or absence of this flag to
determine whether to emit unindexed symtabs: we only remove entries from
the variable section when filtering symbols, and we only remove them if
they are in the reported symbol set, fixing the case where no symbols
are reported by the linker at all.
(The negative sense of the new CTF_LINK flag is intentional: the common
case, both for ld and for simple tools that want to do a ctf_link with
no ELF symbol table in sight, is probably to filter out symbols that no
linker has reported: i.e., for the simple tools, all of them.)
There's another wrinkle, though. It is quite possible for a non-linker
to add symbols to a dict via ctf_add_*_sym and then write it out via the
ctf_write APIs: perhaps it's preparing a dict for a later linker
invocation. Right now this would not lead to anything terribly
meaningful happening: ctf_serialize just assumes it was called via
ctf_link if symbols are present. So add an (internal-to-libctf) flag
that indicates that a writeout is happening via ctf_link_write, and set
it there (propagating it to child dicts as needed). ctf_serialize can
then spot when it is not being called by a linker, and arrange to always
write out an indexed, sorted symtypetab for fastest possible future
symbol lookup by name in that case. (The writeouts done by ld -r are
unsorted, because the only thing likely to use those symtabs is the
linker, which doesn't benefit from symtypetab sorting.)
Tests added for all three linking cases (ld -r, ld -shared, ld), with a
bit of testsuite framework enhancement to stop it unconditionally
linking the CTF to be checked by the lookup program with -shared, so
tests can now examine CTF linked with -r or indeed with no flags at all,
though the output filename is still foo.so even in this case.
Another test added for the non-linker case that endeavours to determine
whether the symtypetab is sorted by examining the order of entries
returned from ctf_symbol_next: nobody outside libctf should rely on
this ordering, but this test is not outside libctf :)
include/ChangeLog
2021-01-26 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (CTF_LINK_NO_FILTER_REPORTED_SYMS): New.
ld/ChangeLog
2021-01-26 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (lang_merge_ctf): Set CTF_LINK_NO_FILTER_REPORTED_SYMS
when appropriate.
libctf/ChangeLog
2021-01-27 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.c (_libctf_nonnull_): Add parameters.
(LCTF_LINKING): New flag.
(ctf_dict_t) <ctf_link_flags>: Mention it.
* ctf-link.c (ctf_link): Keep LCTF_LINKING set across call.
(ctf_write): Likewise, including in child dictionaries.
(ctf_link_shuffle_syms): Make sure ctf_dynsyms is NULL if there
are no reported symbols.
* ctf-create.c (symtypetab_delete_nonstatic_vars): Make sure
the variable has been reported as a symbol by the linker.
(symtypetab_skippable): Mention relationship between SYMFP and the
flags.
(symtypetab_density): Adjust nonnullity. Exit early if no symbols
were reported and force-indexing is off (i.e., we are doing a
final link).
(ctf_serialize): Handle the !LCTF_LINKING case by writing out an
indexed, sorted symtypetab (and allow SYMFP to be NULL in this
case). Turn sorting off if this is a non-final link. Only delete
nonstatic vars if we are filtering symbols and the linker has
reported some.
* testsuite/libctf-regression/nonstatic-var-section-ld-r*:
New test of variable and symtypetab section population when
ld -r is used.
* testsuite/libctf-regression/nonstatic-var-section-ld-executable.lk:
Likewise, when ld of an executable is used.
* testsuite/libctf-regression/nonstatic-var-section-ld.lk:
Likewise, when ld -shared alone is used.
* testsuite/libctf-regression/nonstatic-var-section-ld*.c:
Lookup programs for the above.
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.*: New
test, testing survival of symbols across ctf_write paths.
* testsuite/lib/ctf-lib.exp (run_lookup_test): New option,
nonshared, suppressing linking of the SOURCE with -shared.
bfd/
PR 27311
* elflink.c (elf_link_add_object_symbols): Don't pull in as-needed
libraries for IR references on pass over libraries after LTO
recompilation.
ld/
* testsuite/ld-plugin/pr27311d.c: New test.
* testsuite/ld-plugin/lto.exp: Rename pr27311 to pr27311-1, compile
and link new test as pr27311-2.
Moving it to an inner scope makes it clearer where it's used (only while
handling the TARGET_WAITKIND_LOADED event).
gdb/ChangeLog:
* infrun.c (handle_inferior_event): Move stop_soon variable to
inner scope.
Change-Id: Ic57685a21714cfbb38f1487ee96cea1d12b44652
This does exactly the same as making decisions based on an override
in _bfd_elf_add_default_symbol, and is simpler.
PR 27311
* elflink.c (_bfd_elf_add_default_symbol): Revert last two changes.
(elf_link_add_object_symbols): Here too. Don't pull in as-needed
libraries when H is an indirect symbol after calling
_bfd_elf_add_default_symbol.
PR 27270
PR 27284
PR 26945
* ar.c: Don't include libbfd.h.
(write_archive): Replace xmalloc+strcpy with xstrdup. Use
bfd_stat rather than fstat on iostream. Move stat and fd tests
outside of _WIN32 ifdef. Delete skip_stat variable.
* arsup.c (temp_name, real_ofd): New static variables.
(ar_open): Use make_tempname and bfd_fdopenw.
(ar_save): Adjust to suit ar_open changes. Move stat output
of _WIN32 ifdef.
* objcopy.c: Don't include libbfd.h.
(copy_file): Use bfd_stat.
bfd/
PR 27311
* elflink.c (_bfd_elf_add_default_symbol): Clear override when
undecorated symbol will have a different version.
ld/
* testsuite/ld-ifunc/ifunc.exp (libpr16467b.so, libpr16467bn.so):
Link with --as-needed.
This adds a testcase that exercises detaching while GDB is stepping
over a breakpoint, in all combinations of:
- maint target non-stop off/on
- set non-stop on/off
- displaced stepping on/off
This exercises the bugs fixed in the previous 8 patches.
gdb/testsuite/ChangeLog:
* gdb.threads/detach-step-over.c: New file.
* gdb.threads/detach-step-over.exp: New file.
A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process, while threads are running. If we have more than one inferior
running, and we detach from just one of the inferiors, we expect that
the remaining inferior continues running. However, in all-stop, if
GDB needs to pause the target for the detach, nothing is re-resuming
the other inferiors after the detach. "info threads" shows the
threads as running, but they really aren't. This fixes it.
gdb/ChangeLog:
* infcmd.c (detach_command): Hold strong reference to target, and
if all-stop on entry, restart threads on exit.
* infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
(restart_stepped_thread): ... this new function. Also handle
trap_expected.
(restart_after_all_stop_detach): New function.
* infrun.h (restart_after_all_stop_detach): Declare.
A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process. That testcase exercises both "set displaced-stepping
on/off". Testing with "set displaced-stepping off" reveals that GDB
does not handle the case of the user typing "detach" just while some
thread is in the middle of an in-line step over. If that thread
belongs to the inferior that is being detached, then the step-over
never finishes, and threads of other inferiors are never re-resumed.
This fixes it.
gdb/ChangeLog:
* infrun.c (struct step_over_info): Initialize fields.
(prepare_for_detach): Handle ongoing in-line step over.
A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process. That testcase sometimes fails with the inferior crashing
with SIGTRAP after the detach because of the bug fixed by this patch,
when tested with the native target.
The problem is that target_detach removes breakpoints from the target
immediately, and that does not work with the native GNU/Linux target
(and probably no other native target) currently. The test wouldn't
fail with this issue when testing against gdbserver, because gdbserver
does allow accessing memory while the current thread is running, by
transparently pausing all threads temporarily, without GDB noticing.
Implementing that in gdbserver was a lot of work, so I'm not looking
forward right now to do the same in the native target. Instead, I
came up with a simpler solution -- push the breakpoints removal down
to the targets. The Linux target conveniently already pauses all
threads before detaching them, since PTRACE_DETACH only works with
stopped threads, so we move removing breakpoints to after that. Only
the remote and GNU/Linux targets support support async execution, so
no other target should really need this.
gdb/ChangeLog:
* linux-nat.c (linux_nat_target::detach): Remove breakpoints
here...
* remote.c (remote_target::remote_detach_1): ... and here ...
* target.c (target_detach): ... instead of here.
* target.h (target_ops::detach): Add comment.
I noticed that "detach" while a program was running sometimes resulted
in the process crashing. I tracked it down to this change to
prepare_for_detach in commit 187b041e ("gdb: move displaced stepping
logic to gdbarch, allow starting concurrent displaced steps"):
/* Is any thread of this process displaced stepping? If not,
there's nothing else to do. */
- if (displaced->step_thread == nullptr)
+ if (displaced_step_in_progress (inf))
return;
The problem above is that the condition was inadvertently flipped. It
should have been:
if (!displaced_step_in_progress (inf))
So I fixed it, and wrote a testcase to exercise it. The testcase has
a number of threads constantly stepping over a breakpoint, and then
GDB detaches the process, while threads are running and stepping over
the breakpoint. And then I was surprised that my testcase would hang
-- GDB would get stuck in an infinite loop in prepare_for_detach,
here:
while (displaced_step_in_progress (inf))
{
...
What is going on is that since we now have two displaced stepping
buffers, as one displaced step finishes, GDB starts another, and
there's another one already in progress, and on and on, so the
displaced_step_in_progress condition never turns false. This happens
because we go via the whole handle_inferior_event, which tries to
start new step overs when one finishes. And also because while we
remove breakpoints from the target before prepare_for_detach is
called, handle_inferior_event ends up calling insert_breakpoints via
e.g. keep_going.
Thinking through all this, I came to the conclusion that going through
the whole handle_inferior_event isn't ideal. A _lot_ is done by that
function, e.g., some thread may get a signal which is passed to the
inferior, and gdb decides to try to get over the signal handler, which
reinstalls breakpoints. Or some process may exit. We can end up
reporting these events via normal_stop while detaching, maybe end up
running some breakpoint commands, or maybe even something runs an
inferior function call. Etc. All this after the user has already
declared they don't want to debug the process anymore, by asking to
detach.
I came to the conclusion that it's better to do the minimal amount of
work possible, in a more controlled fashion, without going through
handle_inferior_event. So in the new approach implemented by this
patch, if there are threads of the inferior that we're detaching in
the middle of a displaced step, stop them, and cancel the displaced
step. This is basically what stop_all_threads already does, via
wait_one and (the now factored out) handle_one, so I'm reusing those.
gdb/ChangeLog:
* infrun.c (struct wait_one_event): Move higher up.
(prepare_for_detach): Abort in-progress displaced steps instead of
letting them complete.
(handle_one): If the inferior is detaching, don't add the thread
back to the global step-over chain.
(restart_threads): Don't restart threads if detaching.
(handle_signal_stop): Remove inferior::detaching reference.
After detaching from a process, the inf->detaching flag is
inadvertently left set to true. If you afterwards reuse the same
inferior to start a new process, GDB will mishave...
The problem is that prepare_for_detach discards the scoped_restore at
the end, while the intention is for the flag to be set only for the
duration of prepare_for_detach.
This was already a bug in the original commit that added
prepare_for_detach, commit 24291992da ("PR gdb/11321"), by yours
truly. Back then, we still used cleanups, and the function called
discard_cleanups instead of do_cleanups, by mistake.
gdb/ChangeLog:
* infrun.c (prepare_for_detach): Don't release scoped_restore
before returning.
This moves the code handling an event out of wait_one to a separate
function, to be used in another context in a following patch.
gdb/ChangeLog:
* infrun.c (handle_one): New function, factored out from ...
(stop_all_threads): ... here.
A following patch will add a new testcase that has two processes, each
with a number of threads constantly tripping a breakpoint and stepping
over it, because the breakpoint has a condition that evals false.
Then GDB detaches from one of the processes, while both processes are
running. And then the testcase sends a SIGUSR1 to the other process.
When run against gdbserver, that would occasionaly fail like this:
(gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
Executing on target: kill -SIGUSR1 208303 (timeout = 300)
spawn -ignore SIGHUP kill -SIGUSR1 208303
Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 208303.208305]
0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
54 counter++; /* Set breakpoint here. */
What happened was that GDBserver is doing a step-over for process A
when a detach request for process B arrives. And that generates a
spurious SIGTRAP report for process A, as seen above.
The GDBserver logs reveal what happened:
- GDB manages to detach while a step over is in progress. That reaches
linux_process_target::complete_ongoing_step_over(), which does:
/* Passing NULL_PTID as filter indicates we want all events to
be left pending. Eventually this returns when there are no
unwaited-for children left. */
ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
__WALL);
As the comment say, this leaves all events pending, _including_ the
just finished step SIGTRAP. We never discard that SIGTRAP. So
GDBserver reports the SIGTRAP to GDB. GDB can't explain the
SIGTRAP, so it reports it to the user.
The GDBserver log looks like this. The LWP of interest is 208305:
Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
proceed_all_lwps: found thread 208305 needing a step-over
Starting step-over on LWP 208305. Stopping all threads
208305 starts a step-over.
>>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
Sending sigstop to lwp 208303
Sending sigstop to lwp 207755
wait_for_sigstop: pulling events
LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
LLW: waitpid 207755 received Stopped (signal) (stopped)
pc is 0x7f7e045593bf
Expected stop.
LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
LLW: waitpid 208303 received Stopped (signal) (stopped)
pc is 0x7ffff7e743bf
Expected stop.
LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
LLW: exit (no unwaited-for LWP)
stop_all_lwps done, setting stopping_threads back to !stopping
<<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
Done stopping all threads for step-over.
pc is 0x555555555227
Writing 8b to 0x555555555227 in process 208305
Could not findsigchld_handler
fast tracepoint jump at 0x555555555227 in list (uninserting).
pending reinsert at 0x555555555227
step from pc 0x555555555227
Resuming lwp 208305 (step, signal 0, stop expected)
<<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
handling possible serial event
getpkt ("D;32b8b"); [no ack sent]
The detach request arrives.
sigchld_handler
Tracing is already off, ignoring
detach: step over in progress, finish it first
GDBserver realizes a step over for 208305 was in progress, let's it
finish.
LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
LLW: waitpid 208305 received Stopped (signal) (stopped)
pc is 0x555555555227
Expected stop.
LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
pending reinsert at 0x555555555227
step from pc 0x555555555227
Resuming lwp 208305 (step, signal 0, stop not expected)
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
sigsuspend'ing
LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
pc is 0x55555555522a
CSBB: LWP 208303.208305 stopped by trace
LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
LLW: exit (no unwaited-for LWP)
Finished step over.
The step-over for 208305 finishes.
Writing cc to 0x555555555227 in process 208305
Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
>>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
stop_all_lwps (stop, except=none)
wait_for_sigstop: pulling events
The detach proceeds (snipped).
...
proceed_one_lwp: lwp 208305
LWP 208305 has pending status, leaving stopped
Later on, 208305 has a pending status (the step SIGTRAP from the
step-over), so GDBserver starts the process of reporting it.
...
wait_1 ret = LWP 208303.208305, 1, 5
<<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
...
and eventually GDB receives the stop notification (T05 == SIGTRAP):
getpkt ("vStopped"); [no ack sent]
sigchld_handler
vStopped: acking 3
Writing resume reply for LWP 208303.208305:1
putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]
From the GDB side, we see:
[infrun] fetch_inferior_event: enter
[infrun] fetch_inferior_event: fetch_inferior_event enter
[infrun] do_target_wait: Found 2 inferiors, starting at #1
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 208303.208305.0 [Thread 208303.208305],
[infrun] print_target_wait_results: status->kind = stopped, signal = GDB_SIGNAL_TRAP
[infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 6
[infrun] operator(): putting back 6 threads to step in global queue
[infrun] start_step_over: exit
[infrun] handle_signal_stop: context switch
[infrun] context_switch: Switching context from process 0 to Thread 208303.208305
[infrun] handle_signal_stop: stop_pc=0x55555555522a
[infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
[infrun] stop_waiting: stop_waiting
[infrun] stop_all_threads: starting
The fix is to discard the step SIGTRAP, unless GDB wanted the thread
to step.
gdbserver/ChangeLog:
* linux-low.cc (linux_process_target::complete_ongoing_step_over):
Discard step SIGTRAP, unless GDB wanted the thread to step.
A following patch will add a testcase that has two processes with
threads stepping over a breakpoint continuously, and then detaches
from one of the processes while threads are running. The other
process continues stepping over its breakpoint. And then the testcase
sends a SIGUSR1, expecting that GDB reports it. That would sometimes
hang against gdbserver, due to the bugs fixed here. Both bugs are
related, in that they're about remote protocol asynchronous Stop
notifications. There's a bug in GDB, and another in GDBserver.
The GDB bug:
- when we detach from a process, the remote target discards any
pending RSP notification related to that process, including the
in-flight, yet-unacked notification. Discarding the in-flight
notification is the problem. Until the in-flight notification is
acked with a vStopped packet, the server won't send another %Stop
notification. As a result, the debug session gets messed up. In
the new testcase's case, GDB would hang inside stop_all_threads,
waiting for a stop for one of the process'es threads, which never
arrived -- its stop reply was permanently stuck in the stop reply
queue, waiting for a vStopped packet that never arrived.
In summary:
1. GDBserver sends stop notification about thread X, the remote
target receives it and stores it
2. At the same time, GDB detaches thread X's inferior
3. The remote target discards the received stop notification
4. GDBserver waits forever for the ack
The GDBserver bug:
GDBserver has the opposite bug. It also discards notifications for
the process being detached. If that discards the head of the
notification queue, when gdb sends an ack, it ends up acking the
_next_ notification. Meaning, gdb loses one notification. In the
testcase, this results in a similar hang in stop_all_threads.
So we have two very similar bugs in GDB and GDBserver, both resulting
in a similar symptom. That's why I'm fixing them both at the same
time.
gdb/ChangeLog:
* remote.c (remote_notif_stop_ack): Don't error out on
TARGET_WAITKIND_IGNORE; instead, just ignore the notification.
(remote_target::discard_pending_stop_replies): Don't delete
in-flight notification; instead, clear its contents.
gdbserver/ChangeLog:
* server.cc (discard_queued_stop_replies): Don't ever discard the
notification at the head of the list.
This adds a testcase exercising attaching to a multi-threaded process,
in all combinations of:
- set non-stop on/off
- maint target non-stop off/on
- "attach" vs "attach &"
This exercises the bugs fixed in the two previous patches.
gdb/testsuite/ChangeLog:
* gdb.threads/attach-non-stop.c: New file.
* gdb.threads/attach-non-stop.exp: New file.
With "target extended-remote" + "maint set target-non-stop", attaching
hangs like so:
(gdb) attach 1244450
Attaching to process 1244450
[New Thread 1244450.1244450]
[New Thread 1244450.1244453]
[New Thread 1244450.1244454]
[New Thread 1244450.1244455]
[New Thread 1244450.1244456]
[New Thread 1244450.1244457]
[New Thread 1244450.1244458]
[New Thread 1244450.1244459]
[New Thread 1244450.1244461]
[New Thread 1244450.1244462]
[New Thread 1244450.1244463]
* hang *
Attaching to the hung GDB shows that GDB is busy in an infinite loop
in stop_all_threads:
(top-gdb) bt
#0 stop_all_threads () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:4755
#1 0x000055555597b424 in stop_waiting (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:7738
#2 0x0000555555976fba in handle_signal_stop (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5868
#3 0x0000555555975f6a in handle_inferior_event (ecs=0x7fffffffd930) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:5527
#4 0x0000555555971da4 in fetch_inferior_event () at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:3910
#5 0x00005555559540b2 in inferior_event_handler (event_type=INF_REG_EVENT) at /home/pedro/gdb/binutils-gdb/src/gdb/inf-loop.c:42
#6 0x000055555597e825 in infrun_async_inferior_event_handler (data=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/infrun.c:9162
#7 0x0000555555687d1d in check_async_event_handlers () at /home/pedro/gdb/binutils-gdb/src/gdb/async-event.c:328
#8 0x0000555555e48284 in gdb_do_one_event () at /home/pedro/gdb/binutils-gdb/src/gdbsupport/event-loop.cc:216
#9 0x00005555559e7512 in start_event_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:347
#10 0x00005555559e765d in captured_command_loop () at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:407
#11 0x00005555559e8f80 in captured_main (data=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1239
#12 0x00005555559e8ff2 in gdb_main (args=0x7fffffffdb70) at /home/pedro/gdb/binutils-gdb/src/gdb/main.c:1254
#13 0x0000555555627c86 in main (argc=12, argv=0x7fffffffdc88) at /home/pedro/gdb/binutils-gdb/src/gdb/gdb.c:32
The problem is that the remote sends stops for all the threads:
Packet received: l/home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/attach-non-stop/attach-non-stop
Sending packet: $vStopped#55...Packet received: T0006:f06e25edec7f0000;07:f06e25edec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd2f;core:15;
Sending packet: $vStopped#55...Packet received: T0006:f0dea5f0ec7f0000;07:f0dea5f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd27;core:4;
Sending packet: $vStopped#55...Packet received: T0006:f0ee25f1ec7f0000;07:f0ee25f1ec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd26;core:5;
Sending packet: $vStopped#55...Packet received: T0006:f0bea5efec7f0000;07:f0bea5efec7f0000;10:f14190ccf4550000;thread:p12fd22.12fd29;core:1;
Sending packet: $vStopped#55...Packet received: T0006:f0ce25f0ec7f0000;07:f0ce25f0ec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd28;core:a;
Sending packet: $vStopped#55...Packet received: T0006:f07ea5edec7f0000;07:f07ea5edec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2e;core:f;
Sending packet: $vStopped#55...Packet received: T0006:f0ae25efec7f0000;07:f0ae25efec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd2a;core:6;
Sending packet: $vStopped#55...Packet received: T0006:0000000000000000;07:c0e8a381fe7f0000;10:bf43b4f1ec7f0000;thread:p12fd22.12fd22;core:2;
Sending packet: $vStopped#55...Packet received: T0006:f0fea5f1ec7f0000;07:f0fea5f1ec7f0000;10:df4190ccf4550000;thread:p12fd22.12fd25;core:8;
Sending packet: $vStopped#55...Packet received: T0006:f09ea5eeec7f0000;07:f09ea5eeec7f0000;10:e84190ccf4550000;thread:p12fd22.12fd2b;core:b;
Sending packet: $vStopped#55...Packet received: OK
But then wait_one never consumes them, always hitting this path:
4473 if (nfds == 0)
4474 {
4475 /* No waitable targets left. All must be stopped. */
4476 return {NULL, minus_one_ptid, {TARGET_WAITKIND_NO_RESUMED}};
4477 }
Resulting in GDB constanly calling target_stop to stop threads, but
the remote target never reporting back the stops to infrun.
That TARGET_WAITKIND_NO_RESUMED path shown above is always taken
because here, in wait_one too, just above:
4428 for (inferior *inf : all_inferiors ())
4429 {
4430 process_stratum_target *target = inf->process_target ();
4431 if (target == NULL
4432 || !target->is_async_p ()
^^^^^^^^^^^^^^^^^^^^^
4433 || !target->threads_executing)
4434 continue;
... the remote target is not async.
And in turn that happened because extended_remote_target::attach
misses enabling async in the target-non-stop path.
A testcase exercising this will be added in a following patch.
gdb/ChangeLog:
* remote.c (extended_remote_target::attach): Set target async in
the target-non-stop path too.
Attaching in non-stop mode currently misbehaves, like so:
(gdb) attach 1244450
Attaching to process 1244450
[New LWP 1244453]
[New LWP 1244454]
[New LWP 1244455]
[New LWP 1244456]
[New LWP 1244457]
[New LWP 1244458]
[New LWP 1244459]
[New LWP 1244461]
[New LWP 1244462]
[New LWP 1244463]
No unwaited-for children left.
At this point, GDB's stopped/running thread state is out of sync with
the inferior:
(gdb) info threads
Id Target Id Frame
* 1 LWP 1244450 "attach-non-stop" 0xf1b443bf in ?? ()
2 LWP 1244453 "attach-non-stop" (running)
3 LWP 1244454 "attach-non-stop" (running)
4 LWP 1244455 "attach-non-stop" (running)
5 LWP 1244456 "attach-non-stop" (running)
6 LWP 1244457 "attach-non-stop" (running)
7 LWP 1244458 "attach-non-stop" (running)
8 LWP 1244459 "attach-non-stop" (running)
9 LWP 1244461 "attach-non-stop" (running)
10 LWP 1244462 "attach-non-stop" (running)
11 LWP 1244463 "attach-non-stop" (running)
(gdb)
(gdb) interrupt -a
(gdb)
*nothing*
The problem is that attaching installs an inferior continuation,
called when the target reports the initial attach stop, here, in
inf-loop.c:inferior_event_handler:
/* Do all continuations associated with the whole inferior (not
a particular thread). */
if (inferior_ptid != null_ptid)
do_all_inferior_continuations (0);
However, currently in non-stop mode, inferior_ptid is still null_ptid
when we get here.
If you try to do "set debug infrun 1" to debug the problem, however,
then the attach completes correctly, with GDB reporting a stop for
each thread.
The bug is that we're missing a switch_to_thread/context_switch call
when handling the initial stop, here:
if (stop_soon == STOP_QUIETLY_NO_SIGSTOP
&& (ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_STOP
|| ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_TRAP
|| ecs->event_thread->suspend.stop_signal == GDB_SIGNAL_0))
{
stop_print_frame = true;
stop_waiting (ecs);
ecs->event_thread->suspend.stop_signal = GDB_SIGNAL_0;
return;
}
Note how the STOP_QUIETLY / STOP_QUIETLY_REMOTE case above that does
call context_switch.
And the reason "set debug infrun 1" "fixes" it, is that the debug path
has a switch_to_thread call.
This patch fixes it by moving the main context_switch call earlier.
It also removes the:
if (ecs->ptid != inferior_ptid)
check at the same time because:
#1 - that is half of what context_switch already does
#2 - deprecated_context_hook is only used in Insight, and all it does
is set an int. It won't care if we call it when the current
thread hasn't actually changed.
A testcase exercising this will be added in a following patch.
gdb/ChangeLog:
PR gdb/27055
* infrun.c (handle_signal_stop): Move main context_switch call
earlier, before STOP_QUIETLY_NO_SIGSTOP.
This patch makes the inferior command display information about the
current inferior when called with no argument. This behavior is similar
to the one of the thread command.
Before patch:
(gdb) info inferior
Num Description Connection Executable
* 1 process 19221 1 (native) /home/lsix/tmp/a.out
2 process 19239 1 (native) /home/lsix/tmp/a.out
(gdb) inferior 2
[Switching to inferior 2 [process 19239] (/home/lsix/tmp/a.out)]
[Switching to thread 2.1 (process 19239)]
#0 0x0000000000401146 in main ()
(gdb) inferior
Argument required (expression to compute).
After patch:
(gdb) info inferior
Num Description Connection Executable
* 1 process 18699 1 (native) /home/lsix/tmp/a.out
2 process 18705 1 (native) /home/lsix/tmp/a.out
(gdb) inferior 2
[Switching to inferior 2 [process 18705] (/home/lsix/tmp/a.out)]
[Switching to thread 2.1 (process 18705)]
#0 0x0000000000401146 in main ()
(gdb) inferior
[Current inferior is 2 [process 18705] (/home/lsix/tmp/a.out)]
gdb/doc/ChangeLog:
* gdb.texinfo (Inferiors Connections and Programs): Document the
inferior command when used without argument.
gdb/ChangeLog:
* NEWS: Add entry for the behavior change of the inferior command.
* inferior.c (inferior_command): When no argument is given to the
inferior command, display info about the currently selected
inferior.
gdb/testsuite/ChangeLog:
* gdb.base/inferior-noarg.c: New test.
* gdb.base/inferior-noarg.exp: New test.
It is possible for the tables in the .debug_{rng,loc}lists sections to
not have an array of offsets. In that case, the offset_entry_count
field of the header is 0. The forms DW_FORM_{rng,loc}listx (reference
by index) can't be used with that table. Instead, the
DW_FORM_sec_offset form, which references a {rng,loc}list by direct
offset in the section, must be used. From what I saw, this is what GCC
currently produces.
Add tests for this case. I didn't see any bug related to this, I just
think that it would be nice to have coverage for this. A new
`-with-offset-array` option is added to the `table` procs, used when
generating {rng,loc}lists, to decide whether to generate the offset
array.
gdb/testsuite/ChangeLog:
* lib/dwarf.exp (rnglists): Add -no-offset-array option to
table proc.
* gdb.dwarf2/rnglists-sec-offset.exp: Add test for
.debug_rnglists table without offset array.
* gdb.dwarf2/loclists-sec-offset.exp: Add test for
.debug_loclists table without offset array.
Change-Id: I8e34a7bf68c9682215ffbbf66600da5b7db91ef7
I think it's wrong that read_loclist_index and read_rnglist_index return
a CORE_ADDR. A CORE_ADDR is an address in the program. These functions
return offset in sections (.debug_loclists and .debug_rnglists). I
think sect_offset is more appropriate.
I'm wondering if struct attribute should have a "set_sect_offset"
method, that takes a sect_offset parameter, or if it's better to be
left as a simple "unsigned".
gdb/ChangeLog:
* dwarf2/read.c (read_loclist_index, read_rnglist_index): Return
a sect_offset.
(read_attribute_reprocess): Adjust.
Change-Id: I0e22e0864130fb490072b41ae099762918b8ad4d
Consider the test case added in this patch. It defines a compilation
unit with a DW_AT_rnglists_base attribute (used for attributes of form
DW_FORM_rnglistx), but also uses DW_AT_ranges of form
DW_FORM_sec_offset:
0x00000027: DW_TAG_compile_unit
DW_AT_ranges [DW_FORM_sec_offset] (0x0000004c
[0x0000000000005000, 0x0000000000006000))
DW_AT_rnglists_base [DW_FORM_sec_offset] (0x00000044)
The DW_AT_rnglists_base does not play a role in reading the DW_AT_ranges of
form DW_FORM_sec_offset, but it should also not do any harm.
This case is currently not handled correctly by GDB. This is not
something that a compiler is likely to emit, but in my opinion there's
no reason why GDB should fail reading it.
The problem is that in partial_die_info::read and a few other places
where the same logic is replicated, the cu->ranges_base value,
containing the DW_AT_rnglists_base value, is wrongfully added to the
DW_AT_ranges value.
It is quite messy how to decide whether cu->ranges_base should be added
to the attribute's value or not. But to summarize, the only time we
want to add it is when the attribute comes from a pre-DWARF 5 split unit
file (a .dwo) [1]. In this case, the DW_AT_ranges attribute from the
split unit file will have form DW_FORM_sec_offset, pointing somewhere in
the linked file's .debug_ranges section. *But* it's not a "true"
DW_FORM_sec_offset, in that it's an offset relative to the beginning of
that CU's contribution in the section, not relative to the beginning of
the section. So in that case, and only that case, do we want to add the
ranges base value, which we found from the DW_AT_GNU_ranges_base
attribute on the skeleton unit.
Almost all instances of the DW_AT_ranges attribute will be found in the
split unit (on DW_TAG_subprogram, for example), and therefore need to
have the ranges base added. However, the DW_TAG_compile_unit DIE in the
skeleton may also have a DW_AT_ranges attribute. For that one, the
ranges base must not be added. Once the DIEs have been loaded in GDB,
however, the distinction between what's coming from the skeleton and
what's coming from the split unit is not clear. It is all merged in one
big happy tree. So how do we know if a given attribute comes from the
split unit or not?
We use the fact that in pre-DWARF 5 split DWARF, DW_AT_ranges is found
on the skeleton's DW_TAG_compile_unit (in the linked file) and never in
the split unit's DW_TAG_compile_unit. This is why you have this in
partial_die_info::read:
int need_ranges_base = (tag != DW_TAG_compile_unit
&& attr.form != DW_FORM_rnglistx);
However, with the corner case described above (where we have a
DW_AT_rnglists_base attribute and a DW_AT_ranges attribute of form
DW_FORM_sec_offset) the condition gets it wrong when it encounters an
attribute like DW_TAG_subprogram with a DW_AT_ranges attribute of
DW_FORM_sec_offset form: it thinks that it is necessary to add the base,
when it reality it is not.
The problem boils down to failing to differentiate these cases:
- a DW_AT_ranges attribute of form DW_FORM_sec_offset in a
pre-DWARF 5 split unit (in which case we need to add the base)
- a DW_AT_ranges attribute of form DW_FORM_sec_offset in a DWARF 5
non-split unit (in which case we must not add the base)
What makes it unnecessarily complex is that the cu->ranges_base field is
overloaded, used to hold the pre-DWARF 5, non-standard
DW_AT_GNU_ranges_base and the DWARF 5 DW_AT_rnglists_base. In reality,
these two are called "bases" but are not the same thing. The result is
that we need twisted conditions to try to determine whether or not we
should add the base to the attribute's value.
To fix it, split the field in two distinct fields. I renamed everything
related to the "old" ranges base to "gnu_ranges_base", to make it clear
that it's about the non-standard, pre-DWARF 5 thing. And everything
related to the DWARF 5 thing gets renamed "rnglists". I think it
becomes much easier to reason this way.
The issue described above gets fixed by the fact that the
DW_AT_rnglists_base value does not end up in cu->gnu_ranges_base, so
cu->gnu_ranges_base stays 0. The condition to determine whether
gnu_ranges_base should be added can therefore be simplified back to:
tag != DW_TAG_compile_unit
... as it was before rnglistx support was added.
Extend the gdb.dwarf2/rnglists-sec-offset.exp to cover this case. I
also extended the test case for loclists similarly, just to see if there
would be some similar problem. There wasn't, but I think it's not a bad
idea to test that case for loclists as well, so I left it in the patch.
[1] https://gcc.gnu.org/wiki/DebugFission
gdb/ChangeLog:
* dwarf2/die.h (struct die_info) <ranges_base>: Split in...
<gnu_ranges_base>: ... this...
<rnglists_base>: ... and this.
* dwarf2/read.c (struct dwarf2_cu) <ranges_base>: Split in...
<gnu_ranges_base>: ... this...
<rnglists_base>: ... and this.
(read_cutu_die_from_dwo): Adjust
(dwarf2_get_pc_bounds): Adjust
(dwarf2_record_block_ranges): Adjust.
(read_full_die_1): Adjust
(partial_die_info::read): Adjust.
(read_rnglist_index): Adjust.
gdb/testsuite/ChangeLog:
* gdb.dwarf2/rnglists-sec-offset.exp: Add test for DW_AT_ranges
of DW_FORM_sec_offset form plus DW_AT_rnglists_base attribute.
* gdb.dwarf2/loclists-sec-offset.exp: Add test for
DW_AT_location of DW_FORM_sec_offset plus DW_AT_loclists_base
attribute
Change-Id: Icd109038634b75d0e6e9d7d1dcb62fb9eb951d83
Add tests for the various issues fixed in the previous patches.
Add a new "loclists" procedure to the DWARF assembler, to allow
generating .debug_loclists sections.
gdb/testsuite/ChangeLog:
PR gdb/26813
* lib/dwarf.exp (_handle_DW_FORM): Handle DW_FORM_loclistx.
(loclists): New proc.
* gdb.dwarf2/loclists-multiple-cus.c: New.
* gdb.dwarf2/loclists-multiple-cus.exp: New.
* gdb.dwarf2/loclists-sec-offset.c: New.
* gdb.dwarf2/loclists-sec-offset.exp: New.
Change-Id: I209bcb2a9482762ae943e518998d1f7761f76928
The _location proc is used to assemble a location description. It needs
to know some contextual information:
- size of an address
- size of an offset (into another DWARF section)
- DWARF version
It currently get all this directly from global variables holding the
compilation unit information. This is fine because as of now, all
location descriptions are generated in the context of creating a
compilation unit. However, a subsequent patch will generate location
descriptions while generating a .debug_loclists section. _location
should therefore no longer rely on the current compilation unit's
properties.
Change it to accept these values as parameters instead of accessing the
values for the CU.
No functional changes intended.
gdb/testsuite/ChangeLog:
* lib/dwarf.exp (_location): Add parameters.
(_handle_DW_FORM): Adjust.
Change-Id: Ib94981979c83ffbebac838081d645ad71c221637
Add tests for the various issues fixed in the previous patches.
Add a new "rnglists" procedure to the DWARF assembler, to allow
generating .debug_rnglists sections. A trivial change is required to
support the DWARF 5 CU header layout.
gdb/testsuite/ChangeLog:
PR gdb/26813
* lib/dwarf.exp (_handle_DW_FORM): Handle DW_FORM_rnglistx.
(cu): Generate header for DWARF 5.
(rnglists): New proc.
* gdb.dwarf2/rnglists-multiple-cus.exp: New.
* gdb.dwarf2/rnglists-sec-offset.exp: New.
Change-Id: I5b297e59c370c60cf671dec19796a6c3b9a9f632
When loading the binary from PR 26813 in GDB, we get:
DW_FORM_rnglistx index pointing outside of .debug_rnglists offset array [in module /home/simark/build/binutils-gdb/gdb/MagicPurse]
... and the symbols fail to load.
In read_rnglist_index and read_loclist_index, we read the header
(documented in sections 7.28 and 7.29 of DWARF 5) of the CU's
contribution to the .debug_rnglists / .debug_loclists sections to
validate that the index we want to read makes sense. However, we always
read the header at the beginning of the section, rather than the header
for the contribution from which we want to read the index.
To illustrate, here's what the binary from PR 26813 contains. There are
two compile units:
0x0000000c: DW_TAG_compile_unit 1
DW_AT_ranges [DW_FORM_rnglistx]: 0x0
DW_AT_rnglists_base [DW_FORM_sec_offset]: 0xC
0x00003ec9: DW_TAG_compile_unit 2
DW_AT_ranges [DW_FORM_rnglistx]: 0xB
DW_AT_rnglists_base [DW_FORM_sec_offset]: 0x85
The layout of the .debug_rnglists is the following:
[0x00, 0x0B]: header for CU 1's contribution
[0x0C, 0x0F]: list of offsets for CU 1 (1 element)
[0x10, 0x78]: range lists data for CU 1
[0x79, 0x84]: header for CU 2's contribution
[0x85, 0xB4]: list of offsets for CU 2 (12 elements)
[0xB5, 0xBD7]: range lists data for CU 2
The DW_AT_rnglists_base attrbute points to the beginning of the list of
offsets for that CU, relative to the start of the .debug_rnglists
section. That's right after the header for that contribution.
When we try to read the DW_AT_ranges attribute for CU 2,
read_rnglist_index reads the header for CU 1 instead of the one for CU
2. Since there's only one element in CU 1's offset list, it believes
(wrongfully) that the index 0xB is out of range.
Fix it by reading the header just before where DW_AT_rnglists_base
points to. With this patch, I am able to load GDB built with clang-11
and -gdwarf-5 in itself, with and without -readnow.
gdb/ChangeLog:
PR gdb/26813
* dwarf2/read.c (read_loclists_rnglists_header): Add
header_offset parameter and use it.
(read_loclist_index): Read header of the current contribution,
not the one at the beginning of the section.
(read_rnglist_index): Likewise.
Change-Id: Ie53ff8251af8c1556f0a83a31aa8572044b79e3d
We hit an assertion when loading the binary from PR 26813. When fixing
it, execution goes a up bit further but then hits another assert, and
another, and another. With these fours fixes, I am able to load the
binary and get to the prompt. An error is shown (index pointing outside
of the section), because the DW_FORM_rnglistx attribute is not read
correctly, but that one is taken care of by the next patch.
The four fixes are:
- attribute::form_requires_reprocessing needs to handle forms
DW_FORM_rnglistx and DW_FORM_loclistx, because set_unsigned_reprocess
is called for them in read_attribute_value.
- read_attribute_reprocess must call set_unsigned for them, not
set_address. The parameter of set_address is a CORE_ADDR, meaning
it's for program addresses. Post-reprocess, DW_FORM_rnglistx and
DW_FORM_loclistx are offsets into their respective sections
(.debug_rnglists and .debug_loclists). set_unsigned is the current
attribute value setter that fits the best. But perhaps we should have
a setter that takes a sect_offset?
- read_attribute_process must call as_unsigned_reprocess instead of
as_unsigned to get the pre-reprocess value, otherwise we hit the
assert inside as_unsigned that makes sure the attribute doesn't need
reprocessing.
- attribute::set_unsigned needs to clear the requires_reprocessing flag,
otherwise it stays set when reprocessing DW_FORM_rnglistx and
DW_FORM_loclistx attributes.
There's another assert that we hit once the next patch is applied, but
since it's in the same vein as the changes in this patch, I included it
in this patch:
- attribute::form_is_unsigned must handle form DW_FORM_loclistx,
otherwise we hit the assert when trying to call set_unsigned for an
attribute of this form. DW_FORM_rnglistx is already handled.
gdb/ChangeLog:
PR gdb/26813
* dwarf2/attribute.h (struct attribute) <set_unsigned>: Clear
requires_reprocessing flag.
* dwarf2/attribute.c (attribute::form_is_unsigned): Handle
DW_FORM_loclistx.
(attribute::form_requires_reprocessing): Handle DW_FORM_rnglistx
and DW_FORM_loclistx.
* dwarf2/read.c (read_attribute_reprocess): Use set_unsigned
instead of set_address for DW_FORM_loclistx and
DW_FORM_rnglistx.
Change-Id: I06c156fa3913ca98e4e39085f4ef171645b4bc1e
In read_rnglist_index and read_loclist_index, we check that both the
start and end of the offset that we read from the offset table are
within the section. I think it's unecessary to do both: if the end of
the offset is within the section, then surely the start of the offset is
within it.
Remove the check for the start of the offset in both functions.
gdb/ChangeLog:
* dwarf2/read.c (read_loclist_index): Remove bound check for
start of offset.
(read_rnglist_index): Likewise.
Change-Id: I7b57ddf4f8a8a28971738f0e3f3af62108f9e19a
read_rnglist_index has a bound check to make sure that we don't go past
the end of the section while reading the offset, but read_loclist_index
doesn't. Add it to read_loclist_index.
gdb/ChangeLog:
* dwarf2/read.c (read_loclist_index): Add bound check for the end
of the offset.
Change-Id: Ic4b55c88860fdc3e007740949c78ec84cdb4da60
I think this check in read_rnglist_index is wrong:
/* Validate that reading won't go beyond the end of the section. */
if (start_offset + cu->header.offset_size > rnglist_base + section->size)
error (_("Reading DW_FORM_rnglistx index beyond end of"
".debug_rnglists section [in module %s]"),
objfile_name (objfile));
The addition `rnglist_base + section->size` doesn't make sense.
rnglist_base is an offset into `section`, so it doesn't make sense to
add it to `section`'s size. `start_offset` also is an offset into
`section`, so we should just compare it to just `section->size`.
gdb/ChangeLog:
* dwarf2/read.c (read_rnglist_index): Fix bound check.
Change-Id: If0ff7c73f4f80f79aac447518f4e8f131f2db8f2
Unlike read_rnglists_index, read_loclist_index uses complaints when it
detects an inconsistency (a DW_FORM_loclistx value without a
.debug_loclists section or an offset outside of the section). I really
think they should be errors, since there's no point in continuing if
this situation happens, we will likely segfault or read garbage.
gdb/ChangeLog:
* dwarf2/read.c (read_loclist_index): Change complaints into
errors.
Change-Id: Ic3a1cf6e682d47cb6e739dd76fd7ca5be2637e10
Add "R (retain)" and "D (mbind)" to "Key to Flags:".
PR binutils/27281
* readelf.c (process_section_headers): Add 'R' and 'D' to
"Key to Flags:".
* testsuite/binutils-all/retain1a.d: Updated.
A default versioned symbol definition in a shared library is
overridden by an unversioned definition in a regular object file, and
thus should not be reason to make an as-needed library needed.
bfd/
PR 27311
* elflink.c (_bfd_elf_add_default_symbol): Add override parameter.
Use when handling default versioned symbol. Rename existing
override variable to nondef_override and use for non-default
versioned symbol.
(elf_link_add_object_symbols): Adjust call to suit. Don't
pull in as-needed libraries when override is set.
ld/
* testsuite/ld-plugin/pr27311.d,
* testsuite/ld-plugin/pr27311.ver,
* testsuite/ld-plugin/pr27311a.c,
* testsuite/ld-plugin/pr27311b.c,
* testsuite/ld-plugin/pr27311c.c: New testcase.
* testsuite/ld-plugin/lto.exp: Run it. Correct PR14918 and
PR12982 entries.
When running test-case gdb.dwarf2/fission-reread.exp with target board
cc-with-gdb-index, we run into an abort during the generation of the gdb-index
by cc-with-tweaks.sh:
...
build/gdb/testsuite/cache/gdb.sh: line 1: 27275 Aborted (core dumped)
...
This can be reproduced on the command line like this:
...
$ gdb -batch ./outputs/gdb.dwarf2/fission-reread/fission-reread \
-ex 'save gdb-index ./outputs/gdb.dwarf2/fission-reread'
warning: Could not find DWO TU fission-reread.dwo(0x9022f1ceac7e8b19) \
referenced by TU at offset 0x0 [in module fission-reread]
warning: Could not find DWO CU fission-reread.dwo(0x807060504030201) \
referenced by CU at offset 0x561 [in module fission-reread]
Aborted (core dumped)
...
The abort is a segfault due to a using a nullptr psymtab in
write_one_signatured_type.
The problem is that we're trying to write index entries for the type unit
with signature:
...
(gdb) p /x entry->signature
$2 = 0x9022f1ceac7e8b19
...
which is a skeleton type unit:
...
Contents of the .debug_types section:
Compilation Unit @ offset 0x0:
Length: 0x4a (32-bit)
Version: 4
Abbrev Offset: 0x165
Pointer Size: 4
Signature: 0x9022f1ceac7e8b19
Type Offset: 0x0
<0><17>: Abbrev Number: 2 (DW_TAG_type_unit)
<18> DW_AT_comp_dir : /tmp/src/gdb/testsuite
<2f> DW_AT_GNU_dwo_name: fission-reread.dwo
<42> DW_AT_GNU_pubnames: 0x0
<46> DW_AT_GNU_pubtypes: 0x0
<4a> DW_AT_GNU_addr_base: 0x0
...
referring to a .dwo file, but as the warnings show, the .dwo file is not
found.
Fix this by skipping the type unit in write_one_signatured_type if
psymtab == nullptr.
Tested on x86_64-linux.
gdb/ChangeLog:
2021-02-02 Tom de Vries <tdevries@suse.de>
PR symtab/24620
* dwarf2/index-write.c (write_one_signatured_type): Skip if
psymtab == nullptr.
gdb/testsuite/ChangeLog:
2021-02-02 Tom de Vries <tdevries@suse.de>
PR symtab/24620
* gdb.dwarf2/fission-reread.exp: Add test-case.
When running test-case gdb.dwarf2/fission-reread.exp with target board
cc-with-gdb-index, we run into:
...
gdb compile failed, warning: Could not find DWO TU \
fission-reread.dwo(0x9022f1ceac7e8b19) referenced by TU at offset 0x0 \
[in module outputs/gdb.dwarf2/fission-reread/fission-reread]
...
The problem is that the .dwo file is not found.
There's code added in the .exp file to make sure the .dwo can be found:
...
# Make sure we can find the .dwo file, regardless of whether we're
# running in parallel mode.
gdb_test_no_output "set debug-file-directory [file dirname $binfile]" \
"set debug-file-directory"
...
This works normally, but not for the gdb invocation done by cc-with-tweaks.sh
for target board cc-with-gdb-index.
Fix this by finding the full path to the .dwo file and passing it
to the compilation.
Tested on x86_64-linux with native and target boards cc-with-gdb-index,
cc-with-debug-names and readnow.
gdb/testsuite/ChangeLog:
2021-02-01 Tom de Vries <tdevries@suse.de>
* gdb.dwarf2/fission-base.S: Pass -DDWO=$dwo.
* gdb.dwarf2/fission-loclists-pie.S: Same.
* gdb.dwarf2/fission-loclists.S: Same.
* gdb.dwarf2/fission-multi-cu.S: Same.
* gdb.dwarf2/fission-reread.S: Same.
* gdb.dwarf2/fission-base.exp: Use DWO.
* gdb.dwarf2/fission-loclists-pie.exp: Same.
* gdb.dwarf2/fission-loclists.exp: Same.
* gdb.dwarf2/fission-multi-cu.exp: Same.
* gdb.dwarf2/fission-reread.exp: Same.