While looking at the recent line number styling commit I noticed a few
places where we could add more file name styling. So lets do that.
Approved-By: Tom Tromey <tom@tromey.com>
In PR gdb/32025, a fatal error was reported when sending a SIGINT to gdb while
disassembling.
I managed to reproduce this on aarch64-linux in a Leap 15.5 container using
this trigger patch:
...
gdb_disassembler_memory_reader::dis_asm_read_memory
(bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
struct disassemble_info *info) noexcept
{
+ set_quit_flag ();
return target_read_code (memaddr, myaddr, len);
}
...
and a simple gdb command line calling the disassemble command:
...
$ gdb -q -batch a.out -ex "disassemble main"
...
The following scenario leads to the fatal error:
- the disassemble command is executed,
- set_quit_flag is called in
gdb_disassembler_memory_reader::dis_asm_read_memory, pretending that a
user pressed ^C,
- target_read_code calls QUIT, which throws a
gdb_exception_quit,
- the exception propagation mechanism reaches c code in libopcodes and a fatal
error triggers because the c code is not compiled with -fexception.
Fix this by:
- wrapping the body of gdb_disassembler_memory_reader::dis_asm_read_memory in
catch_exceptions (which consequently needs moving to a header file), and
- reraising the caught exception in default_print_insn using QUIT.
Tested on aarch64-linux.
Approved-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32025
Most files including gdbcmd.h currently rely on it to access things
actually declared in cli/cli-cmds.h (setlist, showlist, etc). To make
things easy, replace all includes of gdbcmd.h with includes of
cli/cli-cmds.h. This might lead to some unused includes of
cli/cli-cmds.h, but it's harmless, and much faster than going through
the 170 or so files by hand.
Change-Id: I11f884d4d616c12c05f395c98bbc2892950fb00f
Approved-By: Tom Tromey <tom@tromey.com>
Move some declarations related to the "quit" machinery from defs.h to
event-top.h. Most of the definitions associated to these declarations
are in event-top.c. The exceptions are `quit()` and `maybe_quit()`,
that are defined in utils.c. For consistency, move these two
definitions to event-top.c.
Include "event-top.h" in many files that use these things.
Change-Id: I6594f6df9047a9a480e7b9934275d186afb14378
Approved-By: Tom Tromey <tom@tromey.com>
Now that defs.h, server.h and common-defs.h are included via the
`-include` option, it is no longer necessary for source files to include
them. Remove all the inclusions of these files I could find. Update
the generation scripts where relevant.
Change-Id: Ia026cff269c1b7ae7386dd3619bc9bb6a5332837
Approved-By: Pedro Alves <pedro@palves.net>
I noticed that the disassembler_options code uses manual memory
management. It seemed simpler to replace this with std::string.
Approved-By: John Baldwin <jhb@FreeBSD.org>
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
Since GDB now requires C++17, we don't need the internally maintained
gdb::optional implementation. This patch does the following replacing:
- gdb::optional -> std::optional
- gdb::in_place -> std::in_place
- #include "gdbsupport/gdb_optional.h" -> #include <optional>
This change has mostly been done automatically. One exception is
gdbsupport/thread-pool.* which did not use the gdb:: prefix as it
already lives in the gdb namespace.
Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9
Approved-By: Tom Tromey <tom@tromey.com>
Approved-By: Pedro Alves <pedro@palves.net>
This function is just a wrapper around the current inferior's gdbarch.
I find that having that wrapper just obscures where the arch is coming
from, and that it's often used as "I don't know which arch to use so
I'll use this magical target_gdbarch function that gets me an arch" when
the arch should in fact come from something in the context (a thread,
objfile, symbol, etc). I think that removing it and inlining
`current_inferior ()->arch ()` everywhere will make it a bit clearer
where that arch comes from and will trigger people into reflecting
whether this is the right place to get the arch or not.
Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b
Reviewed-By: John Baldwin <jhb@FreeBSD.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
Latest libc++[1] causes transitive include to <locale> when
<mutex> or <thread> header is included. This causes
gdb to not build[2] since <locale> defines isupper/islower etc.
functions that are explicitly macroed-out in safe-ctype.h to
prevent their use.
Use the suggestion from libc++ to include <locale> internally when
building in C++ mode to avoid build errors.
Use safe-gdb-ctype.h as the include instead of "safe-ctype.h"
to keep this isolated to gdb since rest of binutils
does not seem to use much C++.
[1]: https://reviews.llvm.org/D144331
[2]: https://issuetracker.google.com/issues/277967395
Simon pointed out a line table regression, and after a couple of false
starts, I was able to reproduce it by hand using his instructions.
The bug is that most of the code in do_mixed_source_and_assembly uses
unrelocated addresses, but one spot does:
pc = low;
... after the text offset has been removed.
This patch fixes the problem by introducing a new type to represent
unrelocated addresses in the line table. This prevents this sort of
bug to some degree (it's still possible to manipulate a CORE_ADDR in a
bad way, this is unavoidable).
However, this did let the compiler flag a few spots in that function,
and now it's not possible to compare an unrelocated address from a
line table with an ordinary CORE_ADDR.
Regression tested on x86-64 Fedora 36, though note this setup never
reproduced the bug in the first place. I also tested it by hand on
the disasm-optim test program.
Linetables no longer change after they are created. This patch
applies const to them.
Note there is one hack to cast away const in mdebugread.c. This code
allocates a linetable using 'malloc', then later copies it to the
obstack. While this could be cleaned up, I chose not to do so because
I have no way of testing it.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This changes linetables to not add the text offset to the addresses
they contain. I did this in a few steps, necessarily combined
together in one patch: I renamed the 'pc' member to 'm_pc', added the
appropriate accessors, and then recompiled. Then I fixed all the
errors. Where possible I generally chose to use the raw_pc accessor,
as it is less expensive.
Note that this patch discounts the possibility that the text section
offset might cause wraparound in the addresses in the line table.
However, this was already discounted -- in particular,
objfile_relocate1 did not re-sort the table in this scenario. (There
was a bug open about this, but as far as I can tell this has never
happened, it's not even clear what inspired that bug.)
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This adds a couple of comparison operators to linetable_entry, and
simplifies both the calls to sort and one other spot that checks for
equality.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
While working on another patch, Simon pointed out that GDB could be
improved by marking the functions passed to the disassembler as
noexcept.
https://sourceware.org/pipermail/gdb-patches/2022-October/193084.html
The reason this is important is the on some hosts, libopcodes, being C
code, will not be compiled with support for handling exceptions. As
such, an attempt to throw an exception over libopcodes code will cause
GDB to terminate.
See bug gdb/29712 for an example of when this happened.
In this commit all the functions that are passed to the disassembler,
and which might be used as callbacks by libopcodes are marked
noexcept.
Ideally, I would have liked to change these typedefs:
using read_memory_ftype = decltype (disassemble_info::read_memory_func);
using memory_error_ftype = decltype (disassemble_info::memory_error_func);
using print_address_ftype = decltype (disassemble_info::print_address_func);
using fprintf_ftype = decltype (disassemble_info::fprintf_func);
using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
which are declared in disasm.h, as including the noexcept keyword.
However, when I tried this, I ran into this warning/error:
In file included from ../../src/gdb/disasm.c:25:
../../src/gdb/disasm.h: In constructor ‘gdb_printing_disassembler::gdb_printing_disassembler(gdbarch*, ui_file*, gdb_disassemble_info::read_memory_ftype, gdb_disassemble_info::memory_error_ftype, gdb_disassemble_info::print_address_ftype)’:
../../src/gdb/disasm.h:116:3: error: mangled name for ‘gdb_printing_disassembler::gdb_printing_disassembler(gdbarch*, ui_file*, gdb_disassemble_info::read_memory_ftype, gdb_disassemble_info::memory_error_ftype, gdb_disassemble_info::print_address_ftype)’ will change in C++17 because the exception specification is part of a function type [-Werror=noexcept-type]
116 | gdb_printing_disassembler (struct gdbarch *gdbarch,
| ^~~~~~~~~~~~~~~~~~~~~~~~~
So I've left that change out. This does mean that if somebody adds a
new use of the disassembler classes in the future, and forgets to mark
the callbacks as noexcept, this will compile fine. We'll just have to
manually check for that during review.
The preferred way of rethrowing an exception is by using throw without
expression, because it avoids object slicing of the exception [1].
Fix this in gdb_pretty_print_disassembler::pretty_print_insn.
Tested on x86_64-linux.
[1] https://en.cppreference.com/w/cpp/language/throw
Approved-By: Andrew Burgess <aburgess@redhat.com>
While working on another issue relating to GDB's use of the Python
Pygments package for disassembly styling I noticed an issue in the
case where the Pygments package raises an exception.
The intention of the current code is that, should the Pygments package
raise an exception, GDB will disable future attempts to call into the
Pygments code. This was intended to prevent repeated errors during
disassembly if, for some reason, the Pygments code isn't working.
Since the Pygments based styling was added, GDB now supports
disassembly styling using libopcodes, but this is only available for
some architectures. For architectures not covered by libopcodes
Pygments is still the only option.
What I observed is that, if I disable the libopcodes styling, then
setup GDB so that the Pygments based styling code will indicate an
error (by returning None), GDB does, as expected, stop using the
Pygments based styling. However, the libopcodes based styling will
instead be used, despite this feature having been disabled.
The problem is that the disassembler output is produced into a
string_file buffer. When we are using Pygments, this buffer is
created without styling support. However, when Pygments fails, we
recreate the buffer with styling support.
The problem is that we should only recreate the buffer with styling
support only if libopcodes styling is enabled. This was an oversight
in commit:
commit 4cbe4ca5da
Date: Mon Feb 14 14:40:52 2022 +0000
gdb: add support for disassembler styling using libopcodes
This commit:
1. Factors out some of the condition checking logic into two new
helper functions use_ext_lang_for_styling and
use_libopcodes_for_styling,
2. Reorders gdb_disassembler::m_buffer and gdb_disassembler::m_dest,
this allows these fields to be initialised m_dest first, which means
that the new condition checking functions can rely on m_dest being
set, even when called from the gdb_disassembler constructor,
3. Make use of the new condition checking functions each time
m_buffer is initialised,
4. Add a new test that forces the Python disassembler styling
function to return None, this will cause GDB to disable use of
Pygments for styling, and
5. When we reinitialise m_buffer, and re-disassemble the
instruction, call reset the in-comment flag. If the instruction
being disassembler ends in a comment then the first disassembly pass
will have set the in-comment flag to true. This shouldn't be a
problem as we will only be using Pygments, and thus performing a
re-disassembly pass, if libopcodes is disabled, so the in-comment
flag will never be checked, even if it is set incorrectly. However,
I think that having the flag set correctly is a good thing, even if
we don't check it (you never know what future uses might come up).
This commit changes the format of 'disassemble /r' to match GNU
objdump. Specifically, GDB will now display the instruction bytes in
as 'objdump --wide --disassemble' does.
Here is an example for RISC-V before this patch:
(gdb) disassemble /r 0x0001018e,0x0001019e
Dump of assembler code from 0x1018e to 0x1019e:
0x0001018e <call_me+66>: 03 26 84 fe lw a2,-24(s0)
0x00010192 <call_me+70>: 83 25 c4 fe lw a1,-20(s0)
0x00010196 <call_me+74>: 61 65 lui a0,0x18
0x00010198 <call_me+76>: 13 05 85 6a addi a0,a0,1704
0x0001019c <call_me+80>: f1 22 jal 0x10368 <printf>
End of assembler dump.
And here's an example after this patch:
(gdb) disassemble /r 0x0001018e,0x0001019e
Dump of assembler code from 0x1018e to 0x1019e:
0x0001018e <call_me+66>: fe842603 lw a2,-24(s0)
0x00010192 <call_me+70>: fec42583 lw a1,-20(s0)
0x00010196 <call_me+74>: 6561 lui a0,0x18
0x00010198 <call_me+76>: 6a850513 addi a0,a0,1704
0x0001019c <call_me+80>: 22f1 jal 0x10368 <printf>
End of assembler dump.
There are two differences here. First, the instruction bytes after
the patch are grouped based on the size of the instruction, and are
byte-swapped to little-endian order.
Second, after the patch, GDB now uses the bytes-per-line hint from
libopcodes to add whitespace padding after the opcode bytes, this
means that in most cases the instructions are nicely aligned.
It is still possible for a very long instruction to intrude into the
disassembled text space. The next example is x86-64, before the
patch:
(gdb) disassemble /r main
Dump of assembler code for function main:
0x0000000000401106 <+0>: 55 push %rbp
0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp
0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi)
0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax
0x0000000000401119 <+19>: 5d pop %rbp
0x000000000040111a <+20>: c3 ret
End of assembler dump.
And after the patch:
(gdb) disassemble /r main
Dump of assembler code for function main:
0x0000000000401106 <+0>: 55 push %rbp
0x0000000000401107 <+1>: 48 89 e5 mov %rsp,%rbp
0x000000000040110a <+4>: c7 87 d8 00 00 00 01 00 00 00 movl $0x1,0xd8(%rdi)
0x0000000000401114 <+14>: b8 00 00 00 00 mov $0x0,%eax
0x0000000000401119 <+19>: 5d pop %rbp
0x000000000040111a <+20>: c3 ret
End of assembler dump.
Most instructions are aligned, except for the very long instruction.
Notice too that for x86-64 libopcodes doesn't request that GDB group
the instruction bytes. This matches the behaviour of objdump.
In case the user really wants the old behaviour, I have added a new
modifier 'disassemble /b', this displays the instruction byte at a
time. For x86-64, which never groups instruction bytes, /b and /r are
equivalent, but for RISC-V, using /b gets the old layout back (except
that the whitespace for alignment is still present). Consider our
original RISC-V example, this time using /b:
(gdb) disassemble /b 0x0001018e,0x0001019e
Dump of assembler code from 0x1018e to 0x1019e:
0x0001018e <call_me+66>: 03 26 84 fe lw a2,-24(s0)
0x00010192 <call_me+70>: 83 25 c4 fe lw a1,-20(s0)
0x00010196 <call_me+74>: 61 65 lui a0,0x18
0x00010198 <call_me+76>: 13 05 85 6a addi a0,a0,1704
0x0001019c <call_me+80>: f1 22 jal 0x10368 <printf>
End of assembler dump.
Obviously, this patch is a potentially significant change to the
behaviour or /r. I could have added /b with the new behaviour and
left /r alone. However, personally, I feel the new behaviour is
significantly better than the old, hence, I made /r be what I consider
the "better" behaviour.
The reason I prefer the new behaviour is that, when I use /r, I almost
always want to manually decode the instruction for some reason, and
having the bytes displayed in "instruction order" rather than memory
order, just makes this easier.
The 'record instruction-history' command also takes a /r modifier, and
has been modified in the same way as disassemble; /r gets the new
behaviour, and /b has been added to retain the old behaviour.
Finally, the MI command -data-disassemble, is unchanged in behaviour,
this command now requests the raw bytes of the instruction, which is
equivalent to the /b modifier. This means that the MI output will
remain backward compatible.
This commit reduces the number of times we call read_code when
printing the instruction opcode bytes during disassembly.
I've added a new gdb::byte_vector within the
gdb_pretty_print_disassembler class, in line with all the other
buffers that gdb_pretty_print_disassembler needs. This byte_vector is
then resized as needed, and filled with a single read_code call for
each instruction.
There should be no user visible changes after this commit.
I went through all the uses of dynamic_cast<> in gdb, looking for ones
that could be replaced with checked_static_cast. This patch is the
result. Regression tested on x86-64 Fedora 34.
This is paired with "opcodes: Add non-enum disassembler options".
There is a portable mechanism for disassembler options and used on some
architectures:
- ARC
- Arm
- MIPS
- PowerPC
- RISC-V
- S/390
However, it only supports following forms:
- [NAME]
- [NAME]=[ENUM_VALUE]
Valid values for [ENUM_VALUE] must be predefined in
disasm_option_arg_t.values. For instance, for -M cpu=[CPU] in ARC
architecture, opcodes/arc-dis.c builds valid CPU model list from
include/elf/arc-cpu.def.
In this commit, it adds following format:
- [NAME]=[ARBITRARY_VALUE] (cannot contain "," though)
This is identified by NULL value of disasm_option_arg_t.values
(normally, this is a non-NULL pointer to a NULL-terminated list).
gdb/ChangeLog:
* gdb/disasm.c (set_disassembler_options): Add support for
non-enum disassembler options.
(show_disassembler_options_sfunc): Likewise.
In commit:
commit 4f46c0bc36
Date: Mon Jul 4 17:45:25 2022 +0100
opcodes: add new sub-mnemonic disassembler style
I added a new disassembler style dis_style_sub_mnemonic, but forgot to
add GDB support for this style. Fix this oversight in this commit.
This commit extends GDB to make use of libopcodes styling support
where available, currently this is just i386 based architectures, and
RISC-V.
For architectures that don't support styling using libopcodes GDB will
fall back to using the Python Pygments package, when the package is
available.
The new libopcodes based styling has the disassembler identify parts
of the disassembled instruction, e.g. registers, immediates,
mnemonics, etc, and can style these components differently.
Additionally, as the styling is now done in GDB we can add settings to
allow the user to configure which colours are used right from the GDB
CLI.
There's some new maintenance commands:
maintenance set libopcodes-styling enabled on|off
maintenance show libopcodes-styling
These can be used to manually disable use of libopcodes styling. This
is a maintenance command as it's not anticipated that a user should
need to do this. But, this could be useful for testing, or, in some
rare cases, a user might want to override the Python hook used for
disassembler styling, and then disable libopcode styling so that GDB
falls back to using Python. Right now I would consider this second
use case a rare situation, which is why I think a maintenance command
is appropriate.
When libopcodes is being used for styling then the user can make use
of the following new styles:
set/show style disassembler comment
set/show style disassembler immediate
set/show style disassembler mnemonic
set/show style disassembler register
The disassembler also makes use of the 'address' and 'function'
styles to style some parts of the disassembler output. I have also
added the following aliases though:
set/show style disassembler address
set/show style disassembler symbol
these are aliases for:
set/show style address
set/show style function
respectively, and exist to make it easier for users to discover
disassembler related style settings. The 'address' style is used to
style numeric addresses in the disassembler output, while the 'symbol'
or 'function' style is used to style the names of symbols in
disassembler output.
As not every architecture supports libopcodes styling, the maintenance
setting 'libopcodes-styling enabled' has an "auto-off" type behaviour.
Consider this GDB session:
(gdb) show architecture
The target architecture is set to "auto" (currently "i386:x86-64").
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "on".
the setting defaults to "on" for architectures that support libopcodes
based styling.
(gdb) set architecture sparc
The target architecture is set to "sparc".
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "off" (not supported on architecture "sparc")
the setting will show as "off" if the user switches to an architecture
that doesn't support libopcodes styling. The underlying setting is
still "on" at this point though, if the user switches back to
i386:x86-64 then the setting would go back to being "on".
(gdb) maintenance set libopcodes-styling enabled off
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "off".
now the setting is "off" for everyone, even if the user switches back
to i386:x86-64 the setting will still show as "off".
(gdb) maintenance set libopcodes-styling enabled on
Use of libopcodes styling not supported on architecture "sparc".
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "off".
attempting to switch the setting "on" for an unsupported architecture
will give an error, and the setting will remain "off".
(gdb) set architecture auto
The target architecture is set to "auto" (currently "i386:x86-64").
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "off".
(gdb) maintenance set libopcodes-styling enabled on
(gdb) maintenance show libopcodes-styling enabled
Use of libopcodes styling support is "on".
the user will need to switch back to a supported architecture before
they can one again turn this setting "on".
The gdb_disassemble_info class is a wrapper around the libopcodes
disassemble_info struct. The 'stream' field of disassemble_info is
passed as an argument to the fprintf_func and fprintf_styled_func
callbacks when the disassembler wants to print anything.
Previously, GDB would store a pointer to a ui_file object in the
'stream' field, then, when the disassembler wanted to print anything,
the content would be written to the ui_file object. An example of an
fprintf_func callback, from gdb/disasm.c is:
int
gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
{
/* Write output to STREAM here. */
}
This is fine, but has one limitation, within the print callbacks we
only have access to STREAM, we can't access any additional state
stored within the gdb_disassemble_info object.
Right now this isn't a problem, but in a future commit this will
become an issue, how we style the output being written to STREAM will
depend on the state of the gdb_disassemble_info object, and this state
might need to be updated, depending on what is being printed.
In this commit I propose changing the 'stream' field of the
disassemble_info to carry a pointer to the gdb_disassemble_info
sub-class, rather than the stream itself.
We then have the two sub-classes of gdb_disassemble_info to consider,
the gdb_non_printing_disassembler class never cared about the stream,
previously, for this class, the stream was nullptr. With the change
to make stream be a gdb_disassemble_info pointer, no further updates
are needed for gdb_non_printing_disassembler.
The other sub-class is gdb_printing_disassembler. In this case the
sub-class now carries around a pointer to the stream object. The
print callbacks are updated to cast the incoming stream object back to
a gdb_printing_disassembler, and then extract the stream.
This is purely a refactoring commit. A later commit will add
additional state to the gdb_printing_disassembler, and update the
print callbacks to access this state.
There should be no user visible changes after this commit.
After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.
My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.
And maybe that's the right way to go. But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.
So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.
In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.
There should be no user visible changes after this commit.
This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.
I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.
Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.
And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash. Which is why I said only "sort of" broken. Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?
Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length. As I run the test for all architectures, I
do indeed see GDB crash for ARM.
To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.
And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.
I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different. While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.
And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.
The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
This commit is setup for the next commit.
In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction. After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.
This commit does not add any new Python API.
What this commit does is put the extension language framework in place
for a print_insn hook. There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.
Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.
After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.
There should be no user visible change after this commit.
The motivation for this change is an upcoming Python disassembler API
that I would like to add. As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch. The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.
The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method. In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.
What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.
My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.
In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.
Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.
However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:
struct gdb_disassemble_info {};
struct gdb_printing_disassembler : public gdb_disassemble_info {};
struct gdb_disassembler : public gdb_printing_disassembler {};
In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.
The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info. Knowing that that change is
coming, I've gone with the above class hierarchy now.
There should be no user visible changes after this commit.
Commit:
commit 60a3da00bd
Date: Sat Jan 22 11:38:18 2022 +0000
objdump/opcodes: add syntax highlighting to disassembler output
Introduced a new use of vfprintf_filtered, which has been deprecated.
This commit replaces this with gdb_vprintf instead.
This commit adds the _option_ of having disassembler output syntax
highlighted in objdump. This option is _off_ by default. The new
command line options are:
--disassembler-color=off # The default.
--disassembler-color=color
--disassembler-color=extended-color
I have implemented two colour modes, using the same option names as we
use of --visualize-jumps, a basic 8-color mode ("color"), and an
extended 8bit color mode ("extended-color").
The syntax highlighting requires that each targets disassembler be
updated; each time the disassembler produces some output we now pass
through an additional parameter indicating what style should be
applied to the text.
As updating all target disassemblers is a large task, the old API is
maintained. And so, a user of the disassembler (i.e. objdump, gdb)
must provide two functions, the current non-styled print function, and
a new, styled print function.
I don't currently have a plan for converting every single target
disassembler, my hope is that interested folk will update the
disassemblers they are interested in. But it is possible some might
never get updated.
In this initial series I intend to convert the RISC-V disassembler
completely, and also do a partial conversion of the x86 disassembler.
Hopefully having the x86 disassembler at least partial converted will
allow more people to try this out easily and provide feedback.
In this commit I have focused on objdump. The changes to GDB at this
point are the bare minimum required to get things compiling, GDB makes
no use of the styling information to provide any colors, that will
come later, if this commit is accepted.
This first commit in the series doesn't convert any target
disassemblers at all (the next two commits will update some targets),
so after this commit, the only color you will see in the disassembler
output, is that produced from objdump itself, e.g. from
objdump_print_addr_with_sym, where we print an address and a symbol
name, these are now printed with styling information, and so will have
colors applied (if the option is on).
Finally, my ability to pick "good" colors is ... well, terrible. I'm
in no way committed to the colors I've picked here, so I encourage
people to suggest new colors, or wait for this commit to land, and
then patch the choice of colors.
I do have an idea about using possibly an environment variable to
allow the objdump colors to be customised, but I haven't done anything
like that in this commit, the color choices are just fixed in the code
for now.
binutils/ChangeLog:
* NEWS: Mention new feature.
* doc/binutils.texi (objdump): Describe --disassembler-color
option.
* objdump.c (disassembler_color): New global.
(disassembler_extended_color): Likewise.
(disassembler_in_comment): Likewise.
(usage): Mention --disassembler-color option.
(long_options): Add --disassembler-color option.
(objdump_print_value): Use fprintf_styled_func instead of
fprintf_func.
(objdump_print_symname): Likewise.
(objdump_print_addr_with_sym): Likewise.
(objdump_color_for_disassembler_style): New function.
(objdump_styled_sprintf): New function.
(fprintf_styled): New function.
(disassemble_jumps): Use disassemble_set_printf, and reset
disassembler_in_comment.
(null_styled_print): New function.
(disassemble_bytes): Use disassemble_set_printf, and reset
disassembler_in_comment.
(disassemble_data): Update init_disassemble_info call.
(main): Handle --disassembler-color option.
include/ChangeLog:
* dis-asm.h (enum disassembler_style): New enum.
(struct disassemble_info): Add fprintf_styled_func field, and
created_styled_output field.
(disassemble_set_printf): Declare.
(init_disassemble_info): Add additional parameter.
(INIT_DISASSEMBLE_INFO): Add additional parameter.
opcodes/ChangeLog:
* dis-init.c (init_disassemble_info): Take extra parameter,
initialize the new fprintf_styled_func and created_styled_output
fields.
* disassembler.c (disassemble_set_printf): New function definition.
Now that filtered and unfiltered output can be treated identically, we
can unify the printf family of functions. This is done under the name
"gdb_printf". Most of this patch was written by script.
Now that filtered and unfiltered output can be treated identically, we
can unify the puts family of functions. This is done under the name
"gdb_puts". Most of this patch was written by script.
Now that filtered and unfiltered output can be treated identically, we
can unify the vprintf family of functions: vprintf_filtered,
vprintf_unfiltered, vfprintf_filtered and vfprintf_unfiltered. (For
the gdb_stdout variants, recall that only printf_unfiltered gets truly
unfiltered output at this point.) This removes one such function and
renames the remaining two to "gdb_vprintf". All callers are updated.
Much of this patch was written by script.
This commit adds styling support to the disassembler output, as such
two new commands are added to GDB:
set style disassembler enabled on|off
show style disassembler enabled
In this commit I make use of the Python Pygments package to provide
the styling. I did investigate making use of libsource-highlight,
however, I found the highlighting results to be inferior to those of
Pygments; only some mnemonics were highlighted, and highlighting of
register names such as r9d and r8d (on x86-64) was incorrect.
To enable disassembler highlighting via Pygments, I've added a new
extension language hook, which is then implemented for Python. This
hook is very similar to the existing hook for source code
colorization.
One possibly odd choice I made with the new hook is to pass a
gdb.Architecture through, even though this is currently unused. The
reason this argument is not used is that, currently, styling is
performed identically for all architectures.
However, even though the Python function used to perform styling of
disassembly output is not part of any documented API, I don't want
to close the door on a user overriding this function to provide
architecture specific styling. To do this, the user would inevitably
require access to the gdb.Architecture, and so I decided to add this
field now.
The styling is applied within gdb_disassembler::print_insn, to achieve
this, gdb_disassembler now writes its output into a temporary buffer,
styling is then applied to the contents of this buffer. Finally the
gdb_disassembler buffer is copied out to its final destination stream.
There's a new test to check that the disassembler output includes some
escape sequences, though I don't check for specific colours; the
precise colors will depend on which instructions are in the
disassembler output, and, I guess, how pygments is configured.
The only negative change with this commit is how we currently style
addresses in GDB.
Currently, when the disassembler wants to print an address, we call
back into GDB, and GDB prints the address value using the `address`
styling, and the symbol name using `function` styling. After this
commit, if pygments is used, then all disassembler styling is done
through pygments, and this include the address and symbol name parts
of the disassembler output.
I don't know how much of an issue this will be for people. There's
already some precedent for this in GDB when we look at source styling.
For example, function names in styled source listings are not styled
using the `function` style, but instead, either GNU Source Highlight,
or pygments gets to decide how the function name should be styled.
If the Python pygments library is not present then GDB will continue
to behave as it always has, the disassembler output is mostly
unstyled, but the address and symbols are styled using the `address`
and `function` styles, as they are today.
However, if the user does `set style disassembler enabled off`, then
all disassembler styling is switched off. This obviously covers the
use of pygments, but also includes the minimal styling done by GDB
when pygments is not available.
We have three places in gdb where we initialise a disassembler that
will not print anything (used for figuring out the length of
instructions, or collecting other information from the disassembler).
Each of these places has its own stub function to act as a print like
callback, the stub function is identical in each case, and just does
nothing.
In this commit I create a new function to initialise a disassembler
that doesn't print anything, and have all three locations use this new
function. There's now only one non-printing stub function.
There should be no user visible changes after this commit.
Add a getter and a setter for a symtab's linetable. Remove the
corresponding macro and adjust all callers.
Change-Id: I159183fc0ccd8e18ab937b3c2f09ef2244ec6e9c
I have warnings like these showing in my editor all the time, so I
thought I'd run clang-tidy with this diagnostic on all the files (that I
can compile) and fix them.
There is still one warning, in utils.c, but that's because some code
is mixed up with preprocessor macros (#ifdef TUI), so I think there no
good solution there.
Change-Id: I345175fc7dd865318f0fbe61ac026c88c3b6a96b
I think it only really makes sense to call wrap_here with an argument
consisting solely of spaces. Given this, it seemed better to me that
the argument be an int, rather than a string. This patch is the
result. Much of it was written by a script.
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
While investigating some disassembler problems I ran into this case;
GDB compiled on a 32-bit arm target, with --enable-targets=all. Then
in GDB:
(gdb) set architecture i386
(gdb) disassemble 0x0,+4
unknown disassembler error (error = -1)
This is interesting because it shows a case where the libopcodes
disassembler is returning -1 without first calling the
memory_error_func callback. Indeed, the return from libopcodes
happens from this code snippet in i386-dis.c in the print_insn
function:
if (address_mode == mode_64bit && sizeof (bfd_vma) < 8)
{
(*info->fprintf_func) (info->stream,
_("64-bit address is disabled"));
return -1;
}
Notice how, prior to the return the disassembler tries to print a
helpful message out, but GDB doesn't print this message.
The reason this message goes missing is the call stack, it looks like
this:
gdb_pretty_print_disassembler::pretty_print_insn
gdb_disassembler::print_insn
gdbarch_print_insn
...
i386-dis.c:print_insn
When i386-dis.c:print_insn returns -1 this is handled in
gdb_disassembler::print_insn, where an exception is thrown. However,
the actual printing of the disassembler output is done in
gdb_pretty_print_disassembler::pretty_print_insn, and is only done if
an exception is not thrown.
In this commit I change this. The pretty_print_insn now uses
try/catch around the call to gdb_disassembler::print_insn, if we catch
an error then we first print any pending output in the instruction
buffer, before rethrowing the exception. As a result, even if an
exception is thrown we still print any pending disassembler output to
the screen; in the above case the helpful message will now be shown.
Before my patch we might expect to see this output:
(gdb) disassemble 0x0,+4
Dump of assembler code from 0x0 to 0x4:
0x0000000000000000: unknown disassembler error (error = -1)
(gdb)
But now we see this:
(gdb) disassemble 0x0,+4
Dump of assembler code from 0x0 to 0x4:
0x0000000000000000: 64-bit address is disabled
unknown disassembler error (error = -1)
If the disassembler returns -1 without printing a helpful message then
we would still expect a change in output, something like:
(gdb) disassemble 0x0,+4
Dump of assembler code from 0x0 to 0x4:
0x0000000000000000:
unknown disassembler error (error = -1)
Which I think is still acceptable, though at this point I think a
strong case can be made that this is a disassembler bug (not printing
anything, but still returning -1).
Notice however, that the error message is always printed on a new line
now. This is also true for the memory error case, where before we
might see this:
(gdb) disassemble 0x0,+4
Dump of assembler code from 0x0 to 0x4:
0x00000000: Cannot access memory at address 0x0
We now get this:
(gdb) disassemble 0x0,+4
Dump of assembler code from 0x0 to 0x4:
0x00000000:
Cannot access memory at address 0x0
For me, I'm happy to accept this change, having the error on a line by
itself, rather than just appended to the end of the previous line,
seems like an improvement, but I'm aware others might feel
differently, so I'd appreciate any feedback.
The disassemble_info structure has four callbacks, we have three of
them as static member functions within gdb_disassembler, the fourth is
just a global static function.
However, this fourth callback, is still only used from the
disassemble_info struct, so there's no real reason for its special
handling.
This commit makes fprintf_disasm a static method within
gdb_disassembler.
There should be no user visible changes after this commit.
If the libopcodes disassembler returns a negative value then this
indicates that the disassembly failed for some reason. In disas.c, in
the function gdb_disassembler::print_insn we can see how this is
handled; when we get a negative value back, we call the memory_error
function, which throws an exception.
The problem here is that the address used in the memory_error call is
gdb_disassembler::m_err_memaddr, which is set in
gdb_disassembler::dis_asm_memory_error, which is called from within
the libopcodes disassembler through the
disassembler_info::memory_error_func callback.
However, for this to work correctly, every time the libopcodes
disassembler returns a negative value, the libopcodes disassembler
must have first called the memory_error_func callback.
My first plan was to make m_err_memaddr a gdb::optional, and assert
that it always had a value prior to calling memory_error, however, a
quick look in opcodes/*-dis.c shows that there _are_ cases where a
negative value is returned without first calling the memory_error_func
callback, for example in arc-dis.c and cris-dis.c.
Now, I think that a good argument can be made that these disassemblers
must therefore be broken, except for the case where we can't read
memory, we should always be able to disassemble the memory contents to
_something_, even if it's just '.word 0x....'. However, I certainly
don't plan to go and fix all of the disassemblers.
What I do propose to do then, is make m_err_memaddr a gdb::optional,
but now, instead of always calling memory_error, I add a new path
which just calls error complaining about an unknown error. This new
path is only used if m_err_memaddr doesn't have a value (indicating
that the memory_error_func callback was not called).
To test this I just augmented one of the disassemblers to always
return -1, before this patch I see this:
Dump of assembler code for function main:
0x000101aa <+0>: Cannot access memory at address 0x0
And after this commit I now see:
Dump of assembler code for function main:
0x000101aa <+0>: unknown disassembler error (error = -1)
This doesn't really help much, but that's because there's no way to
report non memory errors out of the disasembler, because, it was not
expected that the disassembler would ever report non memory errors.
String-like settings (var_string, var_filename, var_optional_filename,
var_string_noescape) currently take a pointer to a `char *` storage
variable (typically global) that holds the setting's value. I'd like to
"mordernize" this by changing them to use an std::string for storage.
An obvious reason is that string operations on std::string are often
easier to write than with C strings. And they avoid having to do any
manual memory management.
Another interesting reason is that, with `char *`, nullptr and an empty
string often both have the same meaning of "no value". String settings
are initially nullptr (unless initialized otherwise). But when doing
"set foo" (where `foo` is a string setting), the setting now points to
an empty string. For example, solib_search_path is nullptr at startup,
but points to an empty string after doing "set solib-search-path". This
leads to some code that needs to check for both to check for "no value".
Or some code that converts back and forth between NULL and "" when
getting or setting the value. I find this very error-prone, because it
is very easy to forget one or the other. With std::string, we at least
know that the variable is not "NULL". There is only one way of
representing an empty string setting, that is with an empty string.
I was wondering whether the distinction between NULL and "" would be
important for some setting, but it doesn't seem so. If that ever
happens, it would be more C++-y and self-descriptive to use
optional<string> anyway.
Actually, there's one spot where this distinction mattered, it's in
init_history, for the test gdb.base/gdbinit-history.exp. init_history
sets the history filename to the default ".gdb_history" if it sees that
the setting was never set - if history_filename is nullptr. If
history_filename is an empty string, it means the setting was explicitly
cleared, so it leaves it as-is. With the change to std::string, this
distinction doesn't exist anymore. This can be fixed by moving the code
that chooses a good default value for history_filename to
_initialize_top. This is ran before -ex commands are processed, so an
-ex command can then clear that value if needed (what
gdb.base/gdbinit-history.exp tests).
Another small improvement, in my opinion is that we can now easily
give string parameters initial values, by simply initializing the global
variables, instead of xstrdup-ing it in the _initialize function.
In Python and Guile, when registering a string-like parameter, we
allocate (with new) an std::string that is owned by the param_smob (in
Guile) and the parmpy_object (in Python) objects.
This patch started by changing all relevant add_setshow_* commands to
take an `std::string *` instead of a `char **` and fixing everything
that failed to build. That includes of course all string setting
variable and their uses.
string_option_def now uses an std::string also, because there's a
connection between options and settings (see
add_setshow_cmds_for_options).
The add_path function in source.c is really complex and twisted, I'd
rather not try to change it to work on an std::string right now.
Instead, I added an overload that copies the std:string to a `char *`
and back. This means more copying, but this is not used in a hot path
at all, so I think it is acceptable.
Change-Id: I92c50a1bdd8307141cdbacb388248e4e4fc08c93
Co-authored-by: Lancelot SIX <lsix@lancelotsix.com>
Some add_set_show commands return a single cmd_list_element, the one for
the "set" command. A subsequent patch will need to access the show
command's cmd_list_element as well. Change these functions to return a
new structure type that holds both pointers.
I initially only modified add_setshow_boolean_cmd (the one I needed),
but I think it's better to change the whole chain to keep everything in
sync.
gdb/ChangeLog:
* command.h (set_show_commands): New.
(add_setshow_enum_cmd, add_setshow_auto_boolean_cmd,
add_setshow_boolean_cmd, add_setshow_filename_cmd,
add_setshow_string_cmd, add_setshow_string_noescape_cmd,
add_setshow_optional_filename_cmd, add_setshow_integer_cmd,
add_setshow_uinteger_cmd, add_setshow_zinteger_cmd,
add_setshow_zuinteger_cmd, add_setshow_zuinteger_unlimited_cmd):
Return set_show_commands. Adjust callers.
* cli/cli-decode.c (add_setshow_cmd_full): Return
set_show_commands, remove result parameters, adjust callers.
Change-Id: I17492b01b76002d09effc84830f9c6db26f1db7a