mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-12-01 08:04:22 +08:00
5e6dca82bc
Arnd found a randconfig that produces the warning: arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at offset 0x3e when building with LLVM_IAS=1 (Clang's integrated assembler). Josh notes: With the LLVM assembler not generating section symbols, objtool has no way to reference this code when it generates ORC unwinder entries, because this code is outside of any ELF function. The limitation now being imposed by objtool is that all code must be contained in an ELF symbol. And .L symbols don't create such symbols. So basically, you can use an .L symbol *inside* a function or a code segment, you just can't use the .L symbol to contain the code using a SYM_*_START/END annotation pair. Fangrui notes that this optimization is helpful for reducing image size when compiling with -ffunction-sections and -fdata-sections. I have observed on the order of tens of thousands of symbols for the kernel images built with those flags. A patch has been authored against GNU binutils to match this behavior of not generating unused section symbols ([1]), so this will also become a problem for users of GNU binutils once they upgrade to 2.36. Omit the .L prefix on a label so that the assembler will emit an entry into the symbol table for the label, with STB_LOCAL binding. This enables objtool to generate proper unwind info here with LLVM_IAS=1 or GNU binutils 2.36+. [ bp: Massage commit message. ] Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Suggested-by: Borislav Petkov <bp@alien8.de> Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20210112194625.4181814-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1209 Link: https://reviews.llvm.org/D93783 Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1408485ce69f844dcd7ded093a8 [1]
223 lines
9.4 KiB
ReStructuredText
223 lines
9.4 KiB
ReStructuredText
Assembler Annotations
|
|
=====================
|
|
|
|
Copyright (c) 2017-2019 Jiri Slaby
|
|
|
|
This document describes the new macros for annotation of data and code in
|
|
assembly. In particular, it contains information about ``SYM_FUNC_START``,
|
|
``SYM_FUNC_END``, ``SYM_CODE_START``, and similar.
|
|
|
|
Rationale
|
|
---------
|
|
Some code like entries, trampolines, or boot code needs to be written in
|
|
assembly. The same as in C, such code is grouped into functions and
|
|
accompanied with data. Standard assemblers do not force users into precisely
|
|
marking these pieces as code, data, or even specifying their length.
|
|
Nevertheless, assemblers provide developers with such annotations to aid
|
|
debuggers throughout assembly. On top of that, developers also want to mark
|
|
some functions as *global* in order to be visible outside of their translation
|
|
units.
|
|
|
|
Over time, the Linux kernel has adopted macros from various projects (like
|
|
``binutils``) to facilitate such annotations. So for historic reasons,
|
|
developers have been using ``ENTRY``, ``END``, ``ENDPROC``, and other
|
|
annotations in assembly. Due to the lack of their documentation, the macros
|
|
are used in rather wrong contexts at some locations. Clearly, ``ENTRY`` was
|
|
intended to denote the beginning of global symbols (be it data or code).
|
|
``END`` used to mark the end of data or end of special functions with
|
|
*non-standard* calling convention. In contrast, ``ENDPROC`` should annotate
|
|
only ends of *standard* functions.
|
|
|
|
When these macros are used correctly, they help assemblers generate a nice
|
|
object with both sizes and types set correctly. For example, the result of
|
|
``arch/x86/lib/putuser.S``::
|
|
|
|
Num: Value Size Type Bind Vis Ndx Name
|
|
25: 0000000000000000 33 FUNC GLOBAL DEFAULT 1 __put_user_1
|
|
29: 0000000000000030 37 FUNC GLOBAL DEFAULT 1 __put_user_2
|
|
32: 0000000000000060 36 FUNC GLOBAL DEFAULT 1 __put_user_4
|
|
35: 0000000000000090 37 FUNC GLOBAL DEFAULT 1 __put_user_8
|
|
|
|
This is not only important for debugging purposes. When there are properly
|
|
annotated objects like this, tools can be run on them to generate more useful
|
|
information. In particular, on properly annotated objects, ``objtool`` can be
|
|
run to check and fix the object if needed. Currently, ``objtool`` can report
|
|
missing frame pointer setup/destruction in functions. It can also
|
|
automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>`
|
|
for most code. Both of these are especially important to support reliable
|
|
stack traces which are in turn necessary for :doc:`Kernel live patching
|
|
<livepatch/livepatch>`.
|
|
|
|
Caveat and Discussion
|
|
---------------------
|
|
As one might realize, there were only three macros previously. That is indeed
|
|
insufficient to cover all the combinations of cases:
|
|
|
|
* standard/non-standard function
|
|
* code/data
|
|
* global/local symbol
|
|
|
|
There was a discussion_ and instead of extending the current ``ENTRY/END*``
|
|
macros, it was decided that brand new macros should be introduced instead::
|
|
|
|
So how about using macro names that actually show the purpose, instead
|
|
of importing all the crappy, historic, essentially randomly chosen
|
|
debug symbol macro names from the binutils and older kernels?
|
|
|
|
.. _discussion: https://lkml.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz
|
|
|
|
Macros Description
|
|
------------------
|
|
|
|
The new macros are prefixed with the ``SYM_`` prefix and can be divided into
|
|
three main groups:
|
|
|
|
1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with
|
|
standard C calling conventions. For example, on x86, this means that the
|
|
stack contains a return address at the predefined place and a return from
|
|
the function can happen in a standard way. When frame pointers are enabled,
|
|
save/restore of frame pointer shall happen at the start/end of a function,
|
|
respectively, too.
|
|
|
|
Checking tools like ``objtool`` should ensure such marked functions conform
|
|
to these rules. The tools can also easily annotate these functions with
|
|
debugging information (like *ORC data*) automatically.
|
|
|
|
2. ``SYM_CODE_*`` -- special functions called with special stack. Be it
|
|
interrupt handlers with special stack content, trampolines, or startup
|
|
functions.
|
|
|
|
Checking tools mostly ignore checking of these functions. But some debug
|
|
information still can be generated automatically. For correct debug data,
|
|
this code needs hints like ``UNWIND_HINT_REGS`` provided by developers.
|
|
|
|
3. ``SYM_DATA*`` -- obviously data belonging to ``.data`` sections and not to
|
|
``.text``. Data do not contain instructions, so they have to be treated
|
|
specially by the tools: they should not treat the bytes as instructions,
|
|
nor assign any debug information to them.
|
|
|
|
Instruction Macros
|
|
~~~~~~~~~~~~~~~~~~
|
|
This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above.
|
|
|
|
``objtool`` requires that all code must be contained in an ELF symbol. Symbol
|
|
names that have a ``.L`` prefix do not emit symbol table entries. ``.L``
|
|
prefixed symbols can be used within a code region, but should be avoided for
|
|
denoting a range of code via ``SYM_*_START/END`` annotations.
|
|
|
|
* ``SYM_FUNC_START`` and ``SYM_FUNC_START_LOCAL`` are supposed to be **the
|
|
most frequent markings**. They are used for functions with standard calling
|
|
conventions -- global and local. Like in C, they both align the functions to
|
|
architecture specific ``__ALIGN`` bytes. There are also ``_NOALIGN`` variants
|
|
for special cases where developers do not want this implicit alignment.
|
|
|
|
``SYM_FUNC_START_WEAK`` and ``SYM_FUNC_START_WEAK_NOALIGN`` markings are
|
|
also offered as an assembler counterpart to the *weak* attribute known from
|
|
C.
|
|
|
|
All of these **shall** be coupled with ``SYM_FUNC_END``. First, it marks
|
|
the sequence of instructions as a function and computes its size to the
|
|
generated object file. Second, it also eases checking and processing such
|
|
object files as the tools can trivially find exact function boundaries.
|
|
|
|
So in most cases, developers should write something like in the following
|
|
example, having some asm instructions in between the macros, of course::
|
|
|
|
SYM_FUNC_START(memset)
|
|
... asm insns ...
|
|
SYM_FUNC_END(memset)
|
|
|
|
In fact, this kind of annotation corresponds to the now deprecated ``ENTRY``
|
|
and ``ENDPROC`` macros.
|
|
|
|
* ``SYM_FUNC_START_ALIAS`` and ``SYM_FUNC_START_LOCAL_ALIAS`` serve for those
|
|
who decided to have two or more names for one function. The typical use is::
|
|
|
|
SYM_FUNC_START_ALIAS(__memset)
|
|
SYM_FUNC_START(memset)
|
|
... asm insns ...
|
|
SYM_FUNC_END(memset)
|
|
SYM_FUNC_END_ALIAS(__memset)
|
|
|
|
In this example, one can call ``__memset`` or ``memset`` with the same
|
|
result, except the debug information for the instructions is generated to
|
|
the object file only once -- for the non-``ALIAS`` case.
|
|
|
|
* ``SYM_CODE_START`` and ``SYM_CODE_START_LOCAL`` should be used only in
|
|
special cases -- if you know what you are doing. This is used exclusively
|
|
for interrupt handlers and similar where the calling convention is not the C
|
|
one. ``_NOALIGN`` variants exist too. The use is the same as for the ``FUNC``
|
|
category above::
|
|
|
|
SYM_CODE_START_LOCAL(bad_put_user)
|
|
... asm insns ...
|
|
SYM_CODE_END(bad_put_user)
|
|
|
|
Again, every ``SYM_CODE_START*`` **shall** be coupled by ``SYM_CODE_END``.
|
|
|
|
To some extent, this category corresponds to deprecated ``ENTRY`` and
|
|
``END``. Except ``END`` had several other meanings too.
|
|
|
|
* ``SYM_INNER_LABEL*`` is used to denote a label inside some
|
|
``SYM_{CODE,FUNC}_START`` and ``SYM_{CODE,FUNC}_END``. They are very similar
|
|
to C labels, except they can be made global. An example of use::
|
|
|
|
SYM_CODE_START(ftrace_caller)
|
|
/* save_mcount_regs fills in first two parameters */
|
|
...
|
|
|
|
SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)
|
|
/* Load the ftrace_ops into the 3rd parameter */
|
|
...
|
|
|
|
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
|
|
call ftrace_stub
|
|
...
|
|
retq
|
|
SYM_CODE_END(ftrace_caller)
|
|
|
|
Data Macros
|
|
~~~~~~~~~~~
|
|
Similar to instructions, there is a couple of macros to describe data in the
|
|
assembly.
|
|
|
|
* ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data
|
|
and shall be used in conjunction with either ``SYM_DATA_END``, or
|
|
``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that
|
|
people can use ``lstack`` and (local) ``lstack_end`` in the following
|
|
example::
|
|
|
|
SYM_DATA_START_LOCAL(lstack)
|
|
.skip 4096
|
|
SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end)
|
|
|
|
* ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly one-line
|
|
data::
|
|
|
|
SYM_DATA(HEAP, .long rm_heap)
|
|
SYM_DATA(heap_end, .long rm_stack)
|
|
|
|
In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END``
|
|
internally.
|
|
|
|
Support Macros
|
|
~~~~~~~~~~~~~~
|
|
All the above reduce themselves to some invocation of ``SYM_START``,
|
|
``SYM_END``, or ``SYM_ENTRY`` at last. Normally, developers should avoid using
|
|
these.
|
|
|
|
Further, in the above examples, one could see ``SYM_L_LOCAL``. There are also
|
|
``SYM_L_GLOBAL`` and ``SYM_L_WEAK``. All are intended to denote linkage of a
|
|
symbol marked by them. They are used either in ``_LABEL`` variants of the
|
|
earlier macros, or in ``SYM_START``.
|
|
|
|
|
|
Overriding Macros
|
|
~~~~~~~~~~~~~~~~~
|
|
Architecture can also override any of the macros in their own
|
|
``asm/linkage.h``, including macros specifying the type of a symbol
|
|
(``SYM_T_FUNC``, ``SYM_T_OBJECT``, and ``SYM_T_NONE``). As every macro
|
|
described in this file is surrounded by ``#ifdef`` + ``#endif``, it is enough
|
|
to define the macros differently in the aforementioned architecture-dependent
|
|
header.
|