2019-07-19 01:44:21 +08:00
|
|
|
/* Symbol, variable and name lookup.
|
2024-01-04 19:52:08 +08:00
|
|
|
Copyright (C) 2019-2024 Free Software Foundation, Inc.
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
|
|
|
This file is part of libctf.
|
|
|
|
|
|
|
|
libctf is free software; you can redistribute it and/or modify it under
|
|
|
|
the terms of the GNU General Public License as published by the Free
|
|
|
|
Software Foundation; either version 3, or (at your option) any later
|
|
|
|
version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful, but
|
|
|
|
WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
|
|
|
See the GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; see the file COPYING. If not see
|
|
|
|
<http://www.gnu.org/licenses/>. */
|
|
|
|
|
|
|
|
#include <ctf-impl.h>
|
|
|
|
#include <elf.h>
|
|
|
|
#include <string.h>
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
#include <assert.h>
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
/* Grow the pptrtab so that it is at least NEW_LEN long. */
|
|
|
|
static int
|
|
|
|
grow_pptrtab (ctf_dict_t *fp, size_t new_len)
|
|
|
|
{
|
|
|
|
uint32_t *new_pptrtab;
|
|
|
|
|
|
|
|
if ((new_pptrtab = realloc (fp->ctf_pptrtab, sizeof (uint32_t)
|
|
|
|
* new_len)) == NULL)
|
|
|
|
return (ctf_set_errno (fp, ENOMEM));
|
|
|
|
|
|
|
|
fp->ctf_pptrtab = new_pptrtab;
|
|
|
|
|
|
|
|
memset (fp->ctf_pptrtab + fp->ctf_pptrtab_len, 0,
|
|
|
|
sizeof (uint32_t) * (new_len - fp->ctf_pptrtab_len));
|
|
|
|
|
|
|
|
fp->ctf_pptrtab_len = new_len;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Update entries in the pptrtab that relate to types newly added in the
|
|
|
|
child. */
|
|
|
|
static int
|
|
|
|
refresh_pptrtab (ctf_dict_t *fp, ctf_dict_t *pfp)
|
|
|
|
{
|
|
|
|
uint32_t i;
|
|
|
|
for (i = fp->ctf_pptrtab_typemax; i <= fp->ctf_typemax; i++)
|
|
|
|
{
|
|
|
|
ctf_id_t type = LCTF_INDEX_TO_TYPE (fp, i, 1);
|
|
|
|
ctf_id_t reffed_type;
|
|
|
|
|
|
|
|
if (ctf_type_kind (fp, type) != CTF_K_POINTER)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
reffed_type = ctf_type_reference (fp, type);
|
|
|
|
|
|
|
|
if (LCTF_TYPE_ISPARENT (fp, reffed_type))
|
|
|
|
{
|
|
|
|
uint32_t idx = LCTF_TYPE_TO_INDEX (fp, reffed_type);
|
|
|
|
|
|
|
|
/* Guard against references to invalid types. No need to consider
|
|
|
|
the CTF dict corrupt in this case: this pointer just can't be a
|
|
|
|
pointer to any type we know about. */
|
|
|
|
if (idx <= pfp->ctf_typemax)
|
|
|
|
{
|
|
|
|
if (idx >= fp->ctf_pptrtab_len
|
|
|
|
&& grow_pptrtab (fp, pfp->ctf_ptrtab_len) < 0)
|
|
|
|
return -1; /* errno is set for us. */
|
|
|
|
|
|
|
|
fp->ctf_pptrtab[idx] = i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
fp->ctf_pptrtab_typemax = fp->ctf_typemax;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-04-24 18:15:33 +08:00
|
|
|
/* Compare the given input string and length against a table of known C storage
|
|
|
|
qualifier keywords. We just ignore these in ctf_lookup_by_name, below. To
|
|
|
|
do this quickly, we use a pre-computed Perfect Hash Function similar to the
|
|
|
|
technique originally described in the classic paper:
|
|
|
|
|
|
|
|
R.J. Cichelli, "Minimal Perfect Hash Functions Made Simple",
|
|
|
|
Communications of the ACM, Volume 23, Issue 1, January 1980, pp. 17-19.
|
|
|
|
|
|
|
|
For an input string S of length N, we use hash H = S[N - 1] + N - 105, which
|
|
|
|
for the current set of qualifiers yields a unique H in the range [0 .. 20].
|
|
|
|
The hash can be modified when the keyword set changes as necessary. We also
|
|
|
|
store the length of each keyword and check it prior to the final strcmp().
|
|
|
|
|
|
|
|
TODO: just use gperf. */
|
|
|
|
|
|
|
|
static int
|
|
|
|
isqualifier (const char *s, size_t len)
|
|
|
|
{
|
|
|
|
static const struct qual
|
|
|
|
{
|
|
|
|
const char *q_name;
|
|
|
|
size_t q_len;
|
|
|
|
} qhash[] = {
|
|
|
|
{"static", 6}, {"", 0}, {"", 0}, {"", 0},
|
|
|
|
{"volatile", 8}, {"", 0}, {"", 0}, {"", 0}, {"", 0},
|
|
|
|
{"", 0}, {"auto", 4}, {"extern", 6}, {"", 0}, {"", 0},
|
|
|
|
{"", 0}, {"", 0}, {"const", 5}, {"register", 8},
|
|
|
|
{"", 0}, {"restrict", 8}, {"_Restrict", 9}
|
|
|
|
};
|
|
|
|
|
|
|
|
int h = s[len - 1] + (int) len - 105;
|
2021-03-26 00:32:46 +08:00
|
|
|
const struct qual *qp;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
2021-03-26 00:32:46 +08:00
|
|
|
if (h < 0 || (size_t) h >= sizeof (qhash) / sizeof (qhash[0]))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
qp = &qhash[h];
|
|
|
|
|
|
|
|
return ((size_t) len == qp->q_len &&
|
2019-04-24 18:15:33 +08:00
|
|
|
strncmp (qp->q_name, s, qp->q_len) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Attempt to convert the given C type name into the corresponding CTF type ID.
|
|
|
|
It is not possible to do complete and proper conversion of type names
|
|
|
|
without implementing a more full-fledged parser, which is necessary to
|
|
|
|
handle things like types that are function pointers to functions that
|
|
|
|
have arguments that are function pointers, and fun stuff like that.
|
|
|
|
Instead, this function implements a very simple conversion algorithm that
|
|
|
|
finds the things that we actually care about: structs, unions, enums,
|
|
|
|
integers, floats, typedefs, and pointers to any of these named types. */
|
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
static ctf_id_t
|
|
|
|
ctf_lookup_by_name_internal (ctf_dict_t *fp, ctf_dict_t *child,
|
|
|
|
const char *name)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
|
|
|
static const char delimiters[] = " \t\n\r\v\f*";
|
|
|
|
|
|
|
|
const ctf_lookup_t *lp;
|
|
|
|
const char *p, *q, *end;
|
|
|
|
ctf_id_t type = 0;
|
|
|
|
ctf_id_t ntype, ptype;
|
|
|
|
|
|
|
|
if (name == NULL)
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, EINVAL));
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
for (p = name, end = name + strlen (name); *p != '\0'; p = q)
|
|
|
|
{
|
2020-07-13 23:05:15 +08:00
|
|
|
while (isspace ((int) *p))
|
2019-04-24 18:15:33 +08:00
|
|
|
p++; /* Skip leading whitespace. */
|
|
|
|
|
|
|
|
if (p == end)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if ((q = strpbrk (p + 1, delimiters)) == NULL)
|
|
|
|
q = end; /* Compare until end. */
|
|
|
|
|
|
|
|
if (*p == '*')
|
|
|
|
{
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
/* Find a pointer to type by looking in child->ctf_pptrtab (if child
|
|
|
|
is set) and fp->ctf_ptrtab. If we can't find a pointer to the
|
|
|
|
given type, see if we can compute a pointer to the type resulting
|
|
|
|
from resolving the type down to its base type and use that instead.
|
|
|
|
This helps with cases where the CTF data includes "struct foo *"
|
|
|
|
but not "foo_t *" and the user tries to access "foo_t *" in the
|
2021-01-19 20:45:18 +08:00
|
|
|
debugger.
|
|
|
|
|
|
|
|
There is extra complexity here because uninitialized elements in
|
|
|
|
the pptrtab and ptrtab are set to zero, but zero (as the type ID
|
|
|
|
meaning the unimplemented type) is a valid return type from
|
|
|
|
ctf_lookup_by_name. (Pointers to types are never of type 0, so
|
|
|
|
this is unambiguous, just fiddly to deal with.) */
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
|
|
|
|
uint32_t idx = LCTF_TYPE_TO_INDEX (fp, type);
|
|
|
|
int in_child = 0;
|
|
|
|
|
2021-01-19 20:45:18 +08:00
|
|
|
ntype = CTF_ERR;
|
2021-09-28 03:31:21 +08:00
|
|
|
if (child && idx < child->ctf_pptrtab_len)
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
{
|
|
|
|
ntype = child->ctf_pptrtab[idx];
|
|
|
|
if (ntype)
|
|
|
|
in_child = 1;
|
2021-01-19 20:45:18 +08:00
|
|
|
else
|
|
|
|
ntype = CTF_ERR;
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
2021-01-19 20:45:18 +08:00
|
|
|
if (ntype == CTF_ERR)
|
|
|
|
{
|
|
|
|
ntype = fp->ctf_ptrtab[idx];
|
|
|
|
if (ntype == 0)
|
|
|
|
ntype = CTF_ERR;
|
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
/* Try resolving to its base type and check again. */
|
2021-01-19 20:45:18 +08:00
|
|
|
if (ntype == CTF_ERR)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
if (child)
|
|
|
|
ntype = ctf_type_resolve_unsliced (child, type);
|
|
|
|
else
|
|
|
|
ntype = ctf_type_resolve_unsliced (fp, type);
|
|
|
|
|
|
|
|
if (ntype == CTF_ERR)
|
|
|
|
goto notype;
|
|
|
|
|
|
|
|
idx = LCTF_TYPE_TO_INDEX (fp, ntype);
|
|
|
|
|
2021-01-19 20:45:18 +08:00
|
|
|
ntype = CTF_ERR;
|
2021-09-28 03:31:21 +08:00
|
|
|
if (child && idx < child->ctf_pptrtab_len)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
ntype = child->ctf_pptrtab[idx];
|
|
|
|
if (ntype)
|
|
|
|
in_child = 1;
|
2021-01-19 20:45:18 +08:00
|
|
|
else
|
|
|
|
ntype = CTF_ERR;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
|
2021-01-19 20:45:18 +08:00
|
|
|
if (ntype == CTF_ERR)
|
|
|
|
{
|
|
|
|
ntype = fp->ctf_ptrtab[idx];
|
|
|
|
if (ntype == 0)
|
|
|
|
ntype = CTF_ERR;
|
|
|
|
}
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
if (ntype == CTF_ERR)
|
|
|
|
goto notype;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
type = LCTF_INDEX_TO_TYPE (fp, ntype, (fp->ctf_flags & LCTF_CHILD)
|
|
|
|
|| in_child);
|
|
|
|
|
|
|
|
/* We are looking up a type in the parent, but the pointed-to type is
|
|
|
|
in the child. Switch to looking in the child: if we need to go
|
|
|
|
back into the parent, we can recurse again. */
|
|
|
|
if (in_child)
|
|
|
|
{
|
|
|
|
fp = child;
|
|
|
|
child = NULL;
|
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
q = p + 1;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (isqualifier (p, (size_t) (q - p)))
|
|
|
|
continue; /* Skip qualifier keyword. */
|
|
|
|
|
|
|
|
for (lp = fp->ctf_lookups; lp->ctl_prefix != NULL; lp++)
|
|
|
|
{
|
|
|
|
/* TODO: This is not MT-safe. */
|
|
|
|
if ((lp->ctl_prefix[0] == '\0' ||
|
|
|
|
strncmp (p, lp->ctl_prefix, (size_t) (q - p)) == 0) &&
|
|
|
|
(size_t) (q - p) >= lp->ctl_len)
|
|
|
|
{
|
2020-07-13 23:05:15 +08:00
|
|
|
for (p += lp->ctl_len; isspace ((int) *p); p++)
|
2019-04-24 18:15:33 +08:00
|
|
|
continue; /* Skip prefix and next whitespace. */
|
|
|
|
|
|
|
|
if ((q = strchr (p, '*')) == NULL)
|
|
|
|
q = end; /* Compare until end. */
|
|
|
|
|
2020-07-13 23:05:15 +08:00
|
|
|
while (isspace ((int) q[-1]))
|
2019-04-24 18:15:33 +08:00
|
|
|
q--; /* Exclude trailing whitespace. */
|
|
|
|
|
|
|
|
/* Expand and/or allocate storage for a slice of the name, then
|
|
|
|
copy it in. */
|
|
|
|
|
|
|
|
if (fp->ctf_tmp_typeslicelen >= (size_t) (q - p) + 1)
|
|
|
|
{
|
|
|
|
memcpy (fp->ctf_tmp_typeslice, p, (size_t) (q - p));
|
|
|
|
fp->ctf_tmp_typeslice[(size_t) (q - p)] = '\0';
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
free (fp->ctf_tmp_typeslice);
|
2019-06-06 21:10:08 +08:00
|
|
|
fp->ctf_tmp_typeslice = xstrndup (p, (size_t) (q - p));
|
2019-04-24 18:15:33 +08:00
|
|
|
if (fp->ctf_tmp_typeslice == NULL)
|
2023-09-13 17:02:36 +08:00
|
|
|
return ctf_set_typed_errno (fp, ENOMEM);
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: remove static/dynamic name lookup distinction
libctf internally maintains a set of hash tables for type name lookups,
one for each valid C type namespace (struct, union, enum, and everything
else).
Or, rather, it maintains *two* sets of hash tables: one, a ctf_hash *,
is meant for lookups in ctf_(buf)open()ed dicts with fixed content; the
other, a ctf_dynhash *, is meant for lookups in ctf_create()d dicts.
This distinction was somewhat valuable in the far pre-binutils past when
two different hashtable implementations were used (one expanding, the
other fixed-size), but those days are long gone: the hash table
implementations are almost identical, both wrappers around the libiberty
hashtab. The ctf_dynhash has many more capabilities than the ctf_hash
(iteration, deletion, etc etc) and has no downsides other than starting
at a fixed, arbitrary small size.
That limitation is easy to lift (via a new ctf_dynhash_create_sized()),
following which we can throw away nearly all the ctf_hash
implementation, and all the code to choose between readable and writable
hashtabs; the few convenience functions that are still useful (for
insertion of name -> type mappings) can also be generalized a bit so
that the extra string verification they do is potentially available to
other string lookups as well.
(libctf still has two hashtable implementations, ctf_dynhash, above,
and ctf_dynset, which is a key-only hashtab that can avoid a great many
malloc()s, used for high-volume applications in the deduplicator.)
libctf/
* ctf-create.c (ctf_create): Eliminate ctn_writable.
(ctf_dtd_insert): Likewise.
(ctf_dtd_delete): Likewise.
(ctf_rollback): Likewise.
(ctf_name_table): Eliminate ctf_names_t.
* ctf-hash.c (ctf_dynhash_create): Comment update.
Reimplement in terms of...
(ctf_dynhash_create_sized): ... this new function.
(ctf_hash_create): Remove.
(ctf_hash_size): Remove.
(ctf_hash_define_type): Remove.
(ctf_hash_destroy): Remove.
(ctf_hash_lookup_type): Rename to...
(ctf_dynhash_lookup_type): ... this.
(ctf_hash_insert_type): Rename to...
(ctf_dynhash_insert_type): ... this, moving validation to...
* ctf-string.c (ctf_strptr_validate): ... this new function.
* ctf-impl.h (struct ctf_names): Extirpate.
(struct ctf_lookup.ctl_hash): Now a ctf_dynhash_t.
(struct ctf_dict): All ctf_names_t fields are now ctf_dynhash_t.
(ctf_name_table): Now returns a ctf_dynhash_t.
(ctf_lookup_by_rawhash): Remove.
(ctf_hash_create): Likewise.
(ctf_hash_insert_type): Likewise.
(ctf_hash_define_type): Likewise.
(ctf_hash_lookup_type): Likewise.
(ctf_hash_size): Likewise.
(ctf_hash_destroy): Likewise.
(ctf_dynhash_create_sized): New.
(ctf_dynhash_insert_type): New.
(ctf_dynhash_lookup_type): New.
(ctf_strptr_validate): New.
* ctf-lookup.c (ctf_lookup_by_name_internal): Adapt.
* ctf-open.c (init_types): Adapt.
(ctf_set_ctl_hashes): Adapt.
(ctf_dict_close): Adapt.
* ctf-serialize.c (ctf_serialize): Adapt.
* ctf-types.c (ctf_lookup_by_rawhash): Remove.
2023-12-19 01:47:48 +08:00
|
|
|
if ((type = (ctf_id_t) (uintptr_t)
|
|
|
|
ctf_dynhash_lookup (lp->ctl_hash,
|
|
|
|
fp->ctf_tmp_typeslice)) == 0)
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
goto notype;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (lp->ctl_prefix == NULL)
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
goto notype;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (*p != '\0' || type == 0)
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_SYNTAX));
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
return type;
|
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
notype:
|
|
|
|
ctf_set_errno (fp, ECTF_NOTYPE);
|
|
|
|
if (fp->ctf_parent != NULL)
|
|
|
|
{
|
|
|
|
/* Need to look up in the parent, from the child's perspective.
|
|
|
|
Make sure the pptrtab is up to date. */
|
|
|
|
|
|
|
|
if (fp->ctf_pptrtab_typemax < fp->ctf_typemax)
|
|
|
|
{
|
|
|
|
if (refresh_pptrtab (fp, fp->ctf_parent) < 0)
|
2023-09-13 17:02:36 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if ((ptype = ctf_lookup_by_name_internal (fp->ctf_parent, fp,
|
|
|
|
name)) != CTF_ERR)
|
|
|
|
return ptype;
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ctf_errno (fp->ctf_parent)));
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
return CTF_ERR;
|
|
|
|
}
|
|
|
|
|
libctf: fix lookups of pointers by name in parent dicts
When you look up a type by name using ctf_lookup_by_name, in most cases
libctf can just strip off any qualifiers and look for the name, but for
pointer types this doesn't work, since the caller will want the pointer
type itself. But pointer types are nameless, and while they cite the
types they point to, looking up a type by name requires a link going the
*other way*, from the type pointed to to the pointer type that points to
it.
libctf has always built this up at open time: ctf_ptrtab is an array of
type indexes pointing from the index of every type to the index of the
type that points to it. But because it is built up at open time (and
because it uses type indexes and not type IDs) it is restricted to
working within a single dict and ignoring parent/child
relationships. This is normally invisible, unless you manage to get a
dict with a type in the parent but the only pointer to it in a child.
The ctf_ptrtab will not track this relationship, so lookups of this
pointer type by name will fail. Since which type is in the parent and
which in the child is largely opaque to the user (which goes where is up
to the deduplicator, and it can and does reshuffle things to save
space), this leads to a very bad user experience, with an
obviously-visible pointer type which ctf_lookup_by_name claims doesn't
exist.
The fix is to have another array, ctf_pptrtab, which is populated in
child dicts: like the parent's ctf_ptrtab, it has one element per type
in the parent, but is all zeroes except for those types which are
pointed to by types in the child: so it maps parent dict indices to
child dict indices. The array is grown, and new child types scanned,
whenever a lookup happens and new types have been added to the child
since the last time a lookup happened that might need the pptrtab.
(So for non-writable dicts, this only happens once, since new types
cannot be added to non-writable dicts at all.)
Since this introduces new complexity (involving updating only part of
the ctf_pptrtab) which is only seen when a writable dict is in use, we
introduce a new libctf-writable testsuite that contains lookup tests
with no corresponding CTF-containing .c files (which can thus be run
even on platforms with no .ctf-section support in the linker yet), and
add a test to check that creation of pointers in children to types in
parents and a following lookup by name works as expected. The non-
writable case is tested in a new libctf-regression testsuite which is
used to track now-fixed outright bugs in libctf.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_pptrtab>: New.
<ctf_pptrtab_len>: New.
<ctf_pptrtab_typemax>: New.
* ctf-create.c (ctf_serialize): Update accordingly.
(ctf_add_reftype): Note that we don't need to update pptrtab here,
despite updating ptrtab.
* ctf-open.c (ctf_dict_close): Destroy the pptrtab.
(ctf_import): Likewise.
(ctf_import_unref): Likewise.
* ctf-lookup.c (grow_pptrtab): New.
(refresh_pptrtab): New, update a pptrtab.
(ctf_lookup_by_name): Turn into a wrapper around (and rename to)...
(ctf_lookup_by_name_internal): ... this: construct the pptrtab, and
use it in addition to the parent's ptrtab when parent dicts are
searched.
* testsuite/libctf-regression/regression.exp: New testsuite for
regression tests.
* testsuite/libctf-regression/pptrtab*: New test.
* testsuite/libctf-writable/writable.exp: New testsuite for tests of
writable CTF dicts.
* testsuite/libctf-writable/pptrtab*: New test.
2021-01-05 21:25:56 +08:00
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_by_name (ctf_dict_t *fp, const char *name)
|
|
|
|
{
|
|
|
|
return ctf_lookup_by_name_internal (fp, NULL, name);
|
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* Return the pointer to the internal CTF type data corresponding to the
|
|
|
|
given type ID. If the ID is invalid, the function returns NULL.
|
|
|
|
This function is not exported outside of the library. */
|
|
|
|
|
|
|
|
const ctf_type_t *
|
|
|
|
ctf_lookup_by_id (ctf_dict_t **fpp, ctf_id_t type)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_dict_t *fp = *fpp;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_id_t idx;
|
|
|
|
|
|
|
|
if ((fp = ctf_get_dict (fp, type)) == NULL)
|
|
|
|
{
|
|
|
|
(void) ctf_set_errno (*fpp, ECTF_NOPARENT);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
idx = LCTF_TYPE_TO_INDEX (fp, type);
|
|
|
|
if (idx > 0 && (unsigned long) idx <= fp->ctf_typemax)
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
*fpp = fp; /* Possibly the parent CTF dict. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return (LCTF_INDEX_TO_TYPEPTR (fp, idx));
|
|
|
|
}
|
|
|
|
|
|
|
|
(void) ctf_set_errno (*fpp, ECTF_BADID);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
typedef struct ctf_lookup_idx_key
|
|
|
|
{
|
|
|
|
ctf_dict_t *clik_fp;
|
|
|
|
const char *clik_name;
|
|
|
|
uint32_t *clik_names;
|
|
|
|
} ctf_lookup_idx_key_t;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
/* A bsearch function for variable names. */
|
|
|
|
|
|
|
|
static int
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_lookup_var (const void *key_, const void *lookup_)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
const ctf_lookup_idx_key_t *key = key_;
|
|
|
|
const ctf_varent_t *lookup = lookup_;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return (strcmp (key->clik_name, ctf_strptr (key->clik_fp, lookup->ctv_name)));
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Given a variable name, return the type of the variable with that name.
|
|
|
|
Look only in this dict, not in the parent. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
ctf_id_t
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_lookup_variable_here (ctf_dict_t *fp, const char *name)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_dvdef_t *dvd = ctf_dvd_lookup (fp, name);
|
2019-04-24 18:15:33 +08:00
|
|
|
ctf_varent_t *ent;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_lookup_idx_key_t key = { fp, name, NULL };
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (dvd != NULL)
|
|
|
|
return dvd->dvd_type;
|
|
|
|
|
2019-04-24 18:15:33 +08:00
|
|
|
/* This array is sorted, so we can bsearch for it. */
|
|
|
|
|
|
|
|
ent = bsearch (&key, fp->ctf_vars, fp->ctf_nvars, sizeof (ctf_varent_t),
|
|
|
|
ctf_lookup_var);
|
|
|
|
|
|
|
|
if (ent == NULL)
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NOTYPEDAT));
|
|
|
|
|
|
|
|
return ent->ctv_type;
|
|
|
|
}
|
2023-04-06 00:21:32 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* As above, but look in the parent too. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_variable (ctf_dict_t *fp, const char *name)
|
|
|
|
{
|
|
|
|
ctf_id_t type;
|
|
|
|
|
|
|
|
if ((type = ctf_lookup_variable_here (fp, name)) == CTF_ERR)
|
|
|
|
{
|
|
|
|
if (ctf_errno (fp) == ECTF_NOTYPEDAT && fp->ctf_parent != NULL)
|
|
|
|
{
|
|
|
|
if ((type = ctf_lookup_variable_here (fp->ctf_parent, name)) != CTF_ERR)
|
|
|
|
return type;
|
|
|
|
return (ctf_set_typed_errno (fp, ctf_errno (fp->ctf_parent)));
|
|
|
|
}
|
|
|
|
|
|
|
|
return -1; /* errno is set for us. */
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
return type;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf, include: new functions for looking up enumerators
Three new functions for looking up the enum type containing a given
enumeration constant, and optionally that constant's value.
The simplest, ctf_lookup_enumerator, looks up a root-visible enumerator by
name in one dict: if the dict contains multiple such constants (which is
possible for dicts created by older versions of the libctf deduplicator),
ECTF_DUPLICATE is returned.
The next simplest, ctf_lookup_enumerator_next, is an iterator which returns
all enumerators with a given name in a given dict, whether root-visible or
not.
The most elaborate, ctf_arc_lookup_enumerator_next, finds all
enumerators with a given name across all dicts in an entire CTF archive,
whether root-visible or not, starting looking in the shared parent dict;
opened dicts are cached (as with all other ctf_arc_*lookup functions) so
that repeated use does not incur repeated opening costs.
All three of these return enumerator values as int64_t: unfortunately, API
compatibility concerns prevent us from doing the same with the other older
enum-related functions, which all return enumerator constant values as ints.
We may be forced to add symbol-versioning compatibility aliases that fix the
other functions in due course, bumping the soname for platforms that do not
support such things.
ctf_arc_lookup_enumerator_next is implemented as a nested ctf_archive_next
iterator, and inside that, a nested ctf_lookup_enumerator_next iterator
within each dict. To aid in this, add support to ctf_next_t iterators for
iterators that are implemented in terms of two simultaneous nested iterators
at once. (It has always been possible for callers to use as many nested or
semi-overlapping ctf_next_t iterators as they need, which is one of the
advantages of this style over the _iter style that calls a function for each
thing iterated over: the iterator change here permits *ctf_next_t iterators
themselves* to be implemented by iterating using multiple other iterators as
part of their internal operation, transparently to the caller.)
Also add a testcase that tests all these functions (which is fairly easy
because ctf_arc_lookup_enumerator_next is implemented in terms of
ctf_lookup_enumerator_next) in addition to enumeration addition in
ctf_open()ed dicts, ctf_add_enumerator duplicate enumerator addition, and
conflicting enumerator constant deduplication.
include/
* ctf-api.h (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): Likewise.
(ctf_arc_lookup_enumerator_next): Likewise.
libctf/
* libctf.ver: Add them.
* ctf-impl.h (ctf_next_t) <ctn_next_inner>: New.
* ctf-util.c (ctf_next_copy): Copy it.
(ctf_next_destroy): Destroy it.
* ctf-lookup.c (ctf_lookup_enumerator): New.
(ctf_lookup_enumerator_next): New.
* ctf-archive.c (ctf_arc_lookup_enumerator_next): New.
* testsuite/libctf-lookup/enumerator-iteration.*: New test.
* testsuite/libctf-lookup/enum-ctf-2.c: New test CTF, used by the
above.
2024-06-12 03:58:00 +08:00
|
|
|
/* Look up a single enumerator by enumeration constant name. Returns the ID of
|
|
|
|
the enum it is contained within and optionally its value. Error out with
|
|
|
|
ECTF_DUPLICATE if multiple exist (which can happen in some older dicts). See
|
|
|
|
ctf_lookup_enumerator_next in that case. Enumeration constants in non-root
|
|
|
|
types are not returned, but constants in parents are, if not overridden by
|
|
|
|
an enum in the child.. */
|
|
|
|
|
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_enumerator (ctf_dict_t *fp, const char *name, int64_t *enum_value)
|
|
|
|
{
|
|
|
|
ctf_id_t type;
|
|
|
|
int enum_int_value;
|
|
|
|
|
|
|
|
if (ctf_dynset_lookup (fp->ctf_conflicting_enums, name))
|
|
|
|
return (ctf_set_typed_errno (fp, ECTF_DUPLICATE));
|
|
|
|
|
|
|
|
/* CTF_K_UNKNOWN suffices for things like enumeration constants that aren't
|
|
|
|
actually types at all (ending up in the global name table). */
|
|
|
|
type = ctf_lookup_by_rawname (fp, CTF_K_UNKNOWN, name);
|
|
|
|
/* Nonexistent type? It may be in the parent. */
|
|
|
|
if (type == 0 && fp->ctf_parent)
|
|
|
|
{
|
|
|
|
if ((type = ctf_lookup_enumerator (fp->ctf_parent, name, enum_value)) == 0)
|
|
|
|
return ctf_set_typed_errno (fp, ECTF_NOENUMNAM);
|
|
|
|
return type;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Nothing more to do if this type didn't exist or we don't have to look up
|
|
|
|
the enum value. */
|
|
|
|
if (type == 0)
|
|
|
|
return ctf_set_typed_errno (fp, ECTF_NOENUMNAM);
|
|
|
|
|
|
|
|
if (enum_value == NULL)
|
|
|
|
return type;
|
|
|
|
|
|
|
|
if (ctf_enum_value (fp, type, name, &enum_int_value) < 0)
|
|
|
|
return CTF_ERR;
|
|
|
|
*enum_value = enum_int_value;
|
|
|
|
|
|
|
|
return type;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return all enumeration constants with a given name in a given dict, similar
|
|
|
|
to ctf_lookup_enumerator above but capable of returning multiple values.
|
|
|
|
Enumerators in parent dictionaries are not returned: enumerators in
|
|
|
|
hidden types *are* returned. */
|
|
|
|
|
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_enumerator_next (ctf_dict_t *fp, const char *name,
|
|
|
|
ctf_next_t **it, int64_t *val)
|
|
|
|
{
|
|
|
|
ctf_next_t *i = *it;
|
|
|
|
int found = 0;
|
|
|
|
|
|
|
|
/* We use ctf_type_next() to iterate across all types, but then traverse each
|
|
|
|
enumerator found by hand: traversing enumerators is very easy, and it would
|
|
|
|
probably be more confusing to use two nested iterators than to do it this
|
|
|
|
way. We use ctn_next to work over enums, then ctn_en and ctn_n to work
|
|
|
|
over enumerators within each enum. */
|
|
|
|
if (!i)
|
|
|
|
{
|
|
|
|
if ((i = ctf_next_create ()) == NULL)
|
|
|
|
return ctf_set_typed_errno (fp, ENOMEM);
|
|
|
|
|
|
|
|
i->cu.ctn_fp = fp;
|
|
|
|
i->ctn_iter_fun = (void (*) (void)) ctf_lookup_enumerator_next;
|
|
|
|
i->ctn_increment = 0;
|
|
|
|
i->ctn_tp = NULL;
|
|
|
|
i->u.ctn_en = NULL;
|
|
|
|
i->ctn_n = 0;
|
|
|
|
*it = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((void (*) (void)) ctf_lookup_enumerator_next != i->ctn_iter_fun)
|
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFUN));
|
|
|
|
|
|
|
|
if (fp != i->cu.ctn_fp)
|
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFP));
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
|
|
|
const char *this_name;
|
|
|
|
|
|
|
|
/* At end of enum? Traverse to next one, if any are left. */
|
|
|
|
|
|
|
|
if (i->u.ctn_en == NULL || i->ctn_n == 0)
|
|
|
|
{
|
|
|
|
const ctf_type_t *tp;
|
|
|
|
ctf_dtdef_t *dtd;
|
|
|
|
|
|
|
|
do
|
|
|
|
i->ctn_type = ctf_type_next (i->cu.ctn_fp, &i->ctn_next, NULL, 1);
|
|
|
|
while (i->ctn_type != CTF_ERR
|
|
|
|
&& ctf_type_kind_unsliced (i->cu.ctn_fp, i->ctn_type)
|
|
|
|
!= CTF_K_ENUM);
|
|
|
|
|
|
|
|
if (i->ctn_type == CTF_ERR)
|
|
|
|
{
|
|
|
|
/* Conveniently, when the iterator over all types is done, so is the
|
|
|
|
iteration as a whole: so we can just pass all errors from the
|
|
|
|
internal iterator straight back out.. */
|
|
|
|
ctf_next_destroy (i);
|
|
|
|
*it = NULL;
|
|
|
|
return CTF_ERR; /* errno is set for us. */
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((tp = ctf_lookup_by_id (&fp, i->ctn_type)) == NULL)
|
|
|
|
return CTF_ERR; /* errno is set for us. */
|
|
|
|
i->ctn_n = LCTF_INFO_VLEN (fp, tp->ctt_info);
|
|
|
|
|
|
|
|
dtd = ctf_dynamic_type (fp, i->ctn_type);
|
|
|
|
|
|
|
|
if (dtd == NULL)
|
|
|
|
{
|
|
|
|
(void) ctf_get_ctt_size (fp, tp, NULL, &i->ctn_increment);
|
|
|
|
i->u.ctn_en = (const ctf_enum_t *) ((uintptr_t) tp +
|
|
|
|
i->ctn_increment);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
i->u.ctn_en = (const ctf_enum_t *) dtd->dtd_vlen;
|
|
|
|
}
|
|
|
|
|
|
|
|
this_name = ctf_strptr (fp, i->u.ctn_en->cte_name);
|
|
|
|
|
|
|
|
i->ctn_n--;
|
|
|
|
|
|
|
|
if (strcmp (name, this_name) == 0)
|
|
|
|
{
|
|
|
|
if (val)
|
|
|
|
*val = i->u.ctn_en->cte_value;
|
|
|
|
found = 1;
|
|
|
|
|
|
|
|
/* Constant found in this enum: try the next one. (Constant names
|
|
|
|
cannot be duplicated within a given enum.) */
|
|
|
|
|
|
|
|
i->ctn_n = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
i->u.ctn_en++;
|
|
|
|
}
|
|
|
|
while (!found);
|
|
|
|
|
|
|
|
return i->ctn_type;
|
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
typedef struct ctf_symidx_sort_arg_cb
|
|
|
|
{
|
|
|
|
ctf_dict_t *fp;
|
|
|
|
uint32_t *names;
|
|
|
|
} ctf_symidx_sort_arg_cb_t;
|
|
|
|
|
|
|
|
static int
|
|
|
|
sort_symidx_by_name (const void *one_, const void *two_, void *arg_)
|
|
|
|
{
|
|
|
|
const uint32_t *one = one_;
|
|
|
|
const uint32_t *two = two_;
|
|
|
|
ctf_symidx_sort_arg_cb_t *arg = arg_;
|
|
|
|
|
|
|
|
return (strcmp (ctf_strptr (arg->fp, arg->names[*one]),
|
|
|
|
ctf_strptr (arg->fp, arg->names[*two])));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Sort a symbol index section by name. Takes a 1:1 mapping of names to the
|
|
|
|
corresponding symbol table. Returns a lexicographically sorted array of idx
|
|
|
|
indexes (and thus, of indexes into the corresponding func info / data object
|
|
|
|
section). */
|
|
|
|
|
|
|
|
static uint32_t *
|
|
|
|
ctf_symidx_sort (ctf_dict_t *fp, uint32_t *idx, size_t *nidx,
|
|
|
|
size_t len)
|
|
|
|
{
|
|
|
|
uint32_t *sorted;
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
if ((sorted = malloc (len)) == NULL)
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
*nidx = len / sizeof (uint32_t);
|
|
|
|
for (i = 0; i < *nidx; i++)
|
|
|
|
sorted[i] = i;
|
|
|
|
|
|
|
|
if (!(fp->ctf_header->cth_flags & CTF_F_IDXSORTED))
|
|
|
|
{
|
|
|
|
ctf_symidx_sort_arg_cb_t arg = { fp, idx };
|
2024-04-02 23:09:11 +08:00
|
|
|
ctf_dprintf ("Index section unsorted: sorting.\n");
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_qsort_r (sorted, *nidx, sizeof (uint32_t), sort_symidx_by_name, &arg);
|
|
|
|
fp->ctf_header->cth_flags |= CTF_F_IDXSORTED;
|
|
|
|
}
|
|
|
|
|
|
|
|
return sorted;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Given a symbol index, return the name of that symbol from the table provided
|
|
|
|
by ctf_link_shuffle_syms, or failing that from the secondary string table, or
|
|
|
|
the null string. */
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
static const char *
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_lookup_symbol_name (ctf_dict_t *fp, unsigned long symidx)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
2024-01-05 20:17:27 +08:00
|
|
|
const ctf_sect_t *sp = &fp->ctf_ext_symtab;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_link_sym_t sym;
|
|
|
|
int err;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (fp->ctf_dynsymidx)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
err = EINVAL;
|
|
|
|
if (symidx > fp->ctf_dynsymmax)
|
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
ctf_link_sym_t *symp = fp->ctf_dynsymidx[symidx];
|
|
|
|
|
|
|
|
if (!symp)
|
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
return symp->st_name;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
err = ECTF_NOSYMTAB;
|
|
|
|
if (sp->cts_data == NULL)
|
|
|
|
goto try_parent;
|
|
|
|
|
2019-04-24 18:15:33 +08:00
|
|
|
if (symidx >= fp->ctf_nsyms)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
switch (sp->cts_entsize)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
case sizeof (Elf64_Sym):
|
|
|
|
{
|
|
|
|
const Elf64_Sym *symp = (Elf64_Sym *) sp->cts_data + symidx;
|
|
|
|
ctf_elf64_to_link_sym (fp, &sym, symp, symidx);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case sizeof (Elf32_Sym):
|
|
|
|
{
|
|
|
|
const Elf32_Sym *symp = (Elf32_Sym *) sp->cts_data + symidx;
|
|
|
|
ctf_elf32_to_link_sym (fp, &sym, symp, symidx);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
ctf_set_errno (fp, ECTF_SYMTAB);
|
2019-04-24 18:15:33 +08:00
|
|
|
return _CTF_NULLSTR;
|
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
assert (!sym.st_nameidx_set);
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return sym.st_name;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
try_parent:
|
|
|
|
if (fp->ctf_parent)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
{
|
|
|
|
const char *ret;
|
|
|
|
ret = ctf_lookup_symbol_name (fp->ctf_parent, symidx);
|
|
|
|
if (ret == NULL)
|
|
|
|
ctf_set_errno (fp, ctf_errno (fp->ctf_parent));
|
|
|
|
return ret;
|
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
else
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, err);
|
|
|
|
return _CTF_NULLSTR;
|
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
/* Given a symbol name, return the index of that symbol, or -1 on error or if
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
not found. If is_function is >= 0, return only function or data object
|
|
|
|
symbols, respectively. */
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
static unsigned long
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_lookup_symbol_idx (ctf_dict_t *fp, const char *symname, int try_parent,
|
|
|
|
int is_function)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
{
|
2024-01-05 20:17:27 +08:00
|
|
|
const ctf_sect_t *sp = &fp->ctf_ext_symtab;
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
ctf_link_sym_t sym;
|
|
|
|
void *known_idx;
|
|
|
|
int err;
|
|
|
|
ctf_dict_t *cache = fp;
|
|
|
|
|
|
|
|
if (fp->ctf_dynsyms)
|
|
|
|
{
|
|
|
|
err = EINVAL;
|
|
|
|
|
|
|
|
ctf_link_sym_t *symp;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (((symp = ctf_dynhash_lookup (fp->ctf_dynsyms, symname)) == NULL)
|
|
|
|
|| (symp->st_type != STT_OBJECT && is_function == 0)
|
|
|
|
|| (symp->st_type != STT_FUNC && is_function == 1))
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
return symp->st_symidx;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = ECTF_NOSYMTAB;
|
|
|
|
if (sp->cts_data == NULL)
|
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
/* First, try a hash lookup to see if we have already spotted this symbol
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
during a past iteration: create the hash first if need be. The
|
|
|
|
lifespan of the strings is equal to the lifespan of the cts_data, so we
|
|
|
|
don't need to strdup them. If this dict was opened as part of an
|
|
|
|
archive, and this archive has a crossdict_cache to cache results that
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
are the same across all dicts in an archive, use it. */
|
|
|
|
|
|
|
|
if (fp->ctf_archive && fp->ctf_archive->ctfi_crossdict_cache)
|
|
|
|
cache = fp->ctf_archive->ctfi_crossdict_cache;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (!cache->ctf_symhash_func)
|
|
|
|
if ((cache->ctf_symhash_func = ctf_dynhash_create (ctf_hash_string,
|
|
|
|
ctf_hash_eq_string,
|
|
|
|
NULL, NULL)) == NULL)
|
|
|
|
goto oom;
|
|
|
|
|
|
|
|
if (!cache->ctf_symhash_objt)
|
|
|
|
if ((cache->ctf_symhash_objt = ctf_dynhash_create (ctf_hash_string,
|
|
|
|
ctf_hash_eq_string,
|
|
|
|
NULL, NULL)) == NULL)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
goto oom;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (is_function != 0 &&
|
|
|
|
ctf_dynhash_lookup_kv (cache->ctf_symhash_func, symname, NULL, &known_idx))
|
|
|
|
return (unsigned long) (uintptr_t) known_idx;
|
|
|
|
|
|
|
|
if (is_function != 1 &&
|
|
|
|
ctf_dynhash_lookup_kv (cache->ctf_symhash_objt, symname, NULL, &known_idx))
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
return (unsigned long) (uintptr_t) known_idx;
|
|
|
|
|
|
|
|
/* Hash lookup unsuccessful: linear search, populating the hashtab for later
|
|
|
|
lookups as we go. */
|
|
|
|
|
|
|
|
for (; cache->ctf_symhash_latest < sp->cts_size / sp->cts_entsize;
|
|
|
|
cache->ctf_symhash_latest++)
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_dynhash_t *h;
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
switch (sp->cts_entsize)
|
|
|
|
{
|
|
|
|
case sizeof (Elf64_Sym):
|
|
|
|
{
|
|
|
|
Elf64_Sym *symp = (Elf64_Sym *) sp->cts_data;
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
ctf_elf64_to_link_sym (fp, &sym, &symp[cache->ctf_symhash_latest],
|
|
|
|
cache->ctf_symhash_latest);
|
2021-03-18 20:37:52 +08:00
|
|
|
}
|
|
|
|
break;
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
case sizeof (Elf32_Sym):
|
|
|
|
{
|
|
|
|
Elf32_Sym *symp = (Elf32_Sym *) sp->cts_data;
|
|
|
|
ctf_elf32_to_link_sym (fp, &sym, &symp[cache->ctf_symhash_latest],
|
|
|
|
cache->ctf_symhash_latest);
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
break;
|
2021-03-18 20:37:52 +08:00
|
|
|
}
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
default:
|
|
|
|
ctf_set_errno (fp, ECTF_SYMTAB);
|
|
|
|
return (unsigned long) -1;
|
|
|
|
}
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|
|
|
|
if (sym.st_type == STT_FUNC)
|
|
|
|
h = cache->ctf_symhash_func;
|
|
|
|
else if (sym.st_type == STT_OBJECT)
|
|
|
|
h = cache->ctf_symhash_objt;
|
|
|
|
else
|
|
|
|
continue; /* Not of interest. */
|
|
|
|
|
|
|
|
if (!ctf_dynhash_lookup_kv (h, sym.st_name,
|
|
|
|
NULL, NULL))
|
|
|
|
if (ctf_dynhash_cinsert (h, sym.st_name,
|
|
|
|
(const void *) (uintptr_t)
|
|
|
|
cache->ctf_symhash_latest) < 0)
|
|
|
|
goto oom;
|
|
|
|
if (strcmp (sym.st_name, symname) == 0)
|
|
|
|
return cache->ctf_symhash_latest++;
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Searched everything, still not found. */
|
|
|
|
|
|
|
|
return (unsigned long) -1;
|
|
|
|
|
|
|
|
try_parent:
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (fp->ctf_parent && try_parent)
|
2023-04-06 00:21:32 +08:00
|
|
|
{
|
|
|
|
unsigned long psym;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if ((psym = ctf_lookup_symbol_idx (fp->ctf_parent, symname, try_parent,
|
|
|
|
is_function))
|
2023-04-06 00:21:32 +08:00
|
|
|
!= (unsigned long) -1)
|
|
|
|
return psym;
|
|
|
|
|
|
|
|
ctf_set_errno (fp, ctf_errno (fp->ctf_parent));
|
|
|
|
return (unsigned long) -1;
|
|
|
|
}
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
else
|
|
|
|
{
|
|
|
|
ctf_set_errno (fp, err);
|
|
|
|
return (unsigned long) -1;
|
|
|
|
}
|
|
|
|
oom:
|
|
|
|
ctf_set_errno (fp, ENOMEM);
|
2024-04-12 21:46:00 +08:00
|
|
|
ctf_err_warn (fp, 0, 0, _("cannot allocate memory for symbol "
|
|
|
|
"lookup hashtab"));
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
return (unsigned long) -1;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_id_t
|
|
|
|
ctf_symbol_next_static (ctf_dict_t *fp, ctf_next_t **it, const char **name,
|
|
|
|
int functions);
|
|
|
|
|
|
|
|
/* Iterate over all symbols with types: if FUNC, function symbols,
|
|
|
|
otherwise, data symbols. The name argument is not optional. The return
|
|
|
|
order is arbitrary, though is likely to be in symbol index or name order.
|
|
|
|
Changing the value of 'functions' in the middle of iteration has
|
|
|
|
unpredictable effects (probably skipping symbols, etc) and is not
|
|
|
|
recommended. Adding symbols while iteration is underway may also lead
|
|
|
|
to other symbols being skipped. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
|
|
|
ctf_id_t
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_symbol_next (ctf_dict_t *fp, ctf_next_t **it, const char **name,
|
|
|
|
int functions)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
2023-03-14 01:32:53 +08:00
|
|
|
ctf_id_t sym = CTF_ERR;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_next_t *i = *it;
|
|
|
|
int err;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (!i)
|
|
|
|
{
|
|
|
|
if ((i = ctf_next_create ()) == NULL)
|
2023-09-13 17:02:36 +08:00
|
|
|
return ctf_set_typed_errno (fp, ENOMEM);
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
i->cu.ctn_fp = fp;
|
|
|
|
i->ctn_iter_fun = (void (*) (void)) ctf_symbol_next;
|
|
|
|
i->ctn_n = 0;
|
|
|
|
*it = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((void (*) (void)) ctf_symbol_next != i->ctn_iter_fun)
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFUN));
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (fp != i->cu.ctn_fp)
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFP));
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Check the dynamic set of names first, to allow previously-written names
|
|
|
|
to be replaced with dynamic ones (there is still no way to remove them,
|
|
|
|
though).
|
|
|
|
|
|
|
|
We intentionally use raw access, not ctf_lookup_by_symbol, to avoid
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
incurring additional sorting cost for unsorted symtypetabs coming from the
|
|
|
|
compiler, to allow ctf_symbol_next to work in the absence of a symtab, and
|
|
|
|
finally because it's easier to work out what the name of each symbol is if
|
|
|
|
we do that. */
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
ctf_dynhash_t *dynh = functions ? fp->ctf_funchash : fp->ctf_objthash;
|
|
|
|
void *dyn_name = NULL, *dyn_value = NULL;
|
|
|
|
size_t dyn_els = dynh ? ctf_dynhash_elements (dynh) : 0;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (i->ctn_n < dyn_els)
|
|
|
|
{
|
libctf, include: support unnamed structure members better
libctf has no intrinsic support for the GCC unnamed structure member
extension. This principally means that you can't look up named members
inside unnamed struct or union members via ctf_member_info: you have to
tiresomely find out the type ID of the unnamed members via iteration,
then look in each of these.
This is ridiculous. Fix it by extending ctf_member_info so that it
recurses into unnamed members for you: this is still unambiguous because
GCC won't let you create ambiguously-named members even in the presence
of this extension.
For consistency, and because the release hasn't happened and we can
still do this, break the ctf_member_next API and add flags: we specify
one flag, CTF_MN_RECURSE, which if set causes ctf_member_next to
automatically recurse into unnamed members for you, returning not only
the members themselves but all their contained members, so that you can
use ctf_member_next to identify every member that it would be valid to
call ctf_member_info with.
New lookup tests are added for all of this.
include/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (CTF_MN_RECURSE): New.
(ctf_member_next): Add flags argument.
libctf/ChangeLog
2021-01-05 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (struct ctf_next) <u.ctn_next>: Move to...
<ctn_next>: ... here.
* ctf-util.c (ctf_next_destroy): Unconditionally destroy it.
* ctf-lookup.c (ctf_symbol_next): Adjust accordingly.
* ctf-types.c (ctf_member_iter): Reimplement in terms of...
(ctf_member_next): ... this. Support recursive unnamed member
iteration (off by default).
(ctf_member_info): Look up members in unnamed sub-structs.
* ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_member_next call.
(ctf_dedup_emit_struct_members): Likewise.
* testsuite/libctf-lookup/struct-iteration-ctf.c: Test empty unnamed
members, and a normal member after the end.
* testsuite/libctf-lookup/struct-iteration.c: Verify that
ctf_member_count is consistent with the number of successful returns
from a non-recursive ctf_member_next.
* testsuite/libctf-lookup/struct-iteration-*: New, test iteration
over struct members.
* testsuite/libctf-lookup/struct-lookup.c: New test.
* testsuite/libctf-lookup/struct-lookup.lk: New test.
2021-01-05 21:25:56 +08:00
|
|
|
err = ctf_dynhash_next (dynh, &i->ctn_next, &dyn_name, &dyn_value);
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* This covers errors and also end-of-iteration. */
|
|
|
|
if (err != 0)
|
|
|
|
{
|
|
|
|
ctf_next_destroy (i);
|
|
|
|
*it = NULL;
|
2023-09-13 17:02:36 +08:00
|
|
|
return ctf_set_typed_errno (fp, err);
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
*name = dyn_name;
|
|
|
|
sym = (ctf_id_t) (uintptr_t) dyn_value;
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
i->ctn_n++;
|
|
|
|
|
|
|
|
return sym;
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|
|
|
|
return ctf_symbol_next_static (fp, it, name, functions);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* ctf_symbol_next, but only for static symbols. Mostly an internal
|
|
|
|
implementation detail of ctf_symbol_next, but also used to simplify
|
|
|
|
serialization. */
|
|
|
|
ctf_id_t
|
|
|
|
ctf_symbol_next_static (ctf_dict_t *fp, ctf_next_t **it, const char **name,
|
|
|
|
int functions)
|
|
|
|
{
|
|
|
|
ctf_id_t sym = CTF_ERR;
|
|
|
|
ctf_next_t *i = *it;
|
|
|
|
ctf_dynhash_t *dynh = functions ? fp->ctf_funchash : fp->ctf_objthash;
|
|
|
|
size_t dyn_els = dynh ? ctf_dynhash_elements (dynh) : 0;
|
|
|
|
|
|
|
|
/* Only relevant for direct internal-to-library calls, not via
|
|
|
|
ctf_symbol_next (but important then). */
|
|
|
|
|
|
|
|
if (!i)
|
|
|
|
{
|
|
|
|
if ((i = ctf_next_create ()) == NULL)
|
|
|
|
return ctf_set_typed_errno (fp, ENOMEM);
|
|
|
|
|
|
|
|
i->cu.ctn_fp = fp;
|
|
|
|
i->ctn_iter_fun = (void (*) (void)) ctf_symbol_next;
|
|
|
|
i->ctn_n = dyn_els;
|
|
|
|
*it = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((void (*) (void)) ctf_symbol_next != i->ctn_iter_fun)
|
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFUN));
|
|
|
|
|
|
|
|
if (fp != i->cu.ctn_fp)
|
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_WRONGFP));
|
|
|
|
|
|
|
|
/* TODO-v4: Indexed after non-indexed portions? */
|
|
|
|
|
|
|
|
if ((!functions && fp->ctf_objtidx_names) ||
|
|
|
|
(functions && fp->ctf_funcidx_names))
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_header_t *hp = fp->ctf_header;
|
|
|
|
uint32_t *idx = functions ? fp->ctf_funcidx_names : fp->ctf_objtidx_names;
|
|
|
|
uint32_t *tab;
|
|
|
|
size_t len;
|
|
|
|
|
|
|
|
if (functions)
|
|
|
|
{
|
|
|
|
len = (hp->cth_varoff - hp->cth_funcidxoff) / sizeof (uint32_t);
|
|
|
|
tab = (uint32_t *) (fp->ctf_buf + hp->cth_funcoff);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
len = (hp->cth_funcidxoff - hp->cth_objtidxoff) / sizeof (uint32_t);
|
|
|
|
tab = (uint32_t *) (fp->ctf_buf + hp->cth_objtoff);
|
|
|
|
}
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (i->ctn_n - dyn_els >= len)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto end;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
*name = ctf_strptr (fp, idx[i->ctn_n - dyn_els]);
|
|
|
|
sym = tab[i->ctn_n - dyn_els];
|
|
|
|
i->ctn_n++;
|
2021-03-18 20:37:52 +08:00
|
|
|
}
|
|
|
|
while (sym == -1u || sym == 0);
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
else
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Skip over pads in ctf_sxlate, padding for typeless symbols in the
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
symtypetab itself, and symbols in the wrong table. */
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
for (; i->ctn_n - dyn_els < fp->ctf_nsyms; i->ctn_n++)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
{
|
|
|
|
ctf_header_t *hp = fp->ctf_header;
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
size_t n = i->ctn_n - dyn_els;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (fp->ctf_sxlate[n] == -1u)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
continue;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
sym = *(uint32_t *) ((uintptr_t) fp->ctf_buf + fp->ctf_sxlate[n]);
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (sym == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (functions)
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (fp->ctf_sxlate[n] >= hp->cth_funcoff
|
|
|
|
&& fp->ctf_sxlate[n] < hp->cth_objtidxoff)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (fp->ctf_sxlate[n] >= hp->cth_objtoff
|
|
|
|
&& fp->ctf_sxlate[n] < hp->cth_funcoff)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (i->ctn_n - dyn_els >= fp->ctf_nsyms)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto end;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
*name = ctf_lookup_symbol_name (fp, i->ctn_n - dyn_els);
|
|
|
|
i->ctn_n++;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return sym;
|
|
|
|
|
|
|
|
end:
|
|
|
|
ctf_next_destroy (i);
|
|
|
|
*it = NULL;
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_NEXT_END));
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* A bsearch function for function and object index names. */
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctf_lookup_idx_name (const void *key_, const void *idx_)
|
libctf, next: introduce new class of easier-to-use iterators
The libctf machinery currently only provides one way to iterate over its
data structures: ctf_*_iter functions that take a callback and an arg
and repeatedly call it.
This *works*, but if you are doing a lot of iteration it is really quite
inconvenient: you have to package up your local variables into
structures over and over again and spawn lots of little functions even
if it would be clearer in a single run of code. Look at ctf-string.c
for an extreme example of how unreadable this can get, with
three-line-long functions proliferating wildly.
The deduplicator takes this to the Nth level. It iterates over a whole
bunch of things: if we'd had to use _iter-class iterators for all of
them there would be twenty additional functions in the deduplicator
alone, for no other reason than that the iterator API requires it.
Let's do something better. strtok_r gives us half the design: generators
in a number of other languages give us the other half.
The *_next API allows you to iterate over CTF-like entities in a single
function using a normal while loop. e.g. here we are iterating over all
the types in a dict:
ctf_next_t *i = NULL;
int *hidden;
ctf_id_t id;
while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR)
{
/* do something with 'hidden' and 'id' */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Here we are walking through the members of a struct with CTF ID
'struct_type':
ctf_next_t *i = NULL;
ssize_t offset;
const char *name;
ctf_id_t membtype;
while ((offset = ctf_member_next (fp, struct_type, &i, &name,
&membtype)) >= 0
{
/* do something with offset, name, and membtype */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Like every other while loop, this means you have access to all the local
variables outside the loop while inside it, with no need to tiresomely
package things up in structures, move the body of the loop into a
separate function, etc, as you would with an iterator taking a callback.
ctf_*_next allocates 'i' for you on first entry (when it must be NULL),
and frees and NULLs it and returns a _next-dependent flag value when the
iteration is over: the fp errno is set to ECTF_NEXT_END when the
iteartion ends normally. If you want to exit early, call
ctf_next_destroy on the iterator. You can copy iterators using
ctf_next_copy, which copies their current iteration position so you can
remember loop positions and go back to them later (or ctf_next_destroy
them if you don't need them after all).
Each _next function returns an always-likely-to-be-useful property of
the thing being iterated over, and takes pointers to parameters for the
others: with very few exceptions all those parameters can be NULLs if
you're not interested in them, so e.g. you can iterate over only the
offsets of members of a structure this way:
while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0)
If you pass an iterator in use by one iteration function to another one,
you get the new error ECTF_NEXT_WRONGFUN back; if you try to change
ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back.
Internally the ctf_next_t remembers the iteration function in use,
various sizes and increments useful for almost all iterations, then
uses unions to overlap the actual entities being iterated over to keep
ctf_next_t size down.
Iterators available in the public API so far (all tested in actual use
in the deduplicator):
/* Iterate over the members of a STRUCT or UNION, returning each member's
offset and optionally name and member type in turn. On end-of-iteration,
returns -1. */
ssize_t
ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
const char **name, ctf_id_t *membtype);
/* Iterate over the members of an enum TYPE, returning each enumerand's
NAME or NULL at end of iteration or error, and optionally passing
back the enumerand's integer VALue. */
const char *
ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
int *val);
/* Iterate over every type in the given CTF container (not including
parents), optionally including non-user-visible types, returning
each type ID and optionally the hidden flag in turn. Returns CTF_ERR
on end of iteration or error. */
ctf_id_t
ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag,
int want_hidden);
/* Iterate over every variable in the given CTF container, in arbitrary
order, returning the name and type of each variable in turn. The
NAME argument is not optional. Returns CTF_ERR on end of iteration
or error. */
ctf_id_t
ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name);
/* Iterate over all CTF files in an archive, returning each dict in turn as a
ctf_file_t, and NULL on error or end of iteration. It is the caller's
responsibility to close it. Parent dicts may be skipped. Regardless of
whether they are skipped or not, the caller must ctf_import the parent if
need be. */
ctf_file_t *
ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it,
const char **name, int skip_parent, int *errp);
ctf_label_next is prototyped but not implemented yet.
include/
* ctf-api.h (ECTF_NEXT_END): New error.
(ECTF_NEXT_WRONGFUN): Likewise.
(ECTF_NEXT_WRONGFP): Likewise.
(ECTF_NERR): Adjust.
(ctf_next_t): New.
(ctf_next_create): New prototype.
(ctf_next_destroy): Likewise.
(ctf_next_copy): Likewise.
(ctf_member_next): Likewise.
(ctf_enum_next): Likewise.
(ctf_type_next): Likewise.
(ctf_label_next): Likewise.
(ctf_variable_next): Likewise.
libctf/
* ctf-impl.h (ctf_next): New.
(ctf_get_dict): New prototype.
* ctf-lookup.c (ctf_get_dict): New, split out of...
(ctf_lookup_by_id): ... here.
* ctf-util.c (ctf_next_create): New.
(ctf_next_destroy): New.
(ctf_next_copy): New.
* ctf-types.c (includes): Add <assert.h>.
(ctf_member_next): New.
(ctf_enum_next): New.
(ctf_type_iter): Document the lack of iteration over parent
types.
(ctf_type_next): New.
(ctf_variable_next): New.
* ctf-archive.c (ctf_archive_next): New.
* libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
const ctf_lookup_idx_key_t *key = key_;
|
|
|
|
const uint32_t *idx = idx_;
|
libctf, next: introduce new class of easier-to-use iterators
The libctf machinery currently only provides one way to iterate over its
data structures: ctf_*_iter functions that take a callback and an arg
and repeatedly call it.
This *works*, but if you are doing a lot of iteration it is really quite
inconvenient: you have to package up your local variables into
structures over and over again and spawn lots of little functions even
if it would be clearer in a single run of code. Look at ctf-string.c
for an extreme example of how unreadable this can get, with
three-line-long functions proliferating wildly.
The deduplicator takes this to the Nth level. It iterates over a whole
bunch of things: if we'd had to use _iter-class iterators for all of
them there would be twenty additional functions in the deduplicator
alone, for no other reason than that the iterator API requires it.
Let's do something better. strtok_r gives us half the design: generators
in a number of other languages give us the other half.
The *_next API allows you to iterate over CTF-like entities in a single
function using a normal while loop. e.g. here we are iterating over all
the types in a dict:
ctf_next_t *i = NULL;
int *hidden;
ctf_id_t id;
while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR)
{
/* do something with 'hidden' and 'id' */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Here we are walking through the members of a struct with CTF ID
'struct_type':
ctf_next_t *i = NULL;
ssize_t offset;
const char *name;
ctf_id_t membtype;
while ((offset = ctf_member_next (fp, struct_type, &i, &name,
&membtype)) >= 0
{
/* do something with offset, name, and membtype */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Like every other while loop, this means you have access to all the local
variables outside the loop while inside it, with no need to tiresomely
package things up in structures, move the body of the loop into a
separate function, etc, as you would with an iterator taking a callback.
ctf_*_next allocates 'i' for you on first entry (when it must be NULL),
and frees and NULLs it and returns a _next-dependent flag value when the
iteration is over: the fp errno is set to ECTF_NEXT_END when the
iteartion ends normally. If you want to exit early, call
ctf_next_destroy on the iterator. You can copy iterators using
ctf_next_copy, which copies their current iteration position so you can
remember loop positions and go back to them later (or ctf_next_destroy
them if you don't need them after all).
Each _next function returns an always-likely-to-be-useful property of
the thing being iterated over, and takes pointers to parameters for the
others: with very few exceptions all those parameters can be NULLs if
you're not interested in them, so e.g. you can iterate over only the
offsets of members of a structure this way:
while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0)
If you pass an iterator in use by one iteration function to another one,
you get the new error ECTF_NEXT_WRONGFUN back; if you try to change
ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back.
Internally the ctf_next_t remembers the iteration function in use,
various sizes and increments useful for almost all iterations, then
uses unions to overlap the actual entities being iterated over to keep
ctf_next_t size down.
Iterators available in the public API so far (all tested in actual use
in the deduplicator):
/* Iterate over the members of a STRUCT or UNION, returning each member's
offset and optionally name and member type in turn. On end-of-iteration,
returns -1. */
ssize_t
ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
const char **name, ctf_id_t *membtype);
/* Iterate over the members of an enum TYPE, returning each enumerand's
NAME or NULL at end of iteration or error, and optionally passing
back the enumerand's integer VALue. */
const char *
ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
int *val);
/* Iterate over every type in the given CTF container (not including
parents), optionally including non-user-visible types, returning
each type ID and optionally the hidden flag in turn. Returns CTF_ERR
on end of iteration or error. */
ctf_id_t
ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag,
int want_hidden);
/* Iterate over every variable in the given CTF container, in arbitrary
order, returning the name and type of each variable in turn. The
NAME argument is not optional. Returns CTF_ERR on end of iteration
or error. */
ctf_id_t
ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name);
/* Iterate over all CTF files in an archive, returning each dict in turn as a
ctf_file_t, and NULL on error or end of iteration. It is the caller's
responsibility to close it. Parent dicts may be skipped. Regardless of
whether they are skipped or not, the caller must ctf_import the parent if
need be. */
ctf_file_t *
ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it,
const char **name, int skip_parent, int *errp);
ctf_label_next is prototyped but not implemented yet.
include/
* ctf-api.h (ECTF_NEXT_END): New error.
(ECTF_NEXT_WRONGFUN): Likewise.
(ECTF_NEXT_WRONGFP): Likewise.
(ECTF_NERR): Adjust.
(ctf_next_t): New.
(ctf_next_create): New prototype.
(ctf_next_destroy): Likewise.
(ctf_next_copy): Likewise.
(ctf_member_next): Likewise.
(ctf_enum_next): Likewise.
(ctf_type_next): Likewise.
(ctf_label_next): Likewise.
(ctf_variable_next): Likewise.
libctf/
* ctf-impl.h (ctf_next): New.
(ctf_get_dict): New prototype.
* ctf-lookup.c (ctf_get_dict): New, split out of...
(ctf_lookup_by_id): ... here.
* ctf-util.c (ctf_next_create): New.
(ctf_next_destroy): New.
(ctf_next_copy): New.
* ctf-types.c (includes): Add <assert.h>.
(ctf_member_next): New.
(ctf_enum_next): New.
(ctf_type_iter): Document the lack of iteration over parent
types.
(ctf_type_next): New.
(ctf_variable_next): New.
* ctf-archive.c (ctf_archive_next): New.
* libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return (strcmp (key->clik_name, ctf_strptr (key->clik_fp, key->clik_names[*idx])));
|
libctf, next: introduce new class of easier-to-use iterators
The libctf machinery currently only provides one way to iterate over its
data structures: ctf_*_iter functions that take a callback and an arg
and repeatedly call it.
This *works*, but if you are doing a lot of iteration it is really quite
inconvenient: you have to package up your local variables into
structures over and over again and spawn lots of little functions even
if it would be clearer in a single run of code. Look at ctf-string.c
for an extreme example of how unreadable this can get, with
three-line-long functions proliferating wildly.
The deduplicator takes this to the Nth level. It iterates over a whole
bunch of things: if we'd had to use _iter-class iterators for all of
them there would be twenty additional functions in the deduplicator
alone, for no other reason than that the iterator API requires it.
Let's do something better. strtok_r gives us half the design: generators
in a number of other languages give us the other half.
The *_next API allows you to iterate over CTF-like entities in a single
function using a normal while loop. e.g. here we are iterating over all
the types in a dict:
ctf_next_t *i = NULL;
int *hidden;
ctf_id_t id;
while ((id = ctf_type_next (fp, &i, &hidden, 1)) != CTF_ERR)
{
/* do something with 'hidden' and 'id' */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Here we are walking through the members of a struct with CTF ID
'struct_type':
ctf_next_t *i = NULL;
ssize_t offset;
const char *name;
ctf_id_t membtype;
while ((offset = ctf_member_next (fp, struct_type, &i, &name,
&membtype)) >= 0
{
/* do something with offset, name, and membtype */
}
if (ctf_errno (fp) != ECTF_NEXT_END)
/* iteration error */
Like every other while loop, this means you have access to all the local
variables outside the loop while inside it, with no need to tiresomely
package things up in structures, move the body of the loop into a
separate function, etc, as you would with an iterator taking a callback.
ctf_*_next allocates 'i' for you on first entry (when it must be NULL),
and frees and NULLs it and returns a _next-dependent flag value when the
iteration is over: the fp errno is set to ECTF_NEXT_END when the
iteartion ends normally. If you want to exit early, call
ctf_next_destroy on the iterator. You can copy iterators using
ctf_next_copy, which copies their current iteration position so you can
remember loop positions and go back to them later (or ctf_next_destroy
them if you don't need them after all).
Each _next function returns an always-likely-to-be-useful property of
the thing being iterated over, and takes pointers to parameters for the
others: with very few exceptions all those parameters can be NULLs if
you're not interested in them, so e.g. you can iterate over only the
offsets of members of a structure this way:
while ((offset = ctf_member_next (fp, struct_id, &i, NULL, NULL)) >= 0)
If you pass an iterator in use by one iteration function to another one,
you get the new error ECTF_NEXT_WRONGFUN back; if you try to change
ctf_file_t in mid-iteration, you get ECTF_NEXT_WRONGFP back.
Internally the ctf_next_t remembers the iteration function in use,
various sizes and increments useful for almost all iterations, then
uses unions to overlap the actual entities being iterated over to keep
ctf_next_t size down.
Iterators available in the public API so far (all tested in actual use
in the deduplicator):
/* Iterate over the members of a STRUCT or UNION, returning each member's
offset and optionally name and member type in turn. On end-of-iteration,
returns -1. */
ssize_t
ctf_member_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
const char **name, ctf_id_t *membtype);
/* Iterate over the members of an enum TYPE, returning each enumerand's
NAME or NULL at end of iteration or error, and optionally passing
back the enumerand's integer VALue. */
const char *
ctf_enum_next (ctf_file_t *fp, ctf_id_t type, ctf_next_t **it,
int *val);
/* Iterate over every type in the given CTF container (not including
parents), optionally including non-user-visible types, returning
each type ID and optionally the hidden flag in turn. Returns CTF_ERR
on end of iteration or error. */
ctf_id_t
ctf_type_next (ctf_file_t *fp, ctf_next_t **it, int *flag,
int want_hidden);
/* Iterate over every variable in the given CTF container, in arbitrary
order, returning the name and type of each variable in turn. The
NAME argument is not optional. Returns CTF_ERR on end of iteration
or error. */
ctf_id_t
ctf_variable_next (ctf_file_t *fp, ctf_next_t **it, const char **name);
/* Iterate over all CTF files in an archive, returning each dict in turn as a
ctf_file_t, and NULL on error or end of iteration. It is the caller's
responsibility to close it. Parent dicts may be skipped. Regardless of
whether they are skipped or not, the caller must ctf_import the parent if
need be. */
ctf_file_t *
ctf_archive_next (const ctf_archive_t *wrapper, ctf_next_t **it,
const char **name, int skip_parent, int *errp);
ctf_label_next is prototyped but not implemented yet.
include/
* ctf-api.h (ECTF_NEXT_END): New error.
(ECTF_NEXT_WRONGFUN): Likewise.
(ECTF_NEXT_WRONGFP): Likewise.
(ECTF_NERR): Adjust.
(ctf_next_t): New.
(ctf_next_create): New prototype.
(ctf_next_destroy): Likewise.
(ctf_next_copy): Likewise.
(ctf_member_next): Likewise.
(ctf_enum_next): Likewise.
(ctf_type_next): Likewise.
(ctf_label_next): Likewise.
(ctf_variable_next): Likewise.
libctf/
* ctf-impl.h (ctf_next): New.
(ctf_get_dict): New prototype.
* ctf-lookup.c (ctf_get_dict): New, split out of...
(ctf_lookup_by_id): ... here.
* ctf-util.c (ctf_next_create): New.
(ctf_next_destroy): New.
(ctf_next_copy): New.
* ctf-types.c (includes): Add <assert.h>.
(ctf_member_next): New.
(ctf_enum_next): New.
(ctf_type_iter): Document the lack of iteration over parent
types.
(ctf_type_next): New.
(ctf_variable_next): New.
* ctf-archive.c (ctf_archive_next): New.
* libctf.ver: Add new public functions.
2020-06-03 22:13:24 +08:00
|
|
|
}
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
/* Given a symbol name or (failing that) number, look up that symbol in the
|
|
|
|
function or object index table (which must exist). Return 0 if not found
|
|
|
|
there (or pad). */
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
static ctf_id_t
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
ctf_try_lookup_indexed (ctf_dict_t *fp, unsigned long symidx,
|
|
|
|
const char *symname, int is_function)
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
struct ctf_header *hp = fp->ctf_header;
|
|
|
|
uint32_t *symtypetab;
|
|
|
|
uint32_t *names;
|
|
|
|
uint32_t *sxlate;
|
|
|
|
size_t nidx;
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (symname == NULL)
|
|
|
|
symname = ctf_lookup_symbol_name (fp, symidx);
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Dynamic dict with no static portion: just return. */
|
|
|
|
if (!hp)
|
|
|
|
{
|
|
|
|
ctf_dprintf ("%s not found in idx: dict is dynamic\n", symname);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
ctf_dprintf ("Looking up type of object with symtab idx %lx or name %s in "
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
"indexed symtypetab\n", symidx, symname);
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (symname[0] == '\0')
|
2023-09-13 17:02:36 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (is_function)
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (!fp->ctf_funcidx_sxlate)
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if ((fp->ctf_funcidx_sxlate
|
|
|
|
= ctf_symidx_sort (fp, (uint32_t *)
|
|
|
|
(fp->ctf_buf + hp->cth_funcidxoff),
|
|
|
|
&fp->ctf_nfuncidx,
|
|
|
|
hp->cth_varoff - hp->cth_funcidxoff))
|
|
|
|
== NULL)
|
|
|
|
{
|
|
|
|
ctf_err_warn (fp, 0, 0, _("cannot sort function symidx"));
|
2023-09-13 17:02:36 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
}
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
symtypetab = (uint32_t *) (fp->ctf_buf + hp->cth_funcoff);
|
|
|
|
sxlate = fp->ctf_funcidx_sxlate;
|
|
|
|
names = fp->ctf_funcidx_names;
|
|
|
|
nidx = fp->ctf_nfuncidx;
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
else
|
|
|
|
{
|
|
|
|
if (!fp->ctf_objtidx_sxlate)
|
|
|
|
{
|
|
|
|
if ((fp->ctf_objtidx_sxlate
|
|
|
|
= ctf_symidx_sort (fp, (uint32_t *)
|
|
|
|
(fp->ctf_buf + hp->cth_objtidxoff),
|
|
|
|
&fp->ctf_nobjtidx,
|
|
|
|
hp->cth_funcidxoff - hp->cth_objtidxoff))
|
|
|
|
== NULL)
|
|
|
|
{
|
|
|
|
ctf_err_warn (fp, 0, 0, _("cannot sort object symidx"));
|
2023-09-13 17:02:36 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
}
|
|
|
|
}
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
symtypetab = (uint32_t *) (fp->ctf_buf + hp->cth_objtoff);
|
|
|
|
sxlate = fp->ctf_objtidx_sxlate;
|
|
|
|
names = fp->ctf_objtidx_names;
|
|
|
|
nidx = fp->ctf_nobjtidx;
|
|
|
|
}
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_lookup_idx_key_t key = { fp, symname, names };
|
|
|
|
uint32_t *idx;
|
|
|
|
|
|
|
|
idx = bsearch (&key, sxlate, nidx, sizeof (uint32_t), ctf_lookup_idx_name);
|
|
|
|
|
|
|
|
if (!idx)
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_dprintf ("%s not found in idx\n", symname);
|
|
|
|
return 0;
|
libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen. Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with. The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too). The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type. Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.
Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.
This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards. These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*. So all of a sudden we are changing types that already exist in
the static portion. ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it. You get an
assertion failure after that, if you're lucky, or a coredump.
So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear". The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones. The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)
This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types). You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.
There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types. This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.
You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.
Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard. It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.
With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.
v5: fix tabdamage.
libctf/
* ctf-impl.h (ctf_names_t): New.
(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
(ctf_file_t) <ctf_structs>: Likewise.
<ctf_unions>: Likewise.
<ctf_enums>: Likewise.
<ctf_names>: Likewise.
<ctf_lookups>: Improve comment.
<ctf_ptrtab_len>: New.
<ctf_prov_strtab>: New.
<ctf_str_prov_offset>: New.
<ctf_dtbyname>: Remove, redundant to the names hashes.
<ctf_dtnextid>: Remove, redundant to ctf_typemax.
(ctf_dtdef_t) <dtd_name>: Remove.
<dtd_data>: Note that the ctt_name is now populated.
(ctf_str_atom_t) <csa_offset>: This is now the strtab
offset for internal strings too.
<csa_external_offset>: New, the external strtab offset.
(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
(ctf_name_table): New declaration.
(ctf_lookup_by_rawname): Likewise.
(ctf_lookup_by_rawhash): Likewise.
(ctf_set_ctl_hashes): Likewise.
(ctf_serialize): Likewise.
(ctf_dtd_insert): Adjust.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen_internal): Likewise.
(ctf_list_empty_p): Likewise.
(ctf_str_remove_ref): Likewise.
(ctf_str_add): Returns uint32_t now.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Now returns a boolean (int).
* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
for strings in the appropriate range.
(ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM
when adding the null string to the new strtab.
(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
(ctf_str_add_ref_internal): Add make_provisional argument. If
make_provisional, populate the offset and fill in the
ctf_prov_strtab accordingly.
(ctf_str_add): Return the offset, not the string.
(ctf_str_add_ref): Likewise.
(ctf_str_add_external): Return a success integer.
(ctf_str_remove_ref): New, remove a single ref.
(ctf_str_count_strtab): Do not count the initial null string's
length or the existence or length of any unreferenced internal
atoms.
(ctf_str_populate_sorttab): Skip atoms with no refs.
(ctf_str_write_strtab): Populate the nullstr earlier. Add one
to the cts_len for the null string, since it is no longer done
in ctf_str_count_strtab. Adjust for csa_external_offset rename.
Populate the csa_offset for both internal and external cases.
Flush the ctf_prov_strtab afterwards, and reset the
ctf_str_prov_offset.
* ctf-create.c (ctf_grow_ptrtab): New.
(ctf_create): Call it. Initialize new fields rather than old
ones. Tell ctf_bufopen_internal that this is a writable dictionary.
Set the ctl hashes and data model.
(ctf_update): Rename to...
(ctf_serialize): ... this. Leave a compatibility function behind.
Tell ctf_simple_open_internal that this is a writable dictionary.
Pass the new fields along from the old dictionary. Drop
ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name.
Do not zero out the DTD's ctt_name.
(ctf_prefixed_name): Rename to...
(ctf_name_table): ... this. No longer return a prefixed name: return
the applicable name table instead.
(ctf_dtd_insert): Use it, and use the right name table. Pass in the
kind we're adding. Migrate away from dtd_name.
(ctf_dtd_delete): Adjust similarly. Remove the ref to the
deleted ctt_name.
(ctf_dtd_lookup_type_by_name): Remove.
(ctf_dynamic_type): Always return NULL on read-only dictionaries.
No longer check ctf_dtnextid: check ctf_typemax instead.
(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
(ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use
ctf_name_table and the right name table, and migrate away from
dtd_name as in ctf_dtd_delete.
(ctf_add_generic): Pass in the kind explicitly and pass it to
ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away
from dtd_name to using ctf_str_add_ref to populate the ctt_name.
Grow the ptrtab if needed.
(ctf_add_encoded): Pass in the kind.
(ctf_add_slice): Likewise.
(ctf_add_array): Likewise.
(ctf_add_function): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
ctt_name rather than dtd_name.
(ctf_add_struct_sized): Pass in the kind. Use
ctf_lookup_by_rawname, not ctf_hash_lookup_type /
ctf_dtd_lookup_type_by_name.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_type): Likewise.
(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
being initialized until after the call.
(ctf_write_mem): Likewise.
(ctf_write): Likewise.
* ctf-archive.c (arc_write_one_ctf): Likewise.
* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
ctf_hash_lookup_type.
(ctf_lookup_by_id): No longer check the readonly types if the
dictionary is writable.
* ctf-open.c (init_types): Assert that this dictionary is not
writable. Adjust to use the new name hashes, ctf_name_table,
and ctf_ptrtab_len. GNU style fix for the final ptrtab scan.
(ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR
if set. Drop out early when dictionary is writable. Split the
ctf_lookups initialization into...
(ctf_set_cth_hashes): ... this new function.
(ctf_simple_open_internal): Adjust. New 'writable' parameter.
(ctf_simple_open): Adjust accordingly.
(ctf_bufopen): Likewise.
(ctf_file_close): Destroy the appropriate name hashes. No longer
destroy ctf_dtbyname, which is gone.
(ctf_getdatasect): Remove spurious "extern".
* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
specified name table, given a kind.
(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
(ctf_member_iter): Add support for iterating over the
dynamic type list.
(ctf_enum_iter): Likewise.
(ctf_variable_iter): Likewise.
(ctf_type_rvisit): Likewise.
(ctf_member_info): Add support for types in the dynamic type list.
(ctf_enum_name): Likewise.
(ctf_enum_value): Likewise.
(ctf_func_type_info): Likewise.
(ctf_func_type_args): Likewise.
* ctf-link.c (ctf_accumulate_archive_names): No longer call
ctf_update.
(ctf_link_write): Likewise.
(ctf_link_intern_extern_string): Adjust for new
ctf_str_add_external return value.
(ctf_link_add_strtab): Likewise.
* ctf-util.c (ctf_list_empty_p): New.
2019-08-08 00:55:09 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* Should be impossible, but be paranoid. */
|
|
|
|
if ((idx - sxlate) > (ptrdiff_t) nidx)
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, ECTF_CORRUPT));
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
|
|
|
ctf_dprintf ("Symbol %lx (%s) is of type %x\n", symidx, symname,
|
|
|
|
symtypetab[*idx]);
|
|
|
|
return symtypetab[*idx];
|
libctf: creation functions
The CTF creation process looks roughly like (error handling elided):
int err;
ctf_file_t *foo = ctf_create (&err);
ctf_id_t type = ctf_add_THING (foo, ...);
ctf_update (foo);
ctf_*write (...);
Some ctf_add_THING functions accept other type IDs as arguments,
depending on the type: cv-quals, pointers, and structure and union
members all take other types as arguments. So do 'slices', which
let you take an existing integral type and recast it as a type
with a different bitness or offset within a byte, for bitfields.
One class of THING is not a type: "variables", which are mappings
of names (in the internal string table) to types. These are mostly
useful when encoding variables that do not appear in a symbol table
but which some external user has some other way to figure out the
address of at runtime (dynamic symbol lookup or querying a VM
interpreter or something).
You can snapshot the creation process at any point: rolling back to a
snapshot deletes all types and variables added since that point.
You can make arbitrary type queries on the CTF container during the
creation process, but you must call ctf_update() first, which
translates the growing dynamic container into a static one (this uses
the CTF opening machinery, added in a later commit), which is quite
expensive. This function must also be called after adding types
and before writing the container out.
Because addition of types involves looking up existing types, we add a
little of the type lookup machinery here, as well: only enough to
look up types in dynamic containers under construction.
libctf/
* ctf-create.c: New file.
* ctf-lookup.c: New file.
include/
* ctf-api.h (zlib.h): New include.
(ctf_sect_t): New.
(ctf_sect_names_t): Likewise.
(ctf_encoding_t): Likewise.
(ctf_membinfo_t): Likewise.
(ctf_arinfo_t): Likewise.
(ctf_funcinfo_t): Likewise.
(ctf_lblinfo_t): Likewise.
(ctf_snapshot_id_t): Likewise.
(CTF_FUNC_VARARG): Likewise.
(ctf_simple_open): Likewise.
(ctf_bufopen): Likewise.
(ctf_create): Likewise.
(ctf_add_array): Likewise.
(ctf_add_const): Likewise.
(ctf_add_enum_encoded): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_float): Likewise.
(ctf_add_forward): Likewise.
(ctf_add_function): Likewise.
(ctf_add_integer): Likewise.
(ctf_add_slice): Likewise.
(ctf_add_pointer): Likewise.
(ctf_add_type): Likewise.
(ctf_add_typedef): Likewise.
(ctf_add_restrict): Likewise.
(ctf_add_struct): Likewise.
(ctf_add_union): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_volatile): Likewise.
(ctf_add_enumerator): Likewise.
(ctf_add_member): Likewise.
(ctf_add_member_offset): Likewise.
(ctf_add_member_encoded): Likewise.
(ctf_add_variable): Likewise.
(ctf_set_array): Likewise.
(ctf_update): Likewise.
(ctf_snapshot): Likewise.
(ctf_rollback): Likewise.
(ctf_discard): Likewise.
(ctf_write): Likewise.
(ctf_gzwrite): Likewise.
(ctf_compress_write): Likewise.
2019-04-24 05:45:46 +08:00
|
|
|
}
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
/* Given a symbol name or (if NULL) symbol index, return the type of the
|
|
|
|
function or data object described by the corresponding entry in the symbol
|
|
|
|
table. We can only return symbols in read-only dicts and in dicts for which
|
|
|
|
ctf_link_shuffle_syms has been called to assign symbol indexes to symbol
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
names.
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
If try_parent is false, do not check the parent dict too.
|
|
|
|
|
|
|
|
If is_function is > -1, only look for data objects or functions in
|
|
|
|
particular. */
|
|
|
|
|
|
|
|
ctf_id_t
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
ctf_lookup_by_sym_or_name (ctf_dict_t *fp, unsigned long symidx,
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
const char *symname, int try_parent,
|
|
|
|
int is_function)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
2024-01-05 20:17:27 +08:00
|
|
|
const ctf_sect_t *sp = &fp->ctf_ext_symtab;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_id_t type = 0;
|
|
|
|
int err = 0;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Shuffled dynsymidx present? Use that. For now, the dynsymidx and
|
|
|
|
shuffled-symbol lookup only support dynamically-added symbols, because
|
|
|
|
this interface is meant for use by linkers, and linkers are only going
|
|
|
|
to report symbols against newly-created, freshly-ctf_link'ed dicts: so
|
|
|
|
there will be no static component in any case. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (fp->ctf_dynsymidx)
|
|
|
|
{
|
|
|
|
const ctf_link_sym_t *sym;
|
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (symname)
|
|
|
|
ctf_dprintf ("Looking up type of object with symname %s in "
|
|
|
|
"writable dict symtypetab\n", symname);
|
|
|
|
else
|
|
|
|
ctf_dprintf ("Looking up type of object with symtab idx %lx in "
|
|
|
|
"writable dict symtypetab\n", symidx);
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
/* No name? Need to look it up. */
|
|
|
|
if (!symname)
|
|
|
|
{
|
|
|
|
err = EINVAL;
|
|
|
|
if (symidx > fp->ctf_dynsymmax)
|
|
|
|
goto try_parent;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
2021-03-18 20:37:52 +08:00
|
|
|
sym = fp->ctf_dynsymidx[symidx];
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
err = ECTF_NOTYPEDAT;
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (!sym || (sym->st_type != STT_OBJECT && sym->st_type != STT_FUNC)
|
|
|
|
|| (sym->st_type != STT_OBJECT && is_function == 0)
|
|
|
|
|| (sym->st_type != STT_FUNC && is_function == 1))
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
goto try_parent;
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (!ctf_assert (fp, !sym->st_nameidx_set))
|
|
|
|
return CTF_ERR;
|
|
|
|
symname = sym->st_name;
|
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
|
|
|
|
if (fp->ctf_objthash == NULL
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|| is_function == 1
|
|
|
|
|| (type = (ctf_id_t) (uintptr_t)
|
|
|
|
ctf_dynhash_lookup (fp->ctf_objthash, symname)) == 0)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
{
|
|
|
|
if (fp->ctf_funchash == NULL
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|| is_function == 0
|
|
|
|
|| (type = (ctf_id_t) (uintptr_t)
|
|
|
|
ctf_dynhash_lookup (fp->ctf_funchash, symname)) == 0)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto try_parent;
|
|
|
|
}
|
|
|
|
|
|
|
|
return type;
|
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Dict not shuffled: look for a dynamic sym first, and look it up
|
|
|
|
directly. */
|
|
|
|
if (symname)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (fp->ctf_objthash != NULL
|
|
|
|
&& is_function != 1
|
|
|
|
&& ((type = (ctf_id_t) (uintptr_t)
|
|
|
|
ctf_dynhash_lookup (fp->ctf_objthash, symname)) != 0))
|
|
|
|
return type;
|
|
|
|
|
|
|
|
if (fp->ctf_funchash != NULL
|
|
|
|
&& is_function != 0
|
|
|
|
&& ((type = (ctf_id_t) (uintptr_t)
|
|
|
|
ctf_dynhash_lookup (fp->ctf_funchash, symname)) != 0))
|
|
|
|
return type;
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
err = ECTF_NOSYMTAB;
|
2024-04-02 23:06:50 +08:00
|
|
|
if (sp->cts_data == NULL && symname == NULL &&
|
|
|
|
((is_function && !fp->ctf_funcidx_names) ||
|
|
|
|
(!is_function && !fp->ctf_objtidx_names)))
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto try_parent;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* This covers both out-of-range lookups by index and a dynamic dict which
|
|
|
|
hasn't been shuffled yet. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
err = EINVAL;
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (symname == NULL && symidx >= fp->ctf_nsyms)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto try_parent;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Try an indexed lookup. */
|
|
|
|
|
|
|
|
if (fp->ctf_objtidx_names && is_function != 1)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if ((type = ctf_try_lookup_indexed (fp, symidx, symname, 0)) == CTF_ERR)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (type == 0 && fp->ctf_funcidx_names && is_function != 0)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if ((type = ctf_try_lookup_indexed (fp, symidx, symname, 1)) == CTF_ERR)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return CTF_ERR; /* errno is set for us. */
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (type != 0)
|
|
|
|
return type;
|
|
|
|
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
/* Indexed but no symbol found -> not present, try the parent. */
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
err = ECTF_NOTYPEDAT;
|
|
|
|
if (fp->ctf_objtidx_names && fp->ctf_funcidx_names)
|
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
/* Table must be nonindexed. */
|
|
|
|
|
|
|
|
ctf_dprintf ("Looking up object type %lx in 1:1 dict symtypetab\n", symidx);
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (symname != NULL)
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if ((symidx = ctf_lookup_symbol_idx (fp, symname, try_parent, is_function))
|
|
|
|
== (unsigned long) -1)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
goto try_parent;
|
|
|
|
|
2019-04-24 18:15:33 +08:00
|
|
|
if (fp->ctf_sxlate[symidx] == -1u)
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
goto try_parent;
|
|
|
|
|
|
|
|
type = *(uint32_t *) ((uintptr_t) fp->ctf_buf + fp->ctf_sxlate[symidx]);
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (type == 0)
|
|
|
|
goto try_parent;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return type;
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
try_parent:
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
if (!try_parent)
|
|
|
|
return ctf_set_errno (fp, err);
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (fp->ctf_parent)
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
{
|
|
|
|
ctf_id_t ret = ctf_lookup_by_sym_or_name (fp->ctf_parent, symidx,
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
symname, try_parent,
|
|
|
|
is_function);
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
if (ret == CTF_ERR)
|
|
|
|
ctf_set_errno (fp, ctf_errno (fp->ctf_parent));
|
|
|
|
return ret;
|
|
|
|
}
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
else
|
2023-09-13 17:02:36 +08:00
|
|
|
return (ctf_set_typed_errno (fp, err));
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
}
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
/* Given a symbol table index, return the type of the function or data object
|
|
|
|
described by the corresponding entry in the symbol table. */
|
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_by_symbol (ctf_dict_t *fp, unsigned long symidx)
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
return ctf_lookup_by_sym_or_name (fp, symidx, NULL, 1, -1);
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Given a symbol name, return the type of the function or data object described
|
|
|
|
by the corresponding entry in the symbol table. */
|
|
|
|
ctf_id_t
|
|
|
|
ctf_lookup_by_symbol_name (ctf_dict_t *fp, const char *symname)
|
|
|
|
{
|
libctf: support addition of types to dicts read via ctf_open()
libctf has long declared deserialized dictionaries (out of files or ELF
sections or memory buffers or whatever) to be read-only: back in the
furthest prehistory this was not the case, in that you could add a
few sorts of type to such dicts, but attempting to do so often caused
horrible memory corruption, so I banned the lot.
But it turns out real consumers want it (notably DTrace, which
synthesises pointers to types that don't have them and adds them to the
ctf_open()ed dicts if it needs them). Let's bring it back again, but
without the memory corruption and without the massive code duplication
required in days of yore to distinguish between static and dynamic
types: the representation of both types has been identical for a few
years, with the only difference being that types as a whole are stored in
a big buffer for types read in via ctf_open and per-type hashtables for
newly-added types.
So we discard the internally-visible concept of "readonly dictionaries"
in favour of declaring the *range of types* that were already present
when the dict was read in to be read-only: you can't modify them (say,
by adding members to them if they're structs, or calling ctf_set_array
on them), but you can add more types and point to them. (The API
remains the same, with calls sometimes returning ECTF_RDONLY, but now
they do so less often.)
This is a fairly invasive change, mostly because code written since the
ban was introduced didn't take the possibility of a static/dynamic split
into account. Some of these irregularities were hard to define as
anything but bugs.
Notably:
- The symbol handling was assuming that symbols only needed to be
looked for in dynamic hashtabs or static linker-laid-out indexed/
nonindexed layouts, but now we want to check both in case people
added more symbols to a dict they opened.
- The code that handles type additions wasn't checking to see if types
with the same name existed *at all* (so you could do
ctf_add_typedef (fp, "foo", bar) repeatedly without error). This
seems reasonable for types you just added, but we probably *do* want
to ban addition of types with names that override names we already
used in the ctf_open()ed portion, since that would probably corrupt
existing type relationships. (Doing things this way also avoids
causing new errors for any existing code that was doing this sort of
thing.)
- ctf_lookup_variable entirely failed to work for variables just added
by ctf_add_variable: you had to write the dict out and read it back
in again before they appeared.
- The symbol handling remembered what symbols you looked up but didn't
remember their types, so you could look up an object symbol and then
find it popping up when you asked for function symbols, which seems
less than ideal. Since we had to rejig things enough to be able to
distinguish function and object symbols internally anyway (in order
to give suitable errors if you try to add a symbol with a name that
already existed in the ctf_open()ed dict), this bug suddenly became
more visible and was easily fixed.
We do not (yet) support writing out dicts that have been previously read
in via ctf_open() or other deserializer (you can look things up in them,
but not write them out a second time). This never worked, so there is
no incompatibility; if it is needed at a later date, the serializer is a
little bit closer to having it work now (the only table we don't deal
with is the types table, and that's because the upcoming CTFv4 changes
are likely to make major changes to the way that table is represented
internally, so adding more code that depends on its current form seems
like a bad idea).
There is a new testcase that tests much of this, in particular that
modification of existing types is still banned and that you can add new
ones and chase them without error.
libctf/
* ctf-impl.h (struct ctf_dict.ctf_symhash): Split into...
(ctf_dict.ctf_symhash_func): ... this and...
(ctf_dict.ctf_symhash_objt): ... this.
(ctf_dict.ctf_stypes): New, counts static types.
(LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR.
(LCTF_RDWR): Deleted.
(LCTF_DIRTY): Renumbered.
(LCTF_LINKING): Likewise.
(ctf_lookup_variable_here): New.
(ctf_lookup_by_sym_or_name): Likewise.
(ctf_symbol_next_static): Likewise.
(ctf_add_variable_forced): Likewise.
(ctf_add_funcobjt_sym_forced): Likewise.
(ctf_simple_open_internal): Adjust.
(ctf_bufopen_internal): Likewise.
* ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with.
(ctf_create): Migrate a bunch of initializations into bufopen.
Force recreation of name tables. Do not forcibly override the
model, let ctf_bufopen do it.
(ctf_static_type): New.
(ctf_update): Drop LCTF_RDWR check.
(ctf_dynamic_type): Likewise.
(ctf_add_function): Likewise.
(ctf_add_type_internal): Likewise.
(ctf_rollback): Check ctf_stypes, not LCTF_RDWR.
(ctf_set_array): Likewise.
(ctf_add_struct_sized): Likewise.
(ctf_add_union_sized): Likewise.
(ctf_add_enum): Likewise.
(ctf_add_enumerator): Likewise (only on the target dict).
(ctf_add_member_offset): Likewise.
(ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types
with colliding names.
(ctf_add_forward): Note safety under the new rules.
(ctf_add_variable): Split all but the existence check into...
(ctf_add_variable_forced): ... this new function.
(ctf_add_funcobjt_sym): Likewise...
(ctf_add_funcobjt_sym_forced): ... for this new function.
* ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts
with any stypes.
(ctf_link_add_strtab): Likewise.
(ctf_link_shuffle_syms): Likewise.
(ctf_link_intern_extern_string): Note pre-existing prohibition.
* ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check.
(ctf_lookup_variable): Split out looking in a dict but not
its parent into...
(ctf_lookup_variable_here): ... this new function.
(ctf_lookup_symbol_idx): Track whether looking up a function or
object: cache them separately.
(ctf_symbol_next): Split out looking in non-dynamic symtypetab
entries to...
(ctf_symbol_next_static): ... this new function. Don't get confused
by the simultaneous presence of static and dynamic symtypetab entries.
(ctf_try_lookup_indexed): Don't waste time looking up symbols by
index before there can be any idea how symbols are numbered.
(ctf_lookup_by_sym_or_name): Distinguish between function and
data object lookups. Drop LCTF_RDWR.
(ctf_lookup_by_symbol): Adjust.
(ctf_lookup_by_symbol_name): Likewise.
* ctf-open.c (init_types): Rename to...
(init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes.
(ctf_simple_open): Drop writable arg.
(ctf_simple_open_internal): Likewise.
(ctf_bufopen): Likewise.
(ctf_bufopen_internal): Populate fields only used for writable dicts.
Drop LCTF_RDWR.
(ctf_dict_close): Cater for symhash cache split.
* ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR.
* ctf-types.c (ctf_variable_next): Drop LCTF_RDWR.
* testsuite/libctf-lookup/add-to-opened*: New test.
2023-12-20 00:58:19 +08:00
|
|
|
return ctf_lookup_by_sym_or_name (fp, 0, symname, 1, -1);
|
libctf, include: find types of symbols by name
The existing ctf_lookup_by_symbol and ctf_arc_lookup_symbol functions
suffice to look up the types of symbols if the caller already has a
symbol number. But the caller often doesn't have one of those and only
knows the name of the symbol: also, in object files, the caller might
not have a useful symbol number in any sense (and neither does libctf:
the 'symbol number' we use in that case literally starts at 0 for the
lexicographically first-sorted symbol in the symtypetab and counts those
symbols, so it corresponds to nothing useful).
This means that even though object files have a symtypetab (generated by
the compiler or by ld -r), the only way we can look up anything in it is
to iterate over all symbols in turn with ctf_symbol_next until we find
the one we want.
This is unhelpful and pointlessly inefficient.
So add a pair of functions to look up symbols by name in a dict and in a
whole archive: ctf_lookup_by_symbol_name and ctf_arc_lookup_symbol_name.
These are identical to the existing functions except that they take
symbol names rather than symbol numbers.
To avoid insane repetition, we do some refactoring in the process, so
that both ctf_lookup_by_symbol and ctf_arc_lookup_symbol turn into thin
wrappers around internal functions that do both lookup by symbol index
and lookup by name. This massively reduces code duplication because
even the existing lookup-by-index stuff wants to use a name sometimes
(when looking up in indexed sections), and the new lookup-by-name stuff
has to turn it into an index sometimes (when looking up in non-indexed
sections): doing it this way lets us share most of that.
The actual name->index lookup is done by ctf_lookup_symbol_idx. We do
not anticipate this lookup to be as heavily used as ld.so symbol lookup
by many orders of magnitude, so using the ELF symbol hashes would
probably take more time to read them than is saved by using the hashes,
and it adds a lot of complexity. Instead, do a linear search for the
symbol name, caching all the name -> index mappings as we go, so that
future searches are likely to hit in the cache. To avoid having to
repeat this search over and over in a CTF archive when
ctf_arc_lookup_symbol_name is used, have cached archive lookups (the
sort done by ctf_arc_lookup_symbol* and the ctf_archive_next iterator)
pick out the first dict they cache in a given archive and store it in a
new ctf_archive field, ctfi_crossdict_cache. This can be used to store
cross-dictionary cached state that depends on things like the ELF symbol
table rather than the contents of any one dict. ctf_lookup_symbol_idx
then caches its name->index mappings in the dictionary named in the
crossdict cache, if any, so that ctf_lookup_symbol_idx in other dicts
in the same archive benefit from the previous linear search, and the
symtab only needs to be scanned at most once.
(Note that if you call ctf_lookup_by_symbol_name in one specific dict,
and then follow it with a ctf_arc_lookup_symbol_name, the former will
not use the crossdict cache because it's only populated by the dict
opens in ctf_arc_lookup_symbol_name. This is harmless except for a small
one-off waste of memory and time: it's only a cache, after all. We can
fix this later by using the archive caching machinery more
aggressively.)
In ctf-archive, we do similar things, turning ctf_arc_lookup_symbol into
a wrapper around a new function that does both index -> ID and name ->
ID lookups across all dicts in an archive. We add a new
ctfi_symnamedicts cache that maps symbol names to the ctf_dict_t * that
it was found in (so that linear searches for symbols don't need to be
repeated): but we also *remove* a cache, the ctfi_syms cache that was
memoizing the actual ctf_id_t returned from every call to
ctf_arc_lookup_symbol. This is pointless: all it saves is one call to
ctf_lookup_by_symbol, and that's basically an array lookup and nothing
more so isn't worth caching. (Equally, given that symbol -> index
mappings are cached by ctf_lookup_by_symbol_name, those calls are nearly
free after the first call, so there's no point caching the ctf_id_t in
that case either.)
We fix up one test that was doing manual symbol lookup to use
ctf_arc_lookup_symbol instead, and enhance it to check that the caching
layer is not totally broken: we also add a new test to do lookups in a
.o file, and another to do lookups in an archive with conflicted types
and make sure that sort of multi-dict lookup is actually working.
include/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_arc_lookup_symbol_name): New.
(ctf_lookup_by_symbol_name): Likewise.
libctf/ChangeLog
2021-02-17 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (ctf_dict_t) <ctf_symhash>: New.
<ctf_symhash_latest>: Likewise.
(struct ctf_archive_internal) <ctfi_crossdict_cache>: New.
<ctfi_symnamedicts>: New.
<ctfi_syms>: Remove.
(ctf_lookup_symbol_name): Remove.
* ctf-lookup.c (ctf_lookup_symbol_name): Propagate errors from
parent properly. Make static.
(ctf_lookup_symbol_idx): New, linear search for the symbol name,
cached in the crossdict cache's ctf_symhash (if available), or
this dict's (otherwise).
(ctf_try_lookup_indexed): Allow the symname to be passed in.
(ctf_lookup_by_symbol): Turn into a wrapper around...
(ctf_lookup_by_sym_or_name): ... this, supporting name lookup too,
using ctf_lookup_symbol_idx in non-writable dicts. Special-case
name lookup in dynamic dicts without reported symbols, which have
no symtab or dynsymidx but where name lookup should still work.
(ctf_lookup_by_symbol_name): New, another wrapper.
* ctf-archive.c (enosym): Note that this is present in
ctfi_symnamedicts too.
(ctf_arc_close): Adjust for removal of ctfi_syms. Free the
ctfi_symnamedicts.
(ctf_arc_flush_caches): Likewise.
(ctf_dict_open_cached): Memoize the first cached dict in the
crossdict cache.
(ctf_arc_lookup_symbol): Turn into a wrapper around...
(ctf_arc_lookup_sym_or_name): ... this. No longer cache
ctf_id_t lookups: just call ctf_lookup_by_symbol as needed (but
still cache the dicts those lookups succeed in). Add
lookup-by-name support, with dicts of successful lookups cached in
ctfi_symnamedicts. Refactor the caching code a bit.
(ctf_arc_lookup_symbol_name): New, another wrapper.
* ctf-open.c (ctf_dict_close): Free the ctf_symhash.
* libctf.ver (LIBCTF_1.2): New version. Add
ctf_lookup_by_symbol_name, ctf_arc_lookup_symbol_name.
* testsuite/libctf-lookup/enum-symbol.c (main): Use
ctf_arc_lookup_symbol rather than looking up the name ourselves.
Fish it out repeatedly, to make sure that symbol caching isn't
broken.
(symidx_64): Remove.
(symidx_32): Remove.
* testsuite/libctf-lookup/enum-symbol-obj.lk: Test symbol lookup
in an unlinked object file (indexed symtypetab sections only).
* testsuite/libctf-writable/symtypetab-nonlinker-writeout.c
(try_maybe_reporting): Check symbol types via
ctf_lookup_by_symbol_name as well as ctf_symbol_next.
* testsuite/libctf-lookup/conflicting-type-syms.*: New test of
lookups in a multi-dict archive.
2021-02-17 23:21:12 +08:00
|
|
|
}
|
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
/* Given a symbol table index, return the info for the function described
|
|
|
|
by the corresponding entry in the symbol table, which may be a function
|
|
|
|
symbol or may be a data symbol that happens to be a function pointer. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
int
|
|
|
|
ctf_func_info (ctf_dict_t *fp, unsigned long symidx, ctf_funcinfo_t *fip)
|
|
|
|
{
|
|
|
|
ctf_id_t type;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if ((type = ctf_lookup_by_symbol (fp, symidx)) == CTF_ERR)
|
|
|
|
return -1; /* errno is set for us. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (ctf_type_kind (fp, type) != CTF_K_FUNCTION)
|
|
|
|
return (ctf_set_errno (fp, ECTF_NOTFUNC));
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return ctf_func_type_info (fp, type, fip);
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Given a symbol table index, return the arguments for the function described
|
|
|
|
by the corresponding entry in the symbol table. */
|
|
|
|
|
|
|
|
int
|
libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_t
The naming of the ctf_file_t type in libctf is a historical curiosity.
Back in the Solaris days, CTF dictionaries were originally generated as
a separate file and then (sometimes) merged into objects: hence the
datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw
CTF is essentially never written to a file on its own, and the datatype
changed name to a "CTF dictionary" years ago. So the term "CTF file"
refers to something that is never a file! This is at best confusing.
The type has also historically been known as a 'CTF container", which is
even more confusing now that we have CTF archives which are *also* a
sort of container (they contain CTF dictionaries), but which are never
referred to as containers in the source code.
So fix this by completing the renaming, renaming ctf_file_t to
ctf_dict_t throughout, and renaming those few functions that refer to
CTF files by name (keeping compatibility aliases) to refer to dicts
instead. Old users who still refer to ctf_file_t will see (harmless)
pointer-compatibility warnings at compile time, but the ABI is unchanged
(since C doesn't mangle names, and ctf_file_t was always an opaque type)
and things will still compile fine as long as -Werror is not specified.
All references to CTF containers and CTF files in the source code are
fixed to refer to CTF dicts instead.
Further (smaller) renamings of annoyingly-named functions to come, as
part of the process of souping up queries across whole archives at once
(needed for the function info and data object sections).
binutils/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close.
* readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t.
(dump_ctf_archive_member): Likewise.
(dump_section_as_ctf): Likewise. Use ctf_dict_close, not
ctf_file_close.
gdb/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctfread.c: Change uses of ctf_file_t to ctf_dict_t.
(ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_file_t): Rename to...
(ctf_dict_t): ... this. Keep ctf_file_t around for compatibility.
(struct ctf_file): Likewise rename to...
(struct ctf_dict): ... this.
(ctf_file_close): Rename to...
(ctf_dict_close): ... this, keeping compatibility function.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this, keeping compatibility function.
All callers adjusted.
* ctf.h: Rename references to ctf_file_t to ctf_dict_t.
(struct ctf_archive) <ctfa_nfiles>: Rename to...
<ctfa_ndicts>: ... this.
ld/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ldlang.c (ctf_output): This is a ctf_dict_t now.
(lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t.
(ldlang_open_ctf): Adjust comment.
(lang_merge_ctf): Use ctf_dict_close, not ctf_file_close.
* ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to
ctf_dict_t. Change opaque declaration accordingly.
* ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust.
* ldemul.h (examine_strtab_for_ctf): Likewise.
(ldemul_examine_strtab_for_ctf): Likewise.
* ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations
adjusted.
(ctf_fileops): Rename to...
(ctf_dictops): ... this.
(ctf_dedup_t) <cd_id_to_file_t>: Rename to...
<cd_id_to_dict_t>: ... this.
(ctf_file_t): Fix outdated comment.
<ctf_fileops>: Rename to...
<ctf_dictops>: ... this.
(struct ctf_archive_internal) <ctfi_file>: Rename to...
<ctfi_dict>: ... this.
* ctf-archive.c: Rename ctf_file_t to ctf_dict_t.
Rename ctf_archive.ctfa_nfiles to ctfa_ndicts.
Rename ctf_file_close to ctf_dict_close. All users adjusted.
* ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers.
(ctf_bundle_t) <ctb_file>: Rename to...
<ctb_dict): ... this.
* ctf-decl.c: Rename ctf_file_t to ctf_dict_t.
* ctf-dedup.c: Likewise. Rename ctf_file_close to
ctf_dict_close. Refer to CTF dicts, not CTF containers.
* ctf-dump.c: Likewise.
* ctf-error.c: Likewise.
* ctf-hash.c: Likewise.
* ctf-inlines.h: Likewise.
* ctf-labels.c: Likewise.
* ctf-link.c: Likewise.
* ctf-lookup.c: Likewise.
* ctf-open-bfd.c: Likewise.
* ctf-string.c: Likewise.
* ctf-subr.c: Likewise.
* ctf-types.c: Likewise.
* ctf-util.c: Likewise.
* ctf-open.c: Likewise.
(ctf_file_close): Rename to...
(ctf_dict_close): ...this.
(ctf_file_close): New trivial wrapper around ctf_dict_close, for
compatibility.
(ctf_parent_file): Rename to...
(ctf_parent_dict): ... this.
(ctf_parent_file): New trivial wrapper around ctf_parent_dict, for
compatibility.
* libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_func_args (ctf_dict_t *fp, unsigned long symidx, uint32_t argc,
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_id_t *argv)
|
2019-04-24 18:15:33 +08:00
|
|
|
{
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
ctf_id_t type;
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if ((type = ctf_lookup_by_symbol (fp, symidx)) == CTF_ERR)
|
|
|
|
return -1; /* errno is set for us. */
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
if (ctf_type_kind (fp, type) != CTF_K_FUNCTION)
|
|
|
|
return (ctf_set_errno (fp, ECTF_NOTFUNC));
|
2019-04-24 18:15:33 +08:00
|
|
|
|
libctf: symbol type linking support
This adds facilities to write out the function info and data object
sections, which efficiently map from entries in the symbol table to
types. The write-side code is entirely new: the read-side code was
merely significantly changed and support for indexed tables added
(pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff
header fields).
With this in place, you can use ctf_lookup_by_symbol to look up the
types of symbols of function and object type (and, as before, you can
use ctf_lookup_variable to look up types of file-scope variables not
present in the symbol table, as long as you know their name: but
variables that are also data objects are now found in the data object
section instead.)
(Compatible) file format change:
The CTF spec has always said that the function info section looks much
like the CTF_K_FUNCTIONs in the type section: an info word (including an
argument count) followed by a return type and N argument types. This
format is suboptimal: it means function symbols cannot be deduplicated
and it causes a lot of ugly code duplication in libctf. But
conveniently the compiler has never emitted this! Because it has always
emitted a rather different format that libctf has never accepted, we can
be sure that there are no instances of this function info section in the
wild, and can freely change its format without compatibility concerns or
a file format version bump. (And since it has never been emitted in any
code that generated any older file format version, either, we need keep
no code to read the format as specified at all!)
So the function info section is now specified as an array of uint32_t,
exactly like the object data section: each entry is a type ID in the
type section which must be of kind CTF_K_FUNCTION, the prototype of
this function.
This allows function types to be deduplicated and also correctly encodes
the fact that all functions declared in C really are types available to
the program: so they should be stored in the type section like all other
types. (In format v4, we will be able to represent the types of static
functions as well, but that really does require a file format change.)
We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the
new function info format is in use. A sufficiently new compiler will
always set this flag. New libctf will always set this flag: old libctf
will refuse to open any CTF dicts that have this flag set. If the flag
is not set on a dict being read in, new libctf will disregard the
function info section. Format v4 will remove this flag (or, rather, the
flag has no meaning there and the bit position may be recycled for some
other purpose).
New API:
Symbol addition:
ctf_add_func_sym: Add a symbol with a given name and type. The
type must be of kind CTF_K_FUNCTION (a function
pointer). Internally this adds a name -> type
mapping to the ctf_funchash in the ctf_dict.
ctf_add_objt_sym: Add a symbol with a given name and type. The type
kind can be anything, including function pointers.
This adds to ctf_objthash.
These both treat symbols as name -> type mappings: the linker associates
symbol names with symbol indexes via the ctf_link_shuffle_syms callback,
which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the
ctf_dict. Repeated relinks can add more symbols.
Variables that are also exposed as symbols are removed from the variable
section at serialization time.
CTF symbol type sections which have enough pads, defined by
CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols
where most types are unknown, or in archive where most types are defined
in some child or parent dict, not in this specific dict) are sorted by
name rather than symidx and accompanied by an index which associates
each symbol type entry with a name: the existing ctf_lookup_by_symbol
will map symbol indexes to symbol names and look the names up in the
index automatically. (This is currently ELF-symbol-table-dependent, but
there is almost nothing specific to ELF in here and we can add support
for other symbol table formats easily).
The compiler also uses index sections to communicate the contents of
object file symbol tables without relying on any specific ordering of
symbols: it doesn't need to sort them, and libctf will detect an
unsorted index section via the absence of the new CTF_F_IDXSORTED header
flag, and sort it if needed.
Iteration:
ctf_symbol_next: Iterator which returns the types and names of symbols
one by one, either for function or data symbols.
This does not require any sorting: the ctf_link machinery uses it to
pull in all the compiler-provided symbols cheaply, but it is not
restricted to that use.
(Compatible) changes in API:
ctf_lookup_by_symbol: can now be called for object and function
symbols: never returns ECTF_NOTDATA (which is
now not thrown by anything, but is kept for
compatibility and because it is a plausible
error that we might start throwing again at some
later date).
Internally we also have changes to the ctf-string functionality so that
"external" strings (those where we track a string -> offset mapping, but
only write out an offset) can be consulted via the usual means
(ctf_strptr) before the strtab is written out. This is important
because ctf_link_add_linker_symbol can now be handed symbols named via
strtab offsets, and ctf_link_shuffle_syms must figure out their actual
names by looking in the external symtab we have just been fed by the
ctf_link_add_strtab callback, long before that strtab is written out.
include/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-api.h (ctf_symbol_next): New.
(ctf_add_objt_sym): Likewise.
(ctf_add_func_sym): Likewise.
* ctf.h: Document new function info section format.
(CTF_F_NEWFUNCINFO): New.
(CTF_F_IDXSORTED): New.
(CTF_F_MAX): Adjust accordingly.
libctf/ChangeLog
2020-11-20 Nick Alcock <nick.alcock@oracle.com>
* ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New.
(_libctf_nonnull_): Likewise.
(ctf_in_flight_dynsym_t): New.
(ctf_dict_t) <ctf_funcidx_names>: Likewise.
<ctf_objtidx_names>: Likewise.
<ctf_nfuncidx>: Likewise.
<ctf_nobjtidx>: Likewise.
<ctf_funcidx_sxlate>: Likewise.
<ctf_objtidx_sxlate>: Likewise.
<ctf_objthash>: Likewise.
<ctf_funchash>: Likewise.
<ctf_dynsyms>: Likewise.
<ctf_dynsymidx>: Likewise.
<ctf_dynsymmax>: Likewise.
<ctf_in_flight_dynsym>: Likewise.
(struct ctf_next) <u.ctn_next>: Likewise.
(ctf_symtab_skippable): New prototype.
(ctf_add_funcobjt_sym): Likewise.
(ctf_dynhash_sort_by_name): Likewise.
(ctf_sym_to_elf64): Rename to...
(ctf_elf32_to_link_sym): ... this, and...
(ctf_elf64_to_link_sym): ... this.
* ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO
flag, and presence of index sections. Refactor out
ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use
ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func
sxlate sections if corresponding index section is present. Adjust
for new func info section format.
(ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error
handling. Report incorrect-length index sections. Always do an
init_symtab, even if there is no symtab section (there may be index
sections still).
(flip_objts): Adjust comment: func and objt sections are actually
identical in structure now, no need to caveat.
(ctf_dict_close): Free newly-added data structures.
* ctf-create.c (ctf_create): Initialize them.
(ctf_symtab_skippable): New, refactored out of
init_symtab, with st_nameidx_set check added.
(ctf_add_funcobjt_sym): New, add a function or object symbol to the
ctf_objthash or ctf_funchash, by name.
(ctf_add_objt_sym): Call it.
(ctf_add_func_sym): Likewise.
(symtypetab_delete_nonstatic_vars): New, delete vars also present as
data objects.
(CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters:
this is a function emission, not a data object emission.
(CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit
pads for symbols with no type (only set for unindexed sections).
(CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters:
always emit indexed.
(symtypetab_density): New, figure out section sizes.
(emit_symtypetab): New, emit a symtypetab.
(emit_symtypetab_index): New, emit a symtypetab index.
(ctf_serialize): Call them, emitting suitably sorted symtypetab
sections and indexes. Set suitable header flags. Copy over new
fields.
* ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an
order on symtypetab index sections.
* ctf-link.c (ctf_add_type_mapping): Delete erroneous comment
relating to code that was never committed.
(ctf_link_one_variable): Improve variable name.
(check_sym): New, symtypetab analogue of check_variable.
(ctf_link_deduplicating_one_symtypetab): New.
(ctf_link_deduplicating_syms): Likewise.
(ctf_link_deduplicating): Call them.
(ctf_link_deduplicating_per_cu): Note that we don't call them in
this case (yet).
(ctf_link_add_strtab): Set the error on the fp correctly.
(ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add
a linker symbol to the in-flight list.
(ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the
in-flight list into a mapping we can use, now its names are
resolvable in the external strtab.
* ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with
external strtab offsets.
(ctf_str_rollback): Adjust comment.
(ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from
writeout time...
(ctf_str_add_external): ... to string addition time.
* ctf-lookup.c (ctf_lookup_var_key_t): Rename to...
(ctf_lookup_idx_key_t): ... this, now we use it for syms too.
<clik_names>: New member, a name table.
(ctf_lookup_var): Adjust accordingly.
(ctf_lookup_variable): Likewise.
(ctf_lookup_by_id): Shuffle further up in the file.
(ctf_symidx_sort_arg_cb): New, callback for...
(sort_symidx_by_name): ... this new function to sort a symidx
found to be unsorted (likely originating from the compiler).
(ctf_symidx_sort): New, sort a symidx.
(ctf_lookup_symbol_name): Support dynamic symbols with indexes
provided by the linker. Use ctf_link_sym_t, not Elf64_Sym.
Check the parent if a child lookup fails.
(ctf_lookup_by_symbol): Likewise. Work for function symbols too.
(ctf_symbol_next): New, iterate over symbols with types (without
sorting).
(ctf_lookup_idx_name): New, bsearch for symbol names in indexes.
(ctf_try_lookup_indexed): New, attempt an indexed lookup.
(ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol.
(ctf_func_args): Likewise.
(ctf_get_dict): Move...
* ctf-types.c (ctf_get_dict): ... here.
* ctf-util.c (ctf_sym_to_elf64): Re-express as...
(ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and
st_nameidx_set (always 0, so st_nameidx can be ignored). Look in
the ELF strtab for names.
(ctf_elf32_to_link_sym): Likewise, for Elf32_Sym.
(ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be.
* libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and
ctf_add_func_sym.
2020-11-20 21:34:04 +08:00
|
|
|
return ctf_func_type_args (fp, type, argc, argv);
|
2019-04-24 18:15:33 +08:00
|
|
|
}
|