bpf-next-bpf-next-6.13

-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmc7hIQACgkQ6rmadz2v
 bTrcRA/+MsUOzJPnjokonHwk8X4KQM21gOua/sUcGArLVGF/JoW5/b1W8UBQ0y5+
 +okYaRNGpwF0/2S8M5FAYpM7VSPLl1U7Rihr55I63D9kbAo0pDQwpn4afQFuZhaC
 l7MzkhBHS7XXx5/70APOzy3kz1GDYvz39jiWuAAhRqVejFO+fa4pDz4W+Ht7jYTQ
 jJOLn4vJna9fSfVf/U/bbdz5lL0lncIiEnRIEbF7EszbF2CA7sa+/KFENGM7ChEo
 UlxK2Xz5fpzgT6htZRjMr6jmupfg7gzdT4moOysQQcjkllvv6/4MD0s/GLShtG9H
 SmpaptpYCEGXLuApGzkSddwiT6iUMTqQr7zs6LPp0gPh+4Z0sSPNoBtBp2v0aVDl
 w0zhVhMfoF66rMG+IZY684CsMGg5h8UsOS46KLjSU0fW2HpGM7+zZLpXOaGkU3OH
 UV0womPT/C2kS2fpOn9F91O8qMjOZ4EXd+zuRtIRv9CeuVIpCT9R13lEYn+wfr6d
 aUci8wybha1UOAvkRiXiqWOPS+0Z/arrSbCSDMQF6DevLpQl0noVbTVssWXcRdUE
 9Ve6J0yS29WxNWFtuuw4xP5NcG1AnRXVGh215TuVBX7xK9X/hnDDhfalltsjXfnd
 m1f64FxU2SGp2D7X8BX/6Aeyo6mITE6I3SNMUrcvk1Zid36zhy8=
 =TXGS
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Add BPF uprobe session support (Jiri Olsa)

 - Optimize uprobe performance (Andrii Nakryiko)

 - Add bpf_fastcall support to helpers and kfuncs (Eduard Zingerman)

 - Avoid calling free_htab_elem() under hash map bucket lock (Hou Tao)

 - Prevent tailcall infinite loop caused by freplace (Leon Hwang)

 - Mark raw_tracepoint arguments as nullable (Kumar Kartikeya Dwivedi)

 - Introduce uptr support in the task local storage map (Martin KaFai
   Lau)

 - Stringify errno log messages in libbpf (Mykyta Yatsenko)

 - Add kmem_cache BPF iterator for perf's lock profiling (Namhyung Kim)

 - Support BPF objects of either endianness in libbpf (Tony Ambardar)

 - Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai)

 - Introduce private stack for eligible BPF programs (Yonghong Song)

 - Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee)

 - Migrate test_sock to selftests/bpf test_progs (Jordan Rife)

* tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits)
  libbpf: Change hash_combine parameters from long to unsigned long
  selftests/bpf: Fix build error with llvm 19
  libbpf: Fix memory leak in bpf_program__attach_uprobe_multi
  bpf: use common instruction history across all states
  bpf: Add necessary migrate_disable to range_tree.
  bpf: Do not alloc arena on unsupported arches
  selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar
  selftests/bpf: Add a test for arena range tree algorithm
  bpf: Introduce range_tree data structure and use it in bpf arena
  samples/bpf: Remove unused variable in xdp2skb_meta_kern.c
  samples/bpf: Remove unused variables in tc_l2_redirect_kern.c
  bpftool: Cast variable `var` to long long
  bpf, x86: Propagate tailcall info only for subprogs
  bpf: Add kernel symbol for struct_ops trampoline
  bpf: Use function pointers count as struct_ops links count
  bpf: Remove unused member rcu from bpf_struct_ops_map
  selftests/bpf: Add struct_ops prog private stack tests
  bpf: Support private stack for struct_ops progs
  selftests/bpf: Add tracing prog private stack tests
  bpf, x86: Support private stack in jit
  ...
This commit is contained in:
Linus Torvalds 2024-11-21 08:11:04 -08:00
commit 6e95ef0258
211 changed files with 6963 additions and 3475 deletions

View File

@ -835,7 +835,7 @@ section named by ``btf_ext_info_sec->sec_name_off``.
See :ref:`Documentation/bpf/llvm_reloc.rst <btf-co-re-relocations>`
for more information on CO-RE relocations.
4.2 .BTF_ids section
4.3 .BTF_ids section
--------------------
The .BTF_ids section encodes BTF ID values that are used within the kernel.
@ -896,6 +896,81 @@ and is used as a filter when resolving the BTF ID value.
All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
4.4 .BTF.base section
---------------------
Split BTF - where the .BTF section only contains types not in the associated
base .BTF section - is an extremely efficient way to encode type information
for kernel modules, since they generally consist of a few module-specific
types along with a large set of shared kernel types. The former are encoded
in split BTF, while the latter are encoded in base BTF, resulting in more
compact representations. A type in split BTF that refers to a type in
base BTF refers to it using its base BTF ID, and split BTF IDs start
at last_base_BTF_ID + 1.
The downside of this approach however is that this makes the split BTF
somewhat brittle - when the base BTF changes, base BTF ID references are
no longer valid and the split BTF itself becomes useless. The role of the
.BTF.base section is to make split BTF more resilient for cases where
the base BTF may change, as is the case for kernel modules not built every
time the kernel is for example. .BTF.base contains named base types; INTs,
FLOATs, STRUCTs, UNIONs, ENUM[64]s and FWDs. INTs and FLOATs are fully
described in .BTF.base sections, while composite types like structs
and unions are not fully defined - the .BTF.base type simply serves as
a description of the type the split BTF referred to, so structs/unions
have 0 members in the .BTF.base section. ENUM[64]s are similarly recorded
with 0 members. Any other types are added to the split BTF. This
distillation process then leaves us with a .BTF.base section with
such minimal descriptions of base types and .BTF split section which refers
to those base types. Later, we can relocate the split BTF using both the
information stored in the .BTF.base section and the new .BTF base; the type
information in the .BTF.base section allows us to update the split BTF
references to point at the corresponding new base BTF IDs.
BTF relocation happens on kernel module load when a kernel module has a
.BTF.base section, and libbpf also provides a btf__relocate() API to
accomplish this.
As an example consider the following base BTF::
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] STRUCT 'foo' size=8 vlen=2
'f1' type_id=1 bits_offset=0
'f2' type_id=1 bits_offset=32
...and associated split BTF::
[3] PTR '(anon)' type_id=2
i.e. split BTF describes a pointer to struct foo { int f1; int f2 };
.BTF.base will consist of::
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] STRUCT 'foo' size=8 vlen=0
If we relocate the split BTF later using the following new base BTF::
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] STRUCT 'foo' size=8 vlen=2
'f1' type_id=2 bits_offset=0
'f2' type_id=2 bits_offset=32
...we can use our .BTF.base description to know that the split BTF reference
is to struct foo, and relocation results in new split BTF::
[4] PTR '(anon)' type_id=3
Note that we had to update BTF ID and start BTF ID for the split BTF.
So we see how .BTF.base plays the role of facilitating later relocation,
leading to more resilient split BTF.
.BTF.base sections will be generated automatically for out-of-tree kernel module
builds - i.e. where KBUILD_EXTMOD is set (as it would be for "make M=path/2/mod"
cases). .BTF.base generation requires pahole support for the "distilled_base"
BTF feature; this is available in pahole v1.28 and later.
5. Using BTF
============

View File

@ -2094,6 +2094,12 @@ static void restore_args(struct jit_ctx *ctx, int args_off, int nregs)
}
}
static bool is_struct_ops_tramp(const struct bpf_tramp_links *fentry_links)
{
return fentry_links->nr_links == 1 &&
fentry_links->links[0]->link.type == BPF_LINK_TYPE_STRUCT_OPS;
}
/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
*
* bpf prog and function entry before bpf trampoline hooked:
@ -2123,6 +2129,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
bool save_ret;
__le32 **branches = NULL;
bool is_struct_ops = is_struct_ops_tramp(fentry);
/* trampoline stack layout:
* [ parent ip ]
@ -2191,11 +2198,14 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
*/
emit_bti(A64_BTI_JC, ctx);
/* frame for parent function */
emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
/* x9 is not set for struct_ops */
if (!is_struct_ops) {
/* frame for parent function */
emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
}
/* frame for patched function */
/* frame for patched function for tracing, or caller for struct_ops */
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
@ -2289,19 +2299,24 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
/* reset SP */
emit(A64_MOV(1, A64_SP, A64_FP), ctx);
/* pop frames */
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
if (flags & BPF_TRAMP_F_SKIP_FRAME) {
/* skip patched function, return to parent */
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(9)), ctx);
if (is_struct_ops) {
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_RET(A64_LR), ctx);
} else {
/* return to patched function */
emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(10)), ctx);
/* pop frames */
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
if (flags & BPF_TRAMP_F_SKIP_FRAME) {
/* skip patched function, return to parent */
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(9)), ctx);
} else {
/* return to patched function */
emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(10)), ctx);
}
}
kfree(branches);

View File

@ -325,6 +325,22 @@ struct jit_context {
/* Number of bytes that will be skipped on tailcall */
#define X86_TAIL_CALL_OFFSET (12 + ENDBR_INSN_SIZE)
static void push_r9(u8 **pprog)
{
u8 *prog = *pprog;
EMIT2(0x41, 0x51); /* push r9 */
*pprog = prog;
}
static void pop_r9(u8 **pprog)
{
u8 *prog = *pprog;
EMIT2(0x41, 0x59); /* pop r9 */
*pprog = prog;
}
static void push_r12(u8 **pprog)
{
u8 *prog = *pprog;
@ -1404,6 +1420,24 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
*pprog = prog;
}
static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
{
u8 *prog = *pprog;
/* movabs r9, priv_frame_ptr */
emit_mov_imm64(&prog, X86_REG_R9, (__force long) priv_frame_ptr >> 32,
(u32) (__force long) priv_frame_ptr);
#ifdef CONFIG_SMP
/* add <r9>, gs:[<off>] */
EMIT2(0x65, 0x4c);
EMIT3(0x03, 0x0c, 0x25);
EMIT((u32)(unsigned long)&this_cpu_off, 4);
#endif
*pprog = prog;
}
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
#define __LOAD_TCC_PTR(off) \
@ -1412,6 +1446,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
#define LOAD_TAIL_CALL_CNT_PTR(stack) \
__LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack))
/* Memory size/value to protect private stack overflow/underflow */
#define PRIV_STACK_GUARD_SZ 8
#define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
int oldproglen, struct jit_context *ctx, bool jmp_padding)
{
@ -1421,18 +1459,28 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
int insn_cnt = bpf_prog->len;
bool seen_exit = false;
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
void __percpu *priv_frame_ptr = NULL;
u64 arena_vm_start, user_vm_start;
void __percpu *priv_stack_ptr;
int i, excnt = 0;
int ilen, proglen = 0;
u8 *prog = temp;
u32 stack_depth;
int err;
stack_depth = bpf_prog->aux->stack_depth;
priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
if (priv_stack_ptr) {
priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
stack_depth = 0;
}
arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
detect_reg_usage(insn, insn_cnt, callee_regs_used);
emit_prologue(&prog, bpf_prog->aux->stack_depth,
emit_prologue(&prog, stack_depth,
bpf_prog_was_classic(bpf_prog), tail_call_reachable,
bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
/* Exception callback will clobber callee regs for its own use, and
@ -1454,6 +1502,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
emit_mov_imm64(&prog, X86_REG_R12,
arena_vm_start >> 32, (u32) arena_vm_start);
if (priv_frame_ptr)
emit_priv_frame_ptr(&prog, priv_frame_ptr);
ilen = prog - temp;
if (rw_image)
memcpy(rw_image + proglen, temp, ilen);
@ -1473,6 +1524,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
u8 *func;
int nops;
if (priv_frame_ptr) {
if (src_reg == BPF_REG_FP)
src_reg = X86_REG_R9;
if (dst_reg == BPF_REG_FP)
dst_reg = X86_REG_R9;
}
switch (insn->code) {
/* ALU */
case BPF_ALU | BPF_ADD | BPF_X:
@ -2127,15 +2186,21 @@ populate_extable:
u8 *ip = image + addrs[i - 1];
func = (u8 *) __bpf_call_base + imm32;
if (tail_call_reachable) {
LOAD_TAIL_CALL_CNT_PTR(bpf_prog->aux->stack_depth);
if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
LOAD_TAIL_CALL_CNT_PTR(stack_depth);
ip += 7;
}
if (!imm32)
return -EINVAL;
if (priv_frame_ptr) {
push_r9(&prog);
ip += 2;
}
ip += x86_call_depth_emit_accounting(&prog, func, ip);
if (emit_call(&prog, func, ip))
return -EINVAL;
if (priv_frame_ptr)
pop_r9(&prog);
break;
}
@ -2145,13 +2210,13 @@ populate_extable:
&bpf_prog->aux->poke_tab[imm32 - 1],
&prog, image + addrs[i - 1],
callee_regs_used,
bpf_prog->aux->stack_depth,
stack_depth,
ctx);
else
emit_bpf_tail_call_indirect(bpf_prog,
&prog,
callee_regs_used,
bpf_prog->aux->stack_depth,
stack_depth,
image + addrs[i - 1],
ctx);
break;
@ -3303,6 +3368,42 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func
return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf);
}
static const char *bpf_get_prog_name(struct bpf_prog *prog)
{
if (prog->aux->ksym.prog)
return prog->aux->ksym.name;
return prog->aux->name;
}
static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
{
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
u64 *stack_ptr;
for_each_possible_cpu(cpu) {
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
stack_ptr[0] = PRIV_STACK_GUARD_VAL;
stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL;
}
}
static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
struct bpf_prog *prog)
{
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
u64 *stack_ptr;
for_each_possible_cpu(cpu) {
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL) {
pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
bpf_get_prog_name(prog));
break;
}
}
}
struct x64_jit_data {
struct bpf_binary_header *rw_header;
struct bpf_binary_header *header;
@ -3320,7 +3421,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
struct bpf_binary_header *rw_header = NULL;
struct bpf_binary_header *header = NULL;
struct bpf_prog *tmp, *orig_prog = prog;
void __percpu *priv_stack_ptr = NULL;
struct x64_jit_data *jit_data;
int priv_stack_alloc_sz;
int proglen, oldproglen = 0;
struct jit_context ctx = {};
bool tmp_blinded = false;
@ -3356,6 +3459,23 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
}
prog->aux->jit_data = jit_data;
}
priv_stack_ptr = prog->aux->priv_stack_ptr;
if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
/* Allocate actual private stack size with verifier-calculated
* stack size plus two memory guards to protect overflow and
* underflow.
*/
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) +
2 * PRIV_STACK_GUARD_SZ;
priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 8, GFP_KERNEL);
if (!priv_stack_ptr) {
prog = orig_prog;
goto out_priv_stack;
}
priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz);
prog->aux->priv_stack_ptr = priv_stack_ptr;
}
addrs = jit_data->addrs;
if (addrs) {
ctx = jit_data->ctx;
@ -3491,6 +3611,11 @@ out_image:
bpf_prog_fill_jited_linfo(prog, addrs + 1);
out_addrs:
kvfree(addrs);
if (!image && priv_stack_ptr) {
free_percpu(priv_stack_ptr);
prog->aux->priv_stack_ptr = NULL;
}
out_priv_stack:
kfree(jit_data);
prog->aux->jit_data = NULL;
}
@ -3529,6 +3654,8 @@ void bpf_jit_free(struct bpf_prog *prog)
if (prog->jited) {
struct x64_jit_data *jit_data = prog->aux->jit_data;
struct bpf_binary_header *hdr;
void __percpu *priv_stack_ptr;
int priv_stack_alloc_sz;
/*
* If we fail the final pass of JIT (from jit_subprogs),
@ -3544,6 +3671,13 @@ void bpf_jit_free(struct bpf_prog *prog)
prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset();
hdr = bpf_jit_binary_pack_hdr(prog);
bpf_jit_binary_pack_free(hdr, NULL);
priv_stack_ptr = prog->aux->priv_stack_ptr;
if (priv_stack_ptr) {
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) +
2 * PRIV_STACK_GUARD_SZ;
priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog);
free_percpu(prog->aux->priv_stack_ptr);
}
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
}
@ -3559,6 +3693,11 @@ bool bpf_jit_supports_exceptions(void)
return IS_ENABLED(CONFIG_UNWINDER_ORC);
}
bool bpf_jit_supports_private_stack(void)
{
return true;
}
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
{
#if defined(CONFIG_UNWINDER_ORC)

View File

@ -203,6 +203,7 @@ enum btf_field_type {
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9),
BPF_WORKQUEUE = (1 << 10),
BPF_UPTR = (1 << 11),
};
typedef void (*btf_dtor_kfunc_t)(void *);
@ -322,6 +323,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
return "kptr";
case BPF_KPTR_PERCPU:
return "percpu_kptr";
case BPF_UPTR:
return "uptr";
case BPF_LIST_HEAD:
return "bpf_list_head";
case BPF_LIST_NODE:
@ -350,6 +353,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
return sizeof(u64);
case BPF_LIST_HEAD:
return sizeof(struct bpf_list_head);
@ -379,6 +383,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
return __alignof__(u64);
case BPF_LIST_HEAD:
return __alignof__(struct bpf_list_head);
@ -419,6 +424,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
break;
default:
WARN_ON_ONCE(1);
@ -507,6 +513,25 @@ static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src
bpf_obj_memcpy(map->record, dst, src, map->value_size, true);
}
static inline void bpf_obj_swap_uptrs(const struct btf_record *rec, void *dst, void *src)
{
unsigned long *src_uptr, *dst_uptr;
const struct btf_field *field;
int i;
if (!btf_record_has_field(rec, BPF_UPTR))
return;
for (i = 0, field = rec->fields; i < rec->cnt; i++, field++) {
if (field->type != BPF_UPTR)
continue;
src_uptr = src + field->offset;
dst_uptr = dst + field->offset;
swap(*src_uptr, *dst_uptr);
}
}
static inline void bpf_obj_memzero(struct btf_record *rec, void *dst, u32 size)
{
u32 curr_off = 0;
@ -907,10 +932,6 @@ enum bpf_reg_type {
* additional context, assume the value is non-null.
*/
PTR_TO_BTF_ID,
/* PTR_TO_BTF_ID_OR_NULL points to a kernel struct that has not
* been checked for null. Used primarily to inform the verifier
* an explicit null check is required for this struct.
*/
PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_ARENA,
PTR_TO_BUF, /* reg points to a read/write buffer */
@ -923,6 +944,10 @@ enum bpf_reg_type {
PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | PTR_TO_SOCKET,
PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | PTR_TO_SOCK_COMMON,
PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | PTR_TO_TCP_SOCK,
/* PTR_TO_BTF_ID_OR_NULL points to a kernel struct that has not
* been checked for null. Used primarily to inform the verifier
* an explicit null check is required for this struct.
*/
PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | PTR_TO_BTF_ID,
/* This must be the last entry. Its purpose is to ensure the enum is
@ -1300,8 +1325,12 @@ void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);
bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr);
#ifdef CONFIG_BPF_JIT
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog);
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog);
struct bpf_trampoline *bpf_trampoline_get(u64 key,
struct bpf_attach_target_info *tgt_info);
void bpf_trampoline_put(struct bpf_trampoline *tr);
@ -1373,7 +1402,8 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func
void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
struct bpf_prog *to);
/* Called only from JIT-enabled code, so there's no need for stubs. */
void bpf_image_ksym_add(void *data, unsigned int size, struct bpf_ksym *ksym);
void bpf_image_ksym_init(void *data, unsigned int size, struct bpf_ksym *ksym);
void bpf_image_ksym_add(struct bpf_ksym *ksym);
void bpf_image_ksym_del(struct bpf_ksym *ksym);
void bpf_ksym_add(struct bpf_ksym *ksym);
void bpf_ksym_del(struct bpf_ksym *ksym);
@ -1382,12 +1412,14 @@ void bpf_jit_uncharge_modmem(u32 size);
bool bpf_prog_has_trampoline(const struct bpf_prog *prog);
#else
static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr)
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
return -ENOTSUPP;
}
static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr)
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
return -ENOTSUPP;
}
@ -1476,6 +1508,7 @@ struct bpf_prog_aux {
u32 max_rdwr_access;
struct btf *attach_btf;
const struct bpf_ctx_arg_aux *ctx_arg_info;
void __percpu *priv_stack_ptr;
struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */
struct bpf_prog *dst_prog;
struct bpf_trampoline *dst_trampoline;
@ -1491,7 +1524,13 @@ struct bpf_prog_aux {
bool xdp_has_frags;
bool exception_cb;
bool exception_boundary;
bool is_extended; /* true if extended by freplace program */
bool jits_use_priv_stack;
bool priv_stack_requested;
u64 prog_array_member_cnt; /* counts how many times as member of prog_array */
struct mutex ext_mutex; /* mutex for is_extended and prog_array_member_cnt */
struct bpf_arena *arena;
void (*recursion_detected)(struct bpf_prog *prog); /* callback if recursion is detected */
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
@ -3461,4 +3500,10 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
return prog->aux->func_idx != 0;
}
static inline bool bpf_prog_is_raw_tp(const struct bpf_prog *prog)
{
return prog->type == BPF_PROG_TYPE_TRACING &&
prog->expected_attach_type == BPF_TRACE_RAW_TP;
}
#endif /* _LINUX_BPF_H */

View File

@ -77,7 +77,13 @@ struct bpf_local_storage_elem {
struct hlist_node map_node; /* Linked to bpf_local_storage_map */
struct hlist_node snode; /* Linked to bpf_local_storage */
struct bpf_local_storage __rcu *local_storage;
struct rcu_head rcu;
union {
struct rcu_head rcu;
struct hlist_node free_node; /* used to postpone
* bpf_selem_free
* after raw_spin_unlock
*/
};
/* 8 bytes hole */
/* The data is stored in another cacheline to minimize
* the number of cachelines access during a cache hit.
@ -181,7 +187,7 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
bool charge_mem, gfp_t gfp_flags);
bool charge_mem, bool swap_uptrs, gfp_t gfp_flags);
void bpf_selem_free(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *smap,
@ -195,7 +201,7 @@ bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags, gfp_t gfp_flags);
void *value, u64 map_flags, bool swap_uptrs, gfp_t gfp_flags);
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map);

View File

@ -48,22 +48,6 @@ enum bpf_reg_liveness {
REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
};
/* For every reg representing a map value or allocated object pointer,
* we consider the tuple of (ptr, id) for them to be unique in verifier
* context and conside them to not alias each other for the purposes of
* tracking lock state.
*/
struct bpf_active_lock {
/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
* there's no active lock held, and other fields have no
* meaning. If non-NULL, it indicates that a lock is held and
* id member has the reg->id of the register which can be >= 0.
*/
void *ptr;
/* This will be reg->id */
u32 id;
};
#define ITER_PREFIX "bpf_iter_"
enum bpf_iter_state {
@ -266,6 +250,13 @@ struct bpf_stack_state {
};
struct bpf_reference_state {
/* Each reference object has a type. Ensure REF_TYPE_PTR is zero to
* default to pointer reference on zero initialization of a state.
*/
enum ref_state_type {
REF_TYPE_PTR = 0,
REF_TYPE_LOCK,
} type;
/* Track each reference created with a unique id, even if the same
* instruction creates the reference multiple times (eg, via CALL).
*/
@ -274,17 +265,10 @@ struct bpf_reference_state {
* is used purely to inform the user of a reference leak.
*/
int insn_idx;
/* There can be a case like:
* main (frame 0)
* cb (frame 1)
* func (frame 3)
* cb (frame 4)
* Hence for frame 4, if callback_ref just stored boolean, it would be
* impossible to distinguish nested callback refs. Hence store the
* frameno and compare that to callback_ref in check_reference_leak when
* exiting a callback function.
/* Use to keep track of the source object of a lock, to ensure
* it matches on unlock.
*/
int callback_ref;
void *ptr;
};
struct bpf_retval_range {
@ -332,6 +316,7 @@ struct bpf_func_state {
/* The following fields should be last. See copy_func_state() */
int acquired_refs;
int active_locks;
struct bpf_reference_state *refs;
/* The state of the stack. Each element of the array describes BPF_REG_SIZE
* (i.e. 8) bytes worth of stack memory.
@ -349,7 +334,7 @@ struct bpf_func_state {
#define MAX_CALL_FRAMES 8
/* instruction history flags, used in bpf_jmp_history_entry.flags field */
/* instruction history flags, used in bpf_insn_hist_entry.flags field */
enum {
/* instruction references stack slot through PTR_TO_STACK register;
* we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
@ -367,7 +352,7 @@ enum {
static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
struct bpf_jmp_history_entry {
struct bpf_insn_hist_entry {
u32 idx;
/* insn idx can't be bigger than 1 million */
u32 prev_idx : 22;
@ -434,7 +419,6 @@ struct bpf_verifier_state {
u32 insn_idx;
u32 curframe;
struct bpf_active_lock active_lock;
bool speculative;
bool active_rcu_lock;
u32 active_preempt_lock;
@ -458,13 +442,14 @@ struct bpf_verifier_state {
* See get_loop_entry() for more information.
*/
struct bpf_verifier_state *loop_entry;
/* jmp history recorded from first to last.
* backtracking is using it to go from last to first.
* For most states jmp_history_cnt is [0-3].
/* Sub-range of env->insn_hist[] corresponding to this state's
* instruction history.
* Backtracking is using it to go from last to first.
* For most states instruction history is short, 0-3 instructions.
* For loops can go up to ~40.
*/
struct bpf_jmp_history_entry *jmp_history;
u32 jmp_history_cnt;
u32 insn_hist_start;
u32 insn_hist_end;
u32 dfs_depth;
u32 callback_unroll_depth;
u32 may_goto_depth;
@ -649,6 +634,12 @@ struct bpf_subprog_arg_info {
};
};
enum priv_stack_mode {
PRIV_STACK_UNKNOWN,
NO_PRIV_STACK,
PRIV_STACK_ADAPTIVE,
};
struct bpf_subprog_info {
/* 'start' has to be the first field otherwise find_subprog() won't work */
u32 start; /* insn idx of function entry point */
@ -669,6 +660,7 @@ struct bpf_subprog_info {
/* true if bpf_fastcall stack region is used by functions that can't be inlined */
bool keep_fastcall_stack: 1;
enum priv_stack_mode priv_stack_mode;
u8 arg_cnt;
struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
};
@ -747,7 +739,9 @@ struct bpf_verifier_env {
int cur_stack;
} cfg;
struct backtrack_state bt;
struct bpf_jmp_history_entry *cur_hist_ent;
struct bpf_insn_hist_entry *insn_hist;
struct bpf_insn_hist_entry *cur_hist_ent;
u32 insn_hist_cap;
u32 pass_cnt; /* number of times do_check() was called */
u32 subprog_cnt;
/* number of instructions analyzed by the verifier */
@ -888,6 +882,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
case BPF_PROG_TYPE_TRACING:
return prog->expected_attach_type != BPF_TRACE_ITER;
case BPF_PROG_TYPE_STRUCT_OPS:
return prog->aux->jits_use_priv_stack;
case BPF_PROG_TYPE_LSM:
return false;
default:

View File

@ -75,6 +75,7 @@
#define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */
#define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
#define KF_FASTCALL (1 << 12) /* kfunc supports bpf_fastcall protocol */
/*
* Tag marking a kernel function as a kfunc. This is meant to minimize the
@ -581,6 +582,16 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
bool btf_types_are_same(const struct btf *btf1, u32 id1,
const struct btf *btf2, u32 id2);
int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx);
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
{
if (!btf_type_is_ptr(t))
return false;
t = btf_type_skip_modifiers(btf, t->type, NULL);
return btf_type_is_struct(t);
}
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)
@ -660,15 +671,4 @@ static inline int btf_check_iter_arg(struct btf *btf, const struct btf_type *fun
return -EOPNOTSUPP;
}
#endif
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
{
if (!btf_type_is_ptr(t))
return false;
t = btf_type_skip_modifiers(btf, t->type, NULL);
return btf_type_is_struct(t);
}
#endif

View File

@ -283,5 +283,6 @@ extern u32 btf_tracing_ids[];
extern u32 bpf_cgroup_btf_id[];
extern u32 bpf_local_storage_map_btf_id[];
extern u32 btf_bpf_map_id[];
extern u32 bpf_kmem_cache_btf_id[];
#endif

View File

@ -1119,6 +1119,7 @@ bool bpf_jit_supports_exceptions(void);
bool bpf_jit_supports_ptr_xchg(void);
bool bpf_jit_supports_arena(void);
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena);
bool bpf_jit_supports_private_stack(void);
u64 bpf_arch_uaddress_limit(void);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func);

View File

@ -1116,6 +1116,7 @@ enum bpf_attach_type {
BPF_NETKIT_PRIMARY,
BPF_NETKIT_PEER,
BPF_TRACE_KPROBE_SESSION,
BPF_TRACE_UPROBE_SESSION,
__MAX_BPF_ATTACH_TYPE
};
@ -1973,6 +1974,8 @@ union bpf_attr {
* program.
* Return
* The SMP id of the processor running the program.
* Attributes
* __bpf_fastcall
*
* long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
* Description
@ -3104,10 +3107,6 @@ union bpf_attr {
* with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
* option, and in this case it only works on functions tagged with
* **ALLOW_ERROR_INJECTION** in the kernel code.
*
* Also, the helper is only available for the architectures having
* the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
* x86 architecture is the only one to support this feature.
* Return
* 0
*
@ -5372,7 +5371,7 @@ union bpf_attr {
* Currently, the **flags** must be 0. Currently, nr_loops is
* limited to 1 << 23 (~8 million) loops.
*
* long (\*callback_fn)(u32 index, void \*ctx);
* long (\*callback_fn)(u64 index, void \*ctx);
*
* where **index** is the current index in the loop. The index
* is zero-indexed.

View File

@ -16,7 +16,7 @@ obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
obj-$(CONFIG_BPF_SYSCALL) += arena.o
obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
endif
obj-$(CONFIG_BPF_JIT) += dispatcher.o
ifeq ($(CONFIG_NET),y)
@ -52,3 +52,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o

View File

@ -3,9 +3,11 @@
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/err.h>
#include "linux/filter.h"
#include <linux/btf_ids.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include "range_tree.h"
/*
* bpf_arena is a sparsely populated shared memory region between bpf program and
@ -45,7 +47,7 @@ struct bpf_arena {
u64 user_vm_start;
u64 user_vm_end;
struct vm_struct *kern_vm;
struct maple_tree mt;
struct range_tree rt;
struct list_head vma_list;
struct mutex lock;
};
@ -98,6 +100,9 @@ static struct bpf_map *arena_map_alloc(union bpf_attr *attr)
u64 vm_range;
int err = -ENOMEM;
if (!bpf_jit_supports_arena())
return ERR_PTR(-EOPNOTSUPP);
if (attr->key_size || attr->value_size || attr->max_entries == 0 ||
/* BPF_F_MMAPABLE must be set */
!(attr->map_flags & BPF_F_MMAPABLE) ||
@ -132,7 +137,8 @@ static struct bpf_map *arena_map_alloc(union bpf_attr *attr)
INIT_LIST_HEAD(&arena->vma_list);
bpf_map_init_from_attr(&arena->map, attr);
mt_init_flags(&arena->mt, MT_FLAGS_ALLOC_RANGE);
range_tree_init(&arena->rt);
range_tree_set(&arena->rt, 0, attr->max_entries);
mutex_init(&arena->lock);
return &arena->map;
@ -183,7 +189,7 @@ static void arena_map_free(struct bpf_map *map)
apply_to_existing_page_range(&init_mm, bpf_arena_get_kern_vm_start(arena),
KERN_VM_SZ - GUARD_SZ, existing_page_cb, NULL);
free_vm_area(arena->kern_vm);
mtree_destroy(&arena->mt);
range_tree_destroy(&arena->rt);
bpf_map_area_free(arena);
}
@ -274,20 +280,20 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
/* User space requested to segfault when page is not allocated by bpf prog */
return VM_FAULT_SIGSEGV;
ret = mtree_insert(&arena->mt, vmf->pgoff, MT_ENTRY, GFP_KERNEL);
ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
if (ret)
return VM_FAULT_SIGSEGV;
/* Account into memcg of the process that created bpf_arena */
ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page);
if (ret) {
mtree_erase(&arena->mt, vmf->pgoff);
range_tree_set(&arena->rt, vmf->pgoff, 1);
return VM_FAULT_SIGSEGV;
}
ret = vm_area_map_pages(arena->kern_vm, kaddr, kaddr + PAGE_SIZE, &page);
if (ret) {
mtree_erase(&arena->mt, vmf->pgoff);
range_tree_set(&arena->rt, vmf->pgoff, 1);
__free_page(page);
return VM_FAULT_SIGSEGV;
}
@ -444,12 +450,16 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt
guard(mutex)(&arena->lock);
if (uaddr)
ret = mtree_insert_range(&arena->mt, pgoff, pgoff + page_cnt - 1,
MT_ENTRY, GFP_KERNEL);
else
ret = mtree_alloc_range(&arena->mt, &pgoff, MT_ENTRY,
page_cnt, 0, page_cnt_max - 1, GFP_KERNEL);
if (uaddr) {
ret = is_range_tree_set(&arena->rt, pgoff, page_cnt);
if (ret)
goto out_free_pages;
ret = range_tree_clear(&arena->rt, pgoff, page_cnt);
} else {
ret = pgoff = range_tree_find(&arena->rt, page_cnt);
if (pgoff >= 0)
ret = range_tree_clear(&arena->rt, pgoff, page_cnt);
}
if (ret)
goto out_free_pages;
@ -476,7 +486,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt
kvfree(pages);
return clear_lo32(arena->user_vm_start) + uaddr32;
out:
mtree_erase(&arena->mt, pgoff);
range_tree_set(&arena->rt, pgoff, page_cnt);
out_free_pages:
kvfree(pages);
return 0;
@ -516,7 +526,7 @@ static void arena_free_pages(struct bpf_arena *arena, long uaddr, long page_cnt)
pgoff = compute_pgoff(arena, uaddr);
/* clear range */
mtree_store_range(&arena->mt, pgoff, pgoff + page_cnt - 1, NULL, GFP_KERNEL);
range_tree_set(&arena->rt, pgoff, page_cnt);
if (page_cnt > 1)
/* bulk zap if multiple pages being freed */

View File

@ -947,22 +947,44 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
struct file *map_file, int fd)
{
struct bpf_prog *prog = bpf_prog_get(fd);
bool is_extended;
if (IS_ERR(prog))
return prog;
if (!bpf_prog_map_compatible(map, prog)) {
if (prog->type == BPF_PROG_TYPE_EXT ||
!bpf_prog_map_compatible(map, prog)) {
bpf_prog_put(prog);
return ERR_PTR(-EINVAL);
}
mutex_lock(&prog->aux->ext_mutex);
is_extended = prog->aux->is_extended;
if (!is_extended)
prog->aux->prog_array_member_cnt++;
mutex_unlock(&prog->aux->ext_mutex);
if (is_extended) {
/* Extended prog can not be tail callee. It's to prevent a
* potential infinite loop like:
* tail callee prog entry -> tail callee prog subprog ->
* freplace prog entry --tailcall-> tail callee prog entry.
*/
bpf_prog_put(prog);
return ERR_PTR(-EBUSY);
}
return prog;
}
static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
{
struct bpf_prog *prog = ptr;
mutex_lock(&prog->aux->ext_mutex);
prog->aux->prog_array_member_cnt--;
mutex_unlock(&prog->aux->ext_mutex);
/* bpf_prog is freed after one RCU or tasks trace grace period */
bpf_prog_put(ptr);
bpf_prog_put(prog);
}
static u32 prog_fd_array_sys_lookup_elem(void *ptr)

View File

@ -107,7 +107,7 @@ static long bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
bpf_cgrp_storage_lock();
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
value, map_flags, GFP_ATOMIC);
value, map_flags, false, GFP_ATOMIC);
bpf_cgrp_storage_unlock();
cgroup_put(cgroup);
return PTR_ERR_OR_ZERO(sdata);
@ -181,7 +181,7 @@ BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup,
if (!percpu_ref_is_dying(&cgroup->self.refcnt) &&
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
value, BPF_NOEXIST, gfp_flags);
value, BPF_NOEXIST, false, gfp_flags);
unlock:
bpf_cgrp_storage_unlock();

View File

@ -99,7 +99,7 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
sdata = bpf_local_storage_update(file_inode(fd_file(f)),
(struct bpf_local_storage_map *)map,
value, map_flags, GFP_ATOMIC);
value, map_flags, false, GFP_ATOMIC);
return PTR_ERR_OR_ZERO(sdata);
}
@ -153,7 +153,7 @@ BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
sdata = bpf_local_storage_update(
inode, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST, gfp_flags);
BPF_NOEXIST, false, gfp_flags);
return IS_ERR(sdata) ? (unsigned long)NULL :
(unsigned long)sdata->data;
}

View File

@ -73,7 +73,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
void *value, bool charge_mem, gfp_t gfp_flags)
void *value, bool charge_mem, bool swap_uptrs, gfp_t gfp_flags)
{
struct bpf_local_storage_elem *selem;
@ -99,9 +99,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
}
if (selem) {
if (value)
if (value) {
/* No need to call check_and_init_map_value as memory is zero init */
copy_map_value(&smap->map, SDATA(selem)->data, value);
/* No need to call check_and_init_map_value as memory is zero init */
if (swap_uptrs)
bpf_obj_swap_uptrs(smap->map.record, SDATA(selem)->data, value);
}
return selem;
}
@ -209,8 +212,12 @@ static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
static void bpf_selem_free_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage_elem *selem;
struct bpf_local_storage_map *smap;
selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
/* The bpf_local_storage_map_free will wait for rcu_barrier */
smap = rcu_dereference_check(SDATA(selem)->smap, 1);
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
bpf_mem_cache_raw_free(selem);
}
@ -226,16 +233,25 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *smap,
bool reuse_now)
{
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
if (!smap->bpf_ma) {
/* Only task storage has uptrs and task storage
* has moved to bpf_mem_alloc. Meaning smap->bpf_ma == true
* for task storage, so this bpf_obj_free_fields() won't unpin
* any uptr.
*/
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
__bpf_selem_free(selem, reuse_now);
return;
}
if (!reuse_now) {
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
} else {
if (reuse_now) {
/* reuse_now == true only happens when the storage owner
* (e.g. task_struct) is being destructed or the map itself
* is being destructed (ie map_free). In both cases,
* no bpf prog can have a hold on the selem. It is
* safe to unpin the uptrs and free the selem now.
*/
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
/* Instead of using the vanilla call_rcu(),
* bpf_mem_cache_free will be able to reuse selem
* immediately.
@ -243,6 +259,26 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
migrate_disable();
bpf_mem_cache_free(&smap->selem_ma, selem);
migrate_enable();
return;
}
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
}
static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
{
struct bpf_local_storage_elem *selem;
struct bpf_local_storage_map *smap;
struct hlist_node *n;
/* The "_safe" iteration is needed.
* The loop is not removing the selem from the list
* but bpf_selem_free will use the selem->rcu_head
* which is union-ized with the selem->free_node.
*/
hlist_for_each_entry_safe(selem, n, list, free_node) {
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
bpf_selem_free(selem, smap, reuse_now);
}
}
@ -252,7 +288,7 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
*/
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
bool uncharge_mem, bool reuse_now)
bool uncharge_mem, struct hlist_head *free_selem_list)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
@ -296,7 +332,7 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
SDATA(selem))
RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
bpf_selem_free(selem, smap, reuse_now);
hlist_add_head(&selem->free_node, free_selem_list);
if (rcu_access_pointer(local_storage->smap) == smap)
RCU_INIT_POINTER(local_storage->smap, NULL);
@ -345,6 +381,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage;
bool bpf_ma, free_local_storage = false;
HLIST_HEAD(selem_free_list);
unsigned long flags;
if (unlikely(!selem_linked_to_storage_lockless(selem)))
@ -360,9 +397,11 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
free_local_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, true, reuse_now);
local_storage, selem, true, &selem_free_list);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&selem_free_list, reuse_now);
if (free_local_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
}
@ -524,11 +563,12 @@ uncharge:
*/
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags, gfp_t gfp_flags)
void *value, u64 map_flags, bool swap_uptrs, gfp_t gfp_flags)
{
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
struct bpf_local_storage *local_storage;
HLIST_HEAD(old_selem_free_list);
unsigned long flags;
int err;
@ -550,7 +590,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (err)
return ERR_PTR(err);
selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
@ -584,7 +624,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
/* A lookup has just been done before and concluded a new selem is
* needed. The chance of an unnecessary alloc is unlikely.
*/
alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
if (!alloc_selem)
return ERR_PTR(-ENOMEM);
@ -624,11 +664,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (old_sdata) {
bpf_selem_unlink_map(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
true, false);
true, &old_selem_free_list);
}
unlock:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&old_selem_free_list, false);
if (alloc_selem) {
mem_uncharge(smap, owner, smap->elem_size);
bpf_selem_free(alloc_selem, smap, true);
@ -706,6 +747,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage_elem *selem;
bool bpf_ma, free_storage = false;
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
@ -734,10 +776,12 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
* of the loop will set the free_cgroup_storage to true.
*/
free_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, true, true);
local_storage, selem, true, &free_selem_list);
}
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&free_selem_list, true);
if (free_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
}
@ -883,6 +927,9 @@ void bpf_local_storage_map_free(struct bpf_map *map,
synchronize_rcu();
if (smap->bpf_ma) {
rcu_barrier_tasks_trace();
if (!rcu_trace_implies_rcu_gp())
rcu_barrier();
bpf_mem_alloc_destroy(&smap->selem_ma);
bpf_mem_alloc_destroy(&smap->storage_ma);
}

View File

@ -23,7 +23,6 @@ struct bpf_struct_ops_value {
struct bpf_struct_ops_map {
struct bpf_map map;
struct rcu_head rcu;
const struct bpf_struct_ops_desc *st_ops_desc;
/* protect map_update */
struct mutex lock;
@ -32,7 +31,9 @@ struct bpf_struct_ops_map {
* (in kvalue.data).
*/
struct bpf_link **links;
u32 links_cnt;
/* ksyms for bpf trampolines */
struct bpf_ksym **ksyms;
u32 funcs_cnt;
u32 image_pages_cnt;
/* image_pages is an array of pages that has all the trampolines
* that stores the func args before calling the bpf_prog.
@ -481,11 +482,11 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
{
u32 i;
for (i = 0; i < st_map->links_cnt; i++) {
if (st_map->links[i]) {
bpf_link_put(st_map->links[i]);
st_map->links[i] = NULL;
}
for (i = 0; i < st_map->funcs_cnt; i++) {
if (!st_map->links[i])
break;
bpf_link_put(st_map->links[i]);
st_map->links[i] = NULL;
}
}
@ -586,6 +587,49 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
return 0;
}
static void bpf_struct_ops_ksym_init(const char *tname, const char *mname,
void *image, unsigned int size,
struct bpf_ksym *ksym)
{
snprintf(ksym->name, KSYM_NAME_LEN, "bpf__%s_%s", tname, mname);
INIT_LIST_HEAD_RCU(&ksym->lnode);
bpf_image_ksym_init(image, size, ksym);
}
static void bpf_struct_ops_map_add_ksyms(struct bpf_struct_ops_map *st_map)
{
u32 i;
for (i = 0; i < st_map->funcs_cnt; i++) {
if (!st_map->ksyms[i])
break;
bpf_image_ksym_add(st_map->ksyms[i]);
}
}
static void bpf_struct_ops_map_del_ksyms(struct bpf_struct_ops_map *st_map)
{
u32 i;
for (i = 0; i < st_map->funcs_cnt; i++) {
if (!st_map->ksyms[i])
break;
bpf_image_ksym_del(st_map->ksyms[i]);
}
}
static void bpf_struct_ops_map_free_ksyms(struct bpf_struct_ops_map *st_map)
{
u32 i;
for (i = 0; i < st_map->funcs_cnt; i++) {
if (!st_map->ksyms[i])
break;
kfree(st_map->ksyms[i]);
st_map->ksyms[i] = NULL;
}
}
static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
@ -601,6 +645,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
int prog_fd, err;
u32 i, trampoline_start, image_off = 0;
void *cur_image = NULL, *image = NULL;
struct bpf_link **plink;
struct bpf_ksym **pksym;
const char *tname, *mname;
if (flags)
return -EINVAL;
@ -639,14 +686,19 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
udata = &uvalue->data;
kdata = &kvalue->data;
plink = st_map->links;
pksym = st_map->ksyms;
tname = btf_name_by_offset(st_map->btf, t->name_off);
module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]);
for_each_member(i, t, member) {
const struct btf_type *mtype, *ptype;
struct bpf_prog *prog;
struct bpf_tramp_link *link;
struct bpf_ksym *ksym;
u32 moff;
moff = __btf_member_bit_offset(t, member) / 8;
mname = btf_name_by_offset(st_map->btf, member->name_off);
ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL);
if (ptype == module_type) {
if (*(void **)(udata + moff))
@ -714,7 +766,14 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
}
bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS,
&bpf_struct_ops_link_lops, prog);
st_map->links[i] = &link->link;
*plink++ = &link->link;
ksym = kzalloc(sizeof(*ksym), GFP_USER);
if (!ksym) {
err = -ENOMEM;
goto reset_unlock;
}
*pksym++ = ksym;
trampoline_start = image_off;
err = bpf_struct_ops_prepare_trampoline(tlinks, link,
@ -735,6 +794,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
/* put prog_id to udata */
*(unsigned long *)(udata + moff) = prog->aux->id;
/* init ksym for this trampoline */
bpf_struct_ops_ksym_init(tname, mname,
image + trampoline_start,
image_off - trampoline_start,
ksym);
}
if (st_ops->validate) {
@ -783,6 +848,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
*/
reset_unlock:
bpf_struct_ops_map_free_ksyms(st_map);
bpf_struct_ops_map_free_image(st_map);
bpf_struct_ops_map_put_progs(st_map);
memset(uvalue, 0, map->value_size);
@ -790,6 +856,8 @@ reset_unlock:
unlock:
kfree(tlinks);
mutex_unlock(&st_map->lock);
if (!err)
bpf_struct_ops_map_add_ksyms(st_map);
return err;
}
@ -849,7 +917,10 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map)
if (st_map->links)
bpf_struct_ops_map_put_progs(st_map);
if (st_map->ksyms)
bpf_struct_ops_map_free_ksyms(st_map);
bpf_map_area_free(st_map->links);
bpf_map_area_free(st_map->ksyms);
bpf_struct_ops_map_free_image(st_map);
bpf_map_area_free(st_map->uvalue);
bpf_map_area_free(st_map);
@ -866,6 +937,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
if (btf_is_module(st_map->btf))
module_put(st_map->st_ops_desc->st_ops->owner);
bpf_struct_ops_map_del_ksyms(st_map);
/* The struct_ops's function may switch to another struct_ops.
*
* For example, bpf_tcp_cc_x->init() may switch to
@ -895,6 +968,19 @@ static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
return 0;
}
static u32 count_func_ptrs(const struct btf *btf, const struct btf_type *t)
{
int i;
u32 count;
const struct btf_member *member;
count = 0;
for_each_member(i, t, member)
if (btf_type_resolve_func_ptr(btf, member->type, NULL))
count++;
return count;
}
static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
{
const struct bpf_struct_ops_desc *st_ops_desc;
@ -961,11 +1047,15 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
map = &st_map->map;
st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
st_map->links_cnt = btf_type_vlen(t);
st_map->funcs_cnt = count_func_ptrs(btf, t);
st_map->links =
bpf_map_area_alloc(st_map->links_cnt * sizeof(struct bpf_links *),
bpf_map_area_alloc(st_map->funcs_cnt * sizeof(struct bpf_link *),
NUMA_NO_NODE);
if (!st_map->uvalue || !st_map->links) {
st_map->ksyms =
bpf_map_area_alloc(st_map->funcs_cnt * sizeof(struct bpf_ksym *),
NUMA_NO_NODE);
if (!st_map->uvalue || !st_map->links || !st_map->ksyms) {
ret = -ENOMEM;
goto errout_free;
}
@ -994,7 +1084,8 @@ static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
usage = sizeof(*st_map) +
vt->size - sizeof(struct bpf_struct_ops_value);
usage += vt->size;
usage += btf_type_vlen(vt) * sizeof(struct bpf_links *);
usage += st_map->funcs_cnt * sizeof(struct bpf_link *);
usage += st_map->funcs_cnt * sizeof(struct bpf_ksym *);
usage += PAGE_SIZE;
return usage;
}

View File

@ -128,6 +128,9 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
struct pid *pid;
int fd, err;
if ((map_flags & BPF_F_LOCK) && btf_record_has_field(map->record, BPF_UPTR))
return -EOPNOTSUPP;
fd = *(int *)key;
pid = pidfd_get_pid(fd, &f_flags);
if (IS_ERR(pid))
@ -146,7 +149,7 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
bpf_task_storage_lock();
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value, map_flags,
GFP_ATOMIC);
true, GFP_ATOMIC);
bpf_task_storage_unlock();
err = PTR_ERR_OR_ZERO(sdata);
@ -218,7 +221,7 @@ static void *__bpf_task_storage_get(struct bpf_map *map,
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) {
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST, gfp_flags);
BPF_NOEXIST, false, gfp_flags);
return IS_ERR(sdata) ? NULL : sdata->data;
}

View File

@ -2808,7 +2808,7 @@ static void btf_ref_type_log(struct btf_verifier_env *env,
btf_verifier_log(env, "type_id=%u", t->type);
}
static struct btf_kind_operations modifier_ops = {
static const struct btf_kind_operations modifier_ops = {
.check_meta = btf_ref_type_check_meta,
.resolve = btf_modifier_resolve,
.check_member = btf_modifier_check_member,
@ -2817,7 +2817,7 @@ static struct btf_kind_operations modifier_ops = {
.show = btf_modifier_show,
};
static struct btf_kind_operations ptr_ops = {
static const struct btf_kind_operations ptr_ops = {
.check_meta = btf_ref_type_check_meta,
.resolve = btf_ptr_resolve,
.check_member = btf_ptr_check_member,
@ -2858,7 +2858,7 @@ static void btf_fwd_type_log(struct btf_verifier_env *env,
btf_verifier_log(env, "%s", btf_type_kflag(t) ? "union" : "struct");
}
static struct btf_kind_operations fwd_ops = {
static const struct btf_kind_operations fwd_ops = {
.check_meta = btf_fwd_check_meta,
.resolve = btf_df_resolve,
.check_member = btf_df_check_member,
@ -3109,7 +3109,7 @@ static void btf_array_show(const struct btf *btf, const struct btf_type *t,
__btf_array_show(btf, t, type_id, data, bits_offset, show);
}
static struct btf_kind_operations array_ops = {
static const struct btf_kind_operations array_ops = {
.check_meta = btf_array_check_meta,
.resolve = btf_array_resolve,
.check_member = btf_array_check_member,
@ -3334,7 +3334,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
}
static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
u32 off, int sz, struct btf_field_info *info)
u32 off, int sz, struct btf_field_info *info, u32 field_mask)
{
enum btf_field_type type;
u32 res_id;
@ -3358,9 +3358,14 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
type = BPF_KPTR_REF;
else if (!strcmp("percpu_kptr", __btf_name_by_offset(btf, t->name_off)))
type = BPF_KPTR_PERCPU;
else if (!strcmp("uptr", __btf_name_by_offset(btf, t->name_off)))
type = BPF_UPTR;
else
return -EINVAL;
if (!(type & field_mask))
return BTF_FIELD_IGNORE;
/* Get the base type */
t = btf_type_skip_modifiers(btf, t->type, &res_id);
/* Only pointer to struct is allowed */
@ -3502,7 +3507,7 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
field_mask_test_name(BPF_REFCOUNT, "bpf_refcount");
/* Only return BPF_KPTR when all other types with matchable names fail */
if (field_mask & BPF_KPTR && !__btf_type_is_struct(var_type)) {
if (field_mask & (BPF_KPTR | BPF_UPTR) && !__btf_type_is_struct(var_type)) {
type = BPF_KPTR_REF;
goto end;
}
@ -3535,6 +3540,7 @@ static int btf_repeat_fields(struct btf_field_info *info, int info_cnt,
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
case BPF_LIST_HEAD:
case BPF_RB_ROOT:
break;
@ -3667,8 +3673,9 @@ static int btf_find_field_one(const struct btf *btf,
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
ret = btf_find_kptr(btf, var_type, off, sz,
info_cnt ? &info[0] : &tmp);
info_cnt ? &info[0] : &tmp, field_mask);
if (ret < 0)
return ret;
break;
@ -3991,6 +3998,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
if (ret < 0)
goto end;
@ -4050,12 +4058,28 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
* Hence we only need to ensure that bpf_{list_head,rb_root} ownership
* does not form cycles.
*/
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_GRAPH_ROOT))
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & (BPF_GRAPH_ROOT | BPF_UPTR)))
return 0;
for (i = 0; i < rec->cnt; i++) {
struct btf_struct_meta *meta;
const struct btf_type *t;
u32 btf_id;
if (rec->fields[i].type == BPF_UPTR) {
/* The uptr only supports pinning one page and cannot
* point to a kernel struct
*/
if (btf_is_kernel(rec->fields[i].kptr.btf))
return -EINVAL;
t = btf_type_by_id(rec->fields[i].kptr.btf,
rec->fields[i].kptr.btf_id);
if (!t->size)
return -EINVAL;
if (t->size > PAGE_SIZE)
return -E2BIG;
continue;
}
if (!(rec->fields[i].type & BPF_GRAPH_ROOT))
continue;
btf_id = rec->fields[i].graph_root.value_btf_id;
@ -4191,7 +4215,7 @@ static void btf_struct_show(const struct btf *btf, const struct btf_type *t,
__btf_struct_show(btf, t, type_id, data, bits_offset, show);
}
static struct btf_kind_operations struct_ops = {
static const struct btf_kind_operations struct_ops = {
.check_meta = btf_struct_check_meta,
.resolve = btf_struct_resolve,
.check_member = btf_struct_check_member,
@ -4359,7 +4383,7 @@ static void btf_enum_show(const struct btf *btf, const struct btf_type *t,
btf_show_end_type(show);
}
static struct btf_kind_operations enum_ops = {
static const struct btf_kind_operations enum_ops = {
.check_meta = btf_enum_check_meta,
.resolve = btf_df_resolve,
.check_member = btf_enum_check_member,
@ -4462,7 +4486,7 @@ static void btf_enum64_show(const struct btf *btf, const struct btf_type *t,
btf_show_end_type(show);
}
static struct btf_kind_operations enum64_ops = {
static const struct btf_kind_operations enum64_ops = {
.check_meta = btf_enum64_check_meta,
.resolve = btf_df_resolve,
.check_member = btf_enum_check_member,
@ -4540,7 +4564,7 @@ done:
btf_verifier_log(env, ")");
}
static struct btf_kind_operations func_proto_ops = {
static const struct btf_kind_operations func_proto_ops = {
.check_meta = btf_func_proto_check_meta,
.resolve = btf_df_resolve,
/*
@ -4598,7 +4622,7 @@ static int btf_func_resolve(struct btf_verifier_env *env,
return 0;
}
static struct btf_kind_operations func_ops = {
static const struct btf_kind_operations func_ops = {
.check_meta = btf_func_check_meta,
.resolve = btf_func_resolve,
.check_member = btf_df_check_member,
@ -5566,7 +5590,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
goto free_aof;
}
ret = btf_find_kptr(btf, t, 0, 0, &tmp);
ret = btf_find_kptr(btf, t, 0, 0, &tmp, BPF_KPTR);
if (ret != BTF_FIELD_FOUND)
continue;
@ -6564,7 +6588,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
if (prog_args_trusted(prog))
info->reg_type |= PTR_TRUSTED;
if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
/* Raw tracepoint arguments always get marked as maybe NULL */
if (bpf_prog_is_raw_tp(prog))
info->reg_type |= PTR_MAYBE_NULL;
else if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
info->reg_type |= PTR_MAYBE_NULL;
if (tgt_prog) {

View File

@ -131,6 +131,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
INIT_LIST_HEAD_RCU(&fp->aux->ksym_prefix.lnode);
#endif
mutex_init(&fp->aux->used_maps_mutex);
mutex_init(&fp->aux->ext_mutex);
mutex_init(&fp->aux->dst_mutex);
return fp;
@ -3044,6 +3045,11 @@ bool __weak bpf_jit_supports_exceptions(void)
return false;
}
bool __weak bpf_jit_supports_private_stack(void)
{
return false;
}
void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
{
}

View File

@ -154,7 +154,8 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
d->image = NULL;
goto out;
}
bpf_image_ksym_add(d->image, PAGE_SIZE, &d->ksym);
bpf_image_ksym_init(d->image, PAGE_SIZE, &d->ksym);
bpf_image_ksym_add(&d->ksym);
}
prev_num_progs = d->num_progs;

View File

@ -896,9 +896,12 @@ find_first_elem:
static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
check_and_free_fields(htab, l);
migrate_disable();
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
bpf_mem_cache_free(&htab->ma, l);
migrate_enable();
}
static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
@ -948,7 +951,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
if (htab_is_prealloc(htab)) {
bpf_map_dec_elem_count(&htab->map);
check_and_free_fields(htab, l);
__pcpu_freelist_push(&htab->freelist, &l->fnode);
pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
dec_elem_count(htab);
htab_elem_free(htab, l);
@ -1018,7 +1021,6 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
*/
pl_new = this_cpu_ptr(htab->extra_elems);
l_new = *pl_new;
htab_put_fd_value(htab, old_elem);
*pl_new = old_elem;
} else {
struct pcpu_freelist_node *l;
@ -1105,6 +1107,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
struct htab_elem *l_new = NULL, *l_old;
struct hlist_nulls_head *head;
unsigned long flags;
void *old_map_ptr;
struct bucket *b;
u32 key_size, hash;
int ret;
@ -1183,12 +1186,27 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
hlist_nulls_add_head_rcu(&l_new->hash_node, head);
if (l_old) {
hlist_nulls_del_rcu(&l_old->hash_node);
/* l_old has already been stashed in htab->extra_elems, free
* its special fields before it is available for reuse. Also
* save the old map pointer in htab of maps before unlock
* and release it after unlock.
*/
old_map_ptr = NULL;
if (htab_is_prealloc(htab)) {
if (map->ops->map_fd_put_ptr)
old_map_ptr = fd_htab_map_get_ptr(map, l_old);
check_and_free_fields(htab, l_old);
}
}
htab_unlock_bucket(htab, b, hash, flags);
if (l_old) {
if (old_map_ptr)
map->ops->map_fd_put_ptr(map, old_map_ptr, true);
if (!htab_is_prealloc(htab))
free_htab_elem(htab, l_old);
else
check_and_free_fields(htab, l_old);
}
ret = 0;
return 0;
err:
htab_unlock_bucket(htab, b, hash, flags);
return ret;
@ -1432,15 +1450,15 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
return ret;
l = lookup_elem_raw(head, hash, key, key_size);
if (l) {
if (l)
hlist_nulls_del_rcu(&l->hash_node);
free_htab_elem(htab, l);
} else {
else
ret = -ENOENT;
}
htab_unlock_bucket(htab, b, hash, flags);
if (l)
free_htab_elem(htab, l);
return ret;
}
@ -1853,13 +1871,14 @@ again_nocopy:
* may cause deadlock. See comments in function
* prealloc_lru_pop(). Let us do bpf_lru_push_free()
* after releasing the bucket lock.
*
* For htab of maps, htab_put_fd_value() in
* free_htab_elem() may acquire a spinlock with bucket
* lock being held and it violates the lock rule, so
* invoke free_htab_elem() after unlock as well.
*/
if (is_lru_map) {
l->batch_flink = node_to_free;
node_to_free = l;
} else {
free_htab_elem(htab, l);
}
l->batch_flink = node_to_free;
node_to_free = l;
}
dst_key += key_size;
dst_val += value_size;
@ -1871,7 +1890,10 @@ again_nocopy:
while (node_to_free) {
l = node_to_free;
node_to_free = node_to_free->batch_flink;
htab_lru_push_free(htab, l);
if (is_lru_map)
htab_lru_push_free(htab, l);
else
free_htab_elem(htab, l);
}
next_batch:

View File

@ -2521,6 +2521,25 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
return p;
}
/**
* bpf_task_from_vpid - Find a struct task_struct from its vpid by looking it up
* in the pid namespace of the current task. If a task is returned, it must
* either be stored in a map, or released with bpf_task_release().
* @vpid: The vpid of the task being looked up.
*/
__bpf_kfunc struct task_struct *bpf_task_from_vpid(s32 vpid)
{
struct task_struct *p;
rcu_read_lock();
p = find_task_by_vpid(vpid);
if (p)
p = bpf_task_acquire(p);
rcu_read_unlock();
return p;
}
/**
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
* @p: The dynptr whose data slice to retrieve
@ -3068,7 +3087,9 @@ BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_from_vpid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
BTF_ID_FLAGS(func, bpf_send_signal_task, KF_TRUSTED_ARGS)
BTF_KFUNCS_END(generic_btf_ids)
static const struct btf_kfunc_id_set generic_kfunc_set = {
@ -3086,8 +3107,8 @@ BTF_ID(func, bpf_cgroup_release_dtor)
#endif
BTF_KFUNCS_START(common_btf_ids)
BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx)
BTF_ID_FLAGS(func, bpf_rdonly_cast)
BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx, KF_FASTCALL)
BTF_ID_FLAGS(func, bpf_rdonly_cast, KF_FASTCALL)
BTF_ID_FLAGS(func, bpf_rcu_read_lock)
BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
BTF_ID_FLAGS(func, bpf_dynptr_slice, KF_RET_NULL)
@ -3124,6 +3145,10 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_get_kmem_cache)
BTF_ID_FLAGS(func, bpf_iter_kmem_cache_new, KF_ITER_NEW | KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_iter_kmem_cache_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE)
BTF_KFUNCS_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {

View File

@ -0,0 +1,238 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Google */
#include <linux/bpf.h>
#include <linux/btf_ids.h>
#include <linux/slab.h>
#include <linux/kernel.h>
#include <linux/seq_file.h>
#include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */
/* open-coded version */
struct bpf_iter_kmem_cache {
__u64 __opaque[1];
} __attribute__((aligned(8)));
struct bpf_iter_kmem_cache_kern {
struct kmem_cache *pos;
} __attribute__((aligned(8)));
#define KMEM_CACHE_POS_START ((void *)1L)
__bpf_kfunc_start_defs();
__bpf_kfunc int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it)
{
struct bpf_iter_kmem_cache_kern *kit = (void *)it;
BUILD_BUG_ON(sizeof(*kit) > sizeof(*it));
BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it));
kit->pos = KMEM_CACHE_POS_START;
return 0;
}
__bpf_kfunc struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it)
{
struct bpf_iter_kmem_cache_kern *kit = (void *)it;
struct kmem_cache *prev = kit->pos;
struct kmem_cache *next;
bool destroy = false;
if (!prev)
return NULL;
mutex_lock(&slab_mutex);
if (list_empty(&slab_caches)) {
mutex_unlock(&slab_mutex);
return NULL;
}
if (prev == KMEM_CACHE_POS_START)
next = list_first_entry(&slab_caches, struct kmem_cache, list);
else if (list_last_entry(&slab_caches, struct kmem_cache, list) == prev)
next = NULL;
else
next = list_next_entry(prev, list);
/* boot_caches have negative refcount, don't touch them */
if (next && next->refcount > 0)
next->refcount++;
/* Skip kmem_cache_destroy() for active entries */
if (prev && prev != KMEM_CACHE_POS_START) {
if (prev->refcount > 1)
prev->refcount--;
else if (prev->refcount == 1)
destroy = true;
}
mutex_unlock(&slab_mutex);
if (destroy)
kmem_cache_destroy(prev);
kit->pos = next;
return next;
}
__bpf_kfunc void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it)
{
struct bpf_iter_kmem_cache_kern *kit = (void *)it;
struct kmem_cache *s = kit->pos;
bool destroy = false;
if (s == NULL || s == KMEM_CACHE_POS_START)
return;
mutex_lock(&slab_mutex);
/* Skip kmem_cache_destroy() for active entries */
if (s->refcount > 1)
s->refcount--;
else if (s->refcount == 1)
destroy = true;
mutex_unlock(&slab_mutex);
if (destroy)
kmem_cache_destroy(s);
}
__bpf_kfunc_end_defs();
struct bpf_iter__kmem_cache {
__bpf_md_ptr(struct bpf_iter_meta *, meta);
__bpf_md_ptr(struct kmem_cache *, s);
};
union kmem_cache_iter_priv {
struct bpf_iter_kmem_cache it;
struct bpf_iter_kmem_cache_kern kit;
};
static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
{
loff_t cnt = 0;
bool found = false;
struct kmem_cache *s;
union kmem_cache_iter_priv *p = seq->private;
mutex_lock(&slab_mutex);
/* Find an entry at the given position in the slab_caches list instead
* of keeping a reference (of the last visited entry, if any) out of
* slab_mutex. It might miss something if one is deleted in the middle
* while it releases the lock. But it should be rare and there's not
* much we can do about it.
*/
list_for_each_entry(s, &slab_caches, list) {
if (cnt == *pos) {
/* Make sure this entry remains in the list by getting
* a new reference count. Note that boot_cache entries
* have a negative refcount, so don't touch them.
*/
if (s->refcount > 0)
s->refcount++;
found = true;
break;
}
cnt++;
}
mutex_unlock(&slab_mutex);
if (!found)
s = NULL;
p->kit.pos = s;
return s;
}
static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
{
struct bpf_iter_meta meta;
struct bpf_iter__kmem_cache ctx = {
.meta = &meta,
.s = v,
};
union kmem_cache_iter_priv *p = seq->private;
struct bpf_prog *prog;
meta.seq = seq;
prog = bpf_iter_get_info(&meta, true);
if (prog && !ctx.s)
bpf_iter_run_prog(prog, &ctx);
bpf_iter_kmem_cache_destroy(&p->it);
}
static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
union kmem_cache_iter_priv *p = seq->private;
++*pos;
return bpf_iter_kmem_cache_next(&p->it);
}
static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v)
{
struct bpf_iter_meta meta;
struct bpf_iter__kmem_cache ctx = {
.meta = &meta,
.s = v,
};
struct bpf_prog *prog;
int ret = 0;
meta.seq = seq;
prog = bpf_iter_get_info(&meta, false);
if (prog)
ret = bpf_iter_run_prog(prog, &ctx);
return ret;
}
static const struct seq_operations kmem_cache_iter_seq_ops = {
.start = kmem_cache_iter_seq_start,
.next = kmem_cache_iter_seq_next,
.stop = kmem_cache_iter_seq_stop,
.show = kmem_cache_iter_seq_show,
};
BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache)
static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = {
.seq_ops = &kmem_cache_iter_seq_ops,
.seq_priv_size = sizeof(union kmem_cache_iter_priv),
};
static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux,
struct seq_file *seq)
{
seq_puts(seq, "kmem_cache iter\n");
}
DEFINE_BPF_ITER_FUNC(kmem_cache, struct bpf_iter_meta *meta,
struct kmem_cache *s)
static struct bpf_iter_reg bpf_kmem_cache_reg_info = {
.target = "kmem_cache",
.feature = BPF_ITER_RESCHED,
.show_fdinfo = bpf_iter_kmem_cache_show_fdinfo,
.ctx_arg_info_size = 1,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__kmem_cache, s),
PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
},
.seq_info = &kmem_cache_iter_seq_info,
};
static int __init bpf_kmem_cache_iter_init(void)
{
bpf_kmem_cache_reg_info.ctx_arg_info[0].btf_id = bpf_kmem_cache_btf_id[0];
return bpf_iter_reg_target(&bpf_kmem_cache_reg_info);
}
late_initcall(bpf_kmem_cache_iter_init);

View File

@ -254,11 +254,8 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node, bool atomic)
static void free_one(void *obj, bool percpu)
{
if (percpu) {
if (percpu)
free_percpu(((void __percpu **)obj)[1]);
kfree(obj);
return;
}
kfree(obj);
}

272
kernel/bpf/range_tree.c Normal file
View File

@ -0,0 +1,272 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <linux/interval_tree_generic.h>
#include <linux/slab.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/bpf.h>
#include "range_tree.h"
/*
* struct range_tree is a data structure used to allocate contiguous memory
* ranges in bpf arena. It's a large bitmap. The contiguous sequence of bits is
* represented by struct range_node or 'rn' for short.
* rn->rn_rbnode links it into an interval tree while
* rn->rb_range_size links it into a second rbtree sorted by size of the range.
* __find_range() performs binary search and best fit algorithm to find the
* range less or equal requested size.
* range_tree_clear/set() clears or sets a range of bits in this bitmap. The
* adjacent ranges are merged or split at the same time.
*
* The split/merge logic is based/borrowed from XFS's xbitmap32 added
* in commit 6772fcc8890a ("xfs: convert xbitmap to interval tree").
*
* The implementation relies on external lock to protect rbtree-s.
* The alloc/free of range_node-s is done via bpf_mem_alloc.
*
* bpf arena is using range_tree to represent unallocated slots.
* At init time:
* range_tree_set(rt, 0, max);
* Then:
* start = range_tree_find(rt, len);
* if (start >= 0)
* range_tree_clear(rt, start, len);
* to find free range and mark slots as allocated and later:
* range_tree_set(rt, start, len);
* to mark as unallocated after use.
*/
struct range_node {
struct rb_node rn_rbnode;
struct rb_node rb_range_size;
u32 rn_start;
u32 rn_last; /* inclusive */
u32 __rn_subtree_last;
};
static struct range_node *rb_to_range_node(struct rb_node *rb)
{
return rb_entry(rb, struct range_node, rb_range_size);
}
static u32 rn_size(struct range_node *rn)
{
return rn->rn_last - rn->rn_start + 1;
}
/* Find range that fits best to requested size */
static inline struct range_node *__find_range(struct range_tree *rt, u32 len)
{
struct rb_node *rb = rt->range_size_root.rb_root.rb_node;
struct range_node *best = NULL;
while (rb) {
struct range_node *rn = rb_to_range_node(rb);
if (len <= rn_size(rn)) {
best = rn;
rb = rb->rb_right;
} else {
rb = rb->rb_left;
}
}
return best;
}
s64 range_tree_find(struct range_tree *rt, u32 len)
{
struct range_node *rn;
rn = __find_range(rt, len);
if (!rn)
return -ENOENT;
return rn->rn_start;
}
/* Insert the range into rbtree sorted by the range size */
static inline void __range_size_insert(struct range_node *rn,
struct rb_root_cached *root)
{
struct rb_node **link = &root->rb_root.rb_node, *rb = NULL;
u64 size = rn_size(rn);
bool leftmost = true;
while (*link) {
rb = *link;
if (size > rn_size(rb_to_range_node(rb))) {
link = &rb->rb_left;
} else {
link = &rb->rb_right;
leftmost = false;
}
}
rb_link_node(&rn->rb_range_size, rb, link);
rb_insert_color_cached(&rn->rb_range_size, root, leftmost);
}
#define START(node) ((node)->rn_start)
#define LAST(node) ((node)->rn_last)
INTERVAL_TREE_DEFINE(struct range_node, rn_rbnode, u32,
__rn_subtree_last, START, LAST,
static inline __maybe_unused,
__range_it)
static inline __maybe_unused void
range_it_insert(struct range_node *rn, struct range_tree *rt)
{
__range_size_insert(rn, &rt->range_size_root);
__range_it_insert(rn, &rt->it_root);
}
static inline __maybe_unused void
range_it_remove(struct range_node *rn, struct range_tree *rt)
{
rb_erase_cached(&rn->rb_range_size, &rt->range_size_root);
RB_CLEAR_NODE(&rn->rb_range_size);
__range_it_remove(rn, &rt->it_root);
}
static inline __maybe_unused struct range_node *
range_it_iter_first(struct range_tree *rt, u32 start, u32 last)
{
return __range_it_iter_first(&rt->it_root, start, last);
}
/* Clear the range in this range tree */
int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
{
u32 last = start + len - 1;
struct range_node *new_rn;
struct range_node *rn;
while ((rn = range_it_iter_first(rt, start, last))) {
if (rn->rn_start < start && rn->rn_last > last) {
u32 old_last = rn->rn_last;
/* Overlaps with the entire clearing range */
range_it_remove(rn, rt);
rn->rn_last = start - 1;
range_it_insert(rn, rt);
/* Add a range */
migrate_disable();
new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
migrate_enable();
if (!new_rn)
return -ENOMEM;
new_rn->rn_start = last + 1;
new_rn->rn_last = old_last;
range_it_insert(new_rn, rt);
} else if (rn->rn_start < start) {
/* Overlaps with the left side of the clearing range */
range_it_remove(rn, rt);
rn->rn_last = start - 1;
range_it_insert(rn, rt);
} else if (rn->rn_last > last) {
/* Overlaps with the right side of the clearing range */
range_it_remove(rn, rt);
rn->rn_start = last + 1;
range_it_insert(rn, rt);
break;
} else {
/* in the middle of the clearing range */
range_it_remove(rn, rt);
migrate_disable();
bpf_mem_free(&bpf_global_ma, rn);
migrate_enable();
}
}
return 0;
}
/* Is the whole range set ? */
int is_range_tree_set(struct range_tree *rt, u32 start, u32 len)
{
u32 last = start + len - 1;
struct range_node *left;
/* Is this whole range set ? */
left = range_it_iter_first(rt, start, last);
if (left && left->rn_start <= start && left->rn_last >= last)
return 0;
return -ESRCH;
}
/* Set the range in this range tree */
int range_tree_set(struct range_tree *rt, u32 start, u32 len)
{
u32 last = start + len - 1;
struct range_node *right;
struct range_node *left;
int err;
/* Is this whole range already set ? */
left = range_it_iter_first(rt, start, last);
if (left && left->rn_start <= start && left->rn_last >= last)
return 0;
/* Clear out everything in the range we want to set. */
err = range_tree_clear(rt, start, len);
if (err)
return err;
/* Do we have a left-adjacent range ? */
left = range_it_iter_first(rt, start - 1, start - 1);
if (left && left->rn_last + 1 != start)
return -EFAULT;
/* Do we have a right-adjacent range ? */
right = range_it_iter_first(rt, last + 1, last + 1);
if (right && right->rn_start != last + 1)
return -EFAULT;
if (left && right) {
/* Combine left and right adjacent ranges */
range_it_remove(left, rt);
range_it_remove(right, rt);
left->rn_last = right->rn_last;
range_it_insert(left, rt);
migrate_disable();
bpf_mem_free(&bpf_global_ma, right);
migrate_enable();
} else if (left) {
/* Combine with the left range */
range_it_remove(left, rt);
left->rn_last = last;
range_it_insert(left, rt);
} else if (right) {
/* Combine with the right range */
range_it_remove(right, rt);
right->rn_start = start;
range_it_insert(right, rt);
} else {
migrate_disable();
left = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
migrate_enable();
if (!left)
return -ENOMEM;
left->rn_start = start;
left->rn_last = last;
range_it_insert(left, rt);
}
return 0;
}
void range_tree_destroy(struct range_tree *rt)
{
struct range_node *rn;
while ((rn = range_it_iter_first(rt, 0, -1U))) {
range_it_remove(rn, rt);
migrate_disable();
bpf_mem_free(&bpf_global_ma, rn);
migrate_enable();
}
}
void range_tree_init(struct range_tree *rt)
{
rt->it_root = RB_ROOT_CACHED;
rt->range_size_root = RB_ROOT_CACHED;
}

21
kernel/bpf/range_tree.h Normal file
View File

@ -0,0 +1,21 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#ifndef _RANGE_TREE_H
#define _RANGE_TREE_H 1
struct range_tree {
/* root of interval tree */
struct rb_root_cached it_root;
/* root of rbtree of interval sizes */
struct rb_root_cached range_size_root;
};
void range_tree_init(struct range_tree *rt);
void range_tree_destroy(struct range_tree *rt);
int range_tree_clear(struct range_tree *rt, u32 start, u32 len);
int range_tree_set(struct range_tree *rt, u32 start, u32 len);
int is_range_tree_set(struct range_tree *rt, u32 start, u32 len);
s64 range_tree_find(struct range_tree *rt, u32 len);
#endif

View File

@ -155,6 +155,89 @@ static void maybe_wait_bpf_programs(struct bpf_map *map)
synchronize_rcu();
}
static void unpin_uptr_kaddr(void *kaddr)
{
if (kaddr)
unpin_user_page(virt_to_page(kaddr));
}
static void __bpf_obj_unpin_uptrs(struct btf_record *rec, u32 cnt, void *obj)
{
const struct btf_field *field;
void **uptr_addr;
int i;
for (i = 0, field = rec->fields; i < cnt; i++, field++) {
if (field->type != BPF_UPTR)
continue;
uptr_addr = obj + field->offset;
unpin_uptr_kaddr(*uptr_addr);
}
}
static void bpf_obj_unpin_uptrs(struct btf_record *rec, void *obj)
{
if (!btf_record_has_field(rec, BPF_UPTR))
return;
__bpf_obj_unpin_uptrs(rec, rec->cnt, obj);
}
static int bpf_obj_pin_uptrs(struct btf_record *rec, void *obj)
{
const struct btf_field *field;
const struct btf_type *t;
unsigned long start, end;
struct page *page;
void **uptr_addr;
int i, err;
if (!btf_record_has_field(rec, BPF_UPTR))
return 0;
for (i = 0, field = rec->fields; i < rec->cnt; i++, field++) {
if (field->type != BPF_UPTR)
continue;
uptr_addr = obj + field->offset;
start = *(unsigned long *)uptr_addr;
if (!start)
continue;
t = btf_type_by_id(field->kptr.btf, field->kptr.btf_id);
/* t->size was checked for zero before */
if (check_add_overflow(start, t->size - 1, &end)) {
err = -EFAULT;
goto unpin_all;
}
/* The uptr's struct cannot span across two pages */
if ((start & PAGE_MASK) != (end & PAGE_MASK)) {
err = -EOPNOTSUPP;
goto unpin_all;
}
err = pin_user_pages_fast(start, 1, FOLL_LONGTERM | FOLL_WRITE, &page);
if (err != 1)
goto unpin_all;
if (PageHighMem(page)) {
err = -EOPNOTSUPP;
unpin_user_page(page);
goto unpin_all;
}
*uptr_addr = page_address(page) + offset_in_page(start);
}
return 0;
unpin_all:
__bpf_obj_unpin_uptrs(rec, i, obj);
return err;
}
static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
void *key, void *value, __u64 flags)
{
@ -199,9 +282,14 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
err = map->ops->map_push_elem(map, value, flags);
} else {
rcu_read_lock();
err = map->ops->map_update_elem(map, key, value, flags);
rcu_read_unlock();
err = bpf_obj_pin_uptrs(map->record, value);
if (!err) {
rcu_read_lock();
err = map->ops->map_update_elem(map, key, value, flags);
rcu_read_unlock();
if (err)
bpf_obj_unpin_uptrs(map->record, value);
}
}
bpf_enable_instrumentation();
@ -548,6 +636,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
if (rec->fields[i].kptr.module)
module_put(rec->fields[i].kptr.module);
if (btf_is_kernel(rec->fields[i].kptr.btf))
@ -597,6 +686,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
case BPF_UPTR:
if (btf_is_kernel(fields[i].kptr.btf))
btf_get(fields[i].kptr.btf);
if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
@ -714,6 +804,10 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
field->kptr.dtor(xchgd_field);
}
break;
case BPF_UPTR:
/* The caller ensured that no one is using the uptr */
unpin_uptr_kaddr(*(void **)field_ptr);
break;
case BPF_LIST_HEAD:
if (WARN_ON_ONCE(rec->spin_lock_off < 0))
continue;
@ -1105,7 +1199,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE,
BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
@ -1161,6 +1255,12 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
goto free_map_tab;
}
break;
case BPF_UPTR:
if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE) {
ret = -EOPNOTSUPP;
goto free_map_tab;
}
break;
case BPF_LIST_HEAD:
case BPF_RB_ROOT:
if (map->map_type != BPF_MAP_TYPE_HASH &&
@ -3218,7 +3318,8 @@ static void bpf_tracing_link_release(struct bpf_link *link)
container_of(link, struct bpf_tracing_link, link.link);
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&tr_link->link,
tr_link->trampoline));
tr_link->trampoline,
tr_link->tgt_prog));
bpf_trampoline_put(tr_link->trampoline);
@ -3358,7 +3459,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
* in prog->aux
*
* - if prog->aux->dst_trampoline is NULL, the program has already been
* attached to a target and its initial target was cleared (below)
* attached to a target and its initial target was cleared (below)
*
* - if tgt_prog != NULL, the caller specified tgt_prog_fd +
* target_btf_id using the link_create API.
@ -3433,7 +3534,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
if (err)
goto out_unlock;
err = bpf_trampoline_link_prog(&link->link, tr);
err = bpf_trampoline_link_prog(&link->link, tr, tgt_prog);
if (err) {
bpf_link_cleanup(&link_primer);
link = NULL;
@ -4002,10 +4103,14 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI &&
attach_type != BPF_TRACE_UPROBE_MULTI)
return -EINVAL;
if (prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION &&
attach_type != BPF_TRACE_UPROBE_SESSION)
return -EINVAL;
if (attach_type != BPF_PERF_EVENT &&
attach_type != BPF_TRACE_KPROBE_MULTI &&
attach_type != BPF_TRACE_KPROBE_SESSION &&
attach_type != BPF_TRACE_UPROBE_MULTI)
attach_type != BPF_TRACE_UPROBE_MULTI &&
attach_type != BPF_TRACE_UPROBE_SESSION)
return -EINVAL;
return 0;
case BPF_PROG_TYPE_SCHED_CLS:
@ -5258,7 +5363,8 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
else if (attr->link_create.attach_type == BPF_TRACE_KPROBE_MULTI ||
attr->link_create.attach_type == BPF_TRACE_KPROBE_SESSION)
ret = bpf_kprobe_multi_link_attach(attr, prog);
else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI)
else if (attr->link_create.attach_type == BPF_TRACE_UPROBE_MULTI ||
attr->link_create.attach_type == BPF_TRACE_UPROBE_SESSION)
ret = bpf_uprobe_multi_link_attach(attr, prog);
break;
default:

View File

@ -115,10 +115,14 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
(ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC);
}
void bpf_image_ksym_add(void *data, unsigned int size, struct bpf_ksym *ksym)
void bpf_image_ksym_init(void *data, unsigned int size, struct bpf_ksym *ksym)
{
ksym->start = (unsigned long) data;
ksym->end = ksym->start + size;
}
void bpf_image_ksym_add(struct bpf_ksym *ksym)
{
bpf_ksym_add(ksym);
perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_BPF, ksym->start,
PAGE_SIZE, false, ksym->name);
@ -377,7 +381,8 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, int size)
ksym = &im->ksym;
INIT_LIST_HEAD_RCU(&ksym->lnode);
snprintf(ksym->name, KSYM_NAME_LEN, "bpf_trampoline_%llu", key);
bpf_image_ksym_add(image, size, ksym);
bpf_image_ksym_init(image, size, ksym);
bpf_image_ksym_add(ksym);
return im;
out_free_image:
@ -523,7 +528,27 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
}
}
static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
static int bpf_freplace_check_tgt_prog(struct bpf_prog *tgt_prog)
{
struct bpf_prog_aux *aux = tgt_prog->aux;
guard(mutex)(&aux->ext_mutex);
if (aux->prog_array_member_cnt)
/* Program extensions can not extend target prog when the target
* prog has been updated to any prog_array map as tail callee.
* It's to prevent a potential infinite loop like:
* tgt prog entry -> tgt prog subprog -> freplace prog entry
* --tailcall-> tgt prog entry.
*/
return -EBUSY;
aux->is_extended = true;
return 0;
}
static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
enum bpf_tramp_prog_type kind;
struct bpf_tramp_link *link_exiting;
@ -544,6 +569,9 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr
/* Cannot attach extension if fentry/fexit are in use. */
if (cnt)
return -EBUSY;
err = bpf_freplace_check_tgt_prog(tgt_prog);
if (err)
return err;
tr->extension_prog = link->link.prog;
return bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP, NULL,
link->link.prog->bpf_func);
@ -570,17 +598,21 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr
return err;
}
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
int err;
mutex_lock(&tr->mutex);
err = __bpf_trampoline_link_prog(link, tr);
err = __bpf_trampoline_link_prog(link, tr, tgt_prog);
mutex_unlock(&tr->mutex);
return err;
}
static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
enum bpf_tramp_prog_type kind;
int err;
@ -591,6 +623,8 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_
err = bpf_arch_text_poke(tr->func.addr, BPF_MOD_JUMP,
tr->extension_prog->bpf_func, NULL);
tr->extension_prog = NULL;
guard(mutex)(&tgt_prog->aux->ext_mutex);
tgt_prog->aux->is_extended = false;
return err;
}
hlist_del_init(&link->tramp_hlist);
@ -599,12 +633,14 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_
}
/* bpf_trampoline_unlink_prog() should never fail. */
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr)
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
struct bpf_trampoline *tr,
struct bpf_prog *tgt_prog)
{
int err;
mutex_lock(&tr->mutex);
err = __bpf_trampoline_unlink_prog(link, tr);
err = __bpf_trampoline_unlink_prog(link, tr, tgt_prog);
mutex_unlock(&tr->mutex);
return err;
}
@ -619,7 +655,7 @@ static void bpf_shim_tramp_link_release(struct bpf_link *link)
if (!shim_link->trampoline)
return;
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline));
WARN_ON_ONCE(bpf_trampoline_unlink_prog(&shim_link->link, shim_link->trampoline, NULL));
bpf_trampoline_put(shim_link->trampoline);
}
@ -733,7 +769,7 @@ int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
goto err;
}
err = __bpf_trampoline_link_prog(&shim_link->link, tr);
err = __bpf_trampoline_link_prog(&shim_link->link, tr, NULL);
if (err)
goto err;
@ -868,6 +904,8 @@ static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tram
if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
bpf_prog_inc_misses_counter(prog);
if (prog->aux->recursion_detected)
prog->aux->recursion_detected(prog);
return 0;
}
return bpf_prog_start_time();
@ -944,6 +982,8 @@ u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
bpf_prog_inc_misses_counter(prog);
if (prog->aux->recursion_detected)
prog->aux->recursion_detected(prog);
return 0;
}
return bpf_prog_start_time();

File diff suppressed because it is too large Load Diff

View File

@ -802,6 +802,8 @@ struct send_signal_irq_work {
struct task_struct *task;
u32 sig;
enum pid_type type;
bool has_siginfo;
struct kernel_siginfo info;
};
static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
@ -809,27 +811,46 @@ static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
static void do_bpf_send_signal(struct irq_work *entry)
{
struct send_signal_irq_work *work;
struct kernel_siginfo *siginfo;
work = container_of(entry, struct send_signal_irq_work, irq_work);
group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, work->type);
siginfo = work->has_siginfo ? &work->info : SEND_SIG_PRIV;
group_send_sig_info(work->sig, siginfo, work->task, work->type);
put_task_struct(work->task);
}
static int bpf_send_signal_common(u32 sig, enum pid_type type)
static int bpf_send_signal_common(u32 sig, enum pid_type type, struct task_struct *task, u64 value)
{
struct send_signal_irq_work *work = NULL;
struct kernel_siginfo info;
struct kernel_siginfo *siginfo;
if (!task) {
task = current;
siginfo = SEND_SIG_PRIV;
} else {
clear_siginfo(&info);
info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_KERNEL;
info.si_pid = 0;
info.si_uid = 0;
info.si_value.sival_ptr = (void *)(unsigned long)value;
siginfo = &info;
}
/* Similar to bpf_probe_write_user, task needs to be
* in a sound condition and kernel memory access be
* permitted in order to send signal to the current
* task.
*/
if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
if (unlikely(task->flags & (PF_KTHREAD | PF_EXITING)))
return -EPERM;
if (unlikely(!nmi_uaccess_okay()))
return -EPERM;
/* Task should not be pid=1 to avoid kernel panic. */
if (unlikely(is_global_init(current)))
if (unlikely(is_global_init(task)))
return -EPERM;
if (irqs_disabled()) {
@ -847,19 +868,22 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
* to the irq_work. The current task may change when queued
* irq works get executed.
*/
work->task = get_task_struct(current);
work->task = get_task_struct(task);
work->has_siginfo = siginfo == &info;
if (work->has_siginfo)
copy_siginfo(&work->info, &info);
work->sig = sig;
work->type = type;
irq_work_queue(&work->irq_work);
return 0;
}
return group_send_sig_info(sig, SEND_SIG_PRIV, current, type);
return group_send_sig_info(sig, siginfo, task, type);
}
BPF_CALL_1(bpf_send_signal, u32, sig)
{
return bpf_send_signal_common(sig, PIDTYPE_TGID);
return bpf_send_signal_common(sig, PIDTYPE_TGID, NULL, 0);
}
static const struct bpf_func_proto bpf_send_signal_proto = {
@ -871,7 +895,7 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
BPF_CALL_1(bpf_send_signal_thread, u32, sig)
{
return bpf_send_signal_common(sig, PIDTYPE_PID);
return bpf_send_signal_common(sig, PIDTYPE_PID, NULL, 0);
}
static const struct bpf_func_proto bpf_send_signal_thread_proto = {
@ -1557,6 +1581,17 @@ static inline bool is_kprobe_session(const struct bpf_prog *prog)
return prog->expected_attach_type == BPF_TRACE_KPROBE_SESSION;
}
static inline bool is_uprobe_multi(const struct bpf_prog *prog)
{
return prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI ||
prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION;
}
static inline bool is_uprobe_session(const struct bpf_prog *prog)
{
return prog->expected_attach_type == BPF_TRACE_UPROBE_SESSION;
}
static const struct bpf_func_proto *
kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@ -1574,13 +1609,13 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_func_ip:
if (is_kprobe_multi(prog))
return &bpf_get_func_ip_proto_kprobe_multi;
if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI)
if (is_uprobe_multi(prog))
return &bpf_get_func_ip_proto_uprobe_multi;
return &bpf_get_func_ip_proto_kprobe;
case BPF_FUNC_get_attach_cookie:
if (is_kprobe_multi(prog))
return &bpf_get_attach_cookie_proto_kmulti;
if (prog->expected_attach_type == BPF_TRACE_UPROBE_MULTI)
if (is_uprobe_multi(prog))
return &bpf_get_attach_cookie_proto_umulti;
return &bpf_get_attach_cookie_proto_trace;
default:
@ -3072,6 +3107,7 @@ struct bpf_uprobe {
u64 cookie;
struct uprobe *uprobe;
struct uprobe_consumer consumer;
bool session;
};
struct bpf_uprobe_multi_link {
@ -3084,7 +3120,7 @@ struct bpf_uprobe_multi_link {
};
struct bpf_uprobe_multi_run_ctx {
struct bpf_run_ctx run_ctx;
struct bpf_session_run_ctx session_ctx;
unsigned long entry_ip;
struct bpf_uprobe *uprobe;
};
@ -3195,17 +3231,22 @@ static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
static int uprobe_prog_run(struct bpf_uprobe *uprobe,
unsigned long entry_ip,
struct pt_regs *regs)
struct pt_regs *regs,
bool is_return, void *data)
{
struct bpf_uprobe_multi_link *link = uprobe->link;
struct bpf_uprobe_multi_run_ctx run_ctx = {
.session_ctx = {
.is_return = is_return,
.data = data,
},
.entry_ip = entry_ip,
.uprobe = uprobe,
};
struct bpf_prog *prog = link->link.prog;
bool sleepable = prog->sleepable;
struct bpf_run_ctx *old_run_ctx;
int err = 0;
int err;
if (link->task && !same_thread_group(current, link->task))
return 0;
@ -3217,7 +3258,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
migrate_disable();
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
old_run_ctx = bpf_set_run_ctx(&run_ctx.session_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
@ -3244,9 +3285,13 @@ uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs,
__u64 *data)
{
struct bpf_uprobe *uprobe;
int ret;
uprobe = container_of(con, struct bpf_uprobe, consumer);
return uprobe_prog_run(uprobe, instruction_pointer(regs), regs);
ret = uprobe_prog_run(uprobe, instruction_pointer(regs), regs, false, data);
if (uprobe->session)
return ret ? UPROBE_HANDLER_IGNORE : 0;
return 0;
}
static int
@ -3256,14 +3301,16 @@ uprobe_multi_link_ret_handler(struct uprobe_consumer *con, unsigned long func, s
struct bpf_uprobe *uprobe;
uprobe = container_of(con, struct bpf_uprobe, consumer);
return uprobe_prog_run(uprobe, func, regs);
uprobe_prog_run(uprobe, func, regs, true, data);
return 0;
}
static u64 bpf_uprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
{
struct bpf_uprobe_multi_run_ctx *run_ctx;
run_ctx = container_of(current->bpf_ctx, struct bpf_uprobe_multi_run_ctx, run_ctx);
run_ctx = container_of(current->bpf_ctx, struct bpf_uprobe_multi_run_ctx,
session_ctx.run_ctx);
return run_ctx->entry_ip;
}
@ -3271,7 +3318,8 @@ static u64 bpf_uprobe_multi_cookie(struct bpf_run_ctx *ctx)
{
struct bpf_uprobe_multi_run_ctx *run_ctx;
run_ctx = container_of(current->bpf_ctx, struct bpf_uprobe_multi_run_ctx, run_ctx);
run_ctx = container_of(current->bpf_ctx, struct bpf_uprobe_multi_run_ctx,
session_ctx.run_ctx);
return run_ctx->uprobe->cookie;
}
@ -3295,7 +3343,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
if (sizeof(u64) != sizeof(void *))
return -EOPNOTSUPP;
if (prog->expected_attach_type != BPF_TRACE_UPROBE_MULTI)
if (!is_uprobe_multi(prog))
return -EINVAL;
flags = attr->link_create.uprobe_multi.flags;
@ -3371,11 +3419,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
uprobes[i].link = link;
if (flags & BPF_F_UPROBE_MULTI_RETURN)
uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
else
if (!(flags & BPF_F_UPROBE_MULTI_RETURN))
uprobes[i].consumer.handler = uprobe_multi_link_handler;
if (flags & BPF_F_UPROBE_MULTI_RETURN || is_uprobe_session(prog))
uprobes[i].consumer.ret_handler = uprobe_multi_link_ret_handler;
if (is_uprobe_session(prog))
uprobes[i].session = true;
if (pid)
uprobes[i].consumer.filter = uprobe_multi_link_filter;
}
@ -3464,7 +3513,7 @@ static int bpf_kprobe_multi_filter(const struct bpf_prog *prog, u32 kfunc_id)
if (!btf_id_set8_contains(&kprobe_multi_kfunc_set_ids, kfunc_id))
return 0;
if (!is_kprobe_session(prog))
if (!is_kprobe_session(prog) && !is_uprobe_session(prog))
return -EACCES;
return 0;
@ -3482,3 +3531,16 @@ static int __init bpf_kprobe_multi_kfuncs_init(void)
}
late_initcall(bpf_kprobe_multi_kfuncs_init);
__bpf_kfunc_start_defs();
__bpf_kfunc int bpf_send_signal_task(struct task_struct *task, int sig, enum pid_type type,
u64 value)
{
if (type != PIDTYPE_PID && type != PIDTYPE_TGID)
return -EINVAL;
return bpf_send_signal_common(sig, type, task, value);
}
__bpf_kfunc_end_defs();

View File

@ -1332,6 +1332,25 @@ size_t ksize(const void *objp)
}
EXPORT_SYMBOL(ksize);
#ifdef CONFIG_BPF_SYSCALL
#include <linux/btf.h>
__bpf_kfunc_start_defs();
__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
{
struct slab *slab;
if (!virt_addr_valid((void *)(long)addr))
return NULL;
slab = virt_to_slab((void *)(long)addr);
return slab ? slab->slab_cache : NULL;
}
__bpf_kfunc_end_defs();
#endif /* CONFIG_BPF_SYSCALL */
/* Tracepoints definitions. */
EXPORT_TRACEPOINT_SYMBOL(kmalloc);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);

View File

@ -106,7 +106,7 @@ static long bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
if (sock) {
sdata = bpf_local_storage_update(
sock->sk, (struct bpf_local_storage_map *)map, value,
map_flags, GFP_ATOMIC);
map_flags, false, GFP_ATOMIC);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@ -137,7 +137,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
{
struct bpf_local_storage_elem *copy_selem;
copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC);
copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, false, GFP_ATOMIC);
if (!copy_selem)
return NULL;
@ -243,7 +243,7 @@ BPF_CALL_5(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
refcount_inc_not_zero(&sk->sk_refcnt)) {
sdata = bpf_local_storage_update(
sk, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST, gfp_flags);
BPF_NOEXIST, false, gfp_flags);
/* sk must be a fullsock (guaranteed by verifier),
* so sock_gen_put() is unnecessary.
*/

View File

@ -17,20 +17,12 @@ tprogs-y += tracex3
tprogs-y += tracex4
tprogs-y += tracex5
tprogs-y += tracex6
tprogs-y += tracex7
tprogs-y += test_probe_write_user
tprogs-y += trace_output
tprogs-y += lathist
tprogs-y += offwaketime
tprogs-y += spintest
tprogs-y += map_perf_test
tprogs-y += test_overhead
tprogs-y += test_cgrp2_array_pin
tprogs-y += test_cgrp2_attach
tprogs-y += test_cgrp2_sock
tprogs-y += test_cgrp2_sock2
tprogs-y += xdp_router_ipv4
tprogs-y += test_current_task_under_cgroup
tprogs-y += trace_event
tprogs-y += sampleip
tprogs-y += tc_l2_redirect
@ -66,20 +58,12 @@ tracex3-objs := tracex3_user.o
tracex4-objs := tracex4_user.o
tracex5-objs := tracex5_user.o $(TRACE_HELPERS)
tracex6-objs := tracex6_user.o
tracex7-objs := tracex7_user.o
test_probe_write_user-objs := test_probe_write_user_user.o
trace_output-objs := trace_output_user.o
lathist-objs := lathist_user.o
offwaketime-objs := offwaketime_user.o $(TRACE_HELPERS)
spintest-objs := spintest_user.o $(TRACE_HELPERS)
map_perf_test-objs := map_perf_test_user.o
test_overhead-objs := test_overhead_user.o
test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o
test_cgrp2_attach-objs := test_cgrp2_attach.o
test_cgrp2_sock-objs := test_cgrp2_sock.o
test_cgrp2_sock2-objs := test_cgrp2_sock2.o
test_current_task_under_cgroup-objs := $(CGROUP_HELPERS) \
test_current_task_under_cgroup_user.o
trace_event-objs := trace_event_user.o $(TRACE_HELPERS)
sampleip-objs := sampleip_user.o $(TRACE_HELPERS)
tc_l2_redirect-objs := tc_l2_redirect_user.o
@ -107,9 +91,6 @@ always-y += tracex3.bpf.o
always-y += tracex4.bpf.o
always-y += tracex5.bpf.o
always-y += tracex6.bpf.o
always-y += tracex7.bpf.o
always-y += sock_flags.bpf.o
always-y += test_probe_write_user.bpf.o
always-y += trace_output.bpf.o
always-y += tcbpf1_kern.o
always-y += tc_l2_redirect_kern.o
@ -117,12 +98,7 @@ always-y += lathist_kern.o
always-y += offwaketime.bpf.o
always-y += spintest.bpf.o
always-y += map_perf_test.bpf.o
always-y += test_overhead_tp.bpf.o
always-y += test_overhead_raw_tp.bpf.o
always-y += test_overhead_kprobe.bpf.o
always-y += parse_varlen.o parse_simple.o parse_ldabs.o
always-y += test_cgrp2_tc.bpf.o
always-y += test_current_task_under_cgroup.bpf.o
always-y += trace_event_kern.o
always-y += sampleip_kern.o
always-y += lwt_len_hist.bpf.o
@ -195,7 +171,6 @@ TPROGLDLIBS_xdp_router_ipv4 += -lm -pthread
TPROGLDLIBS_tracex4 += -lrt
TPROGLDLIBS_trace_output += -lrt
TPROGLDLIBS_map_perf_test += -lrt
TPROGLDLIBS_test_overhead += -lrt
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang

View File

@ -1,47 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
#include "vmlinux.h"
#include "net_shared.h"
#include <bpf/bpf_helpers.h>
SEC("cgroup/sock")
int bpf_prog1(struct bpf_sock *sk)
{
char fmt[] = "socket: family %d type %d protocol %d\n";
char fmt2[] = "socket: uid %u gid %u\n";
__u64 gid_uid = bpf_get_current_uid_gid();
__u32 uid = gid_uid & 0xffffffff;
__u32 gid = gid_uid >> 32;
bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
bpf_trace_printk(fmt2, sizeof(fmt2), uid, gid);
/* block AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6 sockets
* ie., make ping6 fail
*/
if (sk->family == AF_INET6 &&
sk->type == SOCK_DGRAM &&
sk->protocol == IPPROTO_ICMPV6)
return 0;
return 1;
}
SEC("cgroup/sock")
int bpf_prog2(struct bpf_sock *sk)
{
char fmt[] = "socket: family %d type %d protocol %d\n";
bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
/* block AF_INET, SOCK_DGRAM, IPPROTO_ICMP sockets
* ie., make ping fail
*/
if (sk->family == AF_INET &&
sk->type == SOCK_DGRAM &&
sk->protocol == IPPROTO_ICMP)
return 0;
return 1;
}
char _license[] SEC("license") = "GPL";

View File

@ -2,6 +2,9 @@
#include <uapi/linux/unistd.h>
#include <linux/kbuild.h>
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wmissing-prototypes"
#define SYSNR(_NR) DEFINE(SYS ## _NR, _NR)
void syscall_defines(void)
@ -17,3 +20,5 @@ void syscall_defines(void)
#endif
}
#pragma GCC diagnostic pop

View File

@ -58,14 +58,11 @@ static __always_inline bool is_vip_addr(__be16 eth_proto, __be32 daddr)
SEC("l2_to_iptun_ingress_forward")
int _l2_to_iptun_ingress_forward(struct __sk_buff *skb)
{
struct bpf_tunnel_key tkey = {};
void *data = (void *)(long)skb->data;
struct eth_hdr *eth = data;
void *data_end = (void *)(long)skb->data_end;
int key = 0, *ifindex;
int ret;
if (data + sizeof(*eth) > data_end)
return TC_ACT_OK;
@ -115,8 +112,6 @@ int _l2_to_iptun_ingress_redirect(struct __sk_buff *skb)
void *data_end = (void *)(long)skb->data_end;
int key = 0, *ifindex;
int ret;
if (data + sizeof(*eth) > data_end)
return TC_ACT_OK;
@ -205,7 +200,6 @@ int _l2_to_ip6tun_ingress_redirect(struct __sk_buff *skb)
SEC("drop_non_tun_vip")
int _drop_non_tun_vip(struct __sk_buff *skb)
{
struct bpf_tunnel_key tkey = {};
void *data = (void *)(long)skb->data;
struct eth_hdr *eth = data;
void *data_end = (void *)(long)skb->data_end;

View File

@ -1,106 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2016 Facebook
*/
#include <linux/unistd.h>
#include <linux/bpf.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <bpf/bpf.h>
static void usage(void)
{
printf("Usage: test_cgrp2_array_pin [...]\n");
printf(" -F <file> File to pin an BPF cgroup array\n");
printf(" -U <file> Update an already pinned BPF cgroup array\n");
printf(" -v <value> Full path of the cgroup2\n");
printf(" -h Display this help\n");
}
int main(int argc, char **argv)
{
const char *pinned_file = NULL, *cg2 = NULL;
int create_array = 1;
int array_key = 0;
int array_fd = -1;
int cg2_fd = -1;
int ret = -1;
int opt;
while ((opt = getopt(argc, argv, "F:U:v:")) != -1) {
switch (opt) {
/* General args */
case 'F':
pinned_file = optarg;
break;
case 'U':
pinned_file = optarg;
create_array = 0;
break;
case 'v':
cg2 = optarg;
break;
default:
usage();
goto out;
}
}
if (!cg2 || !pinned_file) {
usage();
goto out;
}
cg2_fd = open(cg2, O_RDONLY);
if (cg2_fd < 0) {
fprintf(stderr, "open(%s,...): %s(%d)\n",
cg2, strerror(errno), errno);
goto out;
}
if (create_array) {
array_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_ARRAY, NULL,
sizeof(uint32_t), sizeof(uint32_t),
1, NULL);
if (array_fd < 0) {
fprintf(stderr,
"bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY,...): %s(%d)\n",
strerror(errno), errno);
goto out;
}
} else {
array_fd = bpf_obj_get(pinned_file);
if (array_fd < 0) {
fprintf(stderr, "bpf_obj_get(%s): %s(%d)\n",
pinned_file, strerror(errno), errno);
goto out;
}
}
ret = bpf_map_update_elem(array_fd, &array_key, &cg2_fd, 0);
if (ret) {
perror("bpf_map_update_elem");
goto out;
}
if (create_array) {
ret = bpf_obj_pin(array_fd, pinned_file);
if (ret) {
fprintf(stderr, "bpf_obj_pin(..., %s): %s(%d)\n",
pinned_file, strerror(errno), errno);
goto out;
}
}
out:
if (array_fd != -1)
close(array_fd);
if (cg2_fd != -1)
close(cg2_fd);
return ret;
}

View File

@ -1,177 +0,0 @@
/* eBPF example program:
*
* - Creates arraymap in kernel with 4 bytes keys and 8 byte values
*
* - Loads eBPF program
*
* The eBPF program accesses the map passed in to store two pieces of
* information. The number of invocations of the program, which maps
* to the number of packets received, is stored to key 0. Key 1 is
* incremented on each iteration by the number of bytes stored in
* the skb.
*
* - Attaches the new program to a cgroup using BPF_PROG_ATTACH
*
* - Every second, reads map[0] and map[1] to see how many bytes and
* packets were seen on any socket of tasks in the given cgroup.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/bpf.h>
#include <bpf/bpf.h>
#include "bpf_insn.h"
#include "bpf_util.h"
enum {
MAP_KEY_PACKETS,
MAP_KEY_BYTES,
};
char bpf_log_buf[BPF_LOG_BUF_SIZE];
static int prog_load(int map_fd, int verdict)
{
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* save r6 so it's not clobbered by BPF_CALL */
/* Count packets */
BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_PACKETS), /* r0 = 0 */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* load map fd to r1 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
/* Count bytes */
BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_BYTES), /* r0 = 1 */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6, offsetof(struct __sk_buff, len)), /* r1 = skb->len */
BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(),
};
size_t insns_cnt = ARRAY_SIZE(prog);
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB, NULL, "GPL",
prog, insns_cnt, &opts);
}
static int usage(const char *argv0)
{
printf("Usage: %s [-d] [-D] <cg-path> <egress|ingress>\n", argv0);
printf(" -d Drop Traffic\n");
printf(" -D Detach filter, and exit\n");
return EXIT_FAILURE;
}
static int attach_filter(int cg_fd, int type, int verdict)
{
int prog_fd, map_fd, ret, key;
long long pkt_cnt, byte_cnt;
map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL,
sizeof(key), sizeof(byte_cnt),
256, NULL);
if (map_fd < 0) {
printf("Failed to create map: '%s'\n", strerror(errno));
return EXIT_FAILURE;
}
prog_fd = prog_load(map_fd, verdict);
printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
if (prog_fd < 0) {
printf("Failed to load prog: '%s'\n", strerror(errno));
return EXIT_FAILURE;
}
ret = bpf_prog_attach(prog_fd, cg_fd, type, 0);
if (ret < 0) {
printf("Failed to attach prog to cgroup: '%s'\n",
strerror(errno));
return EXIT_FAILURE;
}
while (1) {
key = MAP_KEY_PACKETS;
assert(bpf_map_lookup_elem(map_fd, &key, &pkt_cnt) == 0);
key = MAP_KEY_BYTES;
assert(bpf_map_lookup_elem(map_fd, &key, &byte_cnt) == 0);
printf("cgroup received %lld packets, %lld bytes\n",
pkt_cnt, byte_cnt);
sleep(1);
}
return EXIT_SUCCESS;
}
int main(int argc, char **argv)
{
int detach_only = 0, verdict = 1;
enum bpf_attach_type type;
int opt, cg_fd, ret;
while ((opt = getopt(argc, argv, "Dd")) != -1) {
switch (opt) {
case 'd':
verdict = 0;
break;
case 'D':
detach_only = 1;
break;
default:
return usage(argv[0]);
}
}
if (argc - optind < 2)
return usage(argv[0]);
if (strcmp(argv[optind + 1], "ingress") == 0)
type = BPF_CGROUP_INET_INGRESS;
else if (strcmp(argv[optind + 1], "egress") == 0)
type = BPF_CGROUP_INET_EGRESS;
else
return usage(argv[0]);
cg_fd = open(argv[optind], O_DIRECTORY | O_RDONLY);
if (cg_fd < 0) {
printf("Failed to open cgroup path: '%s'\n", strerror(errno));
return EXIT_FAILURE;
}
if (detach_only) {
ret = bpf_prog_detach(cg_fd, type);
printf("bpf_prog_detach() returned '%s' (%d)\n",
strerror(errno), errno);
} else
ret = attach_filter(cg_fd, type, verdict);
return ret;
}

View File

@ -1,294 +0,0 @@
/* eBPF example program:
*
* - Loads eBPF program
*
* The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
* sockets opened by processes in the cgroup.
*
* - Attaches the new program to a cgroup using BPF_PROG_ATTACH
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <inttypes.h>
#include <linux/bpf.h>
#include <bpf/bpf.h>
#include "bpf_insn.h"
char bpf_log_buf[BPF_LOG_BUF_SIZE];
static int prog_load(__u32 idx, __u32 mark, __u32 prio)
{
/* save pointer to context */
struct bpf_insn prog_start[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
};
struct bpf_insn prog_end[] = {
BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
BPF_EXIT_INSN(),
};
/* set sk_bound_dev_if on socket */
struct bpf_insn prog_dev[] = {
BPF_MOV64_IMM(BPF_REG_3, idx),
BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
};
/* set mark on socket */
struct bpf_insn prog_mark[] = {
/* get uid of process */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_current_uid_gid),
BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xffffffff),
/* if uid is 0, use given mark, else use the uid as the mark */
BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_MOV64_IMM(BPF_REG_3, mark),
/* set the mark on the new socket */
BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, mark)),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, mark)),
};
/* set priority on socket */
struct bpf_insn prog_prio[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
BPF_MOV64_IMM(BPF_REG_3, prio),
BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)),
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE,
);
struct bpf_insn *prog;
size_t insns_cnt;
void *p;
int ret;
insns_cnt = sizeof(prog_start) + sizeof(prog_end);
if (idx)
insns_cnt += sizeof(prog_dev);
if (mark)
insns_cnt += sizeof(prog_mark);
if (prio)
insns_cnt += sizeof(prog_prio);
p = prog = malloc(insns_cnt);
if (!prog) {
fprintf(stderr, "Failed to allocate memory for instructions\n");
return EXIT_FAILURE;
}
memcpy(p, prog_start, sizeof(prog_start));
p += sizeof(prog_start);
if (idx) {
memcpy(p, prog_dev, sizeof(prog_dev));
p += sizeof(prog_dev);
}
if (mark) {
memcpy(p, prog_mark, sizeof(prog_mark));
p += sizeof(prog_mark);
}
if (prio) {
memcpy(p, prog_prio, sizeof(prog_prio));
p += sizeof(prog_prio);
}
memcpy(p, prog_end, sizeof(prog_end));
p += sizeof(prog_end);
insns_cnt /= sizeof(struct bpf_insn);
ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL",
prog, insns_cnt, &opts);
free(prog);
return ret;
}
static int get_bind_to_device(int sd, char *name, size_t len)
{
socklen_t optlen = len;
int rc;
name[0] = '\0';
rc = getsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, name, &optlen);
if (rc < 0)
perror("setsockopt(SO_BINDTODEVICE)");
return rc;
}
static unsigned int get_somark(int sd)
{
unsigned int mark = 0;
socklen_t optlen = sizeof(mark);
int rc;
rc = getsockopt(sd, SOL_SOCKET, SO_MARK, &mark, &optlen);
if (rc < 0)
perror("getsockopt(SO_MARK)");
return mark;
}
static unsigned int get_priority(int sd)
{
unsigned int prio = 0;
socklen_t optlen = sizeof(prio);
int rc;
rc = getsockopt(sd, SOL_SOCKET, SO_PRIORITY, &prio, &optlen);
if (rc < 0)
perror("getsockopt(SO_PRIORITY)");
return prio;
}
static int show_sockopts(int family)
{
unsigned int mark, prio;
char name[16];
int sd;
sd = socket(family, SOCK_DGRAM, 17);
if (sd < 0) {
perror("socket");
return 1;
}
if (get_bind_to_device(sd, name, sizeof(name)) < 0)
return 1;
mark = get_somark(sd);
prio = get_priority(sd);
close(sd);
printf("sd %d: dev %s, mark %u, priority %u\n", sd, name, mark, prio);
return 0;
}
static int usage(const char *argv0)
{
printf("Usage:\n");
printf(" Attach a program\n");
printf(" %s -b bind-to-dev -m mark -p prio cg-path\n", argv0);
printf("\n");
printf(" Detach a program\n");
printf(" %s -d cg-path\n", argv0);
printf("\n");
printf(" Show inherited socket settings (mark, priority, and device)\n");
printf(" %s [-6]\n", argv0);
return EXIT_FAILURE;
}
int main(int argc, char **argv)
{
__u32 idx = 0, mark = 0, prio = 0;
const char *cgrp_path = NULL;
int cg_fd, prog_fd, ret;
int family = PF_INET;
int do_attach = 1;
int rc;
while ((rc = getopt(argc, argv, "db:m:p:6")) != -1) {
switch (rc) {
case 'd':
do_attach = 0;
break;
case 'b':
idx = if_nametoindex(optarg);
if (!idx) {
idx = strtoumax(optarg, NULL, 0);
if (!idx) {
printf("Invalid device name\n");
return EXIT_FAILURE;
}
}
break;
case 'm':
mark = strtoumax(optarg, NULL, 0);
break;
case 'p':
prio = strtoumax(optarg, NULL, 0);
break;
case '6':
family = PF_INET6;
break;
default:
return usage(argv[0]);
}
}
if (optind == argc)
return show_sockopts(family);
cgrp_path = argv[optind];
if (!cgrp_path) {
fprintf(stderr, "cgroup path not given\n");
return EXIT_FAILURE;
}
if (do_attach && !idx && !mark && !prio) {
fprintf(stderr,
"One of device, mark or priority must be given\n");
return EXIT_FAILURE;
}
cg_fd = open(cgrp_path, O_DIRECTORY | O_RDONLY);
if (cg_fd < 0) {
printf("Failed to open cgroup path: '%s'\n", strerror(errno));
return EXIT_FAILURE;
}
if (do_attach) {
prog_fd = prog_load(idx, mark, prio);
if (prog_fd < 0) {
printf("Failed to load prog: '%s'\n", strerror(errno));
printf("Output from kernel verifier:\n%s\n-------\n",
bpf_log_buf);
return EXIT_FAILURE;
}
ret = bpf_prog_attach(prog_fd, cg_fd,
BPF_CGROUP_INET_SOCK_CREATE, 0);
if (ret < 0) {
printf("Failed to attach prog to cgroup: '%s'\n",
strerror(errno));
return EXIT_FAILURE;
}
} else {
ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
if (ret < 0) {
printf("Failed to detach prog from cgroup: '%s'\n",
strerror(errno));
return EXIT_FAILURE;
}
}
close(cg_fd);
return EXIT_SUCCESS;
}

View File

@ -1,137 +0,0 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# Test various socket options that can be set by attaching programs to cgroups.
MY_DIR=$(dirname $0)
TEST=$MY_DIR/test_cgrp2_sock
CGRP_MNT="/tmp/cgroupv2-test_cgrp2_sock"
################################################################################
#
print_result()
{
local rc=$1
local status=" OK "
[ $rc -ne 0 ] && status="FAIL"
printf "%-50s [%4s]\n" "$2" "$status"
}
check_sock()
{
out=$($TEST)
echo $out | grep -q "$1"
if [ $? -ne 0 ]; then
print_result 1 "IPv4: $2"
echo " expected: $1"
echo " have: $out"
rc=1
else
print_result 0 "IPv4: $2"
fi
}
check_sock6()
{
out=$($TEST -6)
echo $out | grep -q "$1"
if [ $? -ne 0 ]; then
print_result 1 "IPv6: $2"
echo " expected: $1"
echo " have: $out"
rc=1
else
print_result 0 "IPv6: $2"
fi
}
################################################################################
#
cleanup()
{
echo $$ >> ${CGRP_MNT}/cgroup.procs
rmdir ${CGRP_MNT}/sockopts
}
cleanup_and_exit()
{
local rc=$1
local msg="$2"
[ -n "$msg" ] && echo "ERROR: $msg"
$TEST -d ${CGRP_MNT}/sockopts
ip li del cgrp2_sock
umount ${CGRP_MNT}
exit $rc
}
################################################################################
# main
rc=0
ip li add cgrp2_sock type dummy 2>/dev/null
set -e
mkdir -p ${CGRP_MNT}
mount -t cgroup2 none ${CGRP_MNT}
set +e
# make sure we have a known start point
cleanup 2>/dev/null
mkdir -p ${CGRP_MNT}/sockopts
[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to create cgroup hierarchy"
# set pid into cgroup
echo $$ > ${CGRP_MNT}/sockopts/cgroup.procs
# no bpf program attached, so socket should show no settings
check_sock "dev , mark 0, priority 0" "No programs attached"
check_sock6 "dev , mark 0, priority 0" "No programs attached"
# verify device is set
#
$TEST -b cgrp2_sock ${CGRP_MNT}/sockopts
if [ $? -ne 0 ]; then
cleanup_and_exit 1 "Failed to install program to set device"
fi
check_sock "dev cgrp2_sock, mark 0, priority 0" "Device set"
check_sock6 "dev cgrp2_sock, mark 0, priority 0" "Device set"
# verify mark is set
#
$TEST -m 666 ${CGRP_MNT}/sockopts
if [ $? -ne 0 ]; then
cleanup_and_exit 1 "Failed to install program to set mark"
fi
check_sock "dev , mark 666, priority 0" "Mark set"
check_sock6 "dev , mark 666, priority 0" "Mark set"
# verify priority is set
#
$TEST -p 123 ${CGRP_MNT}/sockopts
if [ $? -ne 0 ]; then
cleanup_and_exit 1 "Failed to install program to set priority"
fi
check_sock "dev , mark 0, priority 123" "Priority set"
check_sock6 "dev , mark 0, priority 123" "Priority set"
# all 3 at once
#
$TEST -b cgrp2_sock -m 666 -p 123 ${CGRP_MNT}/sockopts
if [ $? -ne 0 ]; then
cleanup_and_exit 1 "Failed to install program to set device, mark and priority"
fi
check_sock "dev cgrp2_sock, mark 666, priority 123" "Priority set"
check_sock6 "dev cgrp2_sock, mark 666, priority 123" "Priority set"
cleanup_and_exit $rc

View File

@ -1,95 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/* eBPF example program:
*
* - Loads eBPF program
*
* The eBPF program loads a filter from file and attaches the
* program to a cgroup using BPF_PROG_ATTACH
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <linux/bpf.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "bpf_insn.h"
static int usage(const char *argv0)
{
printf("Usage: %s cg-path filter-path [filter-id]\n", argv0);
return EXIT_FAILURE;
}
int main(int argc, char **argv)
{
int cg_fd, err, ret = EXIT_FAILURE, filter_id = 0, prog_cnt = 0;
const char *link_pin_path = "/sys/fs/bpf/test_cgrp2_sock2";
struct bpf_link *link = NULL;
struct bpf_program *progs[2];
struct bpf_program *prog;
struct bpf_object *obj;
if (argc < 3)
return usage(argv[0]);
if (argc > 3)
filter_id = atoi(argv[3]);
cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
if (cg_fd < 0) {
printf("Failed to open cgroup path: '%s'\n", strerror(errno));
return ret;
}
obj = bpf_object__open_file(argv[2], NULL);
if (libbpf_get_error(obj)) {
printf("ERROR: opening BPF object file failed\n");
return ret;
}
bpf_object__for_each_program(prog, obj) {
progs[prog_cnt] = prog;
prog_cnt++;
}
if (filter_id >= prog_cnt) {
printf("Invalid program id; program not found in file\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
printf("ERROR: loading BPF object file failed\n");
goto cleanup;
}
link = bpf_program__attach_cgroup(progs[filter_id], cg_fd);
if (libbpf_get_error(link)) {
printf("ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
err = bpf_link__pin(link, link_pin_path);
if (err < 0) {
printf("ERROR: bpf_link__pin failed: %d\n", err);
goto cleanup;
}
ret = EXIT_SUCCESS;
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return ret;
}

View File

@ -1,103 +0,0 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
BPFFS=/sys/fs/bpf
MY_DIR=$(dirname $0)
TEST=$MY_DIR/test_cgrp2_sock2
LINK_PIN=$BPFFS/test_cgrp2_sock2
BPF_PROG=$MY_DIR/sock_flags.bpf.o
function config_device {
ip netns add at_ns0
ip link add veth0 type veth peer name veth0b
ip link set veth0 netns at_ns0
ip netns exec at_ns0 sysctl -q net.ipv6.conf.veth0.disable_ipv6=0
ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
ip netns exec at_ns0 ip addr add 2401:db00::1/64 dev veth0 nodad
ip netns exec at_ns0 ip link set dev veth0 up
sysctl -q net.ipv6.conf.veth0b.disable_ipv6=0
ip addr add 172.16.1.101/24 dev veth0b
ip addr add 2401:db00::2/64 dev veth0b nodad
ip link set veth0b up
}
function config_cgroup {
rm -rf /tmp/cgroupv2
mkdir -p /tmp/cgroupv2
mount -t cgroup2 none /tmp/cgroupv2
mkdir -p /tmp/cgroupv2/foo
echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
}
function config_bpffs {
if mount | grep $BPFFS > /dev/null; then
echo "bpffs already mounted"
else
echo "bpffs not mounted. Mounting..."
mount -t bpf none $BPFFS
fi
}
function attach_bpf {
$TEST /tmp/cgroupv2/foo $BPF_PROG $1
[ $? -ne 0 ] && exit 1
}
function cleanup {
rm -rf $LINK_PIN
ip link del veth0b
ip netns delete at_ns0
umount /tmp/cgroupv2
rm -rf /tmp/cgroupv2
}
cleanup 2>/dev/null
set -e
config_device
config_cgroup
config_bpffs
set +e
#
# Test 1 - fail ping6
#
attach_bpf 0
ping -c1 -w1 172.16.1.100
if [ $? -ne 0 ]; then
echo "ping failed when it should succeed"
cleanup
exit 1
fi
ping6 -c1 -w1 2401:db00::1
if [ $? -eq 0 ]; then
echo "ping6 succeeded when it should not"
cleanup
exit 1
fi
rm -rf $LINK_PIN
sleep 1 # Wait for link detach
#
# Test 2 - fail ping
#
attach_bpf 1
ping6 -c1 -w1 2401:db00::1
if [ $? -ne 0 ]; then
echo "ping6 failed when it should succeed"
cleanup
exit 1
fi
ping -c1 -w1 172.16.1.100
if [ $? -eq 0 ]; then
echo "ping succeeded when it should not"
cleanup
exit 1
fi
cleanup
echo
echo "*** PASS ***"

View File

@ -1,56 +0,0 @@
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#define KBUILD_MODNAME "foo"
#include "vmlinux.h"
#include "net_shared.h"
#include <bpf/bpf_helpers.h>
/* copy of 'struct ethhdr' without __packed */
struct eth_hdr {
unsigned char h_dest[ETH_ALEN];
unsigned char h_source[ETH_ALEN];
unsigned short h_proto;
};
struct {
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(pinning, LIBBPF_PIN_BY_NAME);
__uint(max_entries, 1);
} test_cgrp2_array_pin SEC(".maps");
SEC("filter")
int handle_egress(struct __sk_buff *skb)
{
void *data = (void *)(long)skb->data;
struct eth_hdr *eth = data;
struct ipv6hdr *ip6h = data + sizeof(*eth);
void *data_end = (void *)(long)skb->data_end;
char dont_care_msg[] = "dont care %04x %d\n";
char pass_msg[] = "pass\n";
char reject_msg[] = "reject\n";
/* single length check */
if (data + sizeof(*eth) + sizeof(*ip6h) > data_end)
return TC_ACT_OK;
if (eth->h_proto != bpf_htons(ETH_P_IPV6) ||
ip6h->nexthdr != IPPROTO_ICMPV6) {
bpf_trace_printk(dont_care_msg, sizeof(dont_care_msg),
eth->h_proto, ip6h->nexthdr);
return TC_ACT_OK;
} else if (bpf_skb_under_cgroup(skb, &test_cgrp2_array_pin, 0) != 1) {
bpf_trace_printk(pass_msg, sizeof(pass_msg));
return TC_ACT_OK;
} else {
bpf_trace_printk(reject_msg, sizeof(reject_msg));
return TC_ACT_SHOT;
}
}
char _license[] SEC("license") = "GPL";

View File

@ -1,187 +0,0 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
MY_DIR=$(dirname $0)
# Details on the bpf prog
BPF_CGRP2_ARRAY_NAME='test_cgrp2_array_pin'
BPF_PROG="$MY_DIR/test_cgrp2_tc.bpf.o"
BPF_SECTION='filter'
[ -z "$TC" ] && TC='tc'
[ -z "$IP" ] && IP='ip'
# Names of the veth interface, net namespace...etc.
HOST_IFC='ve'
NS_IFC='vens'
NS='ns'
find_mnt() {
cat /proc/mounts | \
awk '{ if ($3 == "'$1'" && mnt == "") { mnt = $2 }} END { print mnt }'
}
# Init cgroup2 vars
init_cgrp2_vars() {
CGRP2_ROOT=$(find_mnt cgroup2)
if [ -z "$CGRP2_ROOT" ]
then
CGRP2_ROOT='/mnt/cgroup2'
MOUNT_CGRP2="yes"
fi
CGRP2_TC="$CGRP2_ROOT/tc"
CGRP2_TC_LEAF="$CGRP2_TC/leaf"
}
# Init bpf fs vars
init_bpf_fs_vars() {
local bpf_fs_root=$(find_mnt bpf)
[ -n "$bpf_fs_root" ] || return -1
BPF_FS_TC_SHARE="$bpf_fs_root/tc/globals"
}
setup_cgrp2() {
case $1 in
start)
if [ "$MOUNT_CGRP2" == 'yes' ]
then
[ -d $CGRP2_ROOT ] || mkdir -p $CGRP2_ROOT
mount -t cgroup2 none $CGRP2_ROOT || return $?
fi
mkdir -p $CGRP2_TC_LEAF
;;
*)
rmdir $CGRP2_TC_LEAF && rmdir $CGRP2_TC
[ "$MOUNT_CGRP2" == 'yes' ] && umount $CGRP2_ROOT
;;
esac
}
setup_bpf_cgrp2_array() {
local bpf_cgrp2_array="$BPF_FS_TC_SHARE/$BPF_CGRP2_ARRAY_NAME"
case $1 in
start)
$MY_DIR/test_cgrp2_array_pin -U $bpf_cgrp2_array -v $CGRP2_TC
;;
*)
[ -d "$BPF_FS_TC_SHARE" ] && rm -f $bpf_cgrp2_array
;;
esac
}
setup_net() {
case $1 in
start)
$IP link add $HOST_IFC type veth peer name $NS_IFC || return $?
$IP link set dev $HOST_IFC up || return $?
sysctl -q net.ipv6.conf.$HOST_IFC.disable_ipv6=0
sysctl -q net.ipv6.conf.$HOST_IFC.accept_dad=0
$IP netns add $NS || return $?
$IP link set dev $NS_IFC netns $NS || return $?
$IP -n $NS link set dev $NS_IFC up || return $?
$IP netns exec $NS sysctl -q net.ipv6.conf.$NS_IFC.disable_ipv6=0
$IP netns exec $NS sysctl -q net.ipv6.conf.$NS_IFC.accept_dad=0
$TC qdisc add dev $HOST_IFC clsact || return $?
$TC filter add dev $HOST_IFC egress bpf da obj $BPF_PROG sec $BPF_SECTION || return $?
;;
*)
$IP netns del $NS
$IP link del $HOST_IFC
;;
esac
}
run_in_cgrp() {
# Fork another bash and move it under the specified cgroup.
# It makes the cgroup cleanup easier at the end of the test.
cmd='echo $$ > '
cmd="$cmd $1/cgroup.procs; exec $2"
bash -c "$cmd"
}
do_test() {
run_in_cgrp $CGRP2_TC_LEAF "ping -6 -c3 ff02::1%$HOST_IFC >& /dev/null"
local dropped=$($TC -s qdisc show dev $HOST_IFC | tail -3 | \
awk '/drop/{print substr($7, 0, index($7, ",")-1)}')
if [[ $dropped -eq 0 ]]
then
echo "FAIL"
return 1
else
echo "Successfully filtered $dropped packets"
return 0
fi
}
do_exit() {
if [ "$DEBUG" == "yes" ] && [ "$MODE" != 'cleanuponly' ]
then
echo "------ DEBUG ------"
echo "mount: "; mount | grep -E '(cgroup2|bpf)'; echo
echo "$CGRP2_TC_LEAF: "; ls -l $CGRP2_TC_LEAF; echo
if [ -d "$BPF_FS_TC_SHARE" ]
then
echo "$BPF_FS_TC_SHARE: "; ls -l $BPF_FS_TC_SHARE; echo
fi
echo "Host net:"
$IP netns
$IP link show dev $HOST_IFC
$IP -6 a show dev $HOST_IFC
$TC -s qdisc show dev $HOST_IFC
echo
echo "$NS net:"
$IP -n $NS link show dev $NS_IFC
$IP -n $NS -6 link show dev $NS_IFC
echo "------ DEBUG ------"
echo
fi
if [ "$MODE" != 'nocleanup' ]
then
setup_net stop
setup_bpf_cgrp2_array stop
setup_cgrp2 stop
fi
}
init_cgrp2_vars
init_bpf_fs_vars
while [[ $# -ge 1 ]]
do
a="$1"
case $a in
debug)
DEBUG='yes'
shift 1
;;
cleanup-only)
MODE='cleanuponly'
shift 1
;;
no-cleanup)
MODE='nocleanup'
shift 1
;;
*)
echo "test_cgrp2_tc [debug] [cleanup-only | no-cleanup]"
echo " debug: Print cgrp and network setup details at the end of the test"
echo " cleanup-only: Try to cleanup things from last test. No test will be run"
echo " no-cleanup: Run the test but don't do cleanup at the end"
echo "[Note: If no arg is given, it will run the test and do cleanup at the end]"
echo
exit -1
;;
esac
done
trap do_exit 0
[ "$MODE" == 'cleanuponly' ] && exit
setup_cgrp2 start || exit $?
setup_net start || exit $?
init_bpf_fs_vars || exit $?
setup_bpf_cgrp2_array start || exit $?
do_test
echo

View File

@ -1,43 +0,0 @@
/* Copyright (c) 2016 Sargun Dhillon <sargun@sargun.me>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include "vmlinux.h"
#include <linux/version.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
struct {
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
__uint(max_entries, 1);
} cgroup_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u64);
__uint(max_entries, 1);
} perf_map SEC(".maps");
/* Writes the last PID that called sync to a map at index 0 */
SEC("ksyscall/sync")
int BPF_KSYSCALL(bpf_prog1)
{
u64 pid = bpf_get_current_pid_tgid();
int idx = 0;
if (!bpf_current_task_under_cgroup(&cgroup_map, 0))
return 0;
bpf_map_update_elem(&perf_map, &idx, &pid, BPF_ANY);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View File

@ -1,115 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2016 Sargun Dhillon <sargun@sargun.me>
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "cgroup_helpers.h"
#define CGROUP_PATH "/my-cgroup"
int main(int argc, char **argv)
{
pid_t remote_pid, local_pid = getpid();
int cg2 = -1, idx = 0, rc = 1;
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct bpf_object *obj;
char filename[256];
int map_fd[2];
snprintf(filename, sizeof(filename), "%s.bpf.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (!prog) {
printf("finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd[0] = bpf_object__find_map_fd_by_name(obj, "cgroup_map");
map_fd[1] = bpf_object__find_map_fd_by_name(obj, "perf_map");
if (map_fd[0] < 0 || map_fd[1] < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
if (setup_cgroup_environment())
goto err;
cg2 = create_and_get_cgroup(CGROUP_PATH);
if (cg2 < 0)
goto err;
if (bpf_map_update_elem(map_fd[0], &idx, &cg2, BPF_ANY)) {
log_err("Adding target cgroup to map");
goto err;
}
if (join_cgroup(CGROUP_PATH))
goto err;
/*
* The installed helper program catched the sync call, and should
* write it to the map.
*/
sync();
bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
if (local_pid != remote_pid) {
fprintf(stderr,
"BPF Helper didn't write correct PID to map, but: %d\n",
remote_pid);
goto err;
}
/* Verify the negative scenario; leave the cgroup */
if (join_cgroup("/"))
goto err;
remote_pid = 0;
bpf_map_update_elem(map_fd[1], &idx, &remote_pid, BPF_ANY);
sync();
bpf_map_lookup_elem(map_fd[1], &idx, &remote_pid);
if (local_pid == remote_pid) {
fprintf(stderr, "BPF cgroup negative test did not work\n");
goto err;
}
rc = 0;
err:
if (cg2 != -1)
close(cg2);
cleanup_cgroup_environment();
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return rc;
}

View File

@ -1,41 +0,0 @@
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include "vmlinux.h"
#include <linux/version.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
SEC("kprobe/__set_task_comm")
int prog(struct pt_regs *ctx)
{
struct signal_struct *signal;
struct task_struct *tsk;
char oldcomm[TASK_COMM_LEN] = {};
char newcomm[TASK_COMM_LEN] = {};
u16 oom_score_adj;
u32 pid;
tsk = (void *)PT_REGS_PARM1_CORE(ctx);
pid = BPF_CORE_READ(tsk, pid);
bpf_core_read_str(oldcomm, sizeof(oldcomm), &tsk->comm);
bpf_core_read_str(newcomm, sizeof(newcomm),
(void *)PT_REGS_PARM2(ctx));
signal = BPF_CORE_READ(tsk, signal);
oom_score_adj = BPF_CORE_READ(signal, oom_score_adj);
return 0;
}
SEC("kprobe/fib_table_lookup")
int prog2(struct pt_regs *ctx)
{
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View File

@ -1,17 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2018 Facebook */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
SEC("raw_tracepoint/task_rename")
int prog(struct bpf_raw_tracepoint_args *ctx)
{
return 0;
}
SEC("raw_tracepoint/fib_table_lookup")
int prog2(struct bpf_raw_tracepoint_args *ctx)
{
return 0;
}
char _license[] SEC("license") = "GPL";

View File

@ -1,23 +0,0 @@
/* Copyright (c) 2016 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
/* from /sys/kernel/tracing/events/task/task_rename/format */
SEC("tracepoint/task/task_rename")
int prog(struct trace_event_raw_task_rename *ctx)
{
return 0;
}
/* from /sys/kernel/tracing/events/fib/fib_table_lookup/format */
SEC("tracepoint/fib/fib_table_lookup")
int prog2(struct trace_event_raw_fib_table_lookup *ctx)
{
return 0;
}
char _license[] SEC("license") = "GPL";

View File

@ -1,225 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2016 Facebook
*/
#define _GNU_SOURCE
#include <sched.h>
#include <errno.h>
#include <stdio.h>
#include <sys/types.h>
#include <asm/unistd.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
#include <string.h>
#include <time.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#define MAX_CNT 1000000
#define DUMMY_IP "127.0.0.1"
#define DUMMY_PORT 80
static struct bpf_link *links[2];
static struct bpf_object *obj;
static int cnt;
static __u64 time_get_ns(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ull + ts.tv_nsec;
}
static void test_task_rename(int cpu)
{
char buf[] = "test\n";
__u64 start_time;
int i, fd;
fd = open("/proc/self/comm", O_WRONLY|O_TRUNC);
if (fd < 0) {
printf("couldn't open /proc\n");
exit(1);
}
start_time = time_get_ns();
for (i = 0; i < MAX_CNT; i++) {
if (write(fd, buf, sizeof(buf)) < 0) {
printf("task rename failed: %s\n", strerror(errno));
close(fd);
return;
}
}
printf("task_rename:%d: %lld events per sec\n",
cpu, MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
close(fd);
}
static void test_fib_table_lookup(int cpu)
{
struct sockaddr_in addr;
char buf[] = "test\n";
__u64 start_time;
int i, fd;
fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (fd < 0) {
printf("couldn't open socket\n");
exit(1);
}
memset((char *)&addr, 0, sizeof(addr));
addr.sin_addr.s_addr = inet_addr(DUMMY_IP);
addr.sin_port = htons(DUMMY_PORT);
addr.sin_family = AF_INET;
start_time = time_get_ns();
for (i = 0; i < MAX_CNT; i++) {
if (sendto(fd, buf, strlen(buf), 0,
(struct sockaddr *)&addr, sizeof(addr)) < 0) {
printf("failed to start ping: %s\n", strerror(errno));
close(fd);
return;
}
}
printf("fib_table_lookup:%d: %lld events per sec\n",
cpu, MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
close(fd);
}
static void loop(int cpu, int flags)
{
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cpu, &cpuset);
sched_setaffinity(0, sizeof(cpuset), &cpuset);
if (flags & 1)
test_task_rename(cpu);
if (flags & 2)
test_fib_table_lookup(cpu);
}
static void run_perf_test(int tasks, int flags)
{
pid_t pid[tasks];
int i;
for (i = 0; i < tasks; i++) {
pid[i] = fork();
if (pid[i] == 0) {
loop(i, flags);
exit(0);
} else if (pid[i] == -1) {
printf("couldn't spawn #%d process\n", i);
exit(1);
}
}
for (i = 0; i < tasks; i++) {
int status;
assert(waitpid(pid[i], &status, 0) == pid[i]);
assert(status == 0);
}
}
static int load_progs(char *filename)
{
struct bpf_program *prog;
int err = 0;
obj = bpf_object__open_file(filename, NULL);
err = libbpf_get_error(obj);
if (err < 0) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return err;
}
/* load BPF program */
err = bpf_object__load(obj);
if (err < 0) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
return err;
}
bpf_object__for_each_program(prog, obj) {
links[cnt] = bpf_program__attach(prog);
err = libbpf_get_error(links[cnt]);
if (err < 0) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
links[cnt] = NULL;
return err;
}
cnt++;
}
return err;
}
static void unload_progs(void)
{
while (cnt)
bpf_link__destroy(links[--cnt]);
bpf_object__close(obj);
}
int main(int argc, char **argv)
{
int num_cpu = sysconf(_SC_NPROCESSORS_ONLN);
int test_flags = ~0;
char filename[256];
int err = 0;
if (argc > 1)
test_flags = atoi(argv[1]) ? : test_flags;
if (argc > 2)
num_cpu = atoi(argv[2]) ? : num_cpu;
if (test_flags & 0x3) {
printf("BASE\n");
run_perf_test(num_cpu, test_flags);
}
if (test_flags & 0xC) {
snprintf(filename, sizeof(filename),
"%s_kprobe.bpf.o", argv[0]);
printf("w/KPROBE\n");
err = load_progs(filename);
if (!err)
run_perf_test(num_cpu, test_flags >> 2);
unload_progs();
}
if (test_flags & 0x30) {
snprintf(filename, sizeof(filename),
"%s_tp.bpf.o", argv[0]);
printf("w/TRACEPOINT\n");
err = load_progs(filename);
if (!err)
run_perf_test(num_cpu, test_flags >> 4);
unload_progs();
}
if (test_flags & 0xC0) {
snprintf(filename, sizeof(filename),
"%s_raw_tp.bpf.o", argv[0]);
printf("w/RAW_TRACEPOINT\n");
err = load_progs(filename);
if (!err)
run_perf_test(num_cpu, test_flags >> 6);
unload_progs();
}
return err;
}

View File

@ -1,16 +0,0 @@
#!/bin/bash
rm -r tmpmnt
rm -f testfile.img
dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
DEVICE=$(losetup --show -f testfile.img)
mkfs.btrfs -f $DEVICE
mkdir tmpmnt
./tracex7 $DEVICE
if [ $? -eq 0 ]
then
echo "SUCCESS!"
else
echo "FAILED!"
fi
losetup -d $DEVICE

View File

@ -1,52 +0,0 @@
/* Copyright (c) 2016 Sargun Dhillon <sargun@sargun.me>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
#include "vmlinux.h"
#include <string.h>
#include <linux/version.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, struct sockaddr_in);
__type(value, struct sockaddr_in);
__uint(max_entries, 256);
} dnat_map SEC(".maps");
/* kprobe is NOT a stable ABI
* kernel functions can be removed, renamed or completely change semantics.
* Number of arguments and their positions can change, etc.
* In such case this bpf+kprobe example will no longer be meaningful
*
* This example sits on a syscall, and the syscall ABI is relatively stable
* of course, across platforms, and over time, the ABI may change.
*/
SEC("ksyscall/connect")
int BPF_KSYSCALL(bpf_prog1, int fd, struct sockaddr_in *uservaddr,
int addrlen)
{
struct sockaddr_in new_addr, orig_addr = {};
struct sockaddr_in *mapped_addr;
if (addrlen > sizeof(orig_addr))
return 0;
if (bpf_probe_read_user(&orig_addr, sizeof(orig_addr), uservaddr) != 0)
return 0;
mapped_addr = bpf_map_lookup_elem(&dnat_map, &orig_addr);
if (mapped_addr != NULL) {
memcpy(&new_addr, mapped_addr, sizeof(new_addr));
bpf_probe_write_user(uservaddr, &new_addr,
sizeof(new_addr));
}
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View File

@ -1,108 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(int ac, char **argv)
{
struct sockaddr_in *serv_addr_in, *mapped_addr_in, *tmp_addr_in;
struct sockaddr serv_addr, mapped_addr, tmp_addr;
int serverfd, serverconnfd, clientfd, map_fd;
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct bpf_object *obj;
socklen_t sockaddr_len;
char filename[256];
char *ip;
serv_addr_in = (struct sockaddr_in *)&serv_addr;
mapped_addr_in = (struct sockaddr_in *)&mapped_addr;
tmp_addr_in = (struct sockaddr_in *)&tmp_addr;
snprintf(filename, sizeof(filename), "%s.bpf.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (libbpf_get_error(prog)) {
fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
map_fd = bpf_object__find_map_fd_by_name(obj, "dnat_map");
if (map_fd < 0) {
fprintf(stderr, "ERROR: finding a map in obj file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
assert((serverfd = socket(AF_INET, SOCK_STREAM, 0)) > 0);
assert((clientfd = socket(AF_INET, SOCK_STREAM, 0)) > 0);
/* Bind server to ephemeral port on lo */
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr_in->sin_family = AF_INET;
serv_addr_in->sin_port = 0;
serv_addr_in->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
assert(bind(serverfd, &serv_addr, sizeof(serv_addr)) == 0);
sockaddr_len = sizeof(serv_addr);
assert(getsockname(serverfd, &serv_addr, &sockaddr_len) == 0);
ip = inet_ntoa(serv_addr_in->sin_addr);
printf("Server bound to: %s:%d\n", ip, ntohs(serv_addr_in->sin_port));
memset(&mapped_addr, 0, sizeof(mapped_addr));
mapped_addr_in->sin_family = AF_INET;
mapped_addr_in->sin_port = htons(5555);
mapped_addr_in->sin_addr.s_addr = inet_addr("255.255.255.255");
assert(!bpf_map_update_elem(map_fd, &mapped_addr, &serv_addr, BPF_ANY));
assert(listen(serverfd, 5) == 0);
ip = inet_ntoa(mapped_addr_in->sin_addr);
printf("Client connecting to: %s:%d\n",
ip, ntohs(mapped_addr_in->sin_port));
assert(connect(clientfd, &mapped_addr, sizeof(mapped_addr)) == 0);
sockaddr_len = sizeof(tmp_addr);
ip = inet_ntoa(tmp_addr_in->sin_addr);
assert((serverconnfd = accept(serverfd, &tmp_addr, &sockaddr_len)) > 0);
printf("Server received connection from: %s:%d\n",
ip, ntohs(tmp_addr_in->sin_port));
sockaddr_len = sizeof(tmp_addr);
assert(getpeername(clientfd, &tmp_addr, &sockaddr_len) == 0);
ip = inet_ntoa(tmp_addr_in->sin_addr);
printf("Client's peer address: %s:%d\n",
ip, ntohs(tmp_addr_in->sin_port));
/* Is the server's getsockname = the socket getpeername */
assert(memcmp(&serv_addr, &tmp_addr, sizeof(struct sockaddr_in)) == 0);
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return 0;
}

View File

@ -1,15 +0,0 @@
#include "vmlinux.h"
#include <linux/version.h>
#include <bpf/bpf_helpers.h>
SEC("kprobe/open_ctree")
int bpf_prog1(struct pt_regs *ctx)
{
unsigned long rc = -12;
bpf_override_return(ctx, rc);
return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

View File

@ -1,56 +0,0 @@
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <bpf/libbpf.h>
int main(int argc, char **argv)
{
struct bpf_link *link = NULL;
struct bpf_program *prog;
struct bpf_object *obj;
char filename[256];
char command[256];
int ret = 0;
FILE *f;
if (!argv[1]) {
fprintf(stderr, "ERROR: Run with the btrfs device argument!\n");
return 0;
}
snprintf(filename, sizeof(filename), "%s.bpf.o", argv[0]);
obj = bpf_object__open_file(filename, NULL);
if (libbpf_get_error(obj)) {
fprintf(stderr, "ERROR: opening BPF object file failed\n");
return 0;
}
prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
if (!prog) {
fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
goto cleanup;
}
/* load BPF program */
if (bpf_object__load(obj)) {
fprintf(stderr, "ERROR: loading BPF object file failed\n");
goto cleanup;
}
link = bpf_program__attach(prog);
if (libbpf_get_error(link)) {
fprintf(stderr, "ERROR: bpf_program__attach failed\n");
link = NULL;
goto cleanup;
}
snprintf(command, 256, "mount %s tmpmnt/", argv[1]);
f = popen(command, "r");
ret = pclose(f);
cleanup:
bpf_link__destroy(link);
bpf_object__close(obj);
return ret ? 0 : 1;
}

View File

@ -32,7 +32,7 @@ SEC("xdp_mark")
int _xdp_mark(struct xdp_md *ctx)
{
struct meta_info *meta;
void *data, *data_end;
void *data;
int ret;
/* Reserve space in-front of data pointer for our meta info.

View File

@ -57,6 +57,7 @@ static __always_inline void swap_mac(void *data, struct ethhdr *orig_eth)
static __always_inline __u16 csum_fold_helper(__u32 csum)
{
csum = (csum & 0xffff) + (csum >> 16);
return ~((csum & 0xffff) + (csum >> 16));
}

View File

@ -3,6 +3,8 @@
pahole-ver := $(CONFIG_PAHOLE_VERSION)
pahole-flags-y :=
JOBS := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
ifeq ($(call test-le, $(pahole-ver), 125),y)
# pahole 1.18 through 1.21 can't handle zero-sized per-CPU vars
@ -12,14 +14,14 @@ endif
pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats
pahole-flags-$(call test-ge, $(pahole-ver), 122) += -j
pahole-flags-$(call test-ge, $(pahole-ver), 122) += -j$(JOBS)
pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized
else
# Switch to using --btf_features for v1.26 and later.
pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs
pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j$(JOBS) --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs
ifneq ($(KBUILD_EXTMOD),)
module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base

View File

@ -37,10 +37,11 @@ class APIElement(object):
@desc: textual description of the symbol
@ret: (optional) description of any associated return value
"""
def __init__(self, proto='', desc='', ret=''):
def __init__(self, proto='', desc='', ret='', attrs=[]):
self.proto = proto
self.desc = desc
self.ret = ret
self.attrs = attrs
class Helper(APIElement):
@ -81,6 +82,11 @@ class Helper(APIElement):
return res
ATTRS = {
'__bpf_fastcall': 'bpf_fastcall'
}
class HeaderParser(object):
"""
An object used to parse a file in order to extract the documentation of a
@ -111,7 +117,8 @@ class HeaderParser(object):
proto = self.parse_proto()
desc = self.parse_desc(proto)
ret = self.parse_ret(proto)
return Helper(proto=proto, desc=desc, ret=ret)
attrs = self.parse_attrs(proto)
return Helper(proto=proto, desc=desc, ret=ret, attrs=attrs)
def parse_symbol(self):
p = re.compile(r' \* ?(BPF\w+)$')
@ -192,6 +199,28 @@ class HeaderParser(object):
raise Exception("No return found for " + proto)
return ret
def parse_attrs(self, proto):
p = re.compile(r' \* ?(?:\t| {5,8})Attributes$')
capture = p.match(self.line)
if not capture:
return []
# Expect a single line with mnemonics for attributes separated by spaces
self.line = self.reader.readline()
p = re.compile(r' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
capture = p.match(self.line)
if not capture:
raise Exception("Incomplete 'Attributes' section for " + proto)
attrs = capture.group(1).split(' ')
for attr in attrs:
if attr not in ATTRS:
raise Exception("Unexpected attribute '" + attr + "' specified for " + proto)
self.line = self.reader.readline()
if self.line != ' *\n':
raise Exception("Expecting empty line after 'Attributes' section for " + proto)
# Prepare a line for next self.parse_* to consume
self.line = self.reader.readline()
return attrs
def seek_to(self, target, help_message, discard_lines = 1):
self.reader.seek(0)
offset = self.reader.read().find(target)
@ -789,6 +818,21 @@ class PrinterHelpers(Printer):
print('%s;' % fwd)
print('')
used_attrs = set()
for helper in self.elements:
for attr in helper.attrs:
used_attrs.add(attr)
for attr in sorted(used_attrs):
print('#ifndef %s' % attr)
print('#if __has_attribute(%s)' % ATTRS[attr])
print('#define %s __attribute__((%s))' % (attr, ATTRS[attr]))
print('#else')
print('#define %s' % attr)
print('#endif')
print('#endif')
if used_attrs:
print('')
def print_footer(self):
footer = ''
print(footer)
@ -827,7 +871,10 @@ class PrinterHelpers(Printer):
print(' *{}{}'.format(' \t' if line else '', line))
print(' */')
print('static %s %s(* const %s)(' % (self.map_type(proto['ret_type']),
print('static ', end='')
if helper.attrs:
print('%s ' % (" ".join(helper.attrs)), end='')
print('%s %s(* const %s)(' % (self.map_type(proto['ret_type']),
proto['ret_star'], proto['name']), end='')
comma = ''
for i, a in enumerate(proto['args']):

View File

@ -210,7 +210,7 @@ static uint8_t *get_last_jit_image(char *haystack, size_t hlen,
return NULL;
}
if (proglen > 1000000) {
printf("proglen of %d too big, stopping\n", proglen);
printf("proglen of %u too big, stopping\n", proglen);
return NULL;
}

View File

@ -147,7 +147,11 @@ ifeq ($(feature-llvm),1)
# If LLVM is available, use it for JIT disassembly
CFLAGS += -DHAVE_LLVM_SUPPORT
LLVM_CONFIG_LIB_COMPONENTS := mcdisassembler all-targets
CFLAGS += $(shell $(LLVM_CONFIG) --cflags)
# llvm-config always adds -D_GNU_SOURCE, however, it may already be in CFLAGS
# (e.g. when bpftool build is called from selftests build as selftests
# Makefile includes lib.mk which sets -D_GNU_SOURCE) which would cause
# compilation error due to redefinition. Let's filter it out here.
CFLAGS += $(filter-out -D_GNU_SOURCE,$(shell $(LLVM_CONFIG) --cflags))
LIBS += $(shell $(LLVM_CONFIG) --libs $(LLVM_CONFIG_LIB_COMPONENTS))
ifeq ($(shell $(LLVM_CONFIG) --shared-mode),static)
LIBS += $(shell $(LLVM_CONFIG) --system-libs $(LLVM_CONFIG_LIB_COMPONENTS))

View File

@ -1,11 +1,15 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
/* Copyright (C) 2019 Facebook */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <errno.h>
#include <fcntl.h>
#include <linux/err.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <linux/btf.h>
@ -21,6 +25,7 @@
#include "main.h"
#define KFUNC_DECL_TAG "bpf_kfunc"
#define FASTCALL_DECL_TAG "bpf_fastcall"
static const char * const btf_kind_str[NR_BTF_KINDS] = {
[BTF_KIND_UNKN] = "UNKNOWN",
@ -284,7 +289,7 @@ static int dump_btf_type(const struct btf *btf, __u32 id,
} else {
if (btf_kflag(t))
printf("\n\t'%s' val=%lldLL", name,
(unsigned long long)val);
(long long)val);
else
printf("\n\t'%s' val=%lluULL", name,
(unsigned long long)val);
@ -464,19 +469,59 @@ static int dump_btf_raw(const struct btf *btf,
return 0;
}
struct ptr_array {
__u32 cnt;
__u32 cap;
const void **elems;
};
static int ptr_array_push(const void *ptr, struct ptr_array *arr)
{
__u32 new_cap;
void *tmp;
if (arr->cnt == arr->cap) {
new_cap = (arr->cap ?: 16) * 2;
tmp = realloc(arr->elems, sizeof(*arr->elems) * new_cap);
if (!tmp)
return -ENOMEM;
arr->elems = tmp;
arr->cap = new_cap;
}
arr->elems[arr->cnt++] = ptr;
return 0;
}
static void ptr_array_free(struct ptr_array *arr)
{
free(arr->elems);
}
static int cmp_kfuncs(const void *pa, const void *pb, void *ctx)
{
struct btf *btf = ctx;
const struct btf_type *a = *(void **)pa;
const struct btf_type *b = *(void **)pb;
return strcmp(btf__str_by_offset(btf, a->name_off),
btf__str_by_offset(btf, b->name_off));
}
static int dump_btf_kfuncs(struct btf_dump *d, const struct btf *btf)
{
LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts);
int cnt = btf__type_cnt(btf);
int i;
__u32 cnt = btf__type_cnt(btf), i, j;
struct ptr_array fastcalls = {};
struct ptr_array kfuncs = {};
int err = 0;
printf("\n/* BPF kfuncs */\n");
printf("#ifndef BPF_NO_KFUNC_PROTOTYPES\n");
for (i = 1; i < cnt; i++) {
const struct btf_type *t = btf__type_by_id(btf, i);
const struct btf_type *ft;
const char *name;
int err;
if (!btf_is_decl_tag(t))
continue;
@ -484,27 +529,53 @@ static int dump_btf_kfuncs(struct btf_dump *d, const struct btf *btf)
if (btf_decl_tag(t)->component_idx != -1)
continue;
name = btf__name_by_offset(btf, t->name_off);
if (strncmp(name, KFUNC_DECL_TAG, sizeof(KFUNC_DECL_TAG)))
ft = btf__type_by_id(btf, t->type);
if (!btf_is_func(ft))
continue;
t = btf__type_by_id(btf, t->type);
if (!btf_is_func(t))
continue;
name = btf__name_by_offset(btf, t->name_off);
if (strncmp(name, KFUNC_DECL_TAG, sizeof(KFUNC_DECL_TAG)) == 0) {
err = ptr_array_push(ft, &kfuncs);
if (err)
goto out;
}
if (strncmp(name, FASTCALL_DECL_TAG, sizeof(FASTCALL_DECL_TAG)) == 0) {
err = ptr_array_push(ft, &fastcalls);
if (err)
goto out;
}
}
/* Sort kfuncs by name for improved vmlinux.h stability */
qsort_r(kfuncs.elems, kfuncs.cnt, sizeof(*kfuncs.elems), cmp_kfuncs, (void *)btf);
for (i = 0; i < kfuncs.cnt; i++) {
const struct btf_type *t = kfuncs.elems[i];
printf("extern ");
/* Assume small amount of fastcall kfuncs */
for (j = 0; j < fastcalls.cnt; j++) {
if (fastcalls.elems[j] == t) {
printf("__bpf_fastcall ");
break;
}
}
opts.field_name = btf__name_by_offset(btf, t->name_off);
err = btf_dump__emit_type_decl(d, t->type, &opts);
if (err)
return err;
goto out;
printf(" __weak __ksym;\n");
}
printf("#endif\n\n");
return 0;
out:
ptr_array_free(&fastcalls);
ptr_array_free(&kfuncs);
return err;
}
static void __printf(2, 0) btf_dump_printf(void *ctx,
@ -718,6 +789,13 @@ static int dump_btf_c(const struct btf *btf,
printf("#ifndef __weak\n");
printf("#define __weak __attribute__((weak))\n");
printf("#endif\n\n");
printf("#ifndef __bpf_fastcall\n");
printf("#if __has_attribute(bpf_fastcall)\n");
printf("#define __bpf_fastcall __attribute__((bpf_fastcall))\n");
printf("#else\n");
printf("#define __bpf_fastcall\n");
printf("#endif\n");
printf("#endif\n\n");
if (root_type_cnt) {
for (i = 0; i < root_type_cnt; i++) {

View File

@ -80,7 +80,8 @@ symbol_lookup_callback(__maybe_unused void *disasm_info,
static int
init_context(disasm_ctx_t *ctx, const char *arch,
__maybe_unused const char *disassembler_options,
__maybe_unused unsigned char *image, __maybe_unused ssize_t len)
__maybe_unused unsigned char *image, __maybe_unused ssize_t len,
__maybe_unused __u64 func_ksym)
{
char *triple;
@ -109,12 +110,13 @@ static void destroy_context(disasm_ctx_t *ctx)
}
static int
disassemble_insn(disasm_ctx_t *ctx, unsigned char *image, ssize_t len, int pc)
disassemble_insn(disasm_ctx_t *ctx, unsigned char *image, ssize_t len, int pc,
__u64 func_ksym)
{
char buf[256];
int count;
count = LLVMDisasmInstruction(*ctx, image + pc, len - pc, pc,
count = LLVMDisasmInstruction(*ctx, image + pc, len - pc, func_ksym + pc,
buf, sizeof(buf));
if (json_output)
printf_json(buf);
@ -136,8 +138,21 @@ int disasm_init(void)
#ifdef HAVE_LIBBFD_SUPPORT
#define DISASM_SPACER "\t"
struct disasm_info {
struct disassemble_info info;
__u64 func_ksym;
};
static void disasm_print_addr(bfd_vma addr, struct disassemble_info *info)
{
struct disasm_info *dinfo = container_of(info, struct disasm_info, info);
addr += dinfo->func_ksym;
generic_print_address(addr, info);
}
typedef struct {
struct disassemble_info *info;
struct disasm_info *info;
disassembler_ftype disassemble;
bfd *bfdf;
} disasm_ctx_t;
@ -215,7 +230,7 @@ static int fprintf_json_styled(void *out,
static int init_context(disasm_ctx_t *ctx, const char *arch,
const char *disassembler_options,
unsigned char *image, ssize_t len)
unsigned char *image, ssize_t len, __u64 func_ksym)
{
struct disassemble_info *info;
char tpath[PATH_MAX];
@ -238,12 +253,13 @@ static int init_context(disasm_ctx_t *ctx, const char *arch,
}
bfdf = ctx->bfdf;
ctx->info = malloc(sizeof(struct disassemble_info));
ctx->info = malloc(sizeof(struct disasm_info));
if (!ctx->info) {
p_err("mem alloc failed");
goto err_close;
}
info = ctx->info;
ctx->info->func_ksym = func_ksym;
info = &ctx->info->info;
if (json_output)
init_disassemble_info_compat(info, stdout,
@ -272,6 +288,7 @@ static int init_context(disasm_ctx_t *ctx, const char *arch,
info->disassembler_options = disassembler_options;
info->buffer = image;
info->buffer_length = len;
info->print_address_func = disasm_print_addr;
disassemble_init_for_target(info);
@ -304,9 +321,10 @@ static void destroy_context(disasm_ctx_t *ctx)
static int
disassemble_insn(disasm_ctx_t *ctx, __maybe_unused unsigned char *image,
__maybe_unused ssize_t len, int pc)
__maybe_unused ssize_t len, int pc,
__maybe_unused __u64 func_ksym)
{
return ctx->disassemble(pc, ctx->info);
return ctx->disassemble(pc, &ctx->info->info);
}
int disasm_init(void)
@ -331,7 +349,7 @@ int disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
if (!len)
return -1;
if (init_context(&ctx, arch, disassembler_options, image, len))
if (init_context(&ctx, arch, disassembler_options, image, len, func_ksym))
return -1;
if (json_output)
@ -360,7 +378,7 @@ int disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
printf("%4x:" DISASM_SPACER, pc);
}
count = disassemble_insn(&ctx, image, len, pc);
count = disassemble_insn(&ctx, image, len, pc, func_ksym);
if (json_output) {
/* Operand array, was started in fprintf_json. Before

View File

@ -679,8 +679,8 @@ static int sets_patch(struct object *obj)
next = rb_first(&obj->sets);
while (next) {
struct btf_id_set8 *set8;
struct btf_id_set *set;
struct btf_id_set8 *set8 = NULL;
struct btf_id_set *set = NULL;
unsigned long addr, off;
struct btf_id *id;

View File

@ -70,7 +70,6 @@ int handle__sched_switch(u64 *ctx)
struct task_struct *next = (struct task_struct *)ctx[2];
struct runq_event event = {};
u64 *tsp, delta_us;
long state;
u32 pid;
/* ivcsw: treat like an enqueue event and store timestamp */

View File

@ -1116,6 +1116,7 @@ enum bpf_attach_type {
BPF_NETKIT_PRIMARY,
BPF_NETKIT_PEER,
BPF_TRACE_KPROBE_SESSION,
BPF_TRACE_UPROBE_SESSION,
__MAX_BPF_ATTACH_TYPE
};
@ -1973,6 +1974,8 @@ union bpf_attr {
* program.
* Return
* The SMP id of the processor running the program.
* Attributes
* __bpf_fastcall
*
* long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
* Description
@ -3104,10 +3107,6 @@ union bpf_attr {
* with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
* option, and in this case it only works on functions tagged with
* **ALLOW_ERROR_INJECTION** in the kernel code.
*
* Also, the helper is only available for the architectures having
* the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
* x86 architecture is the only one to support this feature.
* Return
* 0
*
@ -5372,7 +5371,7 @@ union bpf_attr {
* Currently, the **flags** must be 0. Currently, nr_loops is
* limited to 1 << 23 (~8 million) loops.
*
* long (\*callback_fn)(u32 index, void \*ctx);
* long (\*callback_fn)(u64 index, void \*ctx);
*
* where **index** is the current index in the loop. The index
* is zero-indexed.

View File

@ -61,7 +61,8 @@ ifndef VERBOSE
endif
INCLUDES = -I$(or $(OUTPUT),.) \
-I$(srctree)/tools/include -I$(srctree)/tools/include/uapi
-I$(srctree)/tools/include -I$(srctree)/tools/include/uapi \
-I$(srctree)/tools/arch/$(SRCARCH)/include
export prefix libdir src obj

View File

@ -776,6 +776,7 @@ int bpf_link_create(int prog_fd, int target_fd,
return libbpf_err(-EINVAL);
break;
case BPF_TRACE_UPROBE_MULTI:
case BPF_TRACE_UPROBE_SESSION:
attr.link_create.uprobe_multi.flags = OPTS_GET(opts, uprobe_multi.flags, 0);
attr.link_create.uprobe_multi.cnt = OPTS_GET(opts, uprobe_multi.cnt, 0);
attr.link_create.uprobe_multi.path = ptr_to_u64(OPTS_GET(opts, uprobe_multi.path, 0));

View File

@ -34,6 +34,7 @@ struct bpf_gen {
void *data_cur;
void *insn_start;
void *insn_cur;
bool swapped_endian;
ssize_t cleanup_label;
__u32 nr_progs;
__u32 nr_maps;

View File

@ -185,6 +185,7 @@ enum libbpf_tristate {
#define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
#define __kptr __attribute__((btf_type_tag("kptr")))
#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))
#define __uptr __attribute__((btf_type_tag("uptr")))
#if defined (__clang__)
#define bpf_ksym_exists(sym) ({ \

View File

@ -22,6 +22,7 @@
#include "libbpf_internal.h"
#include "hashmap.h"
#include "strset.h"
#include "str_error.h"
#define BTF_MAX_NR_TYPES 0x7fffffffU
#define BTF_MAX_STR_OFFSET 0x7fffffffU
@ -1179,7 +1180,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
fd = open(path, O_RDONLY | O_CLOEXEC);
if (fd < 0) {
err = -errno;
pr_warn("failed to open %s: %s\n", path, strerror(errno));
pr_warn("failed to open %s: %s\n", path, errstr(err));
return ERR_PTR(err);
}
@ -1445,7 +1446,7 @@ retry_load:
goto retry_load;
err = -errno;
pr_warn("BTF loading error: %d\n", err);
pr_warn("BTF loading error: %s\n", errstr(err));
/* don't print out contents of custom log_buf */
if (!log_buf && buf[0])
pr_warn("-- BEGIN BTF LOAD LOG ---\n%s\n-- END BTF LOAD LOG --\n", buf);
@ -2885,7 +2886,7 @@ int btf__add_decl_tag(struct btf *btf, const char *value, int ref_type_id,
return btf_commit_type(btf, sz);
}
struct btf_ext_sec_setup_param {
struct btf_ext_sec_info_param {
__u32 off;
__u32 len;
__u32 min_rec_size;
@ -2893,14 +2894,20 @@ struct btf_ext_sec_setup_param {
const char *desc;
};
static int btf_ext_setup_info(struct btf_ext *btf_ext,
struct btf_ext_sec_setup_param *ext_sec)
/*
* Parse a single info subsection of the BTF.ext info data:
* - validate subsection structure and elements
* - save info subsection start and sizing details in struct btf_ext
* - endian-independent operation, for calling before byte-swapping
*/
static int btf_ext_parse_sec_info(struct btf_ext *btf_ext,
struct btf_ext_sec_info_param *ext_sec,
bool is_native)
{
const struct btf_ext_info_sec *sinfo;
struct btf_ext_info *ext_info;
__u32 info_left, record_size;
size_t sec_cnt = 0;
/* The start of the info sec (including the __u32 record_size). */
void *info;
if (ext_sec->len == 0)
@ -2912,6 +2919,7 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
return -EINVAL;
}
/* The start of the info sec (including the __u32 record_size). */
info = btf_ext->data + btf_ext->hdr->hdr_len + ext_sec->off;
info_left = ext_sec->len;
@ -2927,9 +2935,13 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
return -EINVAL;
}
/* The record size needs to meet the minimum standard */
record_size = *(__u32 *)info;
/* The record size needs to meet either the minimum standard or, when
* handling non-native endianness data, the exact standard so as
* to allow safe byte-swapping.
*/
record_size = is_native ? *(__u32 *)info : bswap_32(*(__u32 *)info);
if (record_size < ext_sec->min_rec_size ||
(!is_native && record_size != ext_sec->min_rec_size) ||
record_size & 0x03) {
pr_debug("%s section in .BTF.ext has invalid record size %u\n",
ext_sec->desc, record_size);
@ -2941,7 +2953,7 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
/* If no records, return failure now so .BTF.ext won't be used. */
if (!info_left) {
pr_debug("%s section in .BTF.ext has no records", ext_sec->desc);
pr_debug("%s section in .BTF.ext has no records\n", ext_sec->desc);
return -EINVAL;
}
@ -2956,7 +2968,7 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
return -EINVAL;
}
num_records = sinfo->num_info;
num_records = is_native ? sinfo->num_info : bswap_32(sinfo->num_info);
if (num_records == 0) {
pr_debug("%s section has incorrect num_records in .BTF.ext\n",
ext_sec->desc);
@ -2984,64 +2996,157 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
return 0;
}
static int btf_ext_setup_func_info(struct btf_ext *btf_ext)
/* Parse all info secs in the BTF.ext info data */
static int btf_ext_parse_info(struct btf_ext *btf_ext, bool is_native)
{
struct btf_ext_sec_setup_param param = {
struct btf_ext_sec_info_param func_info = {
.off = btf_ext->hdr->func_info_off,
.len = btf_ext->hdr->func_info_len,
.min_rec_size = sizeof(struct bpf_func_info_min),
.ext_info = &btf_ext->func_info,
.desc = "func_info"
};
return btf_ext_setup_info(btf_ext, &param);
}
static int btf_ext_setup_line_info(struct btf_ext *btf_ext)
{
struct btf_ext_sec_setup_param param = {
struct btf_ext_sec_info_param line_info = {
.off = btf_ext->hdr->line_info_off,
.len = btf_ext->hdr->line_info_len,
.min_rec_size = sizeof(struct bpf_line_info_min),
.ext_info = &btf_ext->line_info,
.desc = "line_info",
};
return btf_ext_setup_info(btf_ext, &param);
}
static int btf_ext_setup_core_relos(struct btf_ext *btf_ext)
{
struct btf_ext_sec_setup_param param = {
struct btf_ext_sec_info_param core_relo = {
.off = btf_ext->hdr->core_relo_off,
.len = btf_ext->hdr->core_relo_len,
.min_rec_size = sizeof(struct bpf_core_relo),
.ext_info = &btf_ext->core_relo_info,
.desc = "core_relo",
};
int err;
return btf_ext_setup_info(btf_ext, &param);
err = btf_ext_parse_sec_info(btf_ext, &func_info, is_native);
if (err)
return err;
err = btf_ext_parse_sec_info(btf_ext, &line_info, is_native);
if (err)
return err;
if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, core_relo_len))
return 0; /* skip core relos parsing */
err = btf_ext_parse_sec_info(btf_ext, &core_relo, is_native);
if (err)
return err;
return 0;
}
static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
/* Swap byte-order of BTF.ext header with any endianness */
static void btf_ext_bswap_hdr(struct btf_ext_header *h)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
bool is_native = h->magic == BTF_MAGIC;
__u32 hdr_len;
if (data_size < offsetofend(struct btf_ext_header, hdr_len) ||
data_size < hdr->hdr_len) {
pr_debug("BTF.ext header not found");
hdr_len = is_native ? h->hdr_len : bswap_32(h->hdr_len);
h->magic = bswap_16(h->magic);
h->hdr_len = bswap_32(h->hdr_len);
h->func_info_off = bswap_32(h->func_info_off);
h->func_info_len = bswap_32(h->func_info_len);
h->line_info_off = bswap_32(h->line_info_off);
h->line_info_len = bswap_32(h->line_info_len);
if (hdr_len < offsetofend(struct btf_ext_header, core_relo_len))
return;
h->core_relo_off = bswap_32(h->core_relo_off);
h->core_relo_len = bswap_32(h->core_relo_len);
}
/* Swap byte-order of generic info subsection */
static void btf_ext_bswap_info_sec(void *info, __u32 len, bool is_native,
info_rec_bswap_fn bswap_fn)
{
struct btf_ext_info_sec *sec;
__u32 info_left, rec_size, *rs;
if (len == 0)
return;
rs = info; /* info record size */
rec_size = is_native ? *rs : bswap_32(*rs);
*rs = bswap_32(*rs);
sec = info + sizeof(__u32); /* info sec #1 */
info_left = len - sizeof(__u32);
while (info_left) {
unsigned int sec_hdrlen = sizeof(struct btf_ext_info_sec);
__u32 i, num_recs;
void *p;
num_recs = is_native ? sec->num_info : bswap_32(sec->num_info);
sec->sec_name_off = bswap_32(sec->sec_name_off);
sec->num_info = bswap_32(sec->num_info);
p = sec->data; /* info rec #1 */
for (i = 0; i < num_recs; i++, p += rec_size)
bswap_fn(p);
sec = p;
info_left -= sec_hdrlen + (__u64)rec_size * num_recs;
}
}
/*
* Swap byte-order of all info data in a BTF.ext section
* - requires BTF.ext hdr in native endianness
*/
static void btf_ext_bswap_info(struct btf_ext *btf_ext, void *data)
{
const bool is_native = btf_ext->swapped_endian;
const struct btf_ext_header *h = data;
void *info;
/* Swap func_info subsection byte-order */
info = data + h->hdr_len + h->func_info_off;
btf_ext_bswap_info_sec(info, h->func_info_len, is_native,
(info_rec_bswap_fn)bpf_func_info_bswap);
/* Swap line_info subsection byte-order */
info = data + h->hdr_len + h->line_info_off;
btf_ext_bswap_info_sec(info, h->line_info_len, is_native,
(info_rec_bswap_fn)bpf_line_info_bswap);
/* Swap core_relo subsection byte-order (if present) */
if (h->hdr_len < offsetofend(struct btf_ext_header, core_relo_len))
return;
info = data + h->hdr_len + h->core_relo_off;
btf_ext_bswap_info_sec(info, h->core_relo_len, is_native,
(info_rec_bswap_fn)bpf_core_relo_bswap);
}
/* Parse hdr data and info sections: check and convert to native endianness */
static int btf_ext_parse(struct btf_ext *btf_ext)
{
__u32 hdr_len, data_size = btf_ext->data_size;
struct btf_ext_header *hdr = btf_ext->hdr;
bool swapped_endian = false;
int err;
if (data_size < offsetofend(struct btf_ext_header, hdr_len)) {
pr_debug("BTF.ext header too short\n");
return -EINVAL;
}
hdr_len = hdr->hdr_len;
if (hdr->magic == bswap_16(BTF_MAGIC)) {
pr_warn("BTF.ext in non-native endianness is not supported\n");
return -ENOTSUP;
swapped_endian = true;
hdr_len = bswap_32(hdr_len);
} else if (hdr->magic != BTF_MAGIC) {
pr_debug("Invalid BTF.ext magic:%x\n", hdr->magic);
return -EINVAL;
}
if (hdr->version != BTF_VERSION) {
/* Ensure known version of structs, current BTF_VERSION == 1 */
if (hdr->version != 1) {
pr_debug("Unsupported BTF.ext version:%u\n", hdr->version);
return -ENOTSUP;
}
@ -3051,11 +3156,39 @@ static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
return -ENOTSUP;
}
if (data_size == hdr->hdr_len) {
if (data_size < hdr_len) {
pr_debug("BTF.ext header not found\n");
return -EINVAL;
} else if (data_size == hdr_len) {
pr_debug("BTF.ext has no data\n");
return -EINVAL;
}
/* Verify mandatory hdr info details present */
if (hdr_len < offsetofend(struct btf_ext_header, line_info_len)) {
pr_warn("BTF.ext header missing func_info, line_info\n");
return -EINVAL;
}
/* Keep hdr native byte-order in memory for introspection */
if (swapped_endian)
btf_ext_bswap_hdr(btf_ext->hdr);
/* Validate info subsections and cache key metadata */
err = btf_ext_parse_info(btf_ext, !swapped_endian);
if (err)
return err;
/* Keep infos native byte-order in memory for introspection */
if (swapped_endian)
btf_ext_bswap_info(btf_ext, btf_ext->data);
/*
* Set btf_ext->swapped_endian only after all header and info data has
* been swapped, helping bswap functions determine if their data are
* in native byte-order when called.
*/
btf_ext->swapped_endian = swapped_endian;
return 0;
}
@ -3067,6 +3200,7 @@ void btf_ext__free(struct btf_ext *btf_ext)
free(btf_ext->line_info.sec_idxs);
free(btf_ext->core_relo_info.sec_idxs);
free(btf_ext->data);
free(btf_ext->data_swapped);
free(btf_ext);
}
@ -3087,29 +3221,7 @@ struct btf_ext *btf_ext__new(const __u8 *data, __u32 size)
}
memcpy(btf_ext->data, data, size);
err = btf_ext_parse_hdr(btf_ext->data, size);
if (err)
goto done;
if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, line_info_len)) {
err = -EINVAL;
goto done;
}
err = btf_ext_setup_func_info(btf_ext);
if (err)
goto done;
err = btf_ext_setup_line_info(btf_ext);
if (err)
goto done;
if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, core_relo_len))
goto done; /* skip core relos parsing */
err = btf_ext_setup_core_relos(btf_ext);
if (err)
goto done;
err = btf_ext_parse(btf_ext);
done:
if (err) {
@ -3120,15 +3232,66 @@ done:
return btf_ext;
}
static void *btf_ext_raw_data(const struct btf_ext *btf_ext_ro, bool swap_endian)
{
struct btf_ext *btf_ext = (struct btf_ext *)btf_ext_ro;
const __u32 data_sz = btf_ext->data_size;
void *data;
/* Return native data (always present) or swapped data if present */
if (!swap_endian)
return btf_ext->data;
else if (btf_ext->data_swapped)
return btf_ext->data_swapped;
/* Recreate missing swapped data, then cache and return */
data = calloc(1, data_sz);
if (!data)
return NULL;
memcpy(data, btf_ext->data, data_sz);
btf_ext_bswap_info(btf_ext, data);
btf_ext_bswap_hdr(data);
btf_ext->data_swapped = data;
return data;
}
const void *btf_ext__raw_data(const struct btf_ext *btf_ext, __u32 *size)
{
void *data;
data = btf_ext_raw_data(btf_ext, btf_ext->swapped_endian);
if (!data)
return errno = ENOMEM, NULL;
*size = btf_ext->data_size;
return btf_ext->data;
return data;
}
__attribute__((alias("btf_ext__raw_data")))
const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, __u32 *size);
enum btf_endianness btf_ext__endianness(const struct btf_ext *btf_ext)
{
if (is_host_big_endian())
return btf_ext->swapped_endian ? BTF_LITTLE_ENDIAN : BTF_BIG_ENDIAN;
else
return btf_ext->swapped_endian ? BTF_BIG_ENDIAN : BTF_LITTLE_ENDIAN;
}
int btf_ext__set_endianness(struct btf_ext *btf_ext, enum btf_endianness endian)
{
if (endian != BTF_LITTLE_ENDIAN && endian != BTF_BIG_ENDIAN)
return libbpf_err(-EINVAL);
btf_ext->swapped_endian = is_host_big_endian() != (endian == BTF_BIG_ENDIAN);
if (!btf_ext->swapped_endian) {
free(btf_ext->data_swapped);
btf_ext->data_swapped = NULL;
}
return 0;
}
struct btf_dedup;
@ -3291,7 +3454,7 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
d = btf_dedup_new(btf, opts);
if (IS_ERR(d)) {
pr_debug("btf_dedup_new failed: %ld", PTR_ERR(d));
pr_debug("btf_dedup_new failed: %ld\n", PTR_ERR(d));
return libbpf_err(-EINVAL);
}
@ -3302,42 +3465,42 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
err = btf_dedup_prep(d);
if (err) {
pr_debug("btf_dedup_prep failed:%d\n", err);
pr_debug("btf_dedup_prep failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_strings(d);
if (err < 0) {
pr_debug("btf_dedup_strings failed:%d\n", err);
pr_debug("btf_dedup_strings failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_prim_types(d);
if (err < 0) {
pr_debug("btf_dedup_prim_types failed:%d\n", err);
pr_debug("btf_dedup_prim_types failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_struct_types(d);
if (err < 0) {
pr_debug("btf_dedup_struct_types failed:%d\n", err);
pr_debug("btf_dedup_struct_types failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_resolve_fwds(d);
if (err < 0) {
pr_debug("btf_dedup_resolve_fwds failed:%d\n", err);
pr_debug("btf_dedup_resolve_fwds failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_ref_types(d);
if (err < 0) {
pr_debug("btf_dedup_ref_types failed:%d\n", err);
pr_debug("btf_dedup_ref_types failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_compact_types(d);
if (err < 0) {
pr_debug("btf_dedup_compact_types failed:%d\n", err);
pr_debug("btf_dedup_compact_types failed: %s\n", errstr(err));
goto done;
}
err = btf_dedup_remap_types(d);
if (err < 0) {
pr_debug("btf_dedup_remap_types failed:%d\n", err);
pr_debug("btf_dedup_remap_types failed: %s\n", errstr(err));
goto done;
}
@ -3385,7 +3548,7 @@ struct btf_dedup {
struct strset *strs_set;
};
static long hash_combine(long h, long value)
static unsigned long hash_combine(unsigned long h, unsigned long value)
{
return h * 31 + value;
}
@ -5056,7 +5219,8 @@ struct btf *btf__load_vmlinux_btf(void)
btf = btf__parse(sysfs_btf_path, NULL);
if (!btf) {
err = -errno;
pr_warn("failed to read kernel BTF from '%s': %d\n", sysfs_btf_path, err);
pr_warn("failed to read kernel BTF from '%s': %s\n",
sysfs_btf_path, errstr(err));
return libbpf_err_ptr(err);
}
pr_debug("loaded kernel BTF from '%s'\n", sysfs_btf_path);
@ -5073,7 +5237,7 @@ struct btf *btf__load_vmlinux_btf(void)
btf = btf__parse(path, NULL);
err = libbpf_get_error(btf);
pr_debug("loading kernel BTF '%s': %d\n", path, err);
pr_debug("loading kernel BTF '%s': %s\n", path, errstr(err));
if (err)
continue;

View File

@ -167,6 +167,9 @@ LIBBPF_API const char *btf__str_by_offset(const struct btf *btf, __u32 offset);
LIBBPF_API struct btf_ext *btf_ext__new(const __u8 *data, __u32 size);
LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
LIBBPF_API const void *btf_ext__raw_data(const struct btf_ext *btf_ext, __u32 *size);
LIBBPF_API enum btf_endianness btf_ext__endianness(const struct btf_ext *btf_ext);
LIBBPF_API int btf_ext__set_endianness(struct btf_ext *btf_ext,
enum btf_endianness endian);
LIBBPF_API int btf__find_str(struct btf *btf, const char *s);
LIBBPF_API int btf__add_str(struct btf *btf, const char *s);

View File

@ -21,6 +21,7 @@
#include "hashmap.h"
#include "libbpf.h"
#include "libbpf_internal.h"
#include "str_error.h"
static const char PREFIXES[] = "\t\t\t\t\t\t\t\t\t\t\t\t\t";
static const size_t PREFIX_CNT = sizeof(PREFIXES) - 1;
@ -867,8 +868,8 @@ static void btf_dump_emit_bit_padding(const struct btf_dump *d,
} pads[] = {
{"long", d->ptr_sz * 8}, {"int", 32}, {"short", 16}, {"char", 8}
};
int new_off, pad_bits, bits, i;
const char *pad_type;
int new_off = 0, pad_bits = 0, bits, i;
const char *pad_type = NULL;
if (cur_off >= next_off)
return; /* no gap */
@ -1304,7 +1305,7 @@ static void btf_dump_emit_type_decl(struct btf_dump *d, __u32 id,
* chain, restore stack, emit warning, and try to
* proceed nevertheless
*/
pr_warn("not enough memory for decl stack:%d", err);
pr_warn("not enough memory for decl stack: %s\n", errstr(err));
d->decl_stack_cnt = stack_start;
return;
}

View File

@ -428,7 +428,7 @@ static int btf_relocate_rewrite_strs(struct btf_relocate *r, __u32 i)
} else {
off = r->str_map[*str_off];
if (!off) {
pr_warn("string '%s' [offset %u] is not mapped to base BTF",
pr_warn("string '%s' [offset %u] is not mapped to base BTF\n",
btf__str_by_offset(r->btf, off), *str_off);
return -ENOENT;
}

View File

@ -24,7 +24,6 @@
int elf_open(const char *binary_path, struct elf_fd *elf_fd)
{
char errmsg[STRERR_BUFSIZE];
int fd, ret;
Elf *elf;
@ -38,8 +37,7 @@ int elf_open(const char *binary_path, struct elf_fd *elf_fd)
fd = open(binary_path, O_RDONLY | O_CLOEXEC);
if (fd < 0) {
ret = -errno;
pr_warn("elf: failed to open %s: %s\n", binary_path,
libbpf_strerror_r(ret, errmsg, sizeof(errmsg)));
pr_warn("elf: failed to open %s: %s\n", binary_path, errstr(ret));
return ret;
}
elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);

View File

@ -47,7 +47,6 @@ static int probe_kern_prog_name(int token_fd)
static int probe_kern_global_data(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
@ -67,9 +66,8 @@ static int probe_kern_global_data(int token_fd)
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
pr_warn("Error in %s(): %s. Couldn't create simple array map.\n",
__func__, errstr(ret));
return ret;
}
@ -267,7 +265,6 @@ static int probe_kern_probe_read_kernel(int token_fd)
static int probe_prog_bind_map(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
@ -285,9 +282,8 @@ static int probe_prog_bind_map(int token_fd)
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
pr_warn("Error in %s(): %s. Couldn't create simple array map.\n",
__func__, errstr(ret));
return ret;
}
@ -604,7 +600,8 @@ bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_
} else if (ret == 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
} else {
pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
pr_warn("Detection of kernel %s support failed: %s\n",
feat->desc, errstr(ret));
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
}
}

View File

@ -14,6 +14,7 @@
#include "bpf_gen_internal.h"
#include "skel_internal.h"
#include <asm/byteorder.h>
#include "str_error.h"
#define MAX_USED_MAPS 64
#define MAX_USED_PROGS 32
@ -393,7 +394,7 @@ int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps)
blob_fd_array_off(gen, i));
emit(gen, BPF_MOV64_IMM(BPF_REG_0, 0));
emit(gen, BPF_EXIT_INSN());
pr_debug("gen: finish %d\n", gen->error);
pr_debug("gen: finish %s\n", errstr(gen->error));
if (!gen->error) {
struct gen_loader_opts *opts = gen->opts;
@ -401,6 +402,15 @@ int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps)
opts->insns_sz = gen->insn_cur - gen->insn_start;
opts->data = gen->data_start;
opts->data_sz = gen->data_cur - gen->data_start;
/* use target endianness for embedded loader */
if (gen->swapped_endian) {
struct bpf_insn *insn = (struct bpf_insn *)opts->insns;
int insn_cnt = opts->insns_sz / sizeof(struct bpf_insn);
for (i = 0; i < insn_cnt; i++)
bpf_insn_bswap(insn++);
}
}
return gen->error;
}
@ -414,6 +424,28 @@ void bpf_gen__free(struct bpf_gen *gen)
free(gen);
}
/*
* Fields of bpf_attr are set to values in native byte-order before being
* written to the target-bound data blob, and may need endian conversion.
* This macro allows providing the correct value in situ more simply than
* writing a separate converter for *all fields* of *all records* included
* in union bpf_attr. Note that sizeof(rval) should match the assignment
* target to avoid runtime problems.
*/
#define tgt_endian(rval) ({ \
typeof(rval) _val = (rval); \
if (gen->swapped_endian) { \
switch (sizeof(_val)) { \
case 1: break; \
case 2: _val = bswap_16(_val); break; \
case 4: _val = bswap_32(_val); break; \
case 8: _val = bswap_64(_val); break; \
default: pr_warn("unsupported bswap size!\n"); \
} \
} \
_val; \
})
void bpf_gen__load_btf(struct bpf_gen *gen, const void *btf_raw_data,
__u32 btf_raw_size)
{
@ -422,11 +454,12 @@ void bpf_gen__load_btf(struct bpf_gen *gen, const void *btf_raw_data,
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: load_btf: size %d\n", btf_raw_size);
btf_data = add_data(gen, btf_raw_data, btf_raw_size);
attr.btf_size = btf_raw_size;
attr.btf_size = tgt_endian(btf_raw_size);
btf_load_attr = add_data(gen, &attr, attr_size);
pr_debug("gen: load_btf: off %d size %d, attr: off %d size %d\n",
btf_data, btf_raw_size, btf_load_attr, attr_size);
/* populate union bpf_attr with user provided log details */
move_ctx2blob(gen, attr_field(btf_load_attr, btf_log_level), 4,
@ -457,28 +490,29 @@ void bpf_gen__map_create(struct bpf_gen *gen,
union bpf_attr attr;
memset(&attr, 0, attr_size);
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.map_flags = map_attr->map_flags;
attr.map_extra = map_attr->map_extra;
attr.map_type = tgt_endian(map_type);
attr.key_size = tgt_endian(key_size);
attr.value_size = tgt_endian(value_size);
attr.map_flags = tgt_endian(map_attr->map_flags);
attr.map_extra = tgt_endian(map_attr->map_extra);
if (map_name)
libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name));
attr.numa_node = map_attr->numa_node;
attr.map_ifindex = map_attr->map_ifindex;
attr.max_entries = max_entries;
attr.btf_key_type_id = map_attr->btf_key_type_id;
attr.btf_value_type_id = map_attr->btf_value_type_id;
pr_debug("gen: map_create: %s idx %d type %d value_type_id %d\n",
attr.map_name, map_idx, map_type, attr.btf_value_type_id);
attr.numa_node = tgt_endian(map_attr->numa_node);
attr.map_ifindex = tgt_endian(map_attr->map_ifindex);
attr.max_entries = tgt_endian(max_entries);
attr.btf_key_type_id = tgt_endian(map_attr->btf_key_type_id);
attr.btf_value_type_id = tgt_endian(map_attr->btf_value_type_id);
map_create_attr = add_data(gen, &attr, attr_size);
if (attr.btf_value_type_id)
pr_debug("gen: map_create: %s idx %d type %d value_type_id %d, attr: off %d size %d\n",
map_name, map_idx, map_type, map_attr->btf_value_type_id,
map_create_attr, attr_size);
if (map_attr->btf_value_type_id)
/* populate union bpf_attr with btf_fd saved in the stack earlier */
move_stack2blob(gen, attr_field(map_create_attr, btf_fd), 4,
stack_off(btf_fd));
switch (attr.map_type) {
switch (map_type) {
case BPF_MAP_TYPE_ARRAY_OF_MAPS:
case BPF_MAP_TYPE_HASH_OF_MAPS:
move_stack2blob(gen, attr_field(map_create_attr, inner_map_fd), 4,
@ -498,8 +532,8 @@ void bpf_gen__map_create(struct bpf_gen *gen,
/* emit MAP_CREATE command */
emit_sys_bpf(gen, BPF_MAP_CREATE, map_create_attr, attr_size);
debug_ret(gen, "map_create %s idx %d type %d value_size %d value_btf_id %d",
attr.map_name, map_idx, map_type, value_size,
attr.btf_value_type_id);
map_name, map_idx, map_type, value_size,
map_attr->btf_value_type_id);
emit_check_err(gen);
/* remember map_fd in the stack, if successful */
if (map_idx < 0) {
@ -784,12 +818,12 @@ log:
emit_ksym_relo_log(gen, relo, kdesc->ref);
}
static __u32 src_reg_mask(void)
static __u32 src_reg_mask(struct bpf_gen *gen)
{
#if defined(__LITTLE_ENDIAN_BITFIELD)
return 0x0f; /* src_reg,dst_reg,... */
#elif defined(__BIG_ENDIAN_BITFIELD)
return 0xf0; /* dst_reg,src_reg,... */
#if defined(__LITTLE_ENDIAN_BITFIELD) /* src_reg,dst_reg,... */
return gen->swapped_endian ? 0xf0 : 0x0f;
#elif defined(__BIG_ENDIAN_BITFIELD) /* dst_reg,src_reg,... */
return gen->swapped_endian ? 0x0f : 0xf0;
#else
#error "Unsupported bit endianness, cannot proceed"
#endif
@ -840,7 +874,7 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 3));
clear_src_reg:
/* clear bpf_object__relocate_data's src_reg assignment, otherwise we get a verifier failure */
reg_mask = src_reg_mask();
reg_mask = src_reg_mask(gen);
emit(gen, BPF_LDX_MEM(BPF_B, BPF_REG_9, BPF_REG_8, offsetofend(struct bpf_insn, code)));
emit(gen, BPF_ALU32_IMM(BPF_AND, BPF_REG_9, reg_mask));
emit(gen, BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, offsetofend(struct bpf_insn, code)));
@ -931,48 +965,94 @@ static void cleanup_relos(struct bpf_gen *gen, int insns)
cleanup_core_relo(gen);
}
/* Convert func, line, and core relo info blobs to target endianness */
static void info_blob_bswap(struct bpf_gen *gen, int func_info, int line_info,
int core_relos, struct bpf_prog_load_opts *load_attr)
{
struct bpf_func_info *fi = gen->data_start + func_info;
struct bpf_line_info *li = gen->data_start + line_info;
struct bpf_core_relo *cr = gen->data_start + core_relos;
int i;
for (i = 0; i < load_attr->func_info_cnt; i++)
bpf_func_info_bswap(fi++);
for (i = 0; i < load_attr->line_info_cnt; i++)
bpf_line_info_bswap(li++);
for (i = 0; i < gen->core_relo_cnt; i++)
bpf_core_relo_bswap(cr++);
}
void bpf_gen__prog_load(struct bpf_gen *gen,
enum bpf_prog_type prog_type, const char *prog_name,
const char *license, struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *load_attr, int prog_idx)
{
int func_info_tot_sz = load_attr->func_info_cnt *
load_attr->func_info_rec_size;
int line_info_tot_sz = load_attr->line_info_cnt *
load_attr->line_info_rec_size;
int core_relo_tot_sz = gen->core_relo_cnt *
sizeof(struct bpf_core_relo);
int prog_load_attr, license_off, insns_off, func_info, line_info, core_relos;
int attr_size = offsetofend(union bpf_attr, core_relo_rec_size);
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: prog_load: type %d insns_cnt %zd progi_idx %d\n",
prog_type, insn_cnt, prog_idx);
/* add license string to blob of bytes */
license_off = add_data(gen, license, strlen(license) + 1);
/* add insns to blob of bytes */
insns_off = add_data(gen, insns, insn_cnt * sizeof(struct bpf_insn));
pr_debug("gen: prog_load: prog_idx %d type %d insn off %d insns_cnt %zd license off %d\n",
prog_idx, prog_type, insns_off, insn_cnt, license_off);
attr.prog_type = prog_type;
attr.expected_attach_type = load_attr->expected_attach_type;
attr.attach_btf_id = load_attr->attach_btf_id;
attr.prog_ifindex = load_attr->prog_ifindex;
/* convert blob insns to target endianness */
if (gen->swapped_endian) {
struct bpf_insn *insn = gen->data_start + insns_off;
int i;
for (i = 0; i < insn_cnt; i++, insn++)
bpf_insn_bswap(insn);
}
attr.prog_type = tgt_endian(prog_type);
attr.expected_attach_type = tgt_endian(load_attr->expected_attach_type);
attr.attach_btf_id = tgt_endian(load_attr->attach_btf_id);
attr.prog_ifindex = tgt_endian(load_attr->prog_ifindex);
attr.kern_version = 0;
attr.insn_cnt = (__u32)insn_cnt;
attr.prog_flags = load_attr->prog_flags;
attr.insn_cnt = tgt_endian((__u32)insn_cnt);
attr.prog_flags = tgt_endian(load_attr->prog_flags);
attr.func_info_rec_size = load_attr->func_info_rec_size;
attr.func_info_cnt = load_attr->func_info_cnt;
func_info = add_data(gen, load_attr->func_info,
attr.func_info_cnt * attr.func_info_rec_size);
attr.func_info_rec_size = tgt_endian(load_attr->func_info_rec_size);
attr.func_info_cnt = tgt_endian(load_attr->func_info_cnt);
func_info = add_data(gen, load_attr->func_info, func_info_tot_sz);
pr_debug("gen: prog_load: func_info: off %d cnt %d rec size %d\n",
func_info, load_attr->func_info_cnt,
load_attr->func_info_rec_size);
attr.line_info_rec_size = load_attr->line_info_rec_size;
attr.line_info_cnt = load_attr->line_info_cnt;
line_info = add_data(gen, load_attr->line_info,
attr.line_info_cnt * attr.line_info_rec_size);
attr.line_info_rec_size = tgt_endian(load_attr->line_info_rec_size);
attr.line_info_cnt = tgt_endian(load_attr->line_info_cnt);
line_info = add_data(gen, load_attr->line_info, line_info_tot_sz);
pr_debug("gen: prog_load: line_info: off %d cnt %d rec size %d\n",
line_info, load_attr->line_info_cnt,
load_attr->line_info_rec_size);
attr.core_relo_rec_size = sizeof(struct bpf_core_relo);
attr.core_relo_cnt = gen->core_relo_cnt;
core_relos = add_data(gen, gen->core_relos,
attr.core_relo_cnt * attr.core_relo_rec_size);
attr.core_relo_rec_size = tgt_endian((__u32)sizeof(struct bpf_core_relo));
attr.core_relo_cnt = tgt_endian(gen->core_relo_cnt);
core_relos = add_data(gen, gen->core_relos, core_relo_tot_sz);
pr_debug("gen: prog_load: core_relos: off %d cnt %d rec size %zd\n",
core_relos, gen->core_relo_cnt,
sizeof(struct bpf_core_relo));
/* convert all info blobs to target endianness */
if (gen->swapped_endian)
info_blob_bswap(gen, func_info, line_info, core_relos, load_attr);
libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
prog_load_attr = add_data(gen, &attr, attr_size);
pr_debug("gen: prog_load: attr: off %d size %d\n",
prog_load_attr, attr_size);
/* populate union bpf_attr with a pointer to license */
emit_rel_store(gen, attr_field(prog_load_attr, license), license_off);
@ -1040,7 +1120,6 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
int zero = 0;
memset(&attr, 0, attr_size);
pr_debug("gen: map_update_elem: idx %d\n", map_idx);
value = add_data(gen, pvalue, value_size);
key = add_data(gen, &zero, sizeof(zero));
@ -1068,6 +1147,8 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
emit(gen, BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel));
map_update_attr = add_data(gen, &attr, attr_size);
pr_debug("gen: map_update_elem: idx %d, value: off %d size %d, attr: off %d size %d\n",
map_idx, value, value_size, map_update_attr, attr_size);
move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
blob_fd_array_off(gen, map_idx));
emit_rel_store(gen, attr_field(map_update_attr, key), key);
@ -1084,14 +1165,16 @@ void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int slo
int attr_size = offsetofend(union bpf_attr, flags);
int map_update_attr, key;
union bpf_attr attr;
int tgt_slot;
memset(&attr, 0, attr_size);
pr_debug("gen: populate_outer_map: outer %d key %d inner %d\n",
outer_map_idx, slot, inner_map_idx);
key = add_data(gen, &slot, sizeof(slot));
tgt_slot = tgt_endian(slot);
key = add_data(gen, &tgt_slot, sizeof(tgt_slot));
map_update_attr = add_data(gen, &attr, attr_size);
pr_debug("gen: populate_outer_map: outer %d key %d inner %d, attr: off %d size %d\n",
outer_map_idx, slot, inner_map_idx, map_update_attr, attr_size);
move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
blob_fd_array_off(gen, outer_map_idx));
emit_rel_store(gen, attr_field(map_update_attr, key), key);
@ -1112,8 +1195,9 @@ void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx)
union bpf_attr attr;
memset(&attr, 0, attr_size);
pr_debug("gen: map_freeze: idx %d\n", map_idx);
map_freeze_attr = add_data(gen, &attr, attr_size);
pr_debug("gen: map_freeze: idx %d, attr: off %d size %d\n",
map_idx, map_freeze_attr, attr_size);
move_blob2blob(gen, attr_field(map_freeze_attr, map_fd), 4,
blob_fd_array_off(gen, map_idx));
/* emit MAP_FREEZE command */

View File

@ -166,8 +166,8 @@ bool hashmap_find(const struct hashmap *map, long key, long *value);
* @bkt: integer used as a bucket loop cursor
*/
#define hashmap__for_each_entry(map, cur, bkt) \
for (bkt = 0; bkt < map->cap; bkt++) \
for (cur = map->buckets[bkt]; cur; cur = cur->next)
for (bkt = 0; bkt < (map)->cap; bkt++) \
for (cur = (map)->buckets[bkt]; cur; cur = cur->next)
/*
* hashmap__for_each_entry_safe - iterate over all entries in hashmap, safe
@ -178,8 +178,8 @@ bool hashmap_find(const struct hashmap *map, long key, long *value);
* @bkt: integer used as a bucket loop cursor
*/
#define hashmap__for_each_entry_safe(map, cur, tmp, bkt) \
for (bkt = 0; bkt < map->cap; bkt++) \
for (cur = map->buckets[bkt]; \
for (bkt = 0; bkt < (map)->cap; bkt++) \
for (cur = (map)->buckets[bkt]; \
cur && ({tmp = cur->next; true; }); \
cur = tmp)
@ -190,19 +190,19 @@ bool hashmap_find(const struct hashmap *map, long key, long *value);
* @key: key to iterate entries for
*/
#define hashmap__for_each_key_entry(map, cur, _key) \
for (cur = map->buckets \
? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \
for (cur = (map)->buckets \
? (map)->buckets[hash_bits((map)->hash_fn((_key), (map)->ctx), (map)->cap_bits)] \
: NULL; \
cur; \
cur = cur->next) \
if (map->equal_fn(cur->key, (_key), map->ctx))
if ((map)->equal_fn(cur->key, (_key), (map)->ctx))
#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key) \
for (cur = map->buckets \
? map->buckets[hash_bits(map->hash_fn((_key), map->ctx), map->cap_bits)] \
for (cur = (map)->buckets \
? (map)->buckets[hash_bits((map)->hash_fn((_key), (map)->ctx), (map)->cap_bits)] \
: NULL; \
cur && ({ tmp = cur->next; true; }); \
cur = tmp) \
if (map->equal_fn(cur->key, (_key), map->ctx))
if ((map)->equal_fn(cur->key, (_key), (map)->ctx))
#endif /* __LIBBPF_HASHMAP_H */

File diff suppressed because it is too large Load Diff

View File

@ -577,10 +577,12 @@ struct bpf_uprobe_multi_opts {
size_t cnt;
/* create return uprobes */
bool retprobe;
/* create session kprobes */
bool session;
size_t :0;
};
#define bpf_uprobe_multi_opts__last_field retprobe
#define bpf_uprobe_multi_opts__last_field session
/**
* @brief **bpf_program__attach_uprobe_multi()** attaches a BPF program

View File

@ -421,6 +421,8 @@ LIBBPF_1.5.0 {
global:
btf__distill_base;
btf__relocate;
btf_ext__endianness;
btf_ext__set_endianness;
bpf_map__autoattach;
bpf_map__set_autoattach;
bpf_object__token_fd;
@ -428,3 +430,6 @@ LIBBPF_1.5.0 {
ring__consume_n;
ring_buffer__consume_n;
} LIBBPF_1.4.0;
LIBBPF_1.6.0 {
} LIBBPF_1.5.0;

View File

@ -10,6 +10,7 @@
#define __LIBBPF_LIBBPF_INTERNAL_H
#include <stdlib.h>
#include <byteswap.h>
#include <limits.h>
#include <errno.h>
#include <linux/err.h>
@ -448,11 +449,11 @@ struct btf_ext_info {
*
* The func_info subsection layout:
* record size for struct bpf_func_info in the func_info subsection
* struct btf_sec_func_info for section #1
* struct btf_ext_info_sec for section #1
* a list of bpf_func_info records for section #1
* where struct bpf_func_info mimics one in include/uapi/linux/bpf.h
* but may not be identical
* struct btf_sec_func_info for section #2
* struct btf_ext_info_sec for section #2
* a list of bpf_func_info records for section #2
* ......
*
@ -484,6 +485,8 @@ struct btf_ext {
struct btf_ext_header *hdr;
void *data;
};
void *data_swapped;
bool swapped_endian;
struct btf_ext_info func_info;
struct btf_ext_info line_info;
struct btf_ext_info core_relo_info;
@ -511,6 +514,32 @@ struct bpf_line_info_min {
__u32 line_col;
};
/* Functions to byte-swap info records */
typedef void (*info_rec_bswap_fn)(void *);
static inline void bpf_func_info_bswap(struct bpf_func_info *i)
{
i->insn_off = bswap_32(i->insn_off);
i->type_id = bswap_32(i->type_id);
}
static inline void bpf_line_info_bswap(struct bpf_line_info *i)
{
i->insn_off = bswap_32(i->insn_off);
i->file_name_off = bswap_32(i->file_name_off);
i->line_off = bswap_32(i->line_off);
i->line_col = bswap_32(i->line_col);
}
static inline void bpf_core_relo_bswap(struct bpf_core_relo *i)
{
i->insn_off = bswap_32(i->insn_off);
i->type_id = bswap_32(i->type_id);
i->access_str_off = bswap_32(i->access_str_off);
i->kind = bswap_32(i->kind);
}
enum btf_field_iter_kind {
BTF_FIELD_ITER_IDS,
BTF_FIELD_ITER_STRS,
@ -588,6 +617,16 @@ static inline bool is_ldimm64_insn(struct bpf_insn *insn)
return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
}
static inline void bpf_insn_bswap(struct bpf_insn *insn)
{
__u8 tmp_reg = insn->dst_reg;
insn->dst_reg = insn->src_reg;
insn->src_reg = tmp_reg;
insn->off = bswap_16(insn->off);
insn->imm = bswap_32(insn->imm);
}
/* Unconditionally dup FD, ensuring it doesn't use [0, 2] range.
* Original FD is not closed or altered in any other way.
* Preserves original FD value, if it's invalid (negative).

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 5
#define LIBBPF_MINOR_VERSION 6
#endif /* __LIBBPF_VERSION_H */

View File

@ -20,6 +20,7 @@
#include "btf.h"
#include "libbpf_internal.h"
#include "strset.h"
#include "str_error.h"
#define BTF_EXTERN_SEC ".extern"
@ -135,6 +136,7 @@ struct bpf_linker {
int fd;
Elf *elf;
Elf64_Ehdr *elf_hdr;
bool swapped_endian;
/* Output sections metadata */
struct dst_sec *secs;
@ -305,7 +307,7 @@ static int init_output_elf(struct bpf_linker *linker, const char *file)
linker->fd = open(file, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0644);
if (linker->fd < 0) {
err = -errno;
pr_warn("failed to create '%s': %d\n", file, err);
pr_warn("failed to create '%s': %s\n", file, errstr(err));
return err;
}
@ -324,13 +326,8 @@ static int init_output_elf(struct bpf_linker *linker, const char *file)
linker->elf_hdr->e_machine = EM_BPF;
linker->elf_hdr->e_type = ET_REL;
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
linker->elf_hdr->e_ident[EI_DATA] = ELFDATA2LSB;
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
linker->elf_hdr->e_ident[EI_DATA] = ELFDATA2MSB;
#else
#error "Unknown __BYTE_ORDER__"
#endif
/* Set unknown ELF endianness, assign later from input files */
linker->elf_hdr->e_ident[EI_DATA] = ELFDATANONE;
/* STRTAB */
/* initialize strset with an empty string to conform to ELF */
@ -396,6 +393,8 @@ static int init_output_elf(struct bpf_linker *linker, const char *file)
pr_warn_elf("failed to create SYMTAB data");
return -EINVAL;
}
/* Ensure libelf translates byte-order of symbol records */
sec->data->d_type = ELF_T_SYM;
str_off = strset__add_str(linker->strtab_strs, sec->sec_name);
if (str_off < 0)
@ -539,19 +538,21 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
const struct bpf_linker_file_opts *opts,
struct src_obj *obj)
{
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
const int host_endianness = ELFDATA2LSB;
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
const int host_endianness = ELFDATA2MSB;
#else
#error "Unknown __BYTE_ORDER__"
#endif
int err = 0;
Elf_Scn *scn;
Elf_Data *data;
Elf64_Ehdr *ehdr;
Elf64_Shdr *shdr;
struct src_sec *sec;
unsigned char obj_byteorder;
unsigned char link_byteorder = linker->elf_hdr->e_ident[EI_DATA];
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
const unsigned char host_byteorder = ELFDATA2LSB;
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
const unsigned char host_byteorder = ELFDATA2MSB;
#else
#error "Unknown __BYTE_ORDER__"
#endif
pr_debug("linker: adding object file '%s'...\n", filename);
@ -560,7 +561,7 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
obj->fd = open(filename, O_RDONLY | O_CLOEXEC);
if (obj->fd < 0) {
err = -errno;
pr_warn("failed to open file '%s': %d\n", filename, err);
pr_warn("failed to open file '%s': %s\n", filename, errstr(err));
return err;
}
obj->elf = elf_begin(obj->fd, ELF_C_READ_MMAP, NULL);
@ -577,11 +578,25 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
pr_warn_elf("failed to get ELF header for %s", filename);
return err;
}
if (ehdr->e_ident[EI_DATA] != host_endianness) {
/* Linker output endianness set by first input object */
obj_byteorder = ehdr->e_ident[EI_DATA];
if (obj_byteorder != ELFDATA2LSB && obj_byteorder != ELFDATA2MSB) {
err = -EOPNOTSUPP;
pr_warn_elf("unsupported byte order of ELF file %s", filename);
pr_warn("unknown byte order of ELF file %s\n", filename);
return err;
}
if (link_byteorder == ELFDATANONE) {
linker->elf_hdr->e_ident[EI_DATA] = obj_byteorder;
linker->swapped_endian = obj_byteorder != host_byteorder;
pr_debug("linker: set %s-endian output byte order\n",
obj_byteorder == ELFDATA2MSB ? "big" : "little");
} else if (link_byteorder != obj_byteorder) {
err = -EOPNOTSUPP;
pr_warn("byte order mismatch with ELF file %s\n", filename);
return err;
}
if (ehdr->e_type != ET_REL
|| ehdr->e_machine != EM_BPF
|| ehdr->e_ident[EI_CLASS] != ELFCLASS64) {
@ -656,7 +671,8 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
obj->btf = btf__new(data->d_buf, shdr->sh_size);
err = libbpf_get_error(obj->btf);
if (err) {
pr_warn("failed to parse .BTF from %s: %d\n", filename, err);
pr_warn("failed to parse .BTF from %s: %s\n",
filename, errstr(err));
return err;
}
sec->skipped = true;
@ -666,7 +682,8 @@ static int linker_load_obj_file(struct bpf_linker *linker, const char *filename,
obj->btf_ext = btf_ext__new(data->d_buf, shdr->sh_size);
err = libbpf_get_error(obj->btf_ext);
if (err) {
pr_warn("failed to parse .BTF.ext from '%s': %d\n", filename, err);
pr_warn("failed to parse .BTF.ext from '%s': %s\n",
filename, errstr(err));
return err;
}
sec->skipped = true;
@ -1109,6 +1126,24 @@ static bool sec_content_is_same(struct dst_sec *dst_sec, struct src_sec *src_sec
return true;
}
static bool is_exec_sec(struct dst_sec *sec)
{
if (!sec || sec->ephemeral)
return false;
return (sec->shdr->sh_type == SHT_PROGBITS) &&
(sec->shdr->sh_flags & SHF_EXECINSTR);
}
static void exec_sec_bswap(void *raw_data, int size)
{
const int insn_cnt = size / sizeof(struct bpf_insn);
struct bpf_insn *insn = raw_data;
int i;
for (i = 0; i < insn_cnt; i++, insn++)
bpf_insn_bswap(insn);
}
static int extend_sec(struct bpf_linker *linker, struct dst_sec *dst, struct src_sec *src)
{
void *tmp;
@ -1168,6 +1203,10 @@ static int extend_sec(struct bpf_linker *linker, struct dst_sec *dst, struct src
memset(dst->raw_data + dst->sec_sz, 0, dst_align_sz - dst->sec_sz);
/* now copy src data at a properly aligned offset */
memcpy(dst->raw_data + dst_align_sz, src->data->d_buf, src->shdr->sh_size);
/* convert added bpf insns to native byte-order */
if (linker->swapped_endian && is_exec_sec(dst))
exec_sec_bswap(dst->raw_data + dst_align_sz, src->shdr->sh_size);
}
dst->sec_sz = dst_final_sz;
@ -2415,6 +2454,10 @@ static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj)
if (glob_sym && glob_sym->var_idx >= 0) {
__s64 sz;
/* FUNCs don't have size, nothing to update */
if (btf_is_func(t))
continue;
dst_var = &dst_sec->sec_vars[glob_sym->var_idx];
/* Because underlying BTF type might have
* changed, so might its size have changed, so
@ -2628,6 +2671,10 @@ int bpf_linker__finalize(struct bpf_linker *linker)
if (!sec->scn)
continue;
/* restore sections with bpf insns to target byte-order */
if (linker->swapped_endian && is_exec_sec(sec))
exec_sec_bswap(sec->raw_data, sec->sec_sz);
sec->data->d_buf = sec->raw_data;
}
@ -2696,6 +2743,7 @@ static int emit_elf_data_sec(struct bpf_linker *linker, const char *sec_name,
static int finalize_btf(struct bpf_linker *linker)
{
enum btf_endianness link_endianness;
LIBBPF_OPTS(btf_dedup_opts, opts);
struct btf *btf = linker->btf;
const void *raw_data;
@ -2729,17 +2777,24 @@ static int finalize_btf(struct bpf_linker *linker)
err = finalize_btf_ext(linker);
if (err) {
pr_warn(".BTF.ext generation failed: %d\n", err);
pr_warn(".BTF.ext generation failed: %s\n", errstr(err));
return err;
}
opts.btf_ext = linker->btf_ext;
err = btf__dedup(linker->btf, &opts);
if (err) {
pr_warn("BTF dedup failed: %d\n", err);
pr_warn("BTF dedup failed: %s\n", errstr(err));
return err;
}
/* Set .BTF and .BTF.ext output byte order */
link_endianness = linker->elf_hdr->e_ident[EI_DATA] == ELFDATA2MSB ?
BTF_BIG_ENDIAN : BTF_LITTLE_ENDIAN;
btf__set_endianness(linker->btf, link_endianness);
if (linker->btf_ext)
btf_ext__set_endianness(linker->btf_ext, link_endianness);
/* Emit .BTF section */
raw_data = btf__raw_data(linker->btf, &raw_sz);
if (!raw_data)
@ -2747,7 +2802,7 @@ static int finalize_btf(struct bpf_linker *linker)
err = emit_elf_data_sec(linker, BTF_ELF_SEC, 8, raw_data, raw_sz);
if (err) {
pr_warn("failed to write out .BTF ELF section: %d\n", err);
pr_warn("failed to write out .BTF ELF section: %s\n", errstr(err));
return err;
}
@ -2759,7 +2814,7 @@ static int finalize_btf(struct bpf_linker *linker)
err = emit_elf_data_sec(linker, BTF_EXT_ELF_SEC, 8, raw_data, raw_sz);
if (err) {
pr_warn("failed to write out .BTF.ext ELF section: %d\n", err);
pr_warn("failed to write out .BTF.ext ELF section: %s\n", errstr(err));
return err;
}
}
@ -2935,7 +2990,7 @@ static int finalize_btf_ext(struct bpf_linker *linker)
err = libbpf_get_error(linker->btf_ext);
if (err) {
linker->btf_ext = NULL;
pr_warn("failed to parse final .BTF.ext data: %d\n", err);
pr_warn("failed to parse final .BTF.ext data: %s\n", errstr(err));
goto out;
}

View File

@ -1339,7 +1339,7 @@ int bpf_core_calc_relo_insn(const char *prog_name,
cands->cands[i].id, cand_spec);
if (err < 0) {
bpf_core_format_spec(spec_buf, sizeof(spec_buf), cand_spec);
pr_warn("prog '%s': relo #%d: error matching candidate #%d %s: %d\n ",
pr_warn("prog '%s': relo #%d: error matching candidate #%d %s: %d\n",
prog_name, relo_idx, i, spec_buf, err);
return err;
}

View File

@ -21,6 +21,7 @@
#include "libbpf.h"
#include "libbpf_internal.h"
#include "bpf.h"
#include "str_error.h"
struct ring {
ring_buffer_sample_fn sample_cb;
@ -88,8 +89,8 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
err = bpf_map_get_info_by_fd(map_fd, &info, &len);
if (err) {
err = -errno;
pr_warn("ringbuf: failed to get map info for fd=%d: %d\n",
map_fd, err);
pr_warn("ringbuf: failed to get map info for fd=%d: %s\n",
map_fd, errstr(err));
return libbpf_err(err);
}
@ -123,8 +124,8 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
tmp = mmap(NULL, rb->page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
map_fd, err);
pr_warn("ringbuf: failed to mmap consumer page for map fd=%d: %s\n",
map_fd, errstr(err));
goto err_out;
}
r->consumer_pos = tmp;
@ -142,8 +143,8 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
tmp = mmap(NULL, (size_t)mmap_sz, PROT_READ, MAP_SHARED, map_fd, rb->page_size);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("ringbuf: failed to mmap data pages for map fd=%d: %d\n",
map_fd, err);
pr_warn("ringbuf: failed to mmap data pages for map fd=%d: %s\n",
map_fd, errstr(err));
goto err_out;
}
r->producer_pos = tmp;
@ -156,8 +157,8 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
e->data.fd = rb->ring_cnt;
if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, e) < 0) {
err = -errno;
pr_warn("ringbuf: failed to epoll add map fd=%d: %d\n",
map_fd, err);
pr_warn("ringbuf: failed to epoll add map fd=%d: %s\n",
map_fd, errstr(err));
goto err_out;
}
@ -205,7 +206,7 @@ ring_buffer__new(int map_fd, ring_buffer_sample_fn sample_cb, void *ctx,
rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (rb->epoll_fd < 0) {
err = -errno;
pr_warn("ringbuf: failed to create epoll instance: %d\n", err);
pr_warn("ringbuf: failed to create epoll instance: %s\n", errstr(err));
goto err_out;
}
@ -458,7 +459,8 @@ static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
err = bpf_map_get_info_by_fd(map_fd, &info, &len);
if (err) {
err = -errno;
pr_warn("user ringbuf: failed to get map info for fd=%d: %d\n", map_fd, err);
pr_warn("user ringbuf: failed to get map info for fd=%d: %s\n",
map_fd, errstr(err));
return err;
}
@ -474,8 +476,8 @@ static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
tmp = mmap(NULL, rb->page_size, PROT_READ, MAP_SHARED, map_fd, 0);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("user ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
map_fd, err);
pr_warn("user ringbuf: failed to mmap consumer page for map fd=%d: %s\n",
map_fd, errstr(err));
return err;
}
rb->consumer_pos = tmp;
@ -494,8 +496,8 @@ static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
map_fd, rb->page_size);
if (tmp == MAP_FAILED) {
err = -errno;
pr_warn("user ringbuf: failed to mmap data pages for map fd=%d: %d\n",
map_fd, err);
pr_warn("user ringbuf: failed to mmap data pages for map fd=%d: %s\n",
map_fd, errstr(err));
return err;
}
@ -506,7 +508,7 @@ static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
rb_epoll->events = EPOLLOUT;
if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, rb_epoll) < 0) {
err = -errno;
pr_warn("user ringbuf: failed to epoll add map fd=%d: %d\n", map_fd, err);
pr_warn("user ringbuf: failed to epoll add map fd=%d: %s\n", map_fd, errstr(err));
return err;
}
@ -531,7 +533,7 @@ user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts)
rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (rb->epoll_fd < 0) {
err = -errno;
pr_warn("user ringbuf: failed to create epoll instance: %d\n", err);
pr_warn("user ringbuf: failed to create epoll instance: %s\n", errstr(err));
goto err_out;
}

View File

@ -351,10 +351,11 @@ static inline int bpf_load_and_run(struct bpf_load_and_run_opts *opts)
attr.test.ctx_size_in = opts->ctx->sz;
err = skel_sys_bpf(BPF_PROG_RUN, &attr, test_run_attr_sz);
if (err < 0 || (int)attr.test.retval < 0) {
opts->errstr = "failed to execute loader prog";
if (err < 0) {
opts->errstr = "failed to execute loader prog";
set_err;
} else {
opts->errstr = "error returned by loader prog";
err = (int)attr.test.retval;
#ifndef __KERNEL__
errno = -err;

View File

@ -5,6 +5,10 @@
#include <errno.h>
#include "str_error.h"
#ifndef ENOTSUPP
#define ENOTSUPP 524
#endif
/* make sure libbpf doesn't use kernel-only integer typedefs */
#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
@ -31,3 +35,70 @@ char *libbpf_strerror_r(int err, char *dst, int len)
}
return dst;
}
const char *errstr(int err)
{
static __thread char buf[12];
if (err > 0)
err = -err;
switch (err) {
case -E2BIG: return "-E2BIG";
case -EACCES: return "-EACCES";
case -EADDRINUSE: return "-EADDRINUSE";
case -EADDRNOTAVAIL: return "-EADDRNOTAVAIL";
case -EAGAIN: return "-EAGAIN";
case -EALREADY: return "-EALREADY";
case -EBADF: return "-EBADF";
case -EBADFD: return "-EBADFD";
case -EBUSY: return "-EBUSY";
case -ECANCELED: return "-ECANCELED";
case -ECHILD: return "-ECHILD";
case -EDEADLK: return "-EDEADLK";
case -EDOM: return "-EDOM";
case -EEXIST: return "-EEXIST";
case -EFAULT: return "-EFAULT";
case -EFBIG: return "-EFBIG";
case -EILSEQ: return "-EILSEQ";
case -EINPROGRESS: return "-EINPROGRESS";
case -EINTR: return "-EINTR";
case -EINVAL: return "-EINVAL";
case -EIO: return "-EIO";
case -EISDIR: return "-EISDIR";
case -ELOOP: return "-ELOOP";
case -EMFILE: return "-EMFILE";
case -EMLINK: return "-EMLINK";
case -EMSGSIZE: return "-EMSGSIZE";
case -ENAMETOOLONG: return "-ENAMETOOLONG";
case -ENFILE: return "-ENFILE";
case -ENODATA: return "-ENODATA";
case -ENODEV: return "-ENODEV";
case -ENOENT: return "-ENOENT";
case -ENOEXEC: return "-ENOEXEC";
case -ENOLINK: return "-ENOLINK";
case -ENOMEM: return "-ENOMEM";
case -ENOSPC: return "-ENOSPC";
case -ENOTBLK: return "-ENOTBLK";
case -ENOTDIR: return "-ENOTDIR";
case -ENOTSUPP: return "-ENOTSUPP";
case -ENOTTY: return "-ENOTTY";
case -ENXIO: return "-ENXIO";
case -EOPNOTSUPP: return "-EOPNOTSUPP";
case -EOVERFLOW: return "-EOVERFLOW";
case -EPERM: return "-EPERM";
case -EPIPE: return "-EPIPE";
case -EPROTO: return "-EPROTO";
case -EPROTONOSUPPORT: return "-EPROTONOSUPPORT";
case -ERANGE: return "-ERANGE";
case -EROFS: return "-EROFS";
case -ESPIPE: return "-ESPIPE";
case -ESRCH: return "-ESRCH";
case -ETXTBSY: return "-ETXTBSY";
case -EUCLEAN: return "-EUCLEAN";
case -EXDEV: return "-EXDEV";
default:
snprintf(buf, sizeof(buf), "%d", err);
return buf;
}
}

View File

@ -6,4 +6,11 @@
char *libbpf_strerror_r(int err, char *dst, int len);
/**
* @brief **errstr()** returns string corresponding to numeric errno
* @param err negative numeric errno
* @return pointer to string representation of the errno, that is invalidated
* upon the next call.
*/
const char *errstr(int err);
#endif /* __LIBBPF_STR_ERROR_H */

View File

@ -20,6 +20,7 @@
#include "libbpf_common.h"
#include "libbpf_internal.h"
#include "hashmap.h"
#include "str_error.h"
/* libbpf's USDT support consists of BPF-side state/code and user-space
* state/code working together in concert. BPF-side parts are defined in
@ -465,8 +466,8 @@ static int parse_vma_segs(int pid, const char *lib_path, struct elf_seg **segs,
goto proceed;
if (!realpath(lib_path, path)) {
pr_warn("usdt: failed to get absolute path of '%s' (err %d), using path as is...\n",
lib_path, -errno);
pr_warn("usdt: failed to get absolute path of '%s' (err %s), using path as is...\n",
lib_path, errstr(-errno));
libbpf_strlcpy(path, lib_path, sizeof(path));
}
@ -475,8 +476,8 @@ proceed:
f = fopen(line, "re");
if (!f) {
err = -errno;
pr_warn("usdt: failed to open '%s' to get base addr of '%s': %d\n",
line, lib_path, err);
pr_warn("usdt: failed to open '%s' to get base addr of '%s': %s\n",
line, lib_path, errstr(err));
return err;
}
@ -606,7 +607,8 @@ static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *
err = parse_elf_segs(elf, path, &segs, &seg_cnt);
if (err) {
pr_warn("usdt: failed to process ELF program segments for '%s': %d\n", path, err);
pr_warn("usdt: failed to process ELF program segments for '%s': %s\n",
path, errstr(err));
goto err_out;
}
@ -708,8 +710,8 @@ static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *
if (vma_seg_cnt == 0) {
err = parse_vma_segs(pid, path, &vma_segs, &vma_seg_cnt);
if (err) {
pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %d\n",
pid, path, err);
pr_warn("usdt: failed to get memory segments in PID %d for shared library '%s': %s\n",
pid, path, errstr(err));
goto err_out;
}
}
@ -1047,8 +1049,8 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
if (is_new && bpf_map_update_elem(spec_map_fd, &spec_id, &target->spec, BPF_ANY)) {
err = -errno;
pr_warn("usdt: failed to set USDT spec #%d for '%s:%s' in '%s': %d\n",
spec_id, usdt_provider, usdt_name, path, err);
pr_warn("usdt: failed to set USDT spec #%d for '%s:%s' in '%s': %s\n",
spec_id, usdt_provider, usdt_name, path, errstr(err));
goto err_out;
}
if (!man->has_bpf_cookie &&
@ -1058,9 +1060,9 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
pr_warn("usdt: IP collision detected for spec #%d for '%s:%s' in '%s'\n",
spec_id, usdt_provider, usdt_name, path);
} else {
pr_warn("usdt: failed to map IP 0x%lx to spec #%d for '%s:%s' in '%s': %d\n",
pr_warn("usdt: failed to map IP 0x%lx to spec #%d for '%s:%s' in '%s': %s\n",
target->abs_ip, spec_id, usdt_provider, usdt_name,
path, err);
path, errstr(err));
}
goto err_out;
}
@ -1076,8 +1078,8 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
target->rel_ip, &opts);
err = libbpf_get_error(uprobe_link);
if (err) {
pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %d\n",
i, usdt_provider, usdt_name, path, err);
pr_warn("usdt: failed to attach uprobe #%d for '%s:%s' in '%s': %s\n",
i, usdt_provider, usdt_name, path, errstr(err));
goto err_out;
}
@ -1099,8 +1101,8 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
NULL, &opts_multi);
if (!link->multi_link) {
err = -errno;
pr_warn("usdt: failed to attach uprobe multi for '%s:%s' in '%s': %d\n",
usdt_provider, usdt_name, path, err);
pr_warn("usdt: failed to attach uprobe multi for '%s:%s' in '%s': %s\n",
usdt_provider, usdt_name, path, errstr(err));
goto err_out;
}

View File

@ -223,7 +223,7 @@ struct zip_archive *zip_archive_open(const char *path)
if (!archive) {
munmap(data, size);
return ERR_PTR(-ENOMEM);
};
}
archive->data = data;
archive->size = size;

View File

@ -807,7 +807,7 @@ static int option__cmp(const void *va, const void *vb)
static struct option *options__order(const struct option *opts)
{
int nr_opts = 0, nr_group = 0, nr_parent = 0, len;
const struct option *o, *p = opts;
const struct option *o = NULL, *p = opts;
struct option *opt, *ordered = NULL, *group;
/* flatten the options that have parents */

View File

@ -16,7 +16,6 @@ fixdep
/test_progs-cpuv4
test_verifier_log
feature
test_sock
urandom_read
test_sockmap
test_lirc_mode2_user

View File

@ -10,6 +10,7 @@ TOOLSDIR := $(abspath ../../..)
LIBDIR := $(TOOLSDIR)/lib
BPFDIR := $(LIBDIR)/bpf
TOOLSINCDIR := $(TOOLSDIR)/include
TOOLSARCHINCDIR := $(TOOLSDIR)/arch/$(SRCARCH)/include
BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
APIDIR := $(TOOLSINCDIR)/uapi
ifneq ($(O),)
@ -44,7 +45,7 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic \
-Wall -Werror -fno-omit-frame-pointer \
$(GENFLAGS) $(SAN_CFLAGS) $(LIBELF_CFLAGS) \
-I$(CURDIR) -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR) \
-I$(TOOLSINCDIR) -I$(APIDIR) -I$(OUTPUT)
-I$(TOOLSINCDIR) -I$(TOOLSARCHINCDIR) -I$(APIDIR) -I$(OUTPUT)
LDFLAGS += $(SAN_LDFLAGS)
LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread
@ -83,7 +84,7 @@ endif
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_sock test_sockmap \
test_sockmap \
test_tcpnotify_user test_sysctl \
test_progs-no_alu32
TEST_INST_SUBDIRS := no_alu32
@ -132,7 +133,6 @@ TEST_PROGS := test_kmod.sh \
test_tunnel.sh \
test_lwt_seg6local.sh \
test_lirc_mode2.sh \
test_skb_cgroup_id.sh \
test_flow_dissector.sh \
test_xdp_vlan_mode_generic.sh \
test_xdp_vlan_mode_native.sh \
@ -274,6 +274,7 @@ $(OUTPUT)/liburandom_read.so: urandom_read_lib1.c urandom_read_lib2.c liburandom
$(Q)$(CLANG) $(CLANG_TARGET_ARCH) \
$(filter-out -static,$(CFLAGS) $(LDFLAGS)) \
$(filter %.c,$^) $(filter-out -static,$(LDLIBS)) \
-Wno-unused-command-line-argument \
-fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
-Wl,--version-script=liburandom_read.map \
-fPIC -shared -o $@
@ -282,6 +283,7 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
$(call msg,BINARY,,$@)
$(Q)$(CLANG) $(CLANG_TARGET_ARCH) \
$(filter-out -static,$(CFLAGS) $(LDFLAGS)) $(filter %.c,$^) \
-Wno-unused-command-line-argument \
-lurandom_read $(filter-out -static,$(LDLIBS)) -L$(OUTPUT) \
-fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
-Wl,-rpath=. -o $@
@ -295,25 +297,33 @@ $(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
$(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(RESOLVE_BTFIDS) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_testmod/bpf_testmod.ko # force re-compilation
$(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_testmod
$(Q)$(MAKE) $(submake_extras) -C bpf_testmod \
RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) \
EXTRA_CFLAGS='' EXTRA_LDFLAGS=''
$(Q)cp bpf_testmod/bpf_testmod.ko $@
$(OUTPUT)/bpf_test_no_cfi.ko: $(VMLINUX_BTF) $(RESOLVE_BTFIDS) $(wildcard bpf_test_no_cfi/Makefile bpf_test_no_cfi/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_test_no_cfi/bpf_test_no_cfi.ko # force re-compilation
$(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_test_no_cfi
$(Q)$(MAKE) $(submake_extras) -C bpf_test_no_cfi \
RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) \
EXTRA_CFLAGS='' EXTRA_LDFLAGS=''
$(Q)cp bpf_test_no_cfi/bpf_test_no_cfi.ko $@
$(OUTPUT)/bpf_test_modorder_x.ko: $(VMLINUX_BTF) $(RESOLVE_BTFIDS) $(wildcard bpf_test_modorder_x/Makefile bpf_test_modorder_x/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_test_modorder_x/bpf_test_modorder_x.ko # force re-compilation
$(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_test_modorder_x
$(Q)$(MAKE) $(submake_extras) -C bpf_test_modorder_x \
RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) \
EXTRA_CFLAGS='' EXTRA_LDFLAGS=''
$(Q)cp bpf_test_modorder_x/bpf_test_modorder_x.ko $@
$(OUTPUT)/bpf_test_modorder_y.ko: $(VMLINUX_BTF) $(RESOLVE_BTFIDS) $(wildcard bpf_test_modorder_y/Makefile bpf_test_modorder_y/*.[ch])
$(call msg,MOD,,$@)
$(Q)$(RM) bpf_test_modorder_y/bpf_test_modorder_y.ko # force re-compilation
$(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_test_modorder_y
$(Q)$(MAKE) $(submake_extras) -C bpf_test_modorder_y \
RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) \
EXTRA_CFLAGS='' EXTRA_LDFLAGS=''
$(Q)cp bpf_test_modorder_y/bpf_test_modorder_y.ko $@
@ -333,8 +343,8 @@ $(OUTPUT)/runqslower: $(BPFOBJ) | $(DEFAULT_BPFTOOL) $(RUNQSLOWER_OUTPUT)
BPFTOOL_OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \
BPFOBJ_OUTPUT=$(BUILD_DIR)/libbpf/ \
BPFOBJ=$(BPFOBJ) BPF_INCLUDE=$(INCLUDE_DIR) \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS)' \
EXTRA_LDFLAGS='$(SAN_LDFLAGS)' && \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS) $(EXTRA_CFLAGS)' \
EXTRA_LDFLAGS='$(SAN_LDFLAGS) $(EXTRA_LDFLAGS)' && \
cp $(RUNQSLOWER_OUTPUT)runqslower $@
TEST_GEN_PROGS_EXTENDED += $(TRUNNER_BPFTOOL)
@ -349,7 +359,6 @@ JSON_WRITER := $(OUTPUT)/json_writer.o
CAP_HELPERS := $(OUTPUT)/cap_helpers.o
NETWORK_HELPERS := $(OUTPUT)/network_helpers.o
$(OUTPUT)/test_sock: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sockmap: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELPERS)
$(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS)
@ -368,7 +377,8 @@ $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \
$(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/bpftool
$(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \
ARCH= CROSS_COMPILE= CC="$(HOSTCC)" LD="$(HOSTLD)" \
EXTRA_CFLAGS='-g $(OPT_FLAGS)' \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(EXTRA_CFLAGS)' \
EXTRA_LDFLAGS='$(EXTRA_LDFLAGS)' \
OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \
LIBBPF_OUTPUT=$(HOST_BUILD_DIR)/libbpf/ \
LIBBPF_DESTDIR=$(HOST_SCRATCH_DIR)/ \
@ -379,7 +389,8 @@ $(CROSS_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \
$(BPFOBJ) | $(BUILD_DIR)/bpftool
$(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \
ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) \
EXTRA_CFLAGS='-g $(OPT_FLAGS)' \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(EXTRA_CFLAGS)' \
EXTRA_LDFLAGS='$(EXTRA_LDFLAGS)' \
OUTPUT=$(BUILD_DIR)/bpftool/ \
LIBBPF_OUTPUT=$(BUILD_DIR)/libbpf/ \
LIBBPF_DESTDIR=$(SCRATCH_DIR)/ \
@ -402,8 +413,8 @@ $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \
$(APIDIR)/linux/bpf.h \
| $(BUILD_DIR)/libbpf
$(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(BUILD_DIR)/libbpf/ \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS)' \
EXTRA_LDFLAGS='$(SAN_LDFLAGS)' \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS) $(EXTRA_CFLAGS)' \
EXTRA_LDFLAGS='$(SAN_LDFLAGS) $(EXTRA_LDFLAGS)' \
DESTDIR=$(SCRATCH_DIR) prefix= all install_headers
ifneq ($(BPFOBJ),$(HOST_BPFOBJ))
@ -411,7 +422,9 @@ $(HOST_BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \
$(APIDIR)/linux/bpf.h \
| $(HOST_BUILD_DIR)/libbpf
$(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) \
EXTRA_CFLAGS='-g $(OPT_FLAGS)' ARCH= CROSS_COMPILE= \
ARCH= CROSS_COMPILE= \
EXTRA_CFLAGS='-g $(OPT_FLAGS) $(EXTRA_CFLAGS)' \
EXTRA_LDFLAGS='$(EXTRA_LDFLAGS)' \
OUTPUT=$(HOST_BUILD_DIR)/libbpf/ \
CC="$(HOSTCC)" LD="$(HOSTLD)" \
DESTDIR=$(HOST_SCRATCH_DIR)/ prefix= all install_headers
@ -460,6 +473,7 @@ endef
IS_LITTLE_ENDIAN = $(shell $(CC) -dM -E - </dev/null | \
grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__')
MENDIAN=$(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian)
BPF_TARGET_ENDIAN=$(if $(IS_LITTLE_ENDIAN),--target=bpfel,--target=bpfeb)
ifneq ($(CROSS_COMPILE),)
CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%))
@ -487,17 +501,17 @@ $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
# $4 - binary name
define CLANG_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$4,$2)
$(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v3 -o $2
$(Q)$(CLANG) $3 -O2 $(BPF_TARGET_ENDIAN) -c $1 -mcpu=v3 -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
define CLANG_NOALU32_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$4,$2)
$(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
$(Q)$(CLANG) $3 -O2 $(BPF_TARGET_ENDIAN) -c $1 -mcpu=v2 -o $2
endef
# Similar to CLANG_BPF_BUILD_RULE, but with cpu-v4
define CLANG_CPUV4_BPF_BUILD_RULE
$(call msg,CLNG-BPF,$4,$2)
$(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v4 -o $2
$(Q)$(CLANG) $3 -O2 $(BPF_TARGET_ENDIAN) -c $1 -mcpu=v4 -o $2
endef
# Build BPF object using GCC
define GCC_BPF_BUILD_RULE
@ -637,10 +651,11 @@ $(TRUNNER_BPF_SKELS_LINKED): $(TRUNNER_OUTPUT)/%: $$$$(%-deps) $(BPFTOOL) | $(TR
# When the compiler generates a %.d file, only skel basenames (not
# full paths) are specified as prerequisites for corresponding %.o
# file. This target makes %.skel.h basename dependent on full paths,
# linking generated %.d dependency with actual %.skel.h files.
$(notdir %.skel.h): $(TRUNNER_OUTPUT)/%.skel.h
@true
# file. vpath directives below instruct make to search for skel files
# in TRUNNER_OUTPUT, if they are not present in the working directory.
vpath %.skel.h $(TRUNNER_OUTPUT)
vpath %.lskel.h $(TRUNNER_OUTPUT)
vpath %.subskel.h $(TRUNNER_OUTPUT)
endif
@ -727,6 +742,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c \
unpriv_helpers.c \
netlink_helpers.c \
jit_disasm_helpers.c \
io_helpers.c \
test_loader.c \
xsk.c \
disasm.c \

View File

@ -4,6 +4,7 @@
#include <argp.h>
#include <unistd.h>
#include <stdint.h>
#include "bpf_util.h"
#include "bench.h"
#include "trigger_bench.skel.h"
#include "trace_helpers.h"
@ -72,7 +73,7 @@ static __always_inline void inc_counter(struct counter *counters)
unsigned slot;
if (unlikely(tid == 0))
tid = syscall(SYS_gettid);
tid = sys_gettid();
/* multiplicative hashing, it's fast */
slot = 2654435769U * tid;

View File

@ -582,4 +582,10 @@ extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
unsigned int flags__k, void *aux__ign) __ksym;
#define bpf_wq_set_callback(timer, cb, flags) \
bpf_wq_set_callback_impl(timer, cb, flags, NULL)
struct bpf_iter_kmem_cache;
extern int bpf_iter_kmem_cache_new(struct bpf_iter_kmem_cache *it) __weak __ksym;
extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym;
extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym;
#endif

View File

@ -40,6 +40,14 @@ DECLARE_TRACE(bpf_testmod_test_nullable_bare,
TP_ARGS(ctx__nullable)
);
struct sk_buff;
DECLARE_TRACE(bpf_testmod_test_raw_tp_null,
TP_PROTO(struct sk_buff *skb),
TP_ARGS(skb)
);
#undef BPF_TESTMOD_DECLARE_TRACE
#ifdef DECLARE_TRACE_WRITABLE
#define BPF_TESTMOD_DECLARE_TRACE(call, proto, args, size) \

View File

@ -245,6 +245,39 @@ __bpf_kfunc void bpf_testmod_ctx_release(struct bpf_testmod_ctx *ctx)
call_rcu(&ctx->rcu, testmod_free_cb);
}
static struct bpf_testmod_ops3 *st_ops3;
static int bpf_testmod_test_3(void)
{
return 0;
}
static int bpf_testmod_test_4(void)
{
return 0;
}
static struct bpf_testmod_ops3 __bpf_testmod_ops3 = {
.test_1 = bpf_testmod_test_3,
.test_2 = bpf_testmod_test_4,
};
static void bpf_testmod_test_struct_ops3(void)
{
if (st_ops3)
st_ops3->test_1();
}
__bpf_kfunc void bpf_testmod_ops3_call_test_1(void)
{
st_ops3->test_1();
}
__bpf_kfunc void bpf_testmod_ops3_call_test_2(void)
{
st_ops3->test_2();
}
struct bpf_testmod_btf_type_tag_1 {
int a;
};
@ -380,6 +413,10 @@ bpf_testmod_test_read(struct file *file, struct kobject *kobj,
(void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2);
(void)trace_bpf_testmod_test_raw_tp_null(NULL);
bpf_testmod_test_struct_ops3();
struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) +
sizeof(int)), GFP_KERNEL);
if (struct_arg3 != NULL) {
@ -584,6 +621,8 @@ BTF_ID_FLAGS(func, bpf_kfunc_trusted_num_test, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_kfunc_rcu_task_test, KF_RCU)
BTF_ID_FLAGS(func, bpf_testmod_ctx_create, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_testmod_ctx_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_testmod_ops3_call_test_1)
BTF_ID_FLAGS(func, bpf_testmod_ops3_call_test_2)
BTF_KFUNCS_END(bpf_testmod_common_kfunc_ids)
BTF_ID_LIST(bpf_testmod_dtor_ids)
@ -1094,6 +1133,10 @@ static const struct bpf_verifier_ops bpf_testmod_verifier_ops = {
.is_valid_access = bpf_testmod_ops_is_valid_access,
};
static const struct bpf_verifier_ops bpf_testmod_verifier_ops3 = {
.is_valid_access = bpf_testmod_ops_is_valid_access,
};
static int bpf_dummy_reg(void *kdata, struct bpf_link *link)
{
struct bpf_testmod_ops *ops = kdata;
@ -1173,6 +1216,68 @@ struct bpf_struct_ops bpf_testmod_ops2 = {
.owner = THIS_MODULE,
};
static int st_ops3_reg(void *kdata, struct bpf_link *link)
{
int err = 0;
mutex_lock(&st_ops_mutex);
if (st_ops3) {
pr_err("st_ops has already been registered\n");
err = -EEXIST;
goto unlock;
}
st_ops3 = kdata;
unlock:
mutex_unlock(&st_ops_mutex);
return err;
}
static void st_ops3_unreg(void *kdata, struct bpf_link *link)
{
mutex_lock(&st_ops_mutex);
st_ops3 = NULL;
mutex_unlock(&st_ops_mutex);
}
static void test_1_recursion_detected(struct bpf_prog *prog)
{
struct bpf_prog_stats *stats;
stats = this_cpu_ptr(prog->stats);
printk("bpf_testmod: oh no, recursing into test_1, recursion_misses %llu",
u64_stats_read(&stats->misses));
}
static int st_ops3_check_member(const struct btf_type *t,
const struct btf_member *member,
const struct bpf_prog *prog)
{
u32 moff = __btf_member_bit_offset(t, member) / 8;
switch (moff) {
case offsetof(struct bpf_testmod_ops3, test_1):
prog->aux->priv_stack_requested = true;
prog->aux->recursion_detected = test_1_recursion_detected;
fallthrough;
default:
break;
}
return 0;
}
struct bpf_struct_ops bpf_testmod_ops3 = {
.verifier_ops = &bpf_testmod_verifier_ops3,
.init = bpf_testmod_ops_init,
.init_member = bpf_testmod_ops_init_member,
.reg = st_ops3_reg,
.unreg = st_ops3_unreg,
.check_member = st_ops3_check_member,
.cfi_stubs = &__bpf_testmod_ops3,
.name = "bpf_testmod_ops3",
.owner = THIS_MODULE,
};
static int bpf_test_mod_st_ops__test_prologue(struct st_ops_args *args)
{
return 0;
@ -1331,6 +1436,7 @@ static int bpf_testmod_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_testmod_kfunc_set);
ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops);
ret = ret ?: register_bpf_struct_ops(&bpf_testmod_ops2, bpf_testmod_ops2);
ret = ret ?: register_bpf_struct_ops(&bpf_testmod_ops3, bpf_testmod_ops3);
ret = ret ?: register_bpf_struct_ops(&testmod_st_ops, bpf_testmod_st_ops);
ret = ret ?: register_btf_id_dtor_kfuncs(bpf_testmod_dtors,
ARRAY_SIZE(bpf_testmod_dtors),

View File

@ -94,6 +94,11 @@ struct bpf_testmod_ops2 {
int (*test_1)(void);
};
struct bpf_testmod_ops3 {
int (*test_1)(void);
int (*test_2)(void);
};
struct st_ops_args {
u64 a;
};

Some files were not shown because too many files have changed in this diff Show More