bpf-next-for-netdev

-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZi9+AAAKCRDbK58LschI
 g0nEAP487m7L0nLVriC2oIOWsi29tklW3etm6DO7gmGRGIHgrgEAnMyV1xBj3bGj
 v6jJwDcybCym1hLx+1x1JCZ4eoAFswE=
 =xbna
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-04-29

We've added 147 non-merge commits during the last 32 day(s) which contain
a total of 158 files changed, 9400 insertions(+), 2213 deletions(-).

The main changes are:

1) Add an internal-only BPF per-CPU instruction for resolving per-CPU
   memory addresses and implement support in x86 BPF JIT. This allows
   inlining per-CPU array and hashmap lookups
   and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko.

2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song.

3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
   atomics in bpf_arena which can be JITed as a single x86 instruction,
   from Alexei Starovoitov.

4) Add support for passing mark with bpf_fib_lookup helper,
   from Anton Protopopov.

5) Add a new bpf_wq API for deferring events and refactor sleepable
   bpf_timer code to keep common code where possible,
   from Benjamin Tissoires.

6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs
   to check when NULL is passed for non-NULLable parameters,
   from Eduard Zingerman.

7) Harden the BPF verifier's and/or/xor value tracking,
   from Harishankar Vishwanathan.

8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel
   crypto subsystem, from Vadim Fedorenko.

9) Various improvements to the BPF instruction set standardization doc,
   from Dave Thaler.

10) Extend libbpf APIs to partially consume items from the BPF ringbuffer,
    from Andrea Righi.

11) Bigger batch of BPF selftests refactoring to use common network helpers
    and to drop duplicate code, from Geliang Tang.

12) Support bpf_tail_call_static() helper for BPF programs with GCC 13,
    from Jose E. Marchesi.

13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
    program to have code sections where preemption is disabled,
    from Kumar Kartikeya Dwivedi.

14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs,
    from David Vernet.

15) Extend the BPF verifier to allow different input maps for a given
    bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu.

16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions
    for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan.

17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison
    the stack memory, from Martin KaFai Lau.

18) Improve xsk selftest coverage with new tests on maximum and minimum
    hardware ring size configurations, from Tushar Vyavahare.

19) Various ReST man pages fixes as well as documentation and bash completion
    improvements for bpftool, from Rameez Rehman & Quentin Monnet.

20) Fix libbpf with regards to dumping subsequent char arrays,
    from Quentin Deslandes.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits)
  bpf, docs: Clarify PC use in instruction-set.rst
  bpf_helpers.h: Define bpf_tail_call_static when building with GCC
  bpf, docs: Add introduction for use in the ISA Internet Draft
  selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us
  bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args
  selftests/bpf: dummy_st_ops should reject 0 for non-nullable params
  bpf: check bpf_dummy_struct_ops program params for test runs
  selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops
  selftests/bpf: adjust dummy_st_ops_success to detect additional error
  bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable
  selftests/bpf: Add ring_buffer__consume_n test.
  bpf: Add bpf_guard_preempt() convenience macro
  selftests: bpf: crypto: add benchmark for crypto functions
  selftests: bpf: crypto skcipher algo selftests
  bpf: crypto: add skcipher to bpf crypto
  bpf: make common crypto API for TC/XDP programs
  bpf: update the comment for BTF_FIELDS_MAX
  selftests/bpf: Fix wq test.
  selftests/bpf: Use make_sockaddr in test_sock_addr
  selftests/bpf: Use connect_to_addr in test_sock_addr
  ...
====================

Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2024-04-29 11:59:20 -07:00
commit 89de2db193
158 changed files with 9328 additions and 2141 deletions

View File

@ -5,7 +5,11 @@
BPF Instruction Set Architecture (ISA) BPF Instruction Set Architecture (ISA)
====================================== ======================================
This document specifies the BPF instruction set architecture (ISA). eBPF (which is no longer an acronym for anything), also commonly
referred to as BPF, is a technology with origins in the Linux kernel
that can run untrusted programs in a privileged context such as an
operating system kernel. This document specifies the BPF instruction
set architecture (ISA).
Documentation conventions Documentation conventions
========================= =========================
@ -43,7 +47,7 @@ a type's signedness (`S`) and bit width (`N`), respectively.
===== ========= ===== =========
For example, `u32` is a type whose valid values are all the 32-bit unsigned For example, `u32` is a type whose valid values are all the 32-bit unsigned
numbers and `s16` is a types whose valid values are all the 16-bit signed numbers and `s16` is a type whose valid values are all the 16-bit signed
numbers. numbers.
Functions Functions
@ -108,7 +112,7 @@ conformance group means it must support all instructions in that conformance
group. group.
The use of named conformance groups enables interoperability between a runtime The use of named conformance groups enables interoperability between a runtime
that executes instructions, and tools as such compilers that generate that executes instructions, and tools such as compilers that generate
instructions for the runtime. Thus, capability discovery in terms of instructions for the runtime. Thus, capability discovery in terms of
conformance groups might be done manually by users or automatically by tools. conformance groups might be done manually by users or automatically by tools.
@ -181,10 +185,13 @@ A basic instruction is encoded as follows::
(`64-bit immediate instructions`_ reuse this field for other purposes) (`64-bit immediate instructions`_ reuse this field for other purposes)
**dst_reg** **dst_reg**
destination register number (0-10) destination register number (0-10), unless otherwise specified
(future instructions might reuse this field for other purposes)
**offset** **offset**
signed integer offset used with pointer arithmetic signed integer offset used with pointer arithmetic, except where
otherwise specified (some arithmetic instructions reuse this field
for other purposes)
**imm** **imm**
signed integer immediate value signed integer immediate value
@ -228,10 +235,12 @@ This is depicted in the following figure::
operation to perform, encoded as explained above operation to perform, encoded as explained above
**regs** **regs**
The source and destination register numbers, encoded as explained above The source and destination register numbers (unless otherwise
specified), encoded as explained above
**offset** **offset**
signed integer offset used with pointer arithmetic signed integer offset used with pointer arithmetic, unless
otherwise specified
**imm** **imm**
signed integer immediate value signed integer immediate value
@ -342,8 +351,8 @@ where '(u32)' indicates that the upper 32 bits are zeroed.
dst = dst ^ imm dst = dst ^ imm
Note that most instructions have instruction offset of 0. Only three instructions Note that most arithmetic instructions have 'offset' set to 0. Only three instructions
(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero offset. (``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'.
Division, multiplication, and modulo operations for ``ALU`` are part Division, multiplication, and modulo operations for ``ALU`` are part
of the "divmul32" conformance group, and division, multiplication, and of the "divmul32" conformance group, and division, multiplication, and
@ -365,15 +374,15 @@ Note that there are varying definitions of the signed modulo operation
when the dividend or divisor are negative, where implementations often when the dividend or divisor are negative, where implementations often
vary by language such that Python, Ruby, etc. differ from C, Go, Java, vary by language such that Python, Ruby, etc. differ from C, Go, Java,
etc. This specification requires that signed modulo use truncated division etc. This specification requires that signed modulo use truncated division
(where -13 % 3 == -1) as implemented in C, Go, etc.: (where -13 % 3 == -1) as implemented in C, Go, etc.::
a % n = a - n * trunc(a / n) a % n = a - n * trunc(a / n)
The ``MOVSX`` instruction does a move operation with sign extension. The ``MOVSX`` instruction does a move operation with sign extension.
``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 ``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
bit operands, and zeroes the remaining upper 32 bits. 32-bit operands, and zeroes the remaining upper 32 bits.
``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit ``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
operands into 64 bit operands. Unlike other arithmetic instructions, operands into 64-bit operands. Unlike other arithmetic instructions,
``MOVSX`` is only defined for register source operands (``X``). ``MOVSX`` is only defined for register source operands (``X``).
The ``NEG`` instruction is only defined when the source bit is clear The ``NEG`` instruction is only defined when the source bit is clear
@ -411,19 +420,19 @@ conformance group.
Examples: Examples:
``{END, TO_LE, ALU}`` with imm = 16/32/64 means:: ``{END, TO_LE, ALU}`` with 'imm' = 16/32/64 means::
dst = htole16(dst) dst = htole16(dst)
dst = htole32(dst) dst = htole32(dst)
dst = htole64(dst) dst = htole64(dst)
``{END, TO_BE, ALU}`` with imm = 16/32/64 means:: ``{END, TO_BE, ALU}`` with 'imm' = 16/32/64 means::
dst = htobe16(dst) dst = htobe16(dst)
dst = htobe32(dst) dst = htobe32(dst)
dst = htobe64(dst) dst = htobe64(dst)
``{END, TO_LE, ALU64}`` with imm = 16/32/64 means:: ``{END, TO_LE, ALU64}`` with 'imm' = 16/32/64 means::
dst = bswap16(dst) dst = bswap16(dst)
dst = bswap32(dst) dst = bswap32(dst)
@ -438,27 +447,33 @@ otherwise identical operations, and indicates the base64 conformance
group unless otherwise specified. group unless otherwise specified.
The 'code' field encodes the operation as below: The 'code' field encodes the operation as below:
======== ===== ======= =============================== =================================================== ======== ===== ======= ================================= ===================================================
code value src_reg description notes code value src_reg description notes
======== ===== ======= =============================== =================================================== ======== ===== ======= ================================= ===================================================
JA 0x0 0x0 PC += offset {JA, K, JMP} only JA 0x0 0x0 PC += offset {JA, K, JMP} only
JA 0x0 0x0 PC += imm {JA, K, JMP32} only JA 0x0 0x0 PC += imm {JA, K, JMP32} only
JEQ 0x1 any PC += offset if dst == src JEQ 0x1 any PC += offset if dst == src
JGT 0x2 any PC += offset if dst > src unsigned JGT 0x2 any PC += offset if dst > src unsigned
JGE 0x3 any PC += offset if dst >= src unsigned JGE 0x3 any PC += offset if dst >= src unsigned
JSET 0x4 any PC += offset if dst & src JSET 0x4 any PC += offset if dst & src
JNE 0x5 any PC += offset if dst != src JNE 0x5 any PC += offset if dst != src
JSGT 0x6 any PC += offset if dst > src signed JSGT 0x6 any PC += offset if dst > src signed
JSGE 0x7 any PC += offset if dst >= src signed JSGE 0x7 any PC += offset if dst >= src signed
CALL 0x8 0x0 call helper function by address {CALL, K, JMP} only, see `Helper functions`_ CALL 0x8 0x0 call helper function by static ID {CALL, K, JMP} only, see `Helper functions`_
CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_ CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_
CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_ CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_
EXIT 0x9 0x0 return {CALL, K, JMP} only EXIT 0x9 0x0 return {CALL, K, JMP} only
JLT 0xa any PC += offset if dst < src unsigned JLT 0xa any PC += offset if dst < src unsigned
JLE 0xb any PC += offset if dst <= src unsigned JLE 0xb any PC += offset if dst <= src unsigned
JSLT 0xc any PC += offset if dst < src signed JSLT 0xc any PC += offset if dst < src signed
JSLE 0xd any PC += offset if dst <= src signed JSLE 0xd any PC += offset if dst <= src signed
======== ===== ======= =============================== =================================================== ======== ===== ======= ================================= ===================================================
where 'PC' denotes the program counter, and the offset to increment by
is in units of 64-bit instructions relative to the instruction following
the jump instruction. Thus 'PC += 1' skips execution of the next
instruction if it's a basic instruction or results in undefined behavior
if the next instruction is a 128-bit wide instruction.
The BPF program needs to store the return value into register R0 before doing an The BPF program needs to store the return value into register R0 before doing an
``EXIT``. ``EXIT``.
@ -475,7 +490,7 @@ where 's>=' indicates a signed '>=' comparison.
gotol +imm gotol +imm
where 'imm' means the branch offset comes from insn 'imm' field. where 'imm' means the branch offset comes from the 'imm' field.
Note that there are two flavors of ``JA`` instructions. The Note that there are two flavors of ``JA`` instructions. The
``JMP`` class permits a 16-bit jump offset specified by the 'offset' ``JMP`` class permits a 16-bit jump offset specified by the 'offset'
@ -493,26 +508,26 @@ Helper functions
Helper functions are a concept whereby BPF programs can call into a Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform. set of function calls exposed by the underlying platform.
Historically, each helper function was identified by an address Historically, each helper function was identified by a static ID
encoded in the imm field. The available helper functions may differ encoded in the 'imm' field. The available helper functions may differ
for each program type, but address values are unique across all program types. for each program type, but static IDs are unique across all program types.
Platforms that support the BPF Type Format (BTF) support identifying Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the imm field, where the BTF ID a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
identifies the helper name and type. identifies the helper name and type.
Program-local functions Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to caller, and are referenced by offset from the call instruction, similar to
``JA``. The offset is encoded in the imm field of the call instruction. ``JA``. The offset is encoded in the 'imm' field of the call instruction.
A ``EXIT`` within the program-local function will return to the caller. An ``EXIT`` within the program-local function will return to the caller.
Load and store instructions Load and store instructions
=========================== ===========================
For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
8-bit 'opcode' field is divided as:: 8-bit 'opcode' field is divided as follows::
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|mode |sz |class| |mode |sz |class|
@ -580,7 +595,7 @@ instructions that transfer data between a register and memory.
dst = *(signed size *) (src + offset) dst = *(signed size *) (src + offset)
Where size is one of: ``B``, ``H``, or ``W``, and Where '<size>' is one of: ``B``, ``H``, or ``W``, and
'signed size' is one of: s8, s16, or s32. 'signed size' is one of: s8, s16, or s32.
Atomic operations Atomic operations
@ -662,11 +677,11 @@ src_reg pseudocode imm type dst type
======= ========================================= =========== ============== ======= ========================================= =========== ==============
0x0 dst = (next_imm << 32) | imm integer integer 0x0 dst = (next_imm << 32) | imm integer integer
0x1 dst = map_by_fd(imm) map fd map 0x1 dst = map_by_fd(imm) map fd map
0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data address
0x3 dst = var_addr(imm) variable id data pointer 0x3 dst = var_addr(imm) variable id data address
0x4 dst = code_addr(imm) integer code pointer 0x4 dst = code_addr(imm) integer code address
0x5 dst = map_by_idx(imm) map index map 0x5 dst = map_by_idx(imm) map index map
0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data address
======= ========================================= =========== ============== ======= ========================================= =========== ==============
where where

View File

@ -3822,6 +3822,14 @@ F: kernel/bpf/tnum.c
F: kernel/bpf/trampoline.c F: kernel/bpf/trampoline.c
F: kernel/bpf/verifier.c F: kernel/bpf/verifier.c
BPF [CRYPTO]
M: Vadim Fedorenko <vadim.fedorenko@linux.dev>
L: bpf@vger.kernel.org
S: Maintained
F: crypto/bpf_crypto_skcipher.c
F: include/linux/bpf_crypto.h
F: kernel/bpf/crypto.c
BPF [DOCUMENTATION] (Related to Standardization) BPF [DOCUMENTATION] (Related to Standardization)
R: David Vernet <void@manifault.com> R: David Vernet <void@manifault.com>
L: bpf@vger.kernel.org L: bpf@vger.kernel.org

View File

@ -29,6 +29,7 @@
#define TCALL_CNT (MAX_BPF_JIT_REG + 2) #define TCALL_CNT (MAX_BPF_JIT_REG + 2)
#define TMP_REG_3 (MAX_BPF_JIT_REG + 3) #define TMP_REG_3 (MAX_BPF_JIT_REG + 3)
#define FP_BOTTOM (MAX_BPF_JIT_REG + 4) #define FP_BOTTOM (MAX_BPF_JIT_REG + 4)
#define ARENA_VM_START (MAX_BPF_JIT_REG + 5)
#define check_imm(bits, imm) do { \ #define check_imm(bits, imm) do { \
if ((((imm) > 0) && ((imm) >> (bits))) || \ if ((((imm) > 0) && ((imm) >> (bits))) || \
@ -67,6 +68,8 @@ static const int bpf2a64[] = {
/* temporary register for blinding constants */ /* temporary register for blinding constants */
[BPF_REG_AX] = A64_R(9), [BPF_REG_AX] = A64_R(9),
[FP_BOTTOM] = A64_R(27), [FP_BOTTOM] = A64_R(27),
/* callee saved register for kern_vm_start address */
[ARENA_VM_START] = A64_R(28),
}; };
struct jit_ctx { struct jit_ctx {
@ -79,6 +82,7 @@ struct jit_ctx {
__le32 *ro_image; __le32 *ro_image;
u32 stack_size; u32 stack_size;
int fpb_offset; int fpb_offset;
u64 user_vm_start;
}; };
struct bpf_plt { struct bpf_plt {
@ -295,7 +299,7 @@ static bool is_lsi_offset(int offset, int scale)
#define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8) #define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf, static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
bool is_exception_cb) bool is_exception_cb, u64 arena_vm_start)
{ {
const struct bpf_prog *prog = ctx->prog; const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = !bpf_is_subprog(prog); const bool is_main_prog = !bpf_is_subprog(prog);
@ -306,6 +310,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
const u8 fp = bpf2a64[BPF_REG_FP]; const u8 fp = bpf2a64[BPF_REG_FP];
const u8 tcc = bpf2a64[TCALL_CNT]; const u8 tcc = bpf2a64[TCALL_CNT];
const u8 fpb = bpf2a64[FP_BOTTOM]; const u8 fpb = bpf2a64[FP_BOTTOM];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const int idx0 = ctx->idx; const int idx0 = ctx->idx;
int cur_offset; int cur_offset;
@ -411,6 +416,10 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf,
/* Set up function call stack */ /* Set up function call stack */
emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
if (arena_vm_start)
emit_a64_mov_i64(arena_vm_base, arena_vm_start, ctx);
return 0; return 0;
} }
@ -738,6 +747,7 @@ static void build_epilogue(struct jit_ctx *ctx, bool is_exception_cb)
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0) #define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27) #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
bool ex_handler_bpf(const struct exception_table_entry *ex, bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs) struct pt_regs *regs)
@ -745,7 +755,8 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup); off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup); int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
regs->regs[dst_reg] = 0; if (dst_reg != DONT_CLEAR)
regs->regs[dst_reg] = 0;
regs->pc = (unsigned long)&ex->fixup - offset; regs->pc = (unsigned long)&ex->fixup - offset;
return true; return true;
} }
@ -765,7 +776,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0; return 0;
if (BPF_MODE(insn->code) != BPF_PROBE_MEM && if (BPF_MODE(insn->code) != BPF_PROBE_MEM &&
BPF_MODE(insn->code) != BPF_PROBE_MEMSX) BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
BPF_MODE(insn->code) != BPF_PROBE_MEM32)
return 0; return 0;
if (!ctx->prog->aux->extable || if (!ctx->prog->aux->extable ||
@ -810,6 +822,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
ex->insn = ins_offset; ex->insn = ins_offset;
if (BPF_CLASS(insn->code) != BPF_LDX)
dst_reg = DONT_CLEAR;
ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) | ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg); FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
@ -829,12 +844,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
bool extra_pass) bool extra_pass)
{ {
const u8 code = insn->code; const u8 code = insn->code;
const u8 dst = bpf2a64[insn->dst_reg]; u8 dst = bpf2a64[insn->dst_reg];
const u8 src = bpf2a64[insn->src_reg]; u8 src = bpf2a64[insn->src_reg];
const u8 tmp = bpf2a64[TMP_REG_1]; const u8 tmp = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2]; const u8 tmp2 = bpf2a64[TMP_REG_2];
const u8 fp = bpf2a64[BPF_REG_FP]; const u8 fp = bpf2a64[BPF_REG_FP];
const u8 fpb = bpf2a64[FP_BOTTOM]; const u8 fpb = bpf2a64[FP_BOTTOM];
const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
const s16 off = insn->off; const s16 off = insn->off;
const s32 imm = insn->imm; const s32 imm = insn->imm;
const int i = insn - ctx->prog->insnsi; const int i = insn - ctx->prog->insnsi;
@ -853,6 +869,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
/* dst = src */ /* dst = src */
case BPF_ALU | BPF_MOV | BPF_X: case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X: case BPF_ALU64 | BPF_MOV | BPF_X:
if (insn_is_cast_user(insn)) {
emit(A64_MOV(0, tmp, src), ctx); // 32-bit mov clears the upper 32 bits
emit_a64_mov_i(0, dst, ctx->user_vm_start >> 32, ctx);
emit(A64_LSL(1, dst, dst, 32), ctx);
emit(A64_CBZ(1, tmp, 2), ctx);
emit(A64_ORR(1, tmp, dst, tmp), ctx);
emit(A64_MOV(1, dst, tmp), ctx);
break;
}
switch (insn->off) { switch (insn->off) {
case 0: case 0:
emit(A64_MOV(is64, dst, src), ctx); emit(A64_MOV(is64, dst, src), ctx);
@ -1237,7 +1262,15 @@ emit_cond_jmp:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_B: case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_H: case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_W: case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
if (ctx->fpb_offset > 0 && src == fp) { case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, src, arena_vm_base), ctx);
src = tmp2;
}
if (ctx->fpb_offset > 0 && src == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
src_adj = fpb; src_adj = fpb;
off_adj = off + ctx->fpb_offset; off_adj = off + ctx->fpb_offset;
} else { } else {
@ -1322,7 +1355,15 @@ emit_cond_jmp:
case BPF_ST | BPF_MEM | BPF_H: case BPF_ST | BPF_MEM | BPF_H:
case BPF_ST | BPF_MEM | BPF_B: case BPF_ST | BPF_MEM | BPF_B:
case BPF_ST | BPF_MEM | BPF_DW: case BPF_ST | BPF_MEM | BPF_DW:
if (ctx->fpb_offset > 0 && dst == fp) { case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
dst = tmp2;
}
if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
dst_adj = fpb; dst_adj = fpb;
off_adj = off + ctx->fpb_offset; off_adj = off + ctx->fpb_offset;
} else { } else {
@ -1365,6 +1406,10 @@ emit_cond_jmp:
} }
break; break;
} }
ret = add_exception_handler(insn, ctx, dst);
if (ret)
return ret;
break; break;
/* STX: *(size *)(dst + off) = src */ /* STX: *(size *)(dst + off) = src */
@ -1372,7 +1417,15 @@ emit_cond_jmp:
case BPF_STX | BPF_MEM | BPF_H: case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_B: case BPF_STX | BPF_MEM | BPF_B:
case BPF_STX | BPF_MEM | BPF_DW: case BPF_STX | BPF_MEM | BPF_DW:
if (ctx->fpb_offset > 0 && dst == fp) { case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit(A64_ADD(1, tmp2, dst, arena_vm_base), ctx);
dst = tmp2;
}
if (ctx->fpb_offset > 0 && dst == fp && BPF_MODE(insn->code) != BPF_PROBE_MEM32) {
dst_adj = fpb; dst_adj = fpb;
off_adj = off + ctx->fpb_offset; off_adj = off + ctx->fpb_offset;
} else { } else {
@ -1413,6 +1466,10 @@ emit_cond_jmp:
} }
break; break;
} }
ret = add_exception_handler(insn, ctx, dst);
if (ret)
return ret;
break; break;
case BPF_STX | BPF_ATOMIC | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_W:
@ -1594,6 +1651,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
bool tmp_blinded = false; bool tmp_blinded = false;
bool extra_pass = false; bool extra_pass = false;
struct jit_ctx ctx; struct jit_ctx ctx;
u64 arena_vm_start;
u8 *image_ptr; u8 *image_ptr;
u8 *ro_image_ptr; u8 *ro_image_ptr;
@ -1611,6 +1669,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
prog = tmp; prog = tmp;
} }
arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
jit_data = prog->aux->jit_data; jit_data = prog->aux->jit_data;
if (!jit_data) { if (!jit_data) {
jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
@ -1641,6 +1700,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
} }
ctx.fpb_offset = find_fpb_offset(prog); ctx.fpb_offset = find_fpb_offset(prog);
ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
/* /*
* 1. Initial fake pass to compute ctx->idx and ctx->offset. * 1. Initial fake pass to compute ctx->idx and ctx->offset.
@ -1648,7 +1708,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
* BPF line info needs ctx->offset[i] to be the offset of * BPF line info needs ctx->offset[i] to be the offset of
* instruction[i] in jited image, so build prologue first. * instruction[i] in jited image, so build prologue first.
*/ */
if (build_prologue(&ctx, was_classic, prog->aux->exception_cb)) { if (build_prologue(&ctx, was_classic, prog->aux->exception_cb,
arena_vm_start)) {
prog = orig_prog; prog = orig_prog;
goto out_off; goto out_off;
} }
@ -1696,7 +1757,7 @@ skip_init_ctx:
ctx.idx = 0; ctx.idx = 0;
ctx.exentry_idx = 0; ctx.exentry_idx = 0;
build_prologue(&ctx, was_classic, prog->aux->exception_cb); build_prologue(&ctx, was_classic, prog->aux->exception_cb, arena_vm_start);
if (build_body(&ctx, extra_pass)) { if (build_body(&ctx, extra_pass)) {
prog = orig_prog; prog = orig_prog;
@ -2461,6 +2522,11 @@ bool bpf_jit_supports_exceptions(void)
return true; return true;
} }
bool bpf_jit_supports_arena(void)
{
return true;
}
void bpf_jit_free(struct bpf_prog *prog) void bpf_jit_free(struct bpf_prog *prog)
{ {
if (prog->jited) { if (prog->jited) {

View File

@ -81,6 +81,8 @@ struct rv_jit_context {
int nexentries; int nexentries;
unsigned long flags; unsigned long flags;
int stack_size; int stack_size;
u64 arena_vm_start;
u64 user_vm_start;
}; };
/* Convert from ninsns to bytes. */ /* Convert from ninsns to bytes. */

View File

@ -18,6 +18,7 @@
#define RV_REG_TCC RV_REG_A6 #define RV_REG_TCC RV_REG_A6
#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */ #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
#define RV_REG_ARENA RV_REG_S7 /* For storing arena_vm_start */
static const int regmap[] = { static const int regmap[] = {
[BPF_REG_0] = RV_REG_A5, [BPF_REG_0] = RV_REG_A5,
@ -255,6 +256,10 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx); emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
store_offset -= 8; store_offset -= 8;
} }
if (ctx->arena_vm_start) {
emit_ld(RV_REG_ARENA, store_offset, RV_REG_SP, ctx);
store_offset -= 8;
}
emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx); emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
/* Set return value. */ /* Set return value. */
@ -548,6 +553,7 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0) #define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27) #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define REG_DONT_CLEAR_MARKER 0 /* RV_REG_ZERO unused in pt_regmap */
bool ex_handler_bpf(const struct exception_table_entry *ex, bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs) struct pt_regs *regs)
@ -555,7 +561,8 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup); off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int regs_offset = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup); int regs_offset = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0; if (regs_offset != REG_DONT_CLEAR_MARKER)
*(unsigned long *)((void *)regs + pt_regmap[regs_offset]) = 0;
regs->epc = (unsigned long)&ex->fixup - offset; regs->epc = (unsigned long)&ex->fixup - offset;
return true; return true;
@ -572,7 +579,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
off_t fixup_offset; off_t fixup_offset;
if (!ctx->insns || !ctx->ro_insns || !ctx->prog->aux->extable || if (!ctx->insns || !ctx->ro_insns || !ctx->prog->aux->extable ||
(BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX)) (BPF_MODE(insn->code) != BPF_PROBE_MEM && BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
BPF_MODE(insn->code) != BPF_PROBE_MEM32))
return 0; return 0;
if (WARN_ON_ONCE(ctx->nexentries >= ctx->prog->aux->num_exentries)) if (WARN_ON_ONCE(ctx->nexentries >= ctx->prog->aux->num_exentries))
@ -1073,6 +1081,15 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
/* dst = src */ /* dst = src */
case BPF_ALU | BPF_MOV | BPF_X: case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X: case BPF_ALU64 | BPF_MOV | BPF_X:
if (insn_is_cast_user(insn)) {
emit_mv(RV_REG_T1, rs, ctx);
emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
emit_imm(rd, (ctx->user_vm_start >> 32) << 32, ctx);
emit(rv_beq(RV_REG_T1, RV_REG_ZERO, 4), ctx);
emit_or(RV_REG_T1, rd, RV_REG_T1, ctx);
emit_mv(rd, RV_REG_T1, ctx);
break;
}
if (imm == 1) { if (imm == 1) {
/* Special mov32 for zext */ /* Special mov32 for zext */
emit_zextw(rd, rd, ctx); emit_zextw(rd, rd, ctx);
@ -1539,6 +1556,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
case BPF_LDX | BPF_PROBE_MEMSX | BPF_B: case BPF_LDX | BPF_PROBE_MEMSX | BPF_B:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_H: case BPF_LDX | BPF_PROBE_MEMSX | BPF_H:
case BPF_LDX | BPF_PROBE_MEMSX | BPF_W: case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
/* LDX | PROBE_MEM32: dst = *(unsigned size *)(src + RV_REG_ARENA + off) */
case BPF_LDX | BPF_PROBE_MEM32 | BPF_B:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
{ {
int insn_len, insns_start; int insn_len, insns_start;
bool sign_ext; bool sign_ext;
@ -1546,6 +1568,11 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
sign_ext = BPF_MODE(insn->code) == BPF_MEMSX || sign_ext = BPF_MODE(insn->code) == BPF_MEMSX ||
BPF_MODE(insn->code) == BPF_PROBE_MEMSX; BPF_MODE(insn->code) == BPF_PROBE_MEMSX;
if (BPF_MODE(insn->code) == BPF_PROBE_MEM32) {
emit_add(RV_REG_T2, rs, RV_REG_ARENA, ctx);
rs = RV_REG_T2;
}
switch (BPF_SIZE(code)) { switch (BPF_SIZE(code)) {
case BPF_B: case BPF_B:
if (is_12b_int(off)) { if (is_12b_int(off)) {
@ -1682,6 +1709,86 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx); emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
break; break;
case BPF_ST | BPF_PROBE_MEM32 | BPF_B:
case BPF_ST | BPF_PROBE_MEM32 | BPF_H:
case BPF_ST | BPF_PROBE_MEM32 | BPF_W:
case BPF_ST | BPF_PROBE_MEM32 | BPF_DW:
{
int insn_len, insns_start;
emit_add(RV_REG_T3, rd, RV_REG_ARENA, ctx);
rd = RV_REG_T3;
/* Load imm to a register then store it */
emit_imm(RV_REG_T1, imm, ctx);
switch (BPF_SIZE(code)) {
case BPF_B:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sb(rd, off, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sb(RV_REG_T2, 0, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_H:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sh(rd, off, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sh(RV_REG_T2, 0, RV_REG_T1), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_W:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sw(rd, off, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit_sw(RV_REG_T2, 0, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_DW:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sd(rd, off, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T2, off, ctx);
emit_add(RV_REG_T2, RV_REG_T2, rd, ctx);
insns_start = ctx->ninsns;
emit_sd(RV_REG_T2, 0, RV_REG_T1, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
insn_len);
if (ret)
return ret;
break;
}
/* STX: *(size *)(dst + off) = src */ /* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_B: case BPF_STX | BPF_MEM | BPF_B:
if (is_12b_int(off)) { if (is_12b_int(off)) {
@ -1728,6 +1835,84 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_atomic(rd, rs, off, imm, emit_atomic(rd, rs, off, imm,
BPF_SIZE(code) == BPF_DW, ctx); BPF_SIZE(code) == BPF_DW, ctx);
break; break;
case BPF_STX | BPF_PROBE_MEM32 | BPF_B:
case BPF_STX | BPF_PROBE_MEM32 | BPF_H:
case BPF_STX | BPF_PROBE_MEM32 | BPF_W:
case BPF_STX | BPF_PROBE_MEM32 | BPF_DW:
{
int insn_len, insns_start;
emit_add(RV_REG_T2, rd, RV_REG_ARENA, ctx);
rd = RV_REG_T2;
switch (BPF_SIZE(code)) {
case BPF_B:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sb(rd, off, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sb(RV_REG_T1, 0, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_H:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit(rv_sh(rd, off, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit(rv_sh(RV_REG_T1, 0, rs), ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_W:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sw(rd, off, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit_sw(RV_REG_T1, 0, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
case BPF_DW:
if (is_12b_int(off)) {
insns_start = ctx->ninsns;
emit_sd(rd, off, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
emit_imm(RV_REG_T1, off, ctx);
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
insns_start = ctx->ninsns;
emit_sd(RV_REG_T1, 0, rs, ctx);
insn_len = ctx->ninsns - insns_start;
break;
}
ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
insn_len);
if (ret)
return ret;
break;
}
default: default:
pr_err("bpf-jit: unknown opcode %02x\n", code); pr_err("bpf-jit: unknown opcode %02x\n", code);
return -EINVAL; return -EINVAL;
@ -1759,6 +1944,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
stack_adjust += 8; stack_adjust += 8;
if (seen_reg(RV_REG_S6, ctx)) if (seen_reg(RV_REG_S6, ctx))
stack_adjust += 8; stack_adjust += 8;
if (ctx->arena_vm_start)
stack_adjust += 8;
stack_adjust = round_up(stack_adjust, 16); stack_adjust = round_up(stack_adjust, 16);
stack_adjust += bpf_stack_adjust; stack_adjust += bpf_stack_adjust;
@ -1810,6 +1997,10 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx); emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
store_offset -= 8; store_offset -= 8;
} }
if (ctx->arena_vm_start) {
emit_sd(RV_REG_SP, store_offset, RV_REG_ARENA, ctx);
store_offset -= 8;
}
emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx); emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
@ -1823,6 +2014,9 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx); emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
ctx->stack_size = stack_adjust; ctx->stack_size = stack_adjust;
if (ctx->arena_vm_start)
emit_imm(RV_REG_ARENA, ctx->arena_vm_start, ctx);
} }
void bpf_jit_build_epilogue(struct rv_jit_context *ctx) void bpf_jit_build_epilogue(struct rv_jit_context *ctx)
@ -1839,3 +2033,8 @@ bool bpf_jit_supports_ptr_xchg(void)
{ {
return true; return true;
} }
bool bpf_jit_supports_arena(void)
{
return true;
}

View File

@ -80,6 +80,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
goto skip_init_ctx; goto skip_init_ctx;
} }
ctx->arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
ctx->user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
ctx->prog = prog; ctx->prog = prog;
ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL); ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
if (!ctx->offset) { if (!ctx->offset) {

View File

@ -816,9 +816,10 @@ done:
static void emit_mov_imm64(u8 **pprog, u32 dst_reg, static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
const u32 imm32_hi, const u32 imm32_lo) const u32 imm32_hi, const u32 imm32_lo)
{ {
u64 imm64 = ((u64)imm32_hi << 32) | (u32)imm32_lo;
u8 *prog = *pprog; u8 *prog = *pprog;
if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) { if (is_uimm32(imm64)) {
/* /*
* For emitting plain u32, where sign bit must not be * For emitting plain u32, where sign bit must not be
* propagated LLVM tends to load imm64 over mov32 * propagated LLVM tends to load imm64 over mov32
@ -826,6 +827,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
* 'mov %eax, imm32' instead. * 'mov %eax, imm32' instead.
*/ */
emit_mov_imm32(&prog, false, dst_reg, imm32_lo); emit_mov_imm32(&prog, false, dst_reg, imm32_lo);
} else if (is_simm32(imm64)) {
emit_mov_imm32(&prog, true, dst_reg, imm32_lo);
} else { } else {
/* movabsq rax, imm64 */ /* movabsq rax, imm64 */
EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg)); EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
@ -1169,6 +1172,54 @@ static int emit_atomic(u8 **pprog, u8 atomic_op,
return 0; return 0;
} }
static int emit_atomic_index(u8 **pprog, u8 atomic_op, u32 size,
u32 dst_reg, u32 src_reg, u32 index_reg, int off)
{
u8 *prog = *pprog;
EMIT1(0xF0); /* lock prefix */
switch (size) {
case BPF_W:
EMIT1(add_3mod(0x40, dst_reg, src_reg, index_reg));
break;
case BPF_DW:
EMIT1(add_3mod(0x48, dst_reg, src_reg, index_reg));
break;
default:
pr_err("bpf_jit: 1 and 2 byte atomics are not supported\n");
return -EFAULT;
}
/* emit opcode */
switch (atomic_op) {
case BPF_ADD:
case BPF_AND:
case BPF_OR:
case BPF_XOR:
/* lock *(u32/u64*)(dst_reg + idx_reg + off) <op>= src_reg */
EMIT1(simple_alu_opcodes[atomic_op]);
break;
case BPF_ADD | BPF_FETCH:
/* src_reg = atomic_fetch_add(dst_reg + idx_reg + off, src_reg); */
EMIT2(0x0F, 0xC1);
break;
case BPF_XCHG:
/* src_reg = atomic_xchg(dst_reg + idx_reg + off, src_reg); */
EMIT1(0x87);
break;
case BPF_CMPXCHG:
/* r0 = atomic_cmpxchg(dst_reg + idx_reg + off, r0, src_reg); */
EMIT2(0x0F, 0xB1);
break;
default:
pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op);
return -EFAULT;
}
emit_insn_suffix_SIB(&prog, dst_reg, src_reg, index_reg, off);
*pprog = prog;
return 0;
}
#define DONT_CLEAR 1 #define DONT_CLEAR 1
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs) bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
@ -1382,6 +1433,16 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
maybe_emit_mod(&prog, AUX_REG, dst_reg, true); maybe_emit_mod(&prog, AUX_REG, dst_reg, true);
EMIT3(0x0F, 0x44, add_2reg(0xC0, AUX_REG, dst_reg)); EMIT3(0x0F, 0x44, add_2reg(0xC0, AUX_REG, dst_reg));
break; break;
} else if (insn_is_mov_percpu_addr(insn)) {
/* mov <dst>, <src> (if necessary) */
EMIT_mov(dst_reg, src_reg);
#ifdef CONFIG_SMP
/* add <dst>, gs:[<off>] */
EMIT2(0x65, add_1mod(0x48, dst_reg));
EMIT3(0x03, add_2reg(0x04, 0, dst_reg), 0x25);
EMIT((u32)(unsigned long)&this_cpu_off, 4);
#endif
break;
} }
fallthrough; fallthrough;
case BPF_ALU | BPF_MOV | BPF_X: case BPF_ALU | BPF_MOV | BPF_X:
@ -1969,6 +2030,15 @@ populate_extable:
return err; return err;
break; break;
case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
start_of_ldx = prog;
err = emit_atomic_index(&prog, insn->imm, BPF_SIZE(insn->code),
dst_reg, src_reg, X86_REG_R12, insn->off);
if (err)
return err;
goto populate_extable;
/* call */ /* call */
case BPF_JMP | BPF_CALL: { case BPF_JMP | BPF_CALL: {
u8 *ip = image + addrs[i - 1]; u8 *ip = image + addrs[i - 1];
@ -3362,6 +3432,11 @@ bool bpf_jit_supports_subprog_tailcalls(void)
return true; return true;
} }
bool bpf_jit_supports_percpu_insn(void)
{
return true;
}
void bpf_jit_free(struct bpf_prog *prog) void bpf_jit_free(struct bpf_prog *prog)
{ {
if (prog->jited) { if (prog->jited) {
@ -3465,6 +3540,21 @@ bool bpf_jit_supports_arena(void)
return true; return true;
} }
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
if (!in_arena)
return true;
switch (insn->code) {
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm == (BPF_AND | BPF_FETCH) ||
insn->imm == (BPF_OR | BPF_FETCH) ||
insn->imm == (BPF_XOR | BPF_FETCH))
return false;
}
return true;
}
bool bpf_jit_supports_ptr_xchg(void) bool bpf_jit_supports_ptr_xchg(void)
{ {
return true; return true;

View File

@ -20,6 +20,9 @@ crypto_skcipher-y += lskcipher.o
crypto_skcipher-y += skcipher.o crypto_skcipher-y += skcipher.o
obj-$(CONFIG_CRYPTO_SKCIPHER2) += crypto_skcipher.o obj-$(CONFIG_CRYPTO_SKCIPHER2) += crypto_skcipher.o
ifeq ($(CONFIG_BPF_SYSCALL),y)
obj-$(CONFIG_CRYPTO_SKCIPHER2) += bpf_crypto_skcipher.o
endif
obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o
obj-$(CONFIG_CRYPTO_ECHAINIV) += echainiv.o obj-$(CONFIG_CRYPTO_ECHAINIV) += echainiv.o

View File

@ -0,0 +1,82 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Meta, Inc */
#include <linux/types.h>
#include <linux/module.h>
#include <linux/bpf_crypto.h>
#include <crypto/skcipher.h>
static void *bpf_crypto_lskcipher_alloc_tfm(const char *algo)
{
return crypto_alloc_lskcipher(algo, 0, 0);
}
static void bpf_crypto_lskcipher_free_tfm(void *tfm)
{
crypto_free_lskcipher(tfm);
}
static int bpf_crypto_lskcipher_has_algo(const char *algo)
{
return crypto_has_skcipher(algo, CRYPTO_ALG_TYPE_LSKCIPHER, CRYPTO_ALG_TYPE_MASK);
}
static int bpf_crypto_lskcipher_setkey(void *tfm, const u8 *key, unsigned int keylen)
{
return crypto_lskcipher_setkey(tfm, key, keylen);
}
static u32 bpf_crypto_lskcipher_get_flags(void *tfm)
{
return crypto_lskcipher_get_flags(tfm);
}
static unsigned int bpf_crypto_lskcipher_ivsize(void *tfm)
{
return crypto_lskcipher_ivsize(tfm);
}
static unsigned int bpf_crypto_lskcipher_statesize(void *tfm)
{
return crypto_lskcipher_statesize(tfm);
}
static int bpf_crypto_lskcipher_encrypt(void *tfm, const u8 *src, u8 *dst,
unsigned int len, u8 *siv)
{
return crypto_lskcipher_encrypt(tfm, src, dst, len, siv);
}
static int bpf_crypto_lskcipher_decrypt(void *tfm, const u8 *src, u8 *dst,
unsigned int len, u8 *siv)
{
return crypto_lskcipher_decrypt(tfm, src, dst, len, siv);
}
static const struct bpf_crypto_type bpf_crypto_lskcipher_type = {
.alloc_tfm = bpf_crypto_lskcipher_alloc_tfm,
.free_tfm = bpf_crypto_lskcipher_free_tfm,
.has_algo = bpf_crypto_lskcipher_has_algo,
.setkey = bpf_crypto_lskcipher_setkey,
.encrypt = bpf_crypto_lskcipher_encrypt,
.decrypt = bpf_crypto_lskcipher_decrypt,
.ivsize = bpf_crypto_lskcipher_ivsize,
.statesize = bpf_crypto_lskcipher_statesize,
.get_flags = bpf_crypto_lskcipher_get_flags,
.owner = THIS_MODULE,
.name = "skcipher",
};
static int __init bpf_crypto_skcipher_init(void)
{
return bpf_crypto_register_type(&bpf_crypto_lskcipher_type);
}
static void __exit bpf_crypto_skcipher_exit(void)
{
int err = bpf_crypto_unregister_type(&bpf_crypto_lskcipher_type);
WARN_ON_ONCE(err);
}
module_init(bpf_crypto_skcipher_init);
module_exit(bpf_crypto_skcipher_exit);
MODULE_LICENSE("GPL");

View File

@ -184,8 +184,8 @@ struct bpf_map_ops {
}; };
enum { enum {
/* Support at most 10 fields in a BTF type */ /* Support at most 11 fields in a BTF type */
BTF_FIELDS_MAX = 10, BTF_FIELDS_MAX = 11,
}; };
enum btf_field_type { enum btf_field_type {
@ -202,6 +202,7 @@ enum btf_field_type {
BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE, BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD, BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9), BPF_REFCOUNT = (1 << 9),
BPF_WORKQUEUE = (1 << 10),
}; };
typedef void (*btf_dtor_kfunc_t)(void *); typedef void (*btf_dtor_kfunc_t)(void *);
@ -238,6 +239,7 @@ struct btf_record {
u32 field_mask; u32 field_mask;
int spin_lock_off; int spin_lock_off;
int timer_off; int timer_off;
int wq_off;
int refcount_off; int refcount_off;
struct btf_field fields[]; struct btf_field fields[];
}; };
@ -312,6 +314,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
return "bpf_spin_lock"; return "bpf_spin_lock";
case BPF_TIMER: case BPF_TIMER:
return "bpf_timer"; return "bpf_timer";
case BPF_WORKQUEUE:
return "bpf_wq";
case BPF_KPTR_UNREF: case BPF_KPTR_UNREF:
case BPF_KPTR_REF: case BPF_KPTR_REF:
return "kptr"; return "kptr";
@ -340,6 +344,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
return sizeof(struct bpf_spin_lock); return sizeof(struct bpf_spin_lock);
case BPF_TIMER: case BPF_TIMER:
return sizeof(struct bpf_timer); return sizeof(struct bpf_timer);
case BPF_WORKQUEUE:
return sizeof(struct bpf_wq);
case BPF_KPTR_UNREF: case BPF_KPTR_UNREF:
case BPF_KPTR_REF: case BPF_KPTR_REF:
case BPF_KPTR_PERCPU: case BPF_KPTR_PERCPU:
@ -367,6 +373,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
return __alignof__(struct bpf_spin_lock); return __alignof__(struct bpf_spin_lock);
case BPF_TIMER: case BPF_TIMER:
return __alignof__(struct bpf_timer); return __alignof__(struct bpf_timer);
case BPF_WORKQUEUE:
return __alignof__(struct bpf_wq);
case BPF_KPTR_UNREF: case BPF_KPTR_UNREF:
case BPF_KPTR_REF: case BPF_KPTR_REF:
case BPF_KPTR_PERCPU: case BPF_KPTR_PERCPU:
@ -406,6 +414,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */ /* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
case BPF_SPIN_LOCK: case BPF_SPIN_LOCK:
case BPF_TIMER: case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_KPTR_UNREF: case BPF_KPTR_UNREF:
case BPF_KPTR_REF: case BPF_KPTR_REF:
case BPF_KPTR_PERCPU: case BPF_KPTR_PERCPU:
@ -525,6 +534,7 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src); bool lock_src);
void bpf_timer_cancel_and_free(void *timer); void bpf_timer_cancel_and_free(void *timer);
void bpf_wq_cancel_and_free(void *timer);
void bpf_list_head_free(const struct btf_field *field, void *list_head, void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock); struct bpf_spin_lock *spin_lock);
void bpf_rb_root_free(const struct btf_field *field, void *rb_root, void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
@ -1265,6 +1275,7 @@ int bpf_dynptr_check_size(u32 size);
u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr); u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len); const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len); void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);
bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr);
#ifdef CONFIG_BPF_JIT #ifdef CONFIG_BPF_JIT
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr); int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
@ -2209,6 +2220,7 @@ void bpf_map_free_record(struct bpf_map *map);
struct btf_record *btf_record_dup(const struct btf_record *rec); struct btf_record *btf_record_dup(const struct btf_record *rec);
bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *rec_b); bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *rec_b);
void bpf_obj_free_timer(const struct btf_record *rec, void *obj); void bpf_obj_free_timer(const struct btf_record *rec, void *obj);
void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj);
void bpf_obj_free_fields(const struct btf_record *rec, void *obj); void bpf_obj_free_fields(const struct btf_record *rec, void *obj);
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu); void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
@ -3010,6 +3022,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags); int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
int sock_map_bpf_prog_query(const union bpf_attr *attr, int sock_map_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr); union bpf_attr __user *uattr);
int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
void sock_map_unhash(struct sock *sk); void sock_map_unhash(struct sock *sk);
void sock_map_destroy(struct sock *sk); void sock_map_destroy(struct sock *sk);
@ -3108,6 +3121,11 @@ static inline int sock_map_bpf_prog_query(const union bpf_attr *attr,
{ {
return -EINVAL; return -EINVAL;
} }
static inline int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
}
#endif /* CONFIG_BPF_SYSCALL */ #endif /* CONFIG_BPF_SYSCALL */
#endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */ #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */

View File

@ -0,0 +1,24 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#ifndef _BPF_CRYPTO_H
#define _BPF_CRYPTO_H
struct bpf_crypto_type {
void *(*alloc_tfm)(const char *algo);
void (*free_tfm)(void *tfm);
int (*has_algo)(const char *algo);
int (*setkey)(void *tfm, const u8 *key, unsigned int keylen);
int (*setauthsize)(void *tfm, unsigned int authsize);
int (*encrypt)(void *tfm, const u8 *src, u8 *dst, unsigned int len, u8 *iv);
int (*decrypt)(void *tfm, const u8 *src, u8 *dst, unsigned int len, u8 *iv);
unsigned int (*ivsize)(void *tfm);
unsigned int (*statesize)(void *tfm);
u32 (*get_flags)(void *tfm);
struct module *owner;
char name[14];
};
int bpf_crypto_register_type(const struct bpf_crypto_type *type);
int bpf_crypto_unregister_type(const struct bpf_crypto_type *type);
#endif /* _BPF_CRYPTO_H */

View File

@ -421,11 +421,13 @@ struct bpf_verifier_state {
struct bpf_active_lock active_lock; struct bpf_active_lock active_lock;
bool speculative; bool speculative;
bool active_rcu_lock; bool active_rcu_lock;
u32 active_preempt_lock;
/* If this state was ever pointed-to by other state's loop_entry field /* If this state was ever pointed-to by other state's loop_entry field
* this flag would be set to true. Used to avoid freeing such states * this flag would be set to true. Used to avoid freeing such states
* while they are still in use. * while they are still in use.
*/ */
bool used_as_loop_entry; bool used_as_loop_entry;
bool in_sleepable;
/* first and last insn idx of this verifier state */ /* first and last insn idx of this verifier state */
u32 first_insn_idx; u32 first_insn_idx;
@ -502,6 +504,13 @@ struct bpf_loop_inline_state {
u32 callback_subprogno; /* valid when fit_for_inline is true */ u32 callback_subprogno; /* valid when fit_for_inline is true */
}; };
/* pointer and state for maps */
struct bpf_map_ptr_state {
struct bpf_map *map_ptr;
bool poison;
bool unpriv;
};
/* Possible states for alu_state member. */ /* Possible states for alu_state member. */
#define BPF_ALU_SANITIZE_SRC (1U << 0) #define BPF_ALU_SANITIZE_SRC (1U << 0)
#define BPF_ALU_SANITIZE_DST (1U << 1) #define BPF_ALU_SANITIZE_DST (1U << 1)
@ -514,7 +523,7 @@ struct bpf_loop_inline_state {
struct bpf_insn_aux_data { struct bpf_insn_aux_data {
union { union {
enum bpf_reg_type ptr_type; /* pointer type for load/store insns */ enum bpf_reg_type ptr_type; /* pointer type for load/store insns */
unsigned long map_ptr_state; /* pointer/poison value for maps */ struct bpf_map_ptr_state map_ptr_state;
s32 call_imm; /* saved imm field of call insn */ s32 call_imm; /* saved imm field of call insn */
u32 alu_limit; /* limit for add/sub register with pointer */ u32 alu_limit; /* limit for add/sub register with pointer */
struct { struct {

View File

@ -75,6 +75,9 @@ struct ctl_table_header;
/* unused opcode to mark special load instruction. Same as BPF_MSH */ /* unused opcode to mark special load instruction. Same as BPF_MSH */
#define BPF_PROBE_MEM32 0xa0 #define BPF_PROBE_MEM32 0xa0
/* unused opcode to mark special atomic instruction */
#define BPF_PROBE_ATOMIC 0xe0
/* unused opcode to mark call to interpreter with arguments */ /* unused opcode to mark call to interpreter with arguments */
#define BPF_CALL_ARGS 0xe0 #define BPF_CALL_ARGS 0xe0
@ -178,6 +181,25 @@ struct ctl_table_header;
.off = 0, \ .off = 0, \
.imm = 0 }) .imm = 0 })
/* Special (internal-only) form of mov, used to resolve per-CPU addrs:
* dst_reg = src_reg + <percpu_base_off>
* BPF_ADDR_PERCPU is used as a special insn->off value.
*/
#define BPF_ADDR_PERCPU (-1)
#define BPF_MOV64_PERCPU_REG(DST, SRC) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = BPF_ADDR_PERCPU, \
.imm = 0 })
static inline bool insn_is_mov_percpu_addr(const struct bpf_insn *insn)
{
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
/* Short form of mov, dst_reg = imm32 */ /* Short form of mov, dst_reg = imm32 */
#define BPF_MOV64_IMM(DST, IMM) \ #define BPF_MOV64_IMM(DST, IMM) \
@ -654,14 +676,16 @@ static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
cant_migrate(); cant_migrate();
if (static_branch_unlikely(&bpf_stats_enabled_key)) { if (static_branch_unlikely(&bpf_stats_enabled_key)) {
struct bpf_prog_stats *stats; struct bpf_prog_stats *stats;
u64 start = sched_clock(); u64 duration, start = sched_clock();
unsigned long flags; unsigned long flags;
ret = dfunc(ctx, prog->insnsi, prog->bpf_func); ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
duration = sched_clock() - start;
stats = this_cpu_ptr(prog->stats); stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp); flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->cnt); u64_stats_inc(&stats->cnt);
u64_stats_add(&stats->nsecs, sched_clock() - start); u64_stats_add(&stats->nsecs, duration);
u64_stats_update_end_irqrestore(&stats->syncp, flags); u64_stats_update_end_irqrestore(&stats->syncp, flags);
} else { } else {
ret = dfunc(ctx, prog->insnsi, prog->bpf_func); ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
@ -970,11 +994,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
void bpf_jit_compile(struct bpf_prog *prog); void bpf_jit_compile(struct bpf_prog *prog);
bool bpf_jit_needs_zext(void); bool bpf_jit_needs_zext(void);
bool bpf_jit_supports_subprog_tailcalls(void); bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_percpu_insn(void);
bool bpf_jit_supports_kfunc_call(void); bool bpf_jit_supports_kfunc_call(void);
bool bpf_jit_supports_far_kfunc_call(void); bool bpf_jit_supports_far_kfunc_call(void);
bool bpf_jit_supports_exceptions(void); bool bpf_jit_supports_exceptions(void);
bool bpf_jit_supports_ptr_xchg(void); bool bpf_jit_supports_ptr_xchg(void);
bool bpf_jit_supports_arena(void); bool bpf_jit_supports_arena(void);
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie); void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func); bool bpf_helper_changes_pkt_data(void *func);

View File

@ -58,6 +58,10 @@ struct sk_psock_progs {
struct bpf_prog *stream_parser; struct bpf_prog *stream_parser;
struct bpf_prog *stream_verdict; struct bpf_prog *stream_verdict;
struct bpf_prog *skb_verdict; struct bpf_prog *skb_verdict;
struct bpf_link *msg_parser_link;
struct bpf_link *stream_parser_link;
struct bpf_link *stream_verdict_link;
struct bpf_link *skb_verdict_link;
}; };
enum sk_psock_state_bits { enum sk_psock_state_bits {

View File

@ -2711,10 +2711,10 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk)
return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1); return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN, 0, NULL) == 1);
} }
static inline void tcp_bpf_rtt(struct sock *sk) static inline void tcp_bpf_rtt(struct sock *sk, long mrtt, u32 srtt)
{ {
if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RTT_CB_FLAG)) if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RTT_CB_FLAG))
tcp_call_bpf(sk, BPF_SOCK_OPS_RTT_CB, 0, NULL); tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_RTT_CB, mrtt, srtt);
} }
#if IS_ENABLED(CONFIG_SMC) #if IS_ENABLED(CONFIG_SMC)

View File

@ -7,6 +7,23 @@
#include <linux/tracepoint.h> #include <linux/tracepoint.h>
TRACE_EVENT(bpf_trigger_tp,
TP_PROTO(int nonce),
TP_ARGS(nonce),
TP_STRUCT__entry(
__field(int, nonce)
),
TP_fast_assign(
__entry->nonce = nonce;
),
TP_printk("nonce %d", __entry->nonce)
);
DECLARE_EVENT_CLASS(bpf_test_finish, DECLARE_EVENT_CLASS(bpf_test_finish,
TP_PROTO(int *err), TP_PROTO(int *err),

View File

@ -1135,6 +1135,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TCX = 11, BPF_LINK_TYPE_TCX = 11,
BPF_LINK_TYPE_UPROBE_MULTI = 12, BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13, BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
__MAX_BPF_LINK_TYPE, __MAX_BPF_LINK_TYPE,
}; };
@ -3394,6 +3395,10 @@ union bpf_attr {
* for the nexthop. If the src addr cannot be derived, * for the nexthop. If the src addr cannot be derived,
* **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this
* case, *params*->dmac and *params*->smac are not set either. * case, *params*->dmac and *params*->smac are not set either.
* **BPF_FIB_LOOKUP_MARK**
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
* *
* *ctx* is either **struct xdp_md** for XDP programs or * *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs. * **struct sk_buff** tc cls_act programs.
@ -5022,7 +5027,7 @@ union bpf_attr {
* bytes will be copied to *dst* * bytes will be copied to *dst*
* Return * Return
* The **hash_algo** is returned on success, * The **hash_algo** is returned on success,
* **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if * **-EOPNOTSUPP** if IMA is disabled or **-EINVAL** if
* invalid arguments are passed. * invalid arguments are passed.
* *
* struct socket *bpf_sock_from_file(struct file *file) * struct socket *bpf_sock_from_file(struct file *file)
@ -5508,7 +5513,7 @@ union bpf_attr {
* bytes will be copied to *dst* * bytes will be copied to *dst*
* Return * Return
* The **hash_algo** is returned on success, * The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if * **-EOPNOTSUPP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed. * invalid arguments are passed.
* *
* void *bpf_kptr_xchg(void *map_value, void *ptr) * void *bpf_kptr_xchg(void *map_value, void *ptr)
@ -6720,6 +6725,10 @@ struct bpf_link_info {
__u32 ifindex; __u32 ifindex;
__u32 attach_type; __u32 attach_type;
} netkit; } netkit;
struct {
__u32 map_id;
__u32 attach_type;
} sockmap;
}; };
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
@ -6938,6 +6947,8 @@ enum {
* socket transition to LISTEN state. * socket transition to LISTEN state.
*/ */
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT. BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
* Arg1: measured RTT input (mrtt)
* Arg2: updated srtt
*/ */
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option. BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle * It will be called to handle
@ -7120,6 +7131,7 @@ enum {
BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
BPF_FIB_LOOKUP_TBID = (1U << 3), BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4), BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
}; };
enum { enum {
@ -7152,7 +7164,7 @@ struct bpf_fib_lookup {
/* output: MTU value */ /* output: MTU value */
__u16 mtu_result; __u16 mtu_result;
}; } __attribute__((packed, aligned(2)));
/* input: L3 device index for lookup /* input: L3 device index for lookup
* output: device index from FIB lookup * output: device index from FIB lookup
*/ */
@ -7197,8 +7209,19 @@ struct bpf_fib_lookup {
__u32 tbid; __u32 tbid;
}; };
__u8 smac[6]; /* ETH_ALEN */ union {
__u8 dmac[6]; /* ETH_ALEN */ /* input */
struct {
__u32 mark; /* policy routing */
/* 2 4-byte holes for input */
};
/* output: source and dest mac */
struct {
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
};
};
}; };
struct bpf_redir_neigh { struct bpf_redir_neigh {
@ -7285,6 +7308,10 @@ struct bpf_timer {
__u64 __opaque[2]; __u64 __opaque[2];
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_wq {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_dynptr { struct bpf_dynptr {
__u64 __opaque[2]; __u64 __opaque[2];
} __attribute__((aligned(8))); } __attribute__((aligned(8)));

View File

@ -44,6 +44,9 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-$(CONFIG_BPF_SYSCALL) += cpumask.o obj-$(CONFIG_BPF_SYSCALL) += cpumask.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif endif
ifeq ($(CONFIG_CRYPTO),y)
obj-$(CONFIG_BPF_SYSCALL) += crypto.o
endif
obj-$(CONFIG_BPF_PRELOAD) += preload/ obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o obj-$(CONFIG_BPF_SYSCALL) += relo_core.o

View File

@ -37,7 +37,7 @@
*/ */
/* number of bytes addressable by LDX/STX insn with 16-bit 'off' field */ /* number of bytes addressable by LDX/STX insn with 16-bit 'off' field */
#define GUARD_SZ (1ull << sizeof(((struct bpf_insn *)0)->off) * 8) #define GUARD_SZ (1ull << sizeof_field(struct bpf_insn, off) * 8)
#define KERN_VM_SZ (SZ_4G + GUARD_SZ) #define KERN_VM_SZ (SZ_4G + GUARD_SZ)
struct bpf_arena { struct bpf_arena {

View File

@ -246,6 +246,38 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
return this_cpu_ptr(array->pptrs[index & array->index_mask]); return this_cpu_ptr(array->pptrs[index & array->index_mask]);
} }
/* emit BPF instructions equivalent to C code of percpu_array_map_lookup_elem() */
static int percpu_array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
struct bpf_insn *insn = insn_buf;
if (!bpf_jit_supports_percpu_insn())
return -EOPNOTSUPP;
if (map->map_flags & BPF_F_INNER_MAP)
return -EOPNOTSUPP;
BUILD_BUG_ON(offsetof(struct bpf_array, map) != 0);
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, offsetof(struct bpf_array, pptrs));
*insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0);
if (!map->bypass_spec_v1) {
*insn++ = BPF_JMP_IMM(BPF_JGE, BPF_REG_0, map->max_entries, 6);
*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_0, array->index_mask);
} else {
*insn++ = BPF_JMP_IMM(BPF_JGE, BPF_REG_0, map->max_entries, 5);
}
*insn++ = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
*insn++ = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
*insn++ = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
*insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 1);
*insn++ = BPF_MOV64_IMM(BPF_REG_0, 0);
return insn - insn_buf;
}
static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu) static void *percpu_array_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu)
{ {
struct bpf_array *array = container_of(map, struct bpf_array, map); struct bpf_array *array = container_of(map, struct bpf_array, map);
@ -396,17 +428,21 @@ static void *array_map_vmalloc_addr(struct bpf_array *array)
return (void *)round_down((unsigned long)array, PAGE_SIZE); return (void *)round_down((unsigned long)array, PAGE_SIZE);
} }
static void array_map_free_timers(struct bpf_map *map) static void array_map_free_timers_wq(struct bpf_map *map)
{ {
struct bpf_array *array = container_of(map, struct bpf_array, map); struct bpf_array *array = container_of(map, struct bpf_array, map);
int i; int i;
/* We don't reset or free fields other than timer on uref dropping to zero. */ /* We don't reset or free fields other than timer and workqueue
if (!btf_record_has_field(map->record, BPF_TIMER)) * on uref dropping to zero.
return; */
if (btf_record_has_field(map->record, BPF_TIMER))
for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i));
for (i = 0; i < array->map.max_entries; i++) if (btf_record_has_field(map->record, BPF_WORKQUEUE))
bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i)); for (i = 0; i < array->map.max_entries; i++)
bpf_obj_free_workqueue(map->record, array_map_elem_ptr(array, i));
} }
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@ -750,7 +786,7 @@ const struct bpf_map_ops array_map_ops = {
.map_alloc = array_map_alloc, .map_alloc = array_map_alloc,
.map_free = array_map_free, .map_free = array_map_free,
.map_get_next_key = array_map_get_next_key, .map_get_next_key = array_map_get_next_key,
.map_release_uref = array_map_free_timers, .map_release_uref = array_map_free_timers_wq,
.map_lookup_elem = array_map_lookup_elem, .map_lookup_elem = array_map_lookup_elem,
.map_update_elem = array_map_update_elem, .map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem, .map_delete_elem = array_map_delete_elem,
@ -776,6 +812,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
.map_free = array_map_free, .map_free = array_map_free,
.map_get_next_key = array_map_get_next_key, .map_get_next_key = array_map_get_next_key,
.map_lookup_elem = percpu_array_map_lookup_elem, .map_lookup_elem = percpu_array_map_lookup_elem,
.map_gen_lookup = percpu_array_map_gen_lookup,
.map_update_elem = array_map_update_elem, .map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem, .map_delete_elem = array_map_delete_elem,
.map_lookup_percpu_elem = percpu_array_map_lookup_percpu_elem, .map_lookup_percpu_elem = percpu_array_map_lookup_percpu_elem,

View File

@ -318,7 +318,7 @@ static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
* *
* If the local_storage->list is already empty, the caller will not * If the local_storage->list is already empty, the caller will not
* care about the bpf_ma value also because the caller is not * care about the bpf_ma value also because the caller is not
* responsibile to free the local_storage. * responsible to free the local_storage.
*/ */
if (storage_smap) if (storage_smap)

View File

@ -3464,6 +3464,15 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
goto end; goto end;
} }
} }
if (field_mask & BPF_WORKQUEUE) {
if (!strcmp(name, "bpf_wq")) {
if (*seen_mask & BPF_WORKQUEUE)
return -E2BIG;
*seen_mask |= BPF_WORKQUEUE;
type = BPF_WORKQUEUE;
goto end;
}
}
field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head"); field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
field_mask_test_name(BPF_LIST_NODE, "bpf_list_node"); field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root"); field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
@ -3515,6 +3524,7 @@ static int btf_find_struct_field(const struct btf *btf,
switch (field_type) { switch (field_type) {
case BPF_SPIN_LOCK: case BPF_SPIN_LOCK:
case BPF_TIMER: case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_LIST_NODE: case BPF_LIST_NODE:
case BPF_RB_NODE: case BPF_RB_NODE:
case BPF_REFCOUNT: case BPF_REFCOUNT:
@ -3582,6 +3592,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
switch (field_type) { switch (field_type) {
case BPF_SPIN_LOCK: case BPF_SPIN_LOCK:
case BPF_TIMER: case BPF_TIMER:
case BPF_WORKQUEUE:
case BPF_LIST_NODE: case BPF_LIST_NODE:
case BPF_RB_NODE: case BPF_RB_NODE:
case BPF_REFCOUNT: case BPF_REFCOUNT:
@ -3816,6 +3827,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
rec->spin_lock_off = -EINVAL; rec->spin_lock_off = -EINVAL;
rec->timer_off = -EINVAL; rec->timer_off = -EINVAL;
rec->wq_off = -EINVAL;
rec->refcount_off = -EINVAL; rec->refcount_off = -EINVAL;
for (i = 0; i < cnt; i++) { for (i = 0; i < cnt; i++) {
field_type_size = btf_field_type_size(info_arr[i].type); field_type_size = btf_field_type_size(info_arr[i].type);
@ -3846,6 +3858,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
/* Cache offset for faster lookup at runtime */ /* Cache offset for faster lookup at runtime */
rec->timer_off = rec->fields[i].offset; rec->timer_off = rec->fields[i].offset;
break; break;
case BPF_WORKQUEUE:
WARN_ON_ONCE(rec->wq_off >= 0);
/* Cache offset for faster lookup at runtime */
rec->wq_off = rec->fields[i].offset;
break;
case BPF_REFCOUNT: case BPF_REFCOUNT:
WARN_ON_ONCE(rec->refcount_off >= 0); WARN_ON_ONCE(rec->refcount_off >= 0);
/* Cache offset for faster lookup at runtime */ /* Cache offset for faster lookup at runtime */
@ -5642,8 +5659,8 @@ errout_free:
return ERR_PTR(err); return ERR_PTR(err);
} }
extern char __weak __start_BTF[]; extern char __start_BTF[];
extern char __weak __stop_BTF[]; extern char __stop_BTF[];
extern struct btf *btf_vmlinux; extern struct btf *btf_vmlinux;
#define BPF_MAP_TYPE(_id, _ops) #define BPF_MAP_TYPE(_id, _ops)
@ -5971,6 +5988,9 @@ struct btf *btf_parse_vmlinux(void)
struct btf *btf = NULL; struct btf *btf = NULL;
int err; int err;
if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF))
return ERR_PTR(-ENOENT);
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
if (!env) if (!env)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);

View File

@ -747,7 +747,7 @@ const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
unsigned long symbol_start = ksym->start; unsigned long symbol_start = ksym->start;
unsigned long symbol_end = ksym->end; unsigned long symbol_end = ksym->end;
strncpy(sym, ksym->name, KSYM_NAME_LEN); strscpy(sym, ksym->name, KSYM_NAME_LEN);
ret = sym; ret = sym;
if (size) if (size)
@ -813,7 +813,7 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
if (it++ != symnum) if (it++ != symnum)
continue; continue;
strncpy(sym, ksym->name, KSYM_NAME_LEN); strscpy(sym, ksym->name, KSYM_NAME_LEN);
*value = ksym->start; *value = ksym->start;
*type = BPF_SYM_ELF_TYPE; *type = BPF_SYM_ELF_TYPE;
@ -2218,6 +2218,7 @@ static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn
u64 stack[stack_size / sizeof(u64)]; \ u64 stack[stack_size / sizeof(u64)]; \
u64 regs[MAX_BPF_EXT_REG] = {}; \ u64 regs[MAX_BPF_EXT_REG] = {}; \
\ \
kmsan_unpoison_memory(stack, sizeof(stack)); \
FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \ FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
ARG1 = (u64) (unsigned long) ctx; \ ARG1 = (u64) (unsigned long) ctx; \
return ___bpf_prog_run(regs, insn); \ return ___bpf_prog_run(regs, insn); \
@ -2231,6 +2232,7 @@ static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
u64 stack[stack_size / sizeof(u64)]; \ u64 stack[stack_size / sizeof(u64)]; \
u64 regs[MAX_BPF_EXT_REG]; \ u64 regs[MAX_BPF_EXT_REG]; \
\ \
kmsan_unpoison_memory(stack, sizeof(stack)); \
FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \ FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
BPF_R1 = r1; \ BPF_R1 = r1; \
BPF_R2 = r2; \ BPF_R2 = r2; \
@ -2812,7 +2814,7 @@ void bpf_prog_free(struct bpf_prog *fp)
} }
EXPORT_SYMBOL_GPL(bpf_prog_free); EXPORT_SYMBOL_GPL(bpf_prog_free);
/* RNG for unpriviledged user space with separated state from prandom_u32(). */ /* RNG for unprivileged user space with separated state from prandom_u32(). */
static DEFINE_PER_CPU(struct rnd_state, bpf_user_rnd_state); static DEFINE_PER_CPU(struct rnd_state, bpf_user_rnd_state);
void bpf_user_rnd_init_once(void) void bpf_user_rnd_init_once(void)
@ -2943,6 +2945,11 @@ bool __weak bpf_jit_supports_subprog_tailcalls(void)
return false; return false;
} }
bool __weak bpf_jit_supports_percpu_insn(void)
{
return false;
}
bool __weak bpf_jit_supports_kfunc_call(void) bool __weak bpf_jit_supports_kfunc_call(void)
{ {
return false; return false;
@ -2958,6 +2965,11 @@ bool __weak bpf_jit_supports_arena(void)
return false; return false;
} }
bool __weak bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
return false;
}
/* Return TRUE if the JIT backend satisfies the following two conditions: /* Return TRUE if the JIT backend satisfies the following two conditions:
* 1) JIT backend supports atomic_xchg() on pointer-sized words. * 1) JIT backend supports atomic_xchg() on pointer-sized words.
* 2) Under the specific arch, the implementation of xchg() is the same * 2) Under the specific arch, the implementation of xchg() is the same

View File

@ -474,6 +474,7 @@ static int __init cpumask_kfunc_init(void)
ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false); ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &cpumask_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors, return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors,
ARRAY_SIZE(cpumask_dtors), ARRAY_SIZE(cpumask_dtors),
THIS_MODULE); THIS_MODULE);

385
kernel/bpf/crypto.c Normal file
View File

@ -0,0 +1,385 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2024 Meta, Inc */
#include <linux/bpf.h>
#include <linux/bpf_crypto.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/btf.h>
#include <linux/btf_ids.h>
#include <linux/filter.h>
#include <linux/scatterlist.h>
#include <linux/skbuff.h>
#include <crypto/skcipher.h>
struct bpf_crypto_type_list {
const struct bpf_crypto_type *type;
struct list_head list;
};
/* BPF crypto initialization parameters struct */
/**
* struct bpf_crypto_params - BPF crypto initialization parameters structure
* @type: The string of crypto operation type.
* @reserved: Reserved member, will be reused for more options in future
* Values:
* 0
* @algo: The string of algorithm to initialize.
* @key: The cipher key used to init crypto algorithm.
* @key_len: The length of cipher key.
* @authsize: The length of authentication tag used by algorithm.
*/
struct bpf_crypto_params {
char type[14];
u8 reserved[2];
char algo[128];
u8 key[256];
u32 key_len;
u32 authsize;
};
static LIST_HEAD(bpf_crypto_types);
static DECLARE_RWSEM(bpf_crypto_types_sem);
/**
* struct bpf_crypto_ctx - refcounted BPF crypto context structure
* @type: The pointer to bpf crypto type
* @tfm: The pointer to instance of crypto API struct.
* @siv_len: Size of IV and state storage for cipher
* @rcu: The RCU head used to free the crypto context with RCU safety.
* @usage: Object reference counter. When the refcount goes to 0, the
* memory is released back to the BPF allocator, which provides
* RCU safety.
*/
struct bpf_crypto_ctx {
const struct bpf_crypto_type *type;
void *tfm;
u32 siv_len;
struct rcu_head rcu;
refcount_t usage;
};
int bpf_crypto_register_type(const struct bpf_crypto_type *type)
{
struct bpf_crypto_type_list *node;
int err = -EEXIST;
down_write(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (!strcmp(node->type->name, type->name))
goto unlock;
}
node = kmalloc(sizeof(*node), GFP_KERNEL);
err = -ENOMEM;
if (!node)
goto unlock;
node->type = type;
list_add(&node->list, &bpf_crypto_types);
err = 0;
unlock:
up_write(&bpf_crypto_types_sem);
return err;
}
EXPORT_SYMBOL_GPL(bpf_crypto_register_type);
int bpf_crypto_unregister_type(const struct bpf_crypto_type *type)
{
struct bpf_crypto_type_list *node;
int err = -ENOENT;
down_write(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (strcmp(node->type->name, type->name))
continue;
list_del(&node->list);
kfree(node);
err = 0;
break;
}
up_write(&bpf_crypto_types_sem);
return err;
}
EXPORT_SYMBOL_GPL(bpf_crypto_unregister_type);
static const struct bpf_crypto_type *bpf_crypto_get_type(const char *name)
{
const struct bpf_crypto_type *type = ERR_PTR(-ENOENT);
struct bpf_crypto_type_list *node;
down_read(&bpf_crypto_types_sem);
list_for_each_entry(node, &bpf_crypto_types, list) {
if (strcmp(node->type->name, name))
continue;
if (try_module_get(node->type->owner))
type = node->type;
break;
}
up_read(&bpf_crypto_types_sem);
return type;
}
__bpf_kfunc_start_defs();
/**
* bpf_crypto_ctx_create() - Create a mutable BPF crypto context.
*
* Allocates a crypto context that can be used, acquired, and released by
* a BPF program. The crypto context returned by this function must either
* be embedded in a map as a kptr, or freed with bpf_crypto_ctx_release().
* As crypto API functions use GFP_KERNEL allocations, this function can
* only be used in sleepable BPF programs.
*
* bpf_crypto_ctx_create() allocates memory for crypto context.
* It may return NULL if no memory is available.
* @params: pointer to struct bpf_crypto_params which contains all the
* details needed to initialise crypto context.
* @params__sz: size of steuct bpf_crypto_params usef by bpf program
* @err: integer to store error code when NULL is returned.
*/
__bpf_kfunc struct bpf_crypto_ctx *
bpf_crypto_ctx_create(const struct bpf_crypto_params *params, u32 params__sz,
int *err)
{
const struct bpf_crypto_type *type;
struct bpf_crypto_ctx *ctx;
if (!params || params->reserved[0] || params->reserved[1] ||
params__sz != sizeof(struct bpf_crypto_params)) {
*err = -EINVAL;
return NULL;
}
type = bpf_crypto_get_type(params->type);
if (IS_ERR(type)) {
*err = PTR_ERR(type);
return NULL;
}
if (!type->has_algo(params->algo)) {
*err = -EOPNOTSUPP;
goto err_module_put;
}
if (!!params->authsize ^ !!type->setauthsize) {
*err = -EOPNOTSUPP;
goto err_module_put;
}
if (!params->key_len || params->key_len > sizeof(params->key)) {
*err = -EINVAL;
goto err_module_put;
}
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx) {
*err = -ENOMEM;
goto err_module_put;
}
ctx->type = type;
ctx->tfm = type->alloc_tfm(params->algo);
if (IS_ERR(ctx->tfm)) {
*err = PTR_ERR(ctx->tfm);
goto err_free_ctx;
}
if (params->authsize) {
*err = type->setauthsize(ctx->tfm, params->authsize);
if (*err)
goto err_free_tfm;
}
*err = type->setkey(ctx->tfm, params->key, params->key_len);
if (*err)
goto err_free_tfm;
if (type->get_flags(ctx->tfm) & CRYPTO_TFM_NEED_KEY) {
*err = -EINVAL;
goto err_free_tfm;
}
ctx->siv_len = type->ivsize(ctx->tfm) + type->statesize(ctx->tfm);
refcount_set(&ctx->usage, 1);
return ctx;
err_free_tfm:
type->free_tfm(ctx->tfm);
err_free_ctx:
kfree(ctx);
err_module_put:
module_put(type->owner);
return NULL;
}
static void crypto_free_cb(struct rcu_head *head)
{
struct bpf_crypto_ctx *ctx;
ctx = container_of(head, struct bpf_crypto_ctx, rcu);
ctx->type->free_tfm(ctx->tfm);
module_put(ctx->type->owner);
kfree(ctx);
}
/**
* bpf_crypto_ctx_acquire() - Acquire a reference to a BPF crypto context.
* @ctx: The BPF crypto context being acquired. The ctx must be a trusted
* pointer.
*
* Acquires a reference to a BPF crypto context. The context returned by this function
* must either be embedded in a map as a kptr, or freed with
* bpf_crypto_ctx_release().
*/
__bpf_kfunc struct bpf_crypto_ctx *
bpf_crypto_ctx_acquire(struct bpf_crypto_ctx *ctx)
{
if (!refcount_inc_not_zero(&ctx->usage))
return NULL;
return ctx;
}
/**
* bpf_crypto_ctx_release() - Release a previously acquired BPF crypto context.
* @ctx: The crypto context being released.
*
* Releases a previously acquired reference to a BPF crypto context. When the final
* reference of the BPF crypto context has been released, its memory
* will be released.
*/
__bpf_kfunc void bpf_crypto_ctx_release(struct bpf_crypto_ctx *ctx)
{
if (refcount_dec_and_test(&ctx->usage))
call_rcu(&ctx->rcu, crypto_free_cb);
}
static int bpf_crypto_crypt(const struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv,
bool decrypt)
{
u32 src_len, dst_len, siv_len;
const u8 *psrc;
u8 *pdst, *piv;
int err;
if (__bpf_dynptr_is_rdonly(dst))
return -EINVAL;
siv_len = __bpf_dynptr_size(siv);
src_len = __bpf_dynptr_size(src);
dst_len = __bpf_dynptr_size(dst);
if (!src_len || !dst_len)
return -EINVAL;
if (siv_len != ctx->siv_len)
return -EINVAL;
psrc = __bpf_dynptr_data(src, src_len);
if (!psrc)
return -EINVAL;
pdst = __bpf_dynptr_data_rw(dst, dst_len);
if (!pdst)
return -EINVAL;
piv = siv_len ? __bpf_dynptr_data_rw(siv, siv_len) : NULL;
if (siv_len && !piv)
return -EINVAL;
err = decrypt ? ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv)
: ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);
return err;
}
/**
* bpf_crypto_decrypt() - Decrypt buffer using configured context and IV provided.
* @ctx: The crypto context being used. The ctx must be a trusted pointer.
* @src: bpf_dynptr to the encrypted data. Must be a trusted pointer.
* @dst: bpf_dynptr to the buffer where to store the result. Must be a trusted pointer.
* @siv: bpf_dynptr to IV data and state data to be used by decryptor.
*
* Decrypts provided buffer using IV data and the crypto context. Crypto context must be configured.
*/
__bpf_kfunc int bpf_crypto_decrypt(struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv)
{
return bpf_crypto_crypt(ctx, src, dst, siv, true);
}
/**
* bpf_crypto_encrypt() - Encrypt buffer using configured context and IV provided.
* @ctx: The crypto context being used. The ctx must be a trusted pointer.
* @src: bpf_dynptr to the plain data. Must be a trusted pointer.
* @dst: bpf_dynptr to buffer where to store the result. Must be a trusted pointer.
* @siv: bpf_dynptr to IV data and state data to be used by decryptor.
*
* Encrypts provided buffer using IV data and the crypto context. Crypto context must be configured.
*/
__bpf_kfunc int bpf_crypto_encrypt(struct bpf_crypto_ctx *ctx,
const struct bpf_dynptr_kern *src,
const struct bpf_dynptr_kern *dst,
const struct bpf_dynptr_kern *siv)
{
return bpf_crypto_crypt(ctx, src, dst, siv, false);
}
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(crypt_init_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_crypto_ctx_create, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
BTF_ID_FLAGS(func, bpf_crypto_ctx_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_crypto_ctx_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_KFUNCS_END(crypt_init_kfunc_btf_ids)
static const struct btf_kfunc_id_set crypt_init_kfunc_set = {
.owner = THIS_MODULE,
.set = &crypt_init_kfunc_btf_ids,
};
BTF_KFUNCS_START(crypt_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_crypto_decrypt, KF_RCU)
BTF_ID_FLAGS(func, bpf_crypto_encrypt, KF_RCU)
BTF_KFUNCS_END(crypt_kfunc_btf_ids)
static const struct btf_kfunc_id_set crypt_kfunc_set = {
.owner = THIS_MODULE,
.set = &crypt_kfunc_btf_ids,
};
BTF_ID_LIST(bpf_crypto_dtor_ids)
BTF_ID(struct, bpf_crypto_ctx)
BTF_ID(func, bpf_crypto_ctx_release)
static int __init crypto_kfunc_init(void)
{
int ret;
const struct btf_id_dtor_kfunc bpf_crypto_dtors[] = {
{
.btf_id = bpf_crypto_dtor_ids[0],
.kfunc_btf_id = bpf_crypto_dtor_ids[1]
},
};
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &crypt_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL,
&crypt_init_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(bpf_crypto_dtors,
ARRAY_SIZE(bpf_crypto_dtors),
THIS_MODULE);
}
late_initcall(crypto_kfunc_init);

View File

@ -172,6 +172,17 @@ static bool is_addr_space_cast(const struct bpf_insn *insn)
insn->off == BPF_ADDR_SPACE_CAST; insn->off == BPF_ADDR_SPACE_CAST;
} }
/* Special (internal-only) form of mov, used to resolve per-CPU addrs:
* dst_reg = src_reg + <percpu_base_off>
* BPF_ADDR_PERCPU is used as a special insn->off value.
*/
#define BPF_ADDR_PERCPU (-1)
static inline bool is_mov_percpu_addr(const struct bpf_insn *insn)
{
return insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->off == BPF_ADDR_PERCPU;
}
void print_bpf_insn(const struct bpf_insn_cbs *cbs, void print_bpf_insn(const struct bpf_insn_cbs *cbs,
const struct bpf_insn *insn, const struct bpf_insn *insn,
bool allow_ptr_leaks) bool allow_ptr_leaks)
@ -194,6 +205,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
verbose(cbs->private_data, "(%02x) r%d = addr_space_cast(r%d, %d, %d)\n", verbose(cbs->private_data, "(%02x) r%d = addr_space_cast(r%d, %d, %d)\n",
insn->code, insn->dst_reg, insn->code, insn->dst_reg,
insn->src_reg, ((u32)insn->imm) >> 16, (u16)insn->imm); insn->src_reg, ((u32)insn->imm) >> 16, (u16)insn->imm);
} else if (is_mov_percpu_addr(insn)) {
verbose(cbs->private_data, "(%02x) r%d = &(void __percpu *)(r%d)\n",
insn->code, insn->dst_reg, insn->src_reg);
} else if (BPF_SRC(insn->code) == BPF_X) { } else if (BPF_SRC(insn->code) == BPF_X) {
verbose(cbs->private_data, "(%02x) %c%d %s %s%c%d\n", verbose(cbs->private_data, "(%02x) %c%d %s %s%c%d\n",
insn->code, class == BPF_ALU ? 'w' : 'r', insn->code, class == BPF_ALU ? 'w' : 'r',

View File

@ -240,6 +240,26 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
} }
} }
static void htab_free_prealloced_wq(struct bpf_htab *htab)
{
u32 num_entries = htab->map.max_entries;
int i;
if (!btf_record_has_field(htab->map.record, BPF_WORKQUEUE))
return;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
for (i = 0; i < num_entries; i++) {
struct htab_elem *elem;
elem = get_htab_elem(htab, i);
bpf_obj_free_workqueue(htab->map.record,
elem->key + round_up(htab->map.key_size, 8));
cond_resched();
}
}
static void htab_free_prealloced_fields(struct bpf_htab *htab) static void htab_free_prealloced_fields(struct bpf_htab *htab)
{ {
u32 num_entries = htab->map.max_entries; u32 num_entries = htab->map.max_entries;
@ -1490,11 +1510,12 @@ static void delete_all_elements(struct bpf_htab *htab)
hlist_nulls_del_rcu(&l->hash_node); hlist_nulls_del_rcu(&l->hash_node);
htab_elem_free(htab, l); htab_elem_free(htab, l);
} }
cond_resched();
} }
migrate_enable(); migrate_enable();
} }
static void htab_free_malloced_timers(struct bpf_htab *htab) static void htab_free_malloced_timers_or_wq(struct bpf_htab *htab, bool is_timer)
{ {
int i; int i;
@ -1506,24 +1527,35 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
hlist_nulls_for_each_entry(l, n, head, hash_node) { hlist_nulls_for_each_entry(l, n, head, hash_node) {
/* We only free timer on uref dropping to zero */ /* We only free timer on uref dropping to zero */
bpf_obj_free_timer(htab->map.record, l->key + round_up(htab->map.key_size, 8)); if (is_timer)
bpf_obj_free_timer(htab->map.record,
l->key + round_up(htab->map.key_size, 8));
else
bpf_obj_free_workqueue(htab->map.record,
l->key + round_up(htab->map.key_size, 8));
} }
cond_resched_rcu(); cond_resched_rcu();
} }
rcu_read_unlock(); rcu_read_unlock();
} }
static void htab_map_free_timers(struct bpf_map *map) static void htab_map_free_timers_and_wq(struct bpf_map *map)
{ {
struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
/* We only free timer on uref dropping to zero */ /* We only free timer and workqueue on uref dropping to zero */
if (!btf_record_has_field(htab->map.record, BPF_TIMER)) if (btf_record_has_field(htab->map.record, BPF_TIMER)) {
return; if (!htab_is_prealloc(htab))
if (!htab_is_prealloc(htab)) htab_free_malloced_timers_or_wq(htab, true);
htab_free_malloced_timers(htab); else
else htab_free_prealloced_timers(htab);
htab_free_prealloced_timers(htab); }
if (btf_record_has_field(htab->map.record, BPF_WORKQUEUE)) {
if (!htab_is_prealloc(htab))
htab_free_malloced_timers_or_wq(htab, false);
else
htab_free_prealloced_wq(htab);
}
} }
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@ -1538,7 +1570,7 @@ static void htab_map_free(struct bpf_map *map)
*/ */
/* htab no longer uses call_rcu() directly. bpf_mem_alloc does it /* htab no longer uses call_rcu() directly. bpf_mem_alloc does it
* underneath and is reponsible for waiting for callbacks to finish * underneath and is responsible for waiting for callbacks to finish
* during bpf_mem_alloc_destroy(). * during bpf_mem_alloc_destroy().
*/ */
if (!htab_is_prealloc(htab)) { if (!htab_is_prealloc(htab)) {
@ -2259,7 +2291,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key, .map_get_next_key = htab_map_get_next_key,
.map_release_uref = htab_map_free_timers, .map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_map_lookup_elem, .map_lookup_elem = htab_map_lookup_elem,
.map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem, .map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem,
.map_update_elem = htab_map_update_elem, .map_update_elem = htab_map_update_elem,
@ -2280,7 +2312,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_alloc = htab_map_alloc, .map_alloc = htab_map_alloc,
.map_free = htab_map_free, .map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key, .map_get_next_key = htab_map_get_next_key,
.map_release_uref = htab_map_free_timers, .map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_lru_map_lookup_elem, .map_lookup_elem = htab_lru_map_lookup_elem,
.map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem, .map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem,
.map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys, .map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys,
@ -2307,6 +2339,26 @@ static void *htab_percpu_map_lookup_elem(struct bpf_map *map, void *key)
return NULL; return NULL;
} }
/* inline bpf_map_lookup_elem() call for per-CPU hashmap */
static int htab_percpu_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
struct bpf_insn *insn = insn_buf;
if (!bpf_jit_supports_percpu_insn())
return -EOPNOTSUPP;
BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem,
(void *(*)(struct bpf_map *map, void *key))NULL));
*insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem);
*insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3);
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0,
offsetof(struct htab_elem, key) + map->key_size);
*insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
*insn++ = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
return insn - insn_buf;
}
static void *htab_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu) static void *htab_percpu_map_lookup_percpu_elem(struct bpf_map *map, void *key, u32 cpu)
{ {
struct htab_elem *l; struct htab_elem *l;
@ -2435,6 +2487,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_free = htab_map_free, .map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key, .map_get_next_key = htab_map_get_next_key,
.map_lookup_elem = htab_percpu_map_lookup_elem, .map_lookup_elem = htab_percpu_map_lookup_elem,
.map_gen_lookup = htab_percpu_map_gen_lookup,
.map_lookup_and_delete_elem = htab_percpu_map_lookup_and_delete_elem, .map_lookup_and_delete_elem = htab_percpu_map_lookup_and_delete_elem,
.map_update_elem = htab_percpu_map_update_elem, .map_update_elem = htab_percpu_map_update_elem,
.map_delete_elem = htab_map_delete_elem, .map_delete_elem = htab_map_delete_elem,

View File

@ -1079,11 +1079,20 @@ const struct bpf_func_proto bpf_snprintf_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO, .arg5_type = ARG_CONST_SIZE_OR_ZERO,
}; };
struct bpf_async_cb {
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
struct rcu_head rcu;
u64 flags;
};
/* BPF map elements can contain 'struct bpf_timer'. /* BPF map elements can contain 'struct bpf_timer'.
* Such map owns all of its BPF timers. * Such map owns all of its BPF timers.
* 'struct bpf_timer' is allocated as part of map element allocation * 'struct bpf_timer' is allocated as part of map element allocation
* and it's zero initialized. * and it's zero initialized.
* That space is used to keep 'struct bpf_timer_kern'. * That space is used to keep 'struct bpf_async_kern'.
* bpf_timer_init() allocates 'struct bpf_hrtimer', inits hrtimer, and * bpf_timer_init() allocates 'struct bpf_hrtimer', inits hrtimer, and
* remembers 'struct bpf_map *' pointer it's part of. * remembers 'struct bpf_map *' pointer it's part of.
* bpf_timer_set_callback() increments prog refcnt and assign bpf callback_fn. * bpf_timer_set_callback() increments prog refcnt and assign bpf callback_fn.
@ -1096,17 +1105,23 @@ const struct bpf_func_proto bpf_snprintf_proto = {
* freeing the timers when inner map is replaced or deleted by user space. * freeing the timers when inner map is replaced or deleted by user space.
*/ */
struct bpf_hrtimer { struct bpf_hrtimer {
struct bpf_async_cb cb;
struct hrtimer timer; struct hrtimer timer;
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
struct rcu_head rcu;
}; };
/* the actual struct hidden inside uapi struct bpf_timer */ struct bpf_work {
struct bpf_timer_kern { struct bpf_async_cb cb;
struct bpf_hrtimer *timer; struct work_struct work;
struct work_struct delete_work;
};
/* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
struct bpf_async_kern {
union {
struct bpf_async_cb *cb;
struct bpf_hrtimer *timer;
struct bpf_work *work;
};
/* bpf_spin_lock is used here instead of spinlock_t to make /* bpf_spin_lock is used here instead of spinlock_t to make
* sure that it always fits into space reserved by struct bpf_timer * sure that it always fits into space reserved by struct bpf_timer
* regardless of LOCKDEP and spinlock debug flags. * regardless of LOCKDEP and spinlock debug flags.
@ -1114,19 +1129,24 @@ struct bpf_timer_kern {
struct bpf_spin_lock lock; struct bpf_spin_lock lock;
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
enum bpf_async_type {
BPF_ASYNC_TYPE_TIMER = 0,
BPF_ASYNC_TYPE_WQ,
};
static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running); static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer) static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
{ {
struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer); struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer);
struct bpf_map *map = t->map; struct bpf_map *map = t->cb.map;
void *value = t->value; void *value = t->cb.value;
bpf_callback_t callback_fn; bpf_callback_t callback_fn;
void *key; void *key;
u32 idx; u32 idx;
BTF_TYPE_EMIT(struct bpf_timer); BTF_TYPE_EMIT(struct bpf_timer);
callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held()); callback_fn = rcu_dereference_check(t->cb.callback_fn, rcu_read_lock_bh_held());
if (!callback_fn) if (!callback_fn)
goto out; goto out;
@ -1155,46 +1175,112 @@ out:
return HRTIMER_NORESTART; return HRTIMER_NORESTART;
} }
BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map, static void bpf_wq_work(struct work_struct *work)
u64, flags)
{ {
clockid_t clockid = flags & (MAX_CLOCKS - 1); struct bpf_work *w = container_of(work, struct bpf_work, work);
struct bpf_hrtimer *t; struct bpf_async_cb *cb = &w->cb;
int ret = 0; struct bpf_map *map = cb->map;
bpf_callback_t callback_fn;
void *value = cb->value;
void *key;
u32 idx;
BUILD_BUG_ON(MAX_CLOCKS != 16); BTF_TYPE_EMIT(struct bpf_wq);
BUILD_BUG_ON(sizeof(struct bpf_timer_kern) > sizeof(struct bpf_timer));
BUILD_BUG_ON(__alignof__(struct bpf_timer_kern) != __alignof__(struct bpf_timer)); callback_fn = READ_ONCE(cb->callback_fn);
if (!callback_fn)
return;
if (map->map_type == BPF_MAP_TYPE_ARRAY) {
struct bpf_array *array = container_of(map, struct bpf_array, map);
/* compute the key */
idx = ((char *)value - array->value) / array->elem_size;
key = &idx;
} else { /* hash or lru */
key = value - round_up(map->key_size, 8);
}
rcu_read_lock_trace();
migrate_disable();
callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
migrate_enable();
rcu_read_unlock_trace();
}
static void bpf_wq_delete_work(struct work_struct *work)
{
struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
cancel_work_sync(&w->work);
kfree_rcu(w, cb.rcu);
}
static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
enum bpf_async_type type)
{
struct bpf_async_cb *cb;
struct bpf_hrtimer *t;
struct bpf_work *w;
clockid_t clockid;
size_t size;
int ret = 0;
if (in_nmi()) if (in_nmi())
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (flags >= MAX_CLOCKS || switch (type) {
/* similar to timerfd except _ALARM variants are not supported */ case BPF_ASYNC_TYPE_TIMER:
(clockid != CLOCK_MONOTONIC && size = sizeof(struct bpf_hrtimer);
clockid != CLOCK_REALTIME && break;
clockid != CLOCK_BOOTTIME)) case BPF_ASYNC_TYPE_WQ:
size = sizeof(struct bpf_work);
break;
default:
return -EINVAL; return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock); }
t = timer->timer;
__bpf_spin_lock_irqsave(&async->lock);
t = async->timer;
if (t) { if (t) {
ret = -EBUSY; ret = -EBUSY;
goto out; goto out;
} }
/* allocate hrtimer via map_kmalloc to use memcg accounting */ /* allocate hrtimer via map_kmalloc to use memcg accounting */
t = bpf_map_kmalloc_node(map, sizeof(*t), GFP_ATOMIC, map->numa_node); cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node);
if (!t) { if (!cb) {
ret = -ENOMEM; ret = -ENOMEM;
goto out; goto out;
} }
t->value = (void *)timer - map->record->timer_off;
t->map = map; switch (type) {
t->prog = NULL; case BPF_ASYNC_TYPE_TIMER:
rcu_assign_pointer(t->callback_fn, NULL); clockid = flags & (MAX_CLOCKS - 1);
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT); t = (struct bpf_hrtimer *)cb;
t->timer.function = bpf_timer_cb;
WRITE_ONCE(timer->timer, t); hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
/* Guarantee the order between timer->timer and map->usercnt. So t->timer.function = bpf_timer_cb;
cb->value = (void *)async - map->record->timer_off;
break;
case BPF_ASYNC_TYPE_WQ:
w = (struct bpf_work *)cb;
INIT_WORK(&w->work, bpf_wq_work);
INIT_WORK(&w->delete_work, bpf_wq_delete_work);
cb->value = (void *)async - map->record->wq_off;
break;
}
cb->map = map;
cb->prog = NULL;
cb->flags = flags;
rcu_assign_pointer(cb->callback_fn, NULL);
WRITE_ONCE(async->cb, cb);
/* Guarantee the order between async->cb and map->usercnt. So
* when there are concurrent uref release and bpf timer init, either * when there are concurrent uref release and bpf timer init, either
* bpf_timer_cancel_and_free() called by uref release reads a no-NULL * bpf_timer_cancel_and_free() called by uref release reads a no-NULL
* timer or atomic64_read() below returns a zero usercnt. * timer or atomic64_read() below returns a zero usercnt.
@ -1204,15 +1290,34 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
/* maps with timers must be either held by user space /* maps with timers must be either held by user space
* or pinned in bpffs. * or pinned in bpffs.
*/ */
WRITE_ONCE(timer->timer, NULL); WRITE_ONCE(async->cb, NULL);
kfree(t); kfree(cb);
ret = -EPERM; ret = -EPERM;
} }
out: out:
__bpf_spin_unlock_irqrestore(&timer->lock); __bpf_spin_unlock_irqrestore(&async->lock);
return ret; return ret;
} }
BPF_CALL_3(bpf_timer_init, struct bpf_async_kern *, timer, struct bpf_map *, map,
u64, flags)
{
clock_t clockid = flags & (MAX_CLOCKS - 1);
BUILD_BUG_ON(MAX_CLOCKS != 16);
BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_timer));
BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_timer));
if (flags >= MAX_CLOCKS ||
/* similar to timerfd except _ALARM variants are not supported */
(clockid != CLOCK_MONOTONIC &&
clockid != CLOCK_REALTIME &&
clockid != CLOCK_BOOTTIME))
return -EINVAL;
return __bpf_async_init(timer, map, flags, BPF_ASYNC_TYPE_TIMER);
}
static const struct bpf_func_proto bpf_timer_init_proto = { static const struct bpf_func_proto bpf_timer_init_proto = {
.func = bpf_timer_init, .func = bpf_timer_init,
.gpl_only = true, .gpl_only = true,
@ -1222,22 +1327,23 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING, .arg3_type = ARG_ANYTHING,
}; };
BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn, static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback_fn,
struct bpf_prog_aux *, aux) struct bpf_prog_aux *aux, unsigned int flags,
enum bpf_async_type type)
{ {
struct bpf_prog *prev, *prog = aux->prog; struct bpf_prog *prev, *prog = aux->prog;
struct bpf_hrtimer *t; struct bpf_async_cb *cb;
int ret = 0; int ret = 0;
if (in_nmi()) if (in_nmi())
return -EOPNOTSUPP; return -EOPNOTSUPP;
__bpf_spin_lock_irqsave(&timer->lock); __bpf_spin_lock_irqsave(&async->lock);
t = timer->timer; cb = async->cb;
if (!t) { if (!cb) {
ret = -EINVAL; ret = -EINVAL;
goto out; goto out;
} }
if (!atomic64_read(&t->map->usercnt)) { if (!atomic64_read(&cb->map->usercnt)) {
/* maps with timers must be either held by user space /* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be * or pinned in bpffs. Otherwise timer might still be
* running even when bpf prog is detached and user space * running even when bpf prog is detached and user space
@ -1246,7 +1352,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EPERM; ret = -EPERM;
goto out; goto out;
} }
prev = t->prog; prev = cb->prog;
if (prev != prog) { if (prev != prog) {
/* Bump prog refcnt once. Every bpf_timer_set_callback() /* Bump prog refcnt once. Every bpf_timer_set_callback()
* can pick different callback_fn-s within the same prog. * can pick different callback_fn-s within the same prog.
@ -1259,14 +1365,20 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
if (prev) if (prev)
/* Drop prev prog refcnt when swapping with new prog */ /* Drop prev prog refcnt when swapping with new prog */
bpf_prog_put(prev); bpf_prog_put(prev);
t->prog = prog; cb->prog = prog;
} }
rcu_assign_pointer(t->callback_fn, callback_fn); rcu_assign_pointer(cb->callback_fn, callback_fn);
out: out:
__bpf_spin_unlock_irqrestore(&timer->lock); __bpf_spin_unlock_irqrestore(&async->lock);
return ret; return ret;
} }
BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
struct bpf_prog_aux *, aux)
{
return __bpf_async_set_callback(timer, callback_fn, aux, 0, BPF_ASYNC_TYPE_TIMER);
}
static const struct bpf_func_proto bpf_timer_set_callback_proto = { static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.func = bpf_timer_set_callback, .func = bpf_timer_set_callback,
.gpl_only = true, .gpl_only = true,
@ -1275,7 +1387,7 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.arg2_type = ARG_PTR_TO_FUNC, .arg2_type = ARG_PTR_TO_FUNC,
}; };
BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, flags) BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, flags)
{ {
struct bpf_hrtimer *t; struct bpf_hrtimer *t;
int ret = 0; int ret = 0;
@ -1287,7 +1399,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
return -EINVAL; return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock); __bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer; t = timer->timer;
if (!t || !t->prog) { if (!t || !t->cb.prog) {
ret = -EINVAL; ret = -EINVAL;
goto out; goto out;
} }
@ -1315,18 +1427,18 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
.arg3_type = ARG_ANYTHING, .arg3_type = ARG_ANYTHING,
}; };
static void drop_prog_refcnt(struct bpf_hrtimer *t) static void drop_prog_refcnt(struct bpf_async_cb *async)
{ {
struct bpf_prog *prog = t->prog; struct bpf_prog *prog = async->prog;
if (prog) { if (prog) {
bpf_prog_put(prog); bpf_prog_put(prog);
t->prog = NULL; async->prog = NULL;
rcu_assign_pointer(t->callback_fn, NULL); rcu_assign_pointer(async->callback_fn, NULL);
} }
} }
BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer) BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
{ {
struct bpf_hrtimer *t; struct bpf_hrtimer *t;
int ret = 0; int ret = 0;
@ -1348,7 +1460,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
ret = -EDEADLK; ret = -EDEADLK;
goto out; goto out;
} }
drop_prog_refcnt(t); drop_prog_refcnt(&t->cb);
out: out:
__bpf_spin_unlock_irqrestore(&timer->lock); __bpf_spin_unlock_irqrestore(&timer->lock);
/* Cancel the timer and wait for associated callback to finish /* Cancel the timer and wait for associated callback to finish
@ -1366,36 +1478,44 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
.arg1_type = ARG_PTR_TO_TIMER, .arg1_type = ARG_PTR_TO_TIMER,
}; };
static struct bpf_async_cb *__bpf_async_cancel_and_free(struct bpf_async_kern *async)
{
struct bpf_async_cb *cb;
/* Performance optimization: read async->cb without lock first. */
if (!READ_ONCE(async->cb))
return NULL;
__bpf_spin_lock_irqsave(&async->lock);
/* re-read it under lock */
cb = async->cb;
if (!cb)
goto out;
drop_prog_refcnt(cb);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
WRITE_ONCE(async->cb, NULL);
out:
__bpf_spin_unlock_irqrestore(&async->lock);
return cb;
}
/* This function is called by map_delete/update_elem for individual element and /* This function is called by map_delete/update_elem for individual element and
* by ops->map_release_uref when the user space reference to a map reaches zero. * by ops->map_release_uref when the user space reference to a map reaches zero.
*/ */
void bpf_timer_cancel_and_free(void *val) void bpf_timer_cancel_and_free(void *val)
{ {
struct bpf_timer_kern *timer = val;
struct bpf_hrtimer *t; struct bpf_hrtimer *t;
/* Performance optimization: read timer->timer without lock first. */ t = (struct bpf_hrtimer *)__bpf_async_cancel_and_free(val);
if (!READ_ONCE(timer->timer))
return;
__bpf_spin_lock_irqsave(&timer->lock);
/* re-read it under lock */
t = timer->timer;
if (!t)
goto out;
drop_prog_refcnt(t);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
WRITE_ONCE(timer->timer, NULL);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
if (!t) if (!t)
return; return;
/* Cancel the timer and wait for callback to complete if it was running. /* Cancel the timer and wait for callback to complete if it was running.
* If hrtimer_cancel() can be safely called it's safe to call kfree(t) * If hrtimer_cancel() can be safely called it's safe to call kfree(t)
* right after for both preallocated and non-preallocated maps. * right after for both preallocated and non-preallocated maps.
* The timer->timer = NULL was already done and no code path can * The async->cb = NULL was already done and no code path can
* see address 't' anymore. * see address 't' anymore.
* *
* Check that bpf_map_delete/update_elem() wasn't called from timer * Check that bpf_map_delete/update_elem() wasn't called from timer
@ -1404,13 +1524,33 @@ out:
* return -1). Though callback_fn is still running on this cpu it's * return -1). Though callback_fn is still running on this cpu it's
* safe to do kfree(t) because bpf_timer_cb() read everything it needed * safe to do kfree(t) because bpf_timer_cb() read everything it needed
* from 't'. The bpf subprog callback_fn won't be able to access 't', * from 't'. The bpf subprog callback_fn won't be able to access 't',
* since timer->timer = NULL was already done. The timer will be * since async->cb = NULL was already done. The timer will be
* effectively cancelled because bpf_timer_cb() will return * effectively cancelled because bpf_timer_cb() will return
* HRTIMER_NORESTART. * HRTIMER_NORESTART.
*/ */
if (this_cpu_read(hrtimer_running) != t) if (this_cpu_read(hrtimer_running) != t)
hrtimer_cancel(&t->timer); hrtimer_cancel(&t->timer);
kfree_rcu(t, rcu); kfree_rcu(t, cb.rcu);
}
/* This function is called by map_delete/update_elem for individual element and
* by ops->map_release_uref when the user space reference to a map reaches zero.
*/
void bpf_wq_cancel_and_free(void *val)
{
struct bpf_work *work;
BTF_TYPE_EMIT(struct bpf_wq);
work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
if (!work)
return;
/* Trigger cancel of the sleepable work, but *do not* wait for
* it to finish if it was running as we might not be in a
* sleepable context.
* kfree will be called once the work has finished.
*/
schedule_work(&work->delete_work);
} }
BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr) BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
@ -1443,7 +1583,7 @@ static const struct bpf_func_proto bpf_kptr_xchg_proto = {
#define DYNPTR_SIZE_MASK 0xFFFFFF #define DYNPTR_SIZE_MASK 0xFFFFFF
#define DYNPTR_RDONLY_BIT BIT(31) #define DYNPTR_RDONLY_BIT BIT(31)
static bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr) bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr)
{ {
return ptr->size & DYNPTR_RDONLY_BIT; return ptr->size & DYNPTR_RDONLY_BIT;
} }
@ -2412,7 +2552,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
/* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice. /* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice.
* *
* For skb-type dynptrs, it is safe to write into the returned pointer * For skb-type dynptrs, it is safe to write into the returned pointer
* if the bpf program allows skb data writes. There are two possiblities * if the bpf program allows skb data writes. There are two possibilities
* that may occur when calling bpf_dynptr_slice_rdwr: * that may occur when calling bpf_dynptr_slice_rdwr:
* *
* 1) The requested slice is in the head of the skb. In this case, the * 1) The requested slice is in the head of the skb. In this case, the
@ -2549,6 +2689,61 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n"); WARN(1, "A call to BPF exception callback should never return\n");
} }
__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags)
{
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
struct bpf_map *map = p__map;
BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_wq));
BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_wq));
if (flags)
return -EINVAL;
return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
}
__bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
{
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
struct bpf_work *w;
if (in_nmi())
return -EOPNOTSUPP;
if (flags)
return -EINVAL;
w = READ_ONCE(async->work);
if (!w || !READ_ONCE(w->cb.prog))
return -EINVAL;
schedule_work(&w->work);
return 0;
}
__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags,
void *aux__ign)
{
struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__ign;
struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
if (flags)
return -EINVAL;
return __bpf_async_set_callback(async, callback_fn, aux, flags, BPF_ASYNC_TYPE_WQ);
}
__bpf_kfunc void bpf_preempt_disable(void)
{
preempt_disable();
}
__bpf_kfunc void bpf_preempt_enable(void)
{
preempt_enable();
}
__bpf_kfunc_end_defs(); __bpf_kfunc_end_defs();
BTF_KFUNCS_START(generic_btf_ids) BTF_KFUNCS_START(generic_btf_ids)
@ -2625,6 +2820,12 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
BTF_ID_FLAGS(func, bpf_dynptr_size) BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone) BTF_ID_FLAGS(func, bpf_dynptr_clone)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_wq_init)
BTF_ID_FLAGS(func, bpf_wq_set_callback_impl)
BTF_ID_FLAGS(func, bpf_wq_start)
BTF_ID_FLAGS(func, bpf_preempt_disable)
BTF_ID_FLAGS(func, bpf_preempt_enable)
BTF_KFUNCS_END(common_btf_ids) BTF_KFUNCS_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = { static const struct btf_kfunc_id_set common_kfunc_set = {
@ -2652,6 +2853,7 @@ static int __init kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &generic_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &generic_kfunc_set);
ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors, ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors,
ARRAY_SIZE(generic_dtors), ARRAY_SIZE(generic_dtors),
THIS_MODULE); THIS_MODULE);

View File

@ -467,9 +467,9 @@ const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type)
if (type & PTR_MAYBE_NULL) { if (type & PTR_MAYBE_NULL) {
if (base_type(type) == PTR_TO_BTF_ID) if (base_type(type) == PTR_TO_BTF_ID)
strncpy(postfix, "or_null_", 16); strscpy(postfix, "or_null_");
else else
strncpy(postfix, "_or_null", 16); strscpy(postfix, "_or_null");
} }
snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s%s", snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s%s",

View File

@ -316,6 +316,7 @@ static long trie_update_elem(struct bpf_map *map,
{ {
struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL; struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL;
struct lpm_trie_node *free_node = NULL;
struct lpm_trie_node __rcu **slot; struct lpm_trie_node __rcu **slot;
struct bpf_lpm_trie_key_u8 *key = _key; struct bpf_lpm_trie_key_u8 *key = _key;
unsigned long irq_flags; unsigned long irq_flags;
@ -390,7 +391,7 @@ static long trie_update_elem(struct bpf_map *map,
trie->n_entries--; trie->n_entries--;
rcu_assign_pointer(*slot, new_node); rcu_assign_pointer(*slot, new_node);
kfree_rcu(node, rcu); free_node = node;
goto out; goto out;
} }
@ -437,6 +438,7 @@ out:
} }
spin_unlock_irqrestore(&trie->lock, irq_flags); spin_unlock_irqrestore(&trie->lock, irq_flags);
kfree_rcu(free_node, rcu);
return ret; return ret;
} }
@ -445,6 +447,7 @@ out:
static long trie_delete_elem(struct bpf_map *map, void *_key) static long trie_delete_elem(struct bpf_map *map, void *_key)
{ {
struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
struct lpm_trie_node *free_node = NULL, *free_parent = NULL;
struct bpf_lpm_trie_key_u8 *key = _key; struct bpf_lpm_trie_key_u8 *key = _key;
struct lpm_trie_node __rcu **trim, **trim2; struct lpm_trie_node __rcu **trim, **trim2;
struct lpm_trie_node *node, *parent; struct lpm_trie_node *node, *parent;
@ -514,8 +517,8 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
else else
rcu_assign_pointer( rcu_assign_pointer(
*trim2, rcu_access_pointer(parent->child[0])); *trim2, rcu_access_pointer(parent->child[0]));
kfree_rcu(parent, rcu); free_parent = parent;
kfree_rcu(node, rcu); free_node = node;
goto out; goto out;
} }
@ -529,10 +532,12 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1])); rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1]));
else else
RCU_INIT_POINTER(*trim, NULL); RCU_INIT_POINTER(*trim, NULL);
kfree_rcu(node, rcu); free_node = node;
out: out:
spin_unlock_irqrestore(&trie->lock, irq_flags); spin_unlock_irqrestore(&trie->lock, irq_flags);
kfree_rcu(free_parent, rcu);
kfree_rcu(free_node, rcu);
return ret; return ret;
} }

View File

@ -559,6 +559,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_SPIN_LOCK: case BPF_SPIN_LOCK:
case BPF_TIMER: case BPF_TIMER:
case BPF_REFCOUNT: case BPF_REFCOUNT:
case BPF_WORKQUEUE:
/* Nothing to release */ /* Nothing to release */
break; break;
default: default:
@ -608,6 +609,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_SPIN_LOCK: case BPF_SPIN_LOCK:
case BPF_TIMER: case BPF_TIMER:
case BPF_REFCOUNT: case BPF_REFCOUNT:
case BPF_WORKQUEUE:
/* Nothing to acquire */ /* Nothing to acquire */
break; break;
default: default:
@ -659,6 +661,13 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
bpf_timer_cancel_and_free(obj + rec->timer_off); bpf_timer_cancel_and_free(obj + rec->timer_off);
} }
void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj)
{
if (WARN_ON_ONCE(!btf_record_has_field(rec, BPF_WORKQUEUE)))
return;
bpf_wq_cancel_and_free(obj + rec->wq_off);
}
void bpf_obj_free_fields(const struct btf_record *rec, void *obj) void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{ {
const struct btf_field *fields; const struct btf_field *fields;
@ -679,6 +688,9 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
case BPF_TIMER: case BPF_TIMER:
bpf_timer_cancel_and_free(field_ptr); bpf_timer_cancel_and_free(field_ptr);
break; break;
case BPF_WORKQUEUE:
bpf_wq_cancel_and_free(field_ptr);
break;
case BPF_KPTR_UNREF: case BPF_KPTR_UNREF:
WRITE_ONCE(*(u64 *)field_ptr, 0); WRITE_ONCE(*(u64 *)field_ptr, 0);
break; break;
@ -1085,7 +1097,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
map->record = btf_parse_fields(btf, value_type, map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD | BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
BPF_RB_ROOT | BPF_REFCOUNT, BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE,
map->value_size); map->value_size);
if (!IS_ERR_OR_NULL(map->record)) { if (!IS_ERR_OR_NULL(map->record)) {
int i; int i;
@ -1115,6 +1127,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
} }
break; break;
case BPF_TIMER: case BPF_TIMER:
case BPF_WORKQUEUE:
if (map->map_type != BPF_MAP_TYPE_HASH && if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH && map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) { map->map_type != BPF_MAP_TYPE_ARRAY) {
@ -5242,6 +5255,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_SK_LOOKUP: case BPF_PROG_TYPE_SK_LOOKUP:
ret = netns_bpf_link_create(attr, prog); ret = netns_bpf_link_create(attr, prog);
break; break;
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_SK_SKB:
ret = sock_map_link_create(attr, prog);
break;
#ifdef CONFIG_NET #ifdef CONFIG_NET
case BPF_PROG_TYPE_XDP: case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog); ret = bpf_xdp_link_attach(attr, prog);

View File

@ -9,8 +9,8 @@
#include <linux/sysfs.h> #include <linux/sysfs.h>
/* See scripts/link-vmlinux.sh, gen_btf() func for details */ /* See scripts/link-vmlinux.sh, gen_btf() func for details */
extern char __weak __start_BTF[]; extern char __start_BTF[];
extern char __weak __stop_BTF[]; extern char __stop_BTF[];
static ssize_t static ssize_t
btf_vmlinux_read(struct file *file, struct kobject *kobj, btf_vmlinux_read(struct file *file, struct kobject *kobj,
@ -32,7 +32,7 @@ static int __init btf_vmlinux_init(void)
{ {
bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF; bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF;
if (!__start_BTF || bin_attr_btf_vmlinux.size == 0) if (bin_attr_btf_vmlinux.size == 0)
return 0; return 0;
btf_kobj = kobject_create_and_add("btf", kernel_kobj); btf_kobj = kobject_create_and_add("btf", kernel_kobj);

View File

@ -885,12 +885,13 @@ static void notrace update_prog_stats(struct bpf_prog *prog,
* Hence check that 'start' is valid. * Hence check that 'start' is valid.
*/ */
start > NO_START_TIME) { start > NO_START_TIME) {
u64 duration = sched_clock() - start;
unsigned long flags; unsigned long flags;
stats = this_cpu_ptr(prog->stats); stats = this_cpu_ptr(prog->stats);
flags = u64_stats_update_begin_irqsave(&stats->syncp); flags = u64_stats_update_begin_irqsave(&stats->syncp);
u64_stats_inc(&stats->cnt); u64_stats_inc(&stats->cnt);
u64_stats_add(&stats->nsecs, sched_clock() - start); u64_stats_add(&stats->nsecs, duration);
u64_stats_update_end_irqrestore(&stats->syncp, flags); u64_stats_update_end_irqrestore(&stats->syncp, flags);
} }
} }

View File

@ -172,7 +172,7 @@ static bool bpf_global_percpu_ma_set;
/* verifier_state + insn_idx are pushed to stack when branch is encountered */ /* verifier_state + insn_idx are pushed to stack when branch is encountered */
struct bpf_verifier_stack_elem { struct bpf_verifier_stack_elem {
/* verifer state is 'st' /* verifier state is 'st'
* before processing instruction 'insn_idx' * before processing instruction 'insn_idx'
* and after processing instruction 'prev_insn_idx' * and after processing instruction 'prev_insn_idx'
*/ */
@ -190,11 +190,6 @@ struct bpf_verifier_stack_elem {
#define BPF_MAP_KEY_POISON (1ULL << 63) #define BPF_MAP_KEY_POISON (1ULL << 63)
#define BPF_MAP_KEY_SEEN (1ULL << 62) #define BPF_MAP_KEY_SEEN (1ULL << 62)
#define BPF_MAP_PTR_UNPRIV 1UL
#define BPF_MAP_PTR_POISON ((void *)((0xeB9FUL << 1) + \
POISON_POINTER_DELTA))
#define BPF_MAP_PTR(X) ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
#define BPF_GLOBAL_PERCPU_MA_MAX_SIZE 512 #define BPF_GLOBAL_PERCPU_MA_MAX_SIZE 512
static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
@ -209,21 +204,22 @@ static bool is_trusted_reg(const struct bpf_reg_state *reg);
static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux) static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
{ {
return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON; return aux->map_ptr_state.poison;
} }
static bool bpf_map_ptr_unpriv(const struct bpf_insn_aux_data *aux) static bool bpf_map_ptr_unpriv(const struct bpf_insn_aux_data *aux)
{ {
return aux->map_ptr_state & BPF_MAP_PTR_UNPRIV; return aux->map_ptr_state.unpriv;
} }
static void bpf_map_ptr_store(struct bpf_insn_aux_data *aux, static void bpf_map_ptr_store(struct bpf_insn_aux_data *aux,
const struct bpf_map *map, bool unpriv) struct bpf_map *map,
bool unpriv, bool poison)
{ {
BUILD_BUG_ON((unsigned long)BPF_MAP_PTR_POISON & BPF_MAP_PTR_UNPRIV);
unpriv |= bpf_map_ptr_unpriv(aux); unpriv |= bpf_map_ptr_unpriv(aux);
aux->map_ptr_state = (unsigned long)map | aux->map_ptr_state.unpriv = unpriv;
(unpriv ? BPF_MAP_PTR_UNPRIV : 0UL); aux->map_ptr_state.poison = poison;
aux->map_ptr_state.map_ptr = map;
} }
static bool bpf_map_key_poisoned(const struct bpf_insn_aux_data *aux) static bool bpf_map_key_poisoned(const struct bpf_insn_aux_data *aux)
@ -336,6 +332,10 @@ struct bpf_kfunc_call_arg_meta {
u8 spi; u8 spi;
u8 frameno; u8 frameno;
} iter; } iter;
struct {
struct bpf_map *ptr;
int uid;
} map;
u64 mem_size; u64 mem_size;
}; };
@ -501,8 +501,12 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
} }
static bool is_sync_callback_calling_kfunc(u32 btf_id); static bool is_sync_callback_calling_kfunc(u32 btf_id);
static bool is_async_callback_calling_kfunc(u32 btf_id);
static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn); static bool is_bpf_throw_kfunc(struct bpf_insn *insn);
static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id);
static bool is_sync_callback_calling_function(enum bpf_func_id func_id) static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
{ {
return func_id == BPF_FUNC_for_each_map_elem || return func_id == BPF_FUNC_for_each_map_elem ||
@ -530,7 +534,8 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)
static bool is_async_callback_calling_insn(struct bpf_insn *insn) static bool is_async_callback_calling_insn(struct bpf_insn *insn)
{ {
return bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm); return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm)) ||
(bpf_pseudo_kfunc_call(insn) && is_async_callback_calling_kfunc(insn->imm));
} }
static bool is_may_goto_insn(struct bpf_insn *insn) static bool is_may_goto_insn(struct bpf_insn *insn)
@ -1429,6 +1434,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
} }
dst_state->speculative = src->speculative; dst_state->speculative = src->speculative;
dst_state->active_rcu_lock = src->active_rcu_lock; dst_state->active_rcu_lock = src->active_rcu_lock;
dst_state->active_preempt_lock = src->active_preempt_lock;
dst_state->in_sleepable = src->in_sleepable;
dst_state->curframe = src->curframe; dst_state->curframe = src->curframe;
dst_state->active_lock.ptr = src->active_lock.ptr; dst_state->active_lock.ptr = src->active_lock.ptr;
dst_state->active_lock.id = src->active_lock.id; dst_state->active_lock.id = src->active_lock.id;
@ -1842,6 +1849,8 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
*/ */
if (btf_record_has_field(map->inner_map_meta->record, BPF_TIMER)) if (btf_record_has_field(map->inner_map_meta->record, BPF_TIMER))
reg->map_uid = reg->id; reg->map_uid = reg->id;
if (btf_record_has_field(map->inner_map_meta->record, BPF_WORKQUEUE))
reg->map_uid = reg->id;
} else if (map->map_type == BPF_MAP_TYPE_XSKMAP) { } else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
reg->type = PTR_TO_XDP_SOCK; reg->type = PTR_TO_XDP_SOCK;
} else if (map->map_type == BPF_MAP_TYPE_SOCKMAP || } else if (map->map_type == BPF_MAP_TYPE_SOCKMAP ||
@ -2135,7 +2144,7 @@ static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg) static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
{ {
/* Try to tighten 64-bit bounds from 32-bit knowledge, using 32-bit /* Try to tighten 64-bit bounds from 32-bit knowledge, using 32-bit
* values on both sides of 64-bit range in hope to have tigher range. * values on both sides of 64-bit range in hope to have tighter range.
* E.g., if r1 is [0x1'00000000, 0x3'80000000], and we learn from * E.g., if r1 is [0x1'00000000, 0x3'80000000], and we learn from
* 32-bit signed > 0 operation that s32 bounds are now [1; 0x7fffffff]. * 32-bit signed > 0 operation that s32 bounds are now [1; 0x7fffffff].
* With this, we can substitute 1 as low 32-bits of _low_ 64-bit bound * With this, we can substitute 1 as low 32-bits of _low_ 64-bit bound
@ -2143,7 +2152,7 @@ static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
* _high_ 64-bit bound (0x380000000 -> 0x37fffffff) and arrive at a * _high_ 64-bit bound (0x380000000 -> 0x37fffffff) and arrive at a
* better overall bounds for r1 as [0x1'000000001; 0x3'7fffffff]. * better overall bounds for r1 as [0x1'000000001; 0x3'7fffffff].
* We just need to make sure that derived bounds we are intersecting * We just need to make sure that derived bounds we are intersecting
* with are well-formed ranges in respecitve s64 or u64 domain, just * with are well-formed ranges in respective s64 or u64 domain, just
* like we do with similar kinds of 32-to-64 or 64-to-32 adjustments. * like we do with similar kinds of 32-to-64 or 64-to-32 adjustments.
*/ */
__u64 new_umin, new_umax; __u64 new_umin, new_umax;
@ -2402,7 +2411,7 @@ static void init_func_state(struct bpf_verifier_env *env,
/* Similar to push_stack(), but for async callbacks */ /* Similar to push_stack(), but for async callbacks */
static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env, static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx, int insn_idx, int prev_insn_idx,
int subprog) int subprog, bool is_sleepable)
{ {
struct bpf_verifier_stack_elem *elem; struct bpf_verifier_stack_elem *elem;
struct bpf_func_state *frame; struct bpf_func_state *frame;
@ -2429,6 +2438,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
* Initialize it similar to do_check_common(). * Initialize it similar to do_check_common().
*/ */
elem->st.branches = 1; elem->st.branches = 1;
elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL); frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame) if (!frame)
goto err; goto err;
@ -3615,7 +3625,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
* sreg needs precision before this insn * sreg needs precision before this insn
*/ */
bt_clear_reg(bt, dreg); bt_clear_reg(bt, dreg);
bt_set_reg(bt, sreg); if (sreg != BPF_REG_FP)
bt_set_reg(bt, sreg);
} else { } else {
/* dreg = K /* dreg = K
* dreg needs precision after this insn. * dreg needs precision after this insn.
@ -3631,7 +3642,8 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
* both dreg and sreg need precision * both dreg and sreg need precision
* before this insn * before this insn
*/ */
bt_set_reg(bt, sreg); if (sreg != BPF_REG_FP)
bt_set_reg(bt, sreg);
} /* else dreg += K } /* else dreg += K
* dreg still needs precision before this insn * dreg still needs precision before this insn
*/ */
@ -5274,7 +5286,8 @@ bad_type:
static bool in_sleepable(struct bpf_verifier_env *env) static bool in_sleepable(struct bpf_verifier_env *env)
{ {
return env->prog->sleepable; return env->prog->sleepable ||
(env->cur_state && env->cur_state->in_sleepable);
} }
/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock() /* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
@ -5297,6 +5310,7 @@ BTF_ID(struct, cgroup)
BTF_ID(struct, bpf_cpumask) BTF_ID(struct, bpf_cpumask)
#endif #endif
BTF_ID(struct, task_struct) BTF_ID(struct, task_struct)
BTF_ID(struct, bpf_crypto_ctx)
BTF_SET_END(rcu_protected_types) BTF_SET_END(rcu_protected_types)
static bool rcu_protected_object(const struct btf *btf, u32 btf_id) static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
@ -6972,6 +6986,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
return err; return err;
} }
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_mismatch);
static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn) static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
{ {
int load_reg; int load_reg;
@ -7032,7 +7049,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
is_pkt_reg(env, insn->dst_reg) || is_pkt_reg(env, insn->dst_reg) ||
is_flow_key_reg(env, insn->dst_reg) || is_flow_key_reg(env, insn->dst_reg) ||
is_sk_reg(env, insn->dst_reg) || is_sk_reg(env, insn->dst_reg) ||
is_arena_reg(env, insn->dst_reg)) { (is_arena_reg(env, insn->dst_reg) && !bpf_jit_supports_insn(insn, true))) {
verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n", verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
insn->dst_reg, insn->dst_reg,
reg_type_str(env, reg_state(env, insn->dst_reg)->type)); reg_type_str(env, reg_state(env, insn->dst_reg)->type));
@ -7068,6 +7085,11 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
if (err) if (err)
return err; return err;
if (is_arena_reg(env, insn->dst_reg)) {
err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
if (err)
return err;
}
/* Check whether we can write into the same memory. */ /* Check whether we can write into the same memory. */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_WRITE, -1, true, false); BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
@ -7590,6 +7612,23 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
return 0; return 0;
} }
static int process_wq_func(struct bpf_verifier_env *env, int regno,
struct bpf_kfunc_call_arg_meta *meta)
{
struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
struct bpf_map *map = reg->map_ptr;
u64 val = reg->var_off.value;
if (map->record->wq_off != val + reg->off) {
verbose(env, "off %lld doesn't point to 'struct bpf_wq' that is at %d\n",
val + reg->off, map->record->wq_off);
return -EINVAL;
}
meta->map.uid = reg->map_uid;
meta->map.ptr = map;
return 0;
}
static int process_kptr_func(struct bpf_verifier_env *env, int regno, static int process_kptr_func(struct bpf_verifier_env *env, int regno,
struct bpf_call_arg_meta *meta) struct bpf_call_arg_meta *meta)
{ {
@ -9484,7 +9523,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
*/ */
env->subprog_info[subprog].is_cb = true; env->subprog_info[subprog].is_cb = true;
if (bpf_pseudo_kfunc_call(insn) && if (bpf_pseudo_kfunc_call(insn) &&
!is_sync_callback_calling_kfunc(insn->imm)) { !is_callback_calling_kfunc(insn->imm)) {
verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n", verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
func_id_name(insn->imm), insn->imm); func_id_name(insn->imm), insn->imm);
return -EFAULT; return -EFAULT;
@ -9498,10 +9537,11 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
if (is_async_callback_calling_insn(insn)) { if (is_async_callback_calling_insn(insn)) {
struct bpf_verifier_state *async_cb; struct bpf_verifier_state *async_cb;
/* there is no real recursion here. timer callbacks are async */ /* there is no real recursion here. timer and workqueue callbacks are async */
env->subprog_info[subprog].is_async_cb = true; env->subprog_info[subprog].is_async_cb = true;
async_cb = push_async_cb(env, env->subprog_info[subprog].start, async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog); insn_idx, subprog,
is_bpf_wq_set_callback_impl_kfunc(insn->imm));
if (!async_cb) if (!async_cb)
return -EFAULT; return -EFAULT;
callee = async_cb->frame[0]; callee = async_cb->frame[0];
@ -9561,6 +9601,13 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL; return -EINVAL;
} }
/* Only global subprogs cannot be called with preemption disabled. */
if (env->cur_state->active_preempt_lock) {
verbose(env, "global function calls are not allowed with preemption disabled,\n"
"use static function instead\n");
return -EINVAL;
}
if (err) { if (err) {
verbose(env, "Caller passes invalid args into func#%d ('%s')\n", verbose(env, "Caller passes invalid args into func#%d ('%s')\n",
subprog, sub_name); subprog, sub_name);
@ -9653,12 +9700,8 @@ static int set_map_elem_callback_state(struct bpf_verifier_env *env,
struct bpf_map *map; struct bpf_map *map;
int err; int err;
if (bpf_map_ptr_poisoned(insn_aux)) { /* valid map_ptr and poison value does not matter */
verbose(env, "tail_call abusing map_ptr\n"); map = insn_aux->map_ptr_state.map_ptr;
return -EINVAL;
}
map = BPF_MAP_PTR(insn_aux->map_ptr_state);
if (!map->ops->map_set_for_each_callback_args || if (!map->ops->map_set_for_each_callback_args ||
!map->ops->map_for_each_callback) { !map->ops->map_for_each_callback) {
verbose(env, "callback function not allowed for map\n"); verbose(env, "callback function not allowed for map\n");
@ -10017,12 +10060,12 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
return -EACCES; return -EACCES;
} }
if (!BPF_MAP_PTR(aux->map_ptr_state)) if (!aux->map_ptr_state.map_ptr)
bpf_map_ptr_store(aux, meta->map_ptr, bpf_map_ptr_store(aux, meta->map_ptr,
!meta->map_ptr->bypass_spec_v1); !meta->map_ptr->bypass_spec_v1, false);
else if (BPF_MAP_PTR(aux->map_ptr_state) != meta->map_ptr) else if (aux->map_ptr_state.map_ptr != meta->map_ptr)
bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON, bpf_map_ptr_store(aux, meta->map_ptr,
!meta->map_ptr->bypass_spec_v1); !meta->map_ptr->bypass_spec_v1, true);
return 0; return 0;
} }
@ -10201,8 +10244,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (env->ops->get_func_proto) if (env->ops->get_func_proto)
fn = env->ops->get_func_proto(func_id, env->prog); fn = env->ops->get_func_proto(func_id, env->prog);
if (!fn) { if (!fn) {
verbose(env, "unknown func %s#%d\n", func_id_name(func_id), verbose(env, "program of this type cannot use helper %s#%d\n",
func_id); func_id_name(func_id), func_id);
return -EINVAL; return -EINVAL;
} }
@ -10251,6 +10294,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
env->insn_aux_data[insn_idx].storage_get_func_atomic = true; env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
} }
if (env->cur_state->active_preempt_lock) {
if (fn->might_sleep) {
verbose(env, "sleepable helper %s#%d in non-preemptible region\n",
func_id_name(func_id), func_id);
return -EINVAL;
}
if (in_sleepable(env) && is_storage_get_function(func_id))
env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
}
meta.func_id = func_id; meta.func_id = func_id;
/* check args */ /* check args */
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@ -10839,6 +10893,7 @@ enum {
KF_ARG_LIST_NODE_ID, KF_ARG_LIST_NODE_ID,
KF_ARG_RB_ROOT_ID, KF_ARG_RB_ROOT_ID,
KF_ARG_RB_NODE_ID, KF_ARG_RB_NODE_ID,
KF_ARG_WORKQUEUE_ID,
}; };
BTF_ID_LIST(kf_arg_btf_ids) BTF_ID_LIST(kf_arg_btf_ids)
@ -10847,6 +10902,7 @@ BTF_ID(struct, bpf_list_head)
BTF_ID(struct, bpf_list_node) BTF_ID(struct, bpf_list_node)
BTF_ID(struct, bpf_rb_root) BTF_ID(struct, bpf_rb_root)
BTF_ID(struct, bpf_rb_node) BTF_ID(struct, bpf_rb_node)
BTF_ID(struct, bpf_wq)
static bool __is_kfunc_ptr_arg_type(const struct btf *btf, static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
const struct btf_param *arg, int type) const struct btf_param *arg, int type)
@ -10890,6 +10946,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID); return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
} }
static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
{
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
}
static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
const struct btf_param *arg) const struct btf_param *arg)
{ {
@ -10959,6 +11020,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_NULL, KF_ARG_PTR_TO_NULL,
KF_ARG_PTR_TO_CONST_STR, KF_ARG_PTR_TO_CONST_STR,
KF_ARG_PTR_TO_MAP, KF_ARG_PTR_TO_MAP,
KF_ARG_PTR_TO_WORKQUEUE,
}; };
enum special_kfunc_type { enum special_kfunc_type {
@ -10984,6 +11046,9 @@ enum special_kfunc_type {
KF_bpf_percpu_obj_new_impl, KF_bpf_percpu_obj_new_impl,
KF_bpf_percpu_obj_drop_impl, KF_bpf_percpu_obj_drop_impl,
KF_bpf_throw, KF_bpf_throw,
KF_bpf_wq_set_callback_impl,
KF_bpf_preempt_disable,
KF_bpf_preempt_enable,
KF_bpf_iter_css_task_new, KF_bpf_iter_css_task_new,
}; };
@ -11008,6 +11073,7 @@ BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl) BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl) BTF_ID(func, bpf_percpu_obj_drop_impl)
BTF_ID(func, bpf_throw) BTF_ID(func, bpf_throw)
BTF_ID(func, bpf_wq_set_callback_impl)
#ifdef CONFIG_CGROUPS #ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new) BTF_ID(func, bpf_iter_css_task_new)
#endif #endif
@ -11036,6 +11102,9 @@ BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl) BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl) BTF_ID(func, bpf_percpu_obj_drop_impl)
BTF_ID(func, bpf_throw) BTF_ID(func, bpf_throw)
BTF_ID(func, bpf_wq_set_callback_impl)
BTF_ID(func, bpf_preempt_disable)
BTF_ID(func, bpf_preempt_enable)
#ifdef CONFIG_CGROUPS #ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new) BTF_ID(func, bpf_iter_css_task_new)
#else #else
@ -11062,6 +11131,16 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock]; return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
} }
static bool is_kfunc_bpf_preempt_disable(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_preempt_disable];
}
static bool is_kfunc_bpf_preempt_enable(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_preempt_enable];
}
static enum kfunc_ptr_arg_type static enum kfunc_ptr_arg_type
get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
struct bpf_kfunc_call_arg_meta *meta, struct bpf_kfunc_call_arg_meta *meta,
@ -11115,6 +11194,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_map(meta->btf, &args[argno])) if (is_kfunc_arg_map(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_MAP; return KF_ARG_PTR_TO_MAP;
if (is_kfunc_arg_wq(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_WORKQUEUE;
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) { if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@ -11366,12 +11448,28 @@ static bool is_sync_callback_calling_kfunc(u32 btf_id)
return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl]; return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
} }
static bool is_async_callback_calling_kfunc(u32 btf_id)
{
return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
}
static bool is_bpf_throw_kfunc(struct bpf_insn *insn) static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
{ {
return bpf_pseudo_kfunc_call(insn) && insn->off == 0 && return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
insn->imm == special_kfunc_list[KF_bpf_throw]; insn->imm == special_kfunc_list[KF_bpf_throw];
} }
static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id)
{
return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
}
static bool is_callback_calling_kfunc(u32 btf_id)
{
return is_sync_callback_calling_kfunc(btf_id) ||
is_async_callback_calling_kfunc(btf_id);
}
static bool is_rbtree_lock_required_kfunc(u32 btf_id) static bool is_rbtree_lock_required_kfunc(u32 btf_id)
{ {
return is_bpf_rbtree_api_kfunc(btf_id); return is_bpf_rbtree_api_kfunc(btf_id);
@ -11716,6 +11814,34 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_NULL: case KF_ARG_PTR_TO_NULL:
continue; continue;
case KF_ARG_PTR_TO_MAP: case KF_ARG_PTR_TO_MAP:
if (!reg->map_ptr) {
verbose(env, "pointer in R%d isn't map pointer\n", regno);
return -EINVAL;
}
if (meta->map.ptr && reg->map_ptr->record->wq_off >= 0) {
/* Use map_uid (which is unique id of inner map) to reject:
* inner_map1 = bpf_map_lookup_elem(outer_map, key1)
* inner_map2 = bpf_map_lookup_elem(outer_map, key2)
* if (inner_map1 && inner_map2) {
* wq = bpf_map_lookup_elem(inner_map1);
* if (wq)
* // mismatch would have been allowed
* bpf_wq_init(wq, inner_map2);
* }
*
* Comparing map_ptr is enough to distinguish normal and outer maps.
*/
if (meta->map.ptr != reg->map_ptr ||
meta->map.uid != reg->map_uid) {
verbose(env,
"workqueue pointer in R1 map_uid=%d doesn't match map pointer in R2 map_uid=%d\n",
meta->map.uid, reg->map_uid);
return -EINVAL;
}
}
meta->map.ptr = reg->map_ptr;
meta->map.uid = reg->map_uid;
fallthrough;
case KF_ARG_PTR_TO_ALLOC_BTF_ID: case KF_ARG_PTR_TO_ALLOC_BTF_ID:
case KF_ARG_PTR_TO_BTF_ID: case KF_ARG_PTR_TO_BTF_ID:
if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta)) if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
@ -11748,6 +11874,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_CALLBACK: case KF_ARG_PTR_TO_CALLBACK:
case KF_ARG_PTR_TO_REFCOUNTED_KPTR: case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
case KF_ARG_PTR_TO_CONST_STR: case KF_ARG_PTR_TO_CONST_STR:
case KF_ARG_PTR_TO_WORKQUEUE:
/* Trusted by default */ /* Trusted by default */
break; break;
default: default:
@ -12034,6 +12161,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
if (ret) if (ret)
return ret; return ret;
break; break;
case KF_ARG_PTR_TO_WORKQUEUE:
if (reg->type != PTR_TO_MAP_VALUE) {
verbose(env, "arg#%d doesn't point to a map value\n", i);
return -EINVAL;
}
ret = process_wq_func(env, regno, meta);
if (ret < 0)
return ret;
break;
} }
} }
@ -12093,11 +12229,11 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char
static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p) int *insn_idx_p)
{ {
const struct btf_type *t, *ptr_type; bool sleepable, rcu_lock, rcu_unlock, preempt_disable, preempt_enable;
u32 i, nargs, ptr_type_id, release_ref_obj_id; u32 i, nargs, ptr_type_id, release_ref_obj_id;
struct bpf_reg_state *regs = cur_regs(env); struct bpf_reg_state *regs = cur_regs(env);
const char *func_name, *ptr_type_name; const char *func_name, *ptr_type_name;
bool sleepable, rcu_lock, rcu_unlock; const struct btf_type *t, *ptr_type;
struct bpf_kfunc_call_arg_meta meta; struct bpf_kfunc_call_arg_meta meta;
struct bpf_insn_aux_data *insn_aux; struct bpf_insn_aux_data *insn_aux;
int err, insn_idx = *insn_idx_p; int err, insn_idx = *insn_idx_p;
@ -12145,9 +12281,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
} }
} }
if (is_bpf_wq_set_callback_impl_kfunc(meta.func_id)) {
err = push_callback_call(env, insn, insn_idx, meta.subprogno,
set_timer_callback_state);
if (err) {
verbose(env, "kfunc %s#%d failed callback verification\n",
func_name, meta.func_id);
return err;
}
}
rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta); rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta); rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
preempt_disable = is_kfunc_bpf_preempt_disable(&meta);
preempt_enable = is_kfunc_bpf_preempt_enable(&meta);
if (env->cur_state->active_rcu_lock) { if (env->cur_state->active_rcu_lock) {
struct bpf_func_state *state; struct bpf_func_state *state;
struct bpf_reg_state *reg; struct bpf_reg_state *reg;
@ -12180,6 +12329,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL; return -EINVAL;
} }
if (env->cur_state->active_preempt_lock) {
if (preempt_disable) {
env->cur_state->active_preempt_lock++;
} else if (preempt_enable) {
env->cur_state->active_preempt_lock--;
} else if (sleepable) {
verbose(env, "kernel func %s is sleepable within non-preemptible region\n", func_name);
return -EACCES;
}
} else if (preempt_disable) {
env->cur_state->active_preempt_lock++;
} else if (preempt_enable) {
verbose(env, "unmatched attempt to enable preemption (kernel function %s)\n", func_name);
return -EINVAL;
}
/* In case of release function, we get register number of refcounted /* In case of release function, we get register number of refcounted
* PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
*/ */
@ -13318,7 +13483,6 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off); bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off); bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off); struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
u32 umax_val = src_reg->u32_max_value; u32 umax_val = src_reg->u32_max_value;
if (src_known && dst_known) { if (src_known && dst_known) {
@ -13331,18 +13495,16 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg,
*/ */
dst_reg->u32_min_value = var32_off.value; dst_reg->u32_min_value = var32_off.value;
dst_reg->u32_max_value = min(dst_reg->u32_max_value, umax_val); dst_reg->u32_max_value = min(dst_reg->u32_max_value, umax_val);
if (dst_reg->s32_min_value < 0 || smin_val < 0) {
/* Lose signed bounds when ANDing negative numbers, /* Safe to set s32 bounds by casting u32 result into s32 when u32
* ain't nobody got time for that. * doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
*/ */
dst_reg->s32_min_value = S32_MIN; if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_max_value = S32_MAX;
} else {
/* ANDing two positives gives a positive, so safe to
* cast result into s64.
*/
dst_reg->s32_min_value = dst_reg->u32_min_value; dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value; dst_reg->s32_max_value = dst_reg->u32_max_value;
} else {
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
} }
} }
@ -13351,7 +13513,6 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg,
{ {
bool src_known = tnum_is_const(src_reg->var_off); bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off); bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
u64 umax_val = src_reg->umax_value; u64 umax_val = src_reg->umax_value;
if (src_known && dst_known) { if (src_known && dst_known) {
@ -13364,18 +13525,16 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg,
*/ */
dst_reg->umin_value = dst_reg->var_off.value; dst_reg->umin_value = dst_reg->var_off.value;
dst_reg->umax_value = min(dst_reg->umax_value, umax_val); dst_reg->umax_value = min(dst_reg->umax_value, umax_val);
if (dst_reg->smin_value < 0 || smin_val < 0) {
/* Lose signed bounds when ANDing negative numbers, /* Safe to set s64 bounds by casting u64 result into s64 when u64
* ain't nobody got time for that. * doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
*/ */
dst_reg->smin_value = S64_MIN; if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smax_value = S64_MAX;
} else {
/* ANDing two positives gives a positive, so safe to
* cast result into s64.
*/
dst_reg->smin_value = dst_reg->umin_value; dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value; dst_reg->smax_value = dst_reg->umax_value;
} else {
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
} }
/* We may learn something more from the var_off */ /* We may learn something more from the var_off */
__update_reg_bounds(dst_reg); __update_reg_bounds(dst_reg);
@ -13387,7 +13546,6 @@ static void scalar32_min_max_or(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off); bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off); bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off); struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
u32 umin_val = src_reg->u32_min_value; u32 umin_val = src_reg->u32_min_value;
if (src_known && dst_known) { if (src_known && dst_known) {
@ -13400,18 +13558,16 @@ static void scalar32_min_max_or(struct bpf_reg_state *dst_reg,
*/ */
dst_reg->u32_min_value = max(dst_reg->u32_min_value, umin_val); dst_reg->u32_min_value = max(dst_reg->u32_min_value, umin_val);
dst_reg->u32_max_value = var32_off.value | var32_off.mask; dst_reg->u32_max_value = var32_off.value | var32_off.mask;
if (dst_reg->s32_min_value < 0 || smin_val < 0) {
/* Lose signed bounds when ORing negative numbers, /* Safe to set s32 bounds by casting u32 result into s32 when u32
* ain't nobody got time for that. * doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
*/ */
dst_reg->s32_min_value = S32_MIN; if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_max_value = S32_MAX;
} else {
/* ORing two positives gives a positive, so safe to
* cast result into s64.
*/
dst_reg->s32_min_value = dst_reg->u32_min_value; dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value; dst_reg->s32_max_value = dst_reg->u32_max_value;
} else {
dst_reg->s32_min_value = S32_MIN;
dst_reg->s32_max_value = S32_MAX;
} }
} }
@ -13420,7 +13576,6 @@ static void scalar_min_max_or(struct bpf_reg_state *dst_reg,
{ {
bool src_known = tnum_is_const(src_reg->var_off); bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off); bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
u64 umin_val = src_reg->umin_value; u64 umin_val = src_reg->umin_value;
if (src_known && dst_known) { if (src_known && dst_known) {
@ -13433,18 +13588,16 @@ static void scalar_min_max_or(struct bpf_reg_state *dst_reg,
*/ */
dst_reg->umin_value = max(dst_reg->umin_value, umin_val); dst_reg->umin_value = max(dst_reg->umin_value, umin_val);
dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask; dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask;
if (dst_reg->smin_value < 0 || smin_val < 0) {
/* Lose signed bounds when ORing negative numbers, /* Safe to set s64 bounds by casting u64 result into s64 when u64
* ain't nobody got time for that. * doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
*/ */
dst_reg->smin_value = S64_MIN; if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smax_value = S64_MAX;
} else {
/* ORing two positives gives a positive, so safe to
* cast result into s64.
*/
dst_reg->smin_value = dst_reg->umin_value; dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value; dst_reg->smax_value = dst_reg->umax_value;
} else {
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
} }
/* We may learn something more from the var_off */ /* We may learn something more from the var_off */
__update_reg_bounds(dst_reg); __update_reg_bounds(dst_reg);
@ -13456,7 +13609,6 @@ static void scalar32_min_max_xor(struct bpf_reg_state *dst_reg,
bool src_known = tnum_subreg_is_const(src_reg->var_off); bool src_known = tnum_subreg_is_const(src_reg->var_off);
bool dst_known = tnum_subreg_is_const(dst_reg->var_off); bool dst_known = tnum_subreg_is_const(dst_reg->var_off);
struct tnum var32_off = tnum_subreg(dst_reg->var_off); struct tnum var32_off = tnum_subreg(dst_reg->var_off);
s32 smin_val = src_reg->s32_min_value;
if (src_known && dst_known) { if (src_known && dst_known) {
__mark_reg32_known(dst_reg, var32_off.value); __mark_reg32_known(dst_reg, var32_off.value);
@ -13467,10 +13619,10 @@ static void scalar32_min_max_xor(struct bpf_reg_state *dst_reg,
dst_reg->u32_min_value = var32_off.value; dst_reg->u32_min_value = var32_off.value;
dst_reg->u32_max_value = var32_off.value | var32_off.mask; dst_reg->u32_max_value = var32_off.value | var32_off.mask;
if (dst_reg->s32_min_value >= 0 && smin_val >= 0) { /* Safe to set s32 bounds by casting u32 result into s32 when u32
/* XORing two positive sign numbers gives a positive, * doesn't cross sign boundary. Otherwise set s32 bounds to unbounded.
* so safe to cast u32 result into s32. */
*/ if ((s32)dst_reg->u32_min_value <= (s32)dst_reg->u32_max_value) {
dst_reg->s32_min_value = dst_reg->u32_min_value; dst_reg->s32_min_value = dst_reg->u32_min_value;
dst_reg->s32_max_value = dst_reg->u32_max_value; dst_reg->s32_max_value = dst_reg->u32_max_value;
} else { } else {
@ -13484,7 +13636,6 @@ static void scalar_min_max_xor(struct bpf_reg_state *dst_reg,
{ {
bool src_known = tnum_is_const(src_reg->var_off); bool src_known = tnum_is_const(src_reg->var_off);
bool dst_known = tnum_is_const(dst_reg->var_off); bool dst_known = tnum_is_const(dst_reg->var_off);
s64 smin_val = src_reg->smin_value;
if (src_known && dst_known) { if (src_known && dst_known) {
/* dst_reg->var_off.value has been updated earlier */ /* dst_reg->var_off.value has been updated earlier */
@ -13496,10 +13647,10 @@ static void scalar_min_max_xor(struct bpf_reg_state *dst_reg,
dst_reg->umin_value = dst_reg->var_off.value; dst_reg->umin_value = dst_reg->var_off.value;
dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask; dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask;
if (dst_reg->smin_value >= 0 && smin_val >= 0) { /* Safe to set s64 bounds by casting u64 result into s64 when u64
/* XORing two positive sign numbers gives a positive, * doesn't cross sign boundary. Otherwise set s64 bounds to unbounded.
* so safe to cast u64 result into s64. */
*/ if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) {
dst_reg->smin_value = dst_reg->umin_value; dst_reg->smin_value = dst_reg->umin_value;
dst_reg->smax_value = dst_reg->umax_value; dst_reg->smax_value = dst_reg->umax_value;
} else { } else {
@ -14726,7 +14877,7 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
/* Adjusts the register min/max values in the case that the dst_reg and /* Adjusts the register min/max values in the case that the dst_reg and
* src_reg are both SCALAR_VALUE registers (or we are simply doing a BPF_K * src_reg are both SCALAR_VALUE registers (or we are simply doing a BPF_K
* check, in which case we havea fake SCALAR_VALUE representing insn->imm). * check, in which case we have a fake SCALAR_VALUE representing insn->imm).
* Technically we can do similar adjustments for pointers to the same object, * Technically we can do similar adjustments for pointers to the same object,
* but we don't support that right now. * but we don't support that right now.
*/ */
@ -15341,6 +15492,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
return -EINVAL; return -EINVAL;
} }
if (env->cur_state->active_preempt_lock) {
verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_preempt_disable-ed region\n");
return -EINVAL;
}
if (regs[ctx_reg].type != PTR_TO_CTX) { if (regs[ctx_reg].type != PTR_TO_CTX) {
verbose(env, verbose(env,
"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n"); "at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
@ -16908,6 +17064,12 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_rcu_lock != cur->active_rcu_lock) if (old->active_rcu_lock != cur->active_rcu_lock)
return false; return false;
if (old->active_preempt_lock != cur->active_preempt_lock)
return false;
if (old->in_sleepable != cur->in_sleepable)
return false;
/* for states to be equal callsites have to be the same /* for states to be equal callsites have to be the same
* and all frame states need to be equivalent * and all frame states need to be equivalent
*/ */
@ -17364,7 +17526,7 @@ hit:
err = propagate_liveness(env, &sl->state, cur); err = propagate_liveness(env, &sl->state, cur);
/* if previous state reached the exit with precision and /* if previous state reached the exit with precision and
* current state is equivalent to it (except precsion marks) * current state is equivalent to it (except precision marks)
* the precision needs to be propagated back in * the precision needs to be propagated back in
* the current state. * the current state.
*/ */
@ -17542,7 +17704,7 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
} }
static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type, static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
bool allow_trust_missmatch) bool allow_trust_mismatch)
{ {
enum bpf_reg_type *prev_type = &env->insn_aux_data[env->insn_idx].ptr_type; enum bpf_reg_type *prev_type = &env->insn_aux_data[env->insn_idx].ptr_type;
@ -17560,7 +17722,7 @@ static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type typ
* src_reg == stack|map in some other branch. * src_reg == stack|map in some other branch.
* Reject it. * Reject it.
*/ */
if (allow_trust_missmatch && if (allow_trust_mismatch &&
base_type(type) == PTR_TO_BTF_ID && base_type(type) == PTR_TO_BTF_ID &&
base_type(*prev_type) == PTR_TO_BTF_ID) { base_type(*prev_type) == PTR_TO_BTF_ID) {
/* /*
@ -17856,6 +18018,13 @@ process_bpf_exit_full:
return -EINVAL; return -EINVAL;
} }
if (env->cur_state->active_preempt_lock && !env->cur_state->curframe) {
verbose(env, "%d bpf_preempt_enable%s missing\n",
env->cur_state->active_preempt_lock,
env->cur_state->active_preempt_lock == 1 ? " is" : "(s) are");
return -EINVAL;
}
/* We must do check_reference_leak here before /* We must do check_reference_leak here before
* prepare_func_exit to handle the case when * prepare_func_exit to handle the case when
* state->curframe > 0, it may be a callback * state->curframe > 0, it may be a callback
@ -18153,6 +18322,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
} }
} }
if (btf_record_has_field(map->record, BPF_WORKQUEUE)) {
if (is_tracing_prog_type(prog_type)) {
verbose(env, "tracing progs cannot use bpf_wq yet\n");
return -EINVAL;
}
}
if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) && if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) &&
!bpf_offload_prog_map_match(prog, map)) { !bpf_offload_prog_map_match(prog, map)) {
verbose(env, "offload device mismatch between prog and map\n"); verbose(env, "offload device mismatch between prog and map\n");
@ -18348,6 +18524,8 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
} }
if (env->used_map_cnt >= MAX_USED_MAPS) { if (env->used_map_cnt >= MAX_USED_MAPS) {
verbose(env, "The total number of maps per program has reached the limit of %u\n",
MAX_USED_MAPS);
fdput(f); fdput(f);
return -E2BIG; return -E2BIG;
} }
@ -18962,6 +19140,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
insn->code == (BPF_ST | BPF_MEM | BPF_W) || insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
insn->code == (BPF_ST | BPF_MEM | BPF_DW)) { insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
type = BPF_WRITE; type = BPF_WRITE;
} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
insn->code == (BPF_STX | BPF_ATOMIC | BPF_DW)) &&
env->insn_aux_data[i + delta].ptr_type == PTR_TO_ARENA) {
insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
env->prog->aux->num_exentries++;
continue;
} else { } else {
continue; continue;
} }
@ -19148,12 +19332,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
env->insn_aux_data[i].call_imm = insn->imm; env->insn_aux_data[i].call_imm = insn->imm;
/* point imm to __bpf_call_base+1 from JITs point of view */ /* point imm to __bpf_call_base+1 from JITs point of view */
insn->imm = 1; insn->imm = 1;
if (bpf_pseudo_func(insn)) if (bpf_pseudo_func(insn)) {
#if defined(MODULES_VADDR)
u64 addr = MODULES_VADDR;
#else
u64 addr = VMALLOC_START;
#endif
/* jit (e.g. x86_64) may emit fewer instructions /* jit (e.g. x86_64) may emit fewer instructions
* if it learns a u32 imm is the same as a u64 imm. * if it learns a u32 imm is the same as a u64 imm.
* Force a non zero here. * Set close enough to possible prog address.
*/ */
insn[1].imm = 1; insn[0].imm = (u32)addr;
insn[1].imm = addr >> 32;
}
} }
err = bpf_prog_alloc_jited_linfo(prog); err = bpf_prog_alloc_jited_linfo(prog);
@ -19226,6 +19417,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
BPF_CLASS(insn->code) == BPF_ST) && BPF_CLASS(insn->code) == BPF_ST) &&
BPF_MODE(insn->code) == BPF_PROBE_MEM32) BPF_MODE(insn->code) == BPF_PROBE_MEM32)
num_exentries++; num_exentries++;
if (BPF_CLASS(insn->code) == BPF_STX &&
BPF_MODE(insn->code) == BPF_PROBE_ATOMIC)
num_exentries++;
} }
func[i]->aux->num_exentries = num_exentries; func[i]->aux->num_exentries = num_exentries;
func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable; func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable;
@ -19557,6 +19751,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1); insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
*cnt = 1; *cnt = 1;
} else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id)) {
struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(BPF_REG_4, (long)env->prog->aux) };
insn_buf[0] = ld_addrs[0];
insn_buf[1] = ld_addrs[1];
insn_buf[2] = *insn;
*cnt = 3;
} }
return 0; return 0;
} }
@ -19832,7 +20033,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
!bpf_map_ptr_unpriv(aux)) { !bpf_map_ptr_unpriv(aux)) {
struct bpf_jit_poke_descriptor desc = { struct bpf_jit_poke_descriptor desc = {
.reason = BPF_POKE_REASON_TAIL_CALL, .reason = BPF_POKE_REASON_TAIL_CALL,
.tail_call.map = BPF_MAP_PTR(aux->map_ptr_state), .tail_call.map = aux->map_ptr_state.map_ptr,
.tail_call.key = bpf_map_key_immediate(aux), .tail_call.key = bpf_map_key_immediate(aux),
.insn_idx = i + delta, .insn_idx = i + delta,
}; };
@ -19861,7 +20062,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
return -EINVAL; return -EINVAL;
} }
map_ptr = BPF_MAP_PTR(aux->map_ptr_state); map_ptr = aux->map_ptr_state.map_ptr;
insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3, insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3,
map_ptr->max_entries, 2); map_ptr->max_entries, 2);
insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3, insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3,
@ -19969,7 +20170,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
if (bpf_map_ptr_poisoned(aux)) if (bpf_map_ptr_poisoned(aux))
goto patch_call_imm; goto patch_call_imm;
map_ptr = BPF_MAP_PTR(aux->map_ptr_state); map_ptr = aux->map_ptr_state.map_ptr;
ops = map_ptr->ops; ops = map_ptr->ops;
if (insn->imm == BPF_FUNC_map_lookup_elem && if (insn->imm == BPF_FUNC_map_lookup_elem &&
ops->map_gen_lookup) { ops->map_gen_lookup) {
@ -20075,6 +20276,30 @@ patch_map_ops_generic:
goto next_insn; goto next_insn;
} }
#ifdef CONFIG_X86_64
/* Implement bpf_get_smp_processor_id() inline. */
if (insn->imm == BPF_FUNC_get_smp_processor_id &&
prog->jit_requested && bpf_jit_supports_percpu_insn()) {
/* BPF_FUNC_get_smp_processor_id inlining is an
* optimization, so if pcpu_hot.cpu_number is ever
* changed in some incompatible and hard to support
* way, it's fine to back out this inlining logic
*/
insn_buf[0] = BPF_MOV32_IMM(BPF_REG_0, (u32)(unsigned long)&pcpu_hot.cpu_number);
insn_buf[1] = BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF_REG_0);
insn_buf[2] = BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0);
cnt = 3;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
goto next_insn;
}
#endif
/* Implement bpf_get_func_arg inline. */ /* Implement bpf_get_func_arg inline. */
if (prog_type == BPF_PROG_TYPE_TRACING && if (prog_type == BPF_PROG_TYPE_TRACING &&
insn->imm == BPF_FUNC_get_func_arg) { insn->imm == BPF_FUNC_get_func_arg) {
@ -20158,6 +20383,62 @@ patch_map_ops_generic:
goto next_insn; goto next_insn;
} }
/* Implement bpf_get_branch_snapshot inline. */
if (IS_ENABLED(CONFIG_PERF_EVENTS) &&
prog->jit_requested && BITS_PER_LONG == 64 &&
insn->imm == BPF_FUNC_get_branch_snapshot) {
/* We are dealing with the following func protos:
* u64 bpf_get_branch_snapshot(void *buf, u32 size, u64 flags);
* int perf_snapshot_branch_stack(struct perf_branch_entry *entries, u32 cnt);
*/
const u32 br_entry_size = sizeof(struct perf_branch_entry);
/* struct perf_branch_entry is part of UAPI and is
* used as an array element, so extremely unlikely to
* ever grow or shrink
*/
BUILD_BUG_ON(br_entry_size != 24);
/* if (unlikely(flags)) return -EINVAL */
insn_buf[0] = BPF_JMP_IMM(BPF_JNE, BPF_REG_3, 0, 7);
/* Transform size (bytes) into number of entries (cnt = size / 24).
* But to avoid expensive division instruction, we implement
* divide-by-3 through multiplication, followed by further
* division by 8 through 3-bit right shift.
* Refer to book "Hacker's Delight, 2nd ed." by Henry S. Warren, Jr.,
* p. 227, chapter "Unsigned Division by 3" for details and proofs.
*
* N / 3 <=> M * N / 2^33, where M = (2^33 + 1) / 3 = 0xaaaaaaab.
*/
insn_buf[1] = BPF_MOV32_IMM(BPF_REG_0, 0xaaaaaaab);
insn_buf[2] = BPF_ALU64_REG(BPF_MUL, BPF_REG_2, BPF_REG_0);
insn_buf[3] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36);
/* call perf_snapshot_branch_stack implementation */
insn_buf[4] = BPF_EMIT_CALL(static_call_query(perf_snapshot_branch_stack));
/* if (entry_cnt == 0) return -ENOENT */
insn_buf[5] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4);
/* return entry_cnt * sizeof(struct perf_branch_entry) */
insn_buf[6] = BPF_ALU32_IMM(BPF_MUL, BPF_REG_0, br_entry_size);
insn_buf[7] = BPF_JMP_A(3);
/* return -EINVAL; */
insn_buf[8] = BPF_MOV64_IMM(BPF_REG_0, -EINVAL);
insn_buf[9] = BPF_JMP_A(1);
/* return -ENOENT; */
insn_buf[10] = BPF_MOV64_IMM(BPF_REG_0, -ENOENT);
cnt = 11;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
continue;
}
/* Implement bpf_kptr_xchg inline */ /* Implement bpf_kptr_xchg inline */
if (prog->jit_requested && BITS_PER_LONG == 64 && if (prog->jit_requested && BITS_PER_LONG == 64 &&
insn->imm == BPF_FUNC_kptr_xchg && insn->imm == BPF_FUNC_kptr_xchg &&

View File

@ -1188,9 +1188,6 @@ static const struct bpf_func_proto bpf_get_attach_cookie_proto_tracing = {
BPF_CALL_3(bpf_get_branch_snapshot, void *, buf, u32, size, u64, flags) BPF_CALL_3(bpf_get_branch_snapshot, void *, buf, u32, size, u64, flags)
{ {
#ifndef CONFIG_X86
return -ENOENT;
#else
static const u32 br_entry_size = sizeof(struct perf_branch_entry); static const u32 br_entry_size = sizeof(struct perf_branch_entry);
u32 entry_cnt = size / br_entry_size; u32 entry_cnt = size / br_entry_size;
@ -1203,7 +1200,6 @@ BPF_CALL_3(bpf_get_branch_snapshot, void *, buf, u32, size, u64, flags)
return -ENOENT; return -ENOENT;
return entry_cnt * br_entry_size; return entry_cnt * br_entry_size;
#endif
} }
static const struct bpf_func_proto bpf_get_branch_snapshot_proto = { static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {

View File

@ -13431,7 +13431,7 @@ static struct bpf_test tests[] = {
.stack_depth = 8, .stack_depth = 8,
.nr_testruns = NR_PATTERN_RUNS, .nr_testruns = NR_PATTERN_RUNS,
}, },
/* 64-bit atomic magnitudes */ /* 32-bit atomic magnitudes */
{ {
"ATOMIC_W_ADD: all operand magnitudes", "ATOMIC_W_ADD: all operand magnitudes",
{ }, { },

View File

@ -79,6 +79,51 @@ static int dummy_ops_call_op(void *image, struct bpf_dummy_ops_test_args *args)
args->args[3], args->args[4]); args->args[3], args->args[4]);
} }
static const struct bpf_ctx_arg_aux *find_ctx_arg_info(struct bpf_prog_aux *aux, int offset)
{
int i;
for (i = 0; i < aux->ctx_arg_info_size; i++)
if (aux->ctx_arg_info[i].offset == offset)
return &aux->ctx_arg_info[i];
return NULL;
}
/* There is only one check at the moment:
* - zero should not be passed for pointer parameters not marked as nullable.
*/
static int check_test_run_args(struct bpf_prog *prog, struct bpf_dummy_ops_test_args *args)
{
const struct btf_type *func_proto = prog->aux->attach_func_proto;
for (u32 arg_no = 0; arg_no < btf_type_vlen(func_proto) ; ++arg_no) {
const struct btf_param *param = &btf_params(func_proto)[arg_no];
const struct bpf_ctx_arg_aux *info;
const struct btf_type *t;
int offset;
if (args->args[arg_no] != 0)
continue;
/* Program is validated already, so there is no need
* to check if t is NULL.
*/
t = btf_type_skip_modifiers(bpf_dummy_ops_btf, param->type, NULL);
if (!btf_type_is_ptr(t))
continue;
offset = btf_ctx_arg_offset(bpf_dummy_ops_btf, func_proto, arg_no);
info = find_ctx_arg_info(prog->aux, offset);
if (info && (info->reg_type & PTR_MAYBE_NULL))
continue;
return -EINVAL;
}
return 0;
}
extern const struct bpf_link_ops bpf_struct_ops_link_lops; extern const struct bpf_link_ops bpf_struct_ops_link_lops;
int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
@ -87,7 +132,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
const struct bpf_struct_ops *st_ops = &bpf_bpf_dummy_ops; const struct bpf_struct_ops *st_ops = &bpf_bpf_dummy_ops;
const struct btf_type *func_proto; const struct btf_type *func_proto;
struct bpf_dummy_ops_test_args *args; struct bpf_dummy_ops_test_args *args;
struct bpf_tramp_links *tlinks; struct bpf_tramp_links *tlinks = NULL;
struct bpf_tramp_link *link = NULL; struct bpf_tramp_link *link = NULL;
void *image = NULL; void *image = NULL;
unsigned int op_idx; unsigned int op_idx;
@ -109,6 +154,10 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
if (IS_ERR(args)) if (IS_ERR(args))
return PTR_ERR(args); return PTR_ERR(args);
err = check_test_run_args(prog, args);
if (err)
goto out;
tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL); tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
if (!tlinks) { if (!tlinks) {
err = -ENOMEM; err = -ENOMEM;
@ -232,7 +281,7 @@ static void bpf_dummy_unreg(void *kdata)
{ {
} }
static int bpf_dummy_test_1(struct bpf_dummy_ops_state *cb) static int bpf_dummy_ops__test_1(struct bpf_dummy_ops_state *cb__nullable)
{ {
return 0; return 0;
} }
@ -249,7 +298,7 @@ static int bpf_dummy_test_sleepable(struct bpf_dummy_ops_state *cb)
} }
static struct bpf_dummy_ops __bpf_bpf_dummy_ops = { static struct bpf_dummy_ops __bpf_bpf_dummy_ops = {
.test_1 = bpf_dummy_test_1, .test_1 = bpf_dummy_ops__test_1,
.test_2 = bpf_dummy_test_2, .test_2 = bpf_dummy_test_2,
.test_sleepable = bpf_dummy_test_sleepable, .test_sleepable = bpf_dummy_test_sleepable,
}; };

View File

@ -575,6 +575,13 @@ __bpf_kfunc int bpf_modify_return_test2(int a, int *b, short c, int d,
return a + *b + c + d + (long)e + f + g; return a + *b + c + d + (long)e + f + g;
} }
__bpf_kfunc int bpf_modify_return_test_tp(int nonce)
{
trace_bpf_trigger_tp(nonce);
return nonce;
}
int noinline bpf_fentry_shadow_test(int a) int noinline bpf_fentry_shadow_test(int a)
{ {
return a + 1; return a + 1;
@ -622,6 +629,7 @@ __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_test_modify_return_ids) BTF_KFUNCS_START(bpf_test_modify_return_ids)
BTF_ID_FLAGS(func, bpf_modify_return_test) BTF_ID_FLAGS(func, bpf_modify_return_test)
BTF_ID_FLAGS(func, bpf_modify_return_test2) BTF_ID_FLAGS(func, bpf_modify_return_test2)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE) BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_test_modify_return_ids) BTF_KFUNCS_END(bpf_test_modify_return_ids)

View File

@ -87,6 +87,9 @@
#include "dev.h" #include "dev.h"
/* Keep the struct bpf_fib_lookup small so that it fits into a cacheline */
static_assert(sizeof(struct bpf_fib_lookup) == 64, "struct bpf_fib_lookup size check");
static const struct bpf_func_proto * static const struct bpf_func_proto *
bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
@ -5886,7 +5889,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF); err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
} else { } else {
fl4.flowi4_mark = 0; if (flags & BPF_FIB_LOOKUP_MARK)
fl4.flowi4_mark = params->mark;
else
fl4.flowi4_mark = 0;
fl4.flowi4_secid = 0; fl4.flowi4_secid = 0;
fl4.flowi4_tun_key.tun_id = 0; fl4.flowi4_tun_key.tun_id = 0;
fl4.flowi4_uid = sock_net_uid(net, NULL); fl4.flowi4_uid = sock_net_uid(net, NULL);
@ -6029,7 +6035,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
err = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, &res, err = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, &res,
strict); strict);
} else { } else {
fl6.flowi6_mark = 0; if (flags & BPF_FIB_LOOKUP_MARK)
fl6.flowi6_mark = params->mark;
else
fl6.flowi6_mark = 0;
fl6.flowi6_secid = 0; fl6.flowi6_secid = 0;
fl6.flowi6_tun_key.tun_id = 0; fl6.flowi6_tun_key.tun_id = 0;
fl6.flowi6_uid = sock_net_uid(net, NULL); fl6.flowi6_uid = sock_net_uid(net, NULL);
@ -6107,7 +6116,7 @@ set_fwd_params:
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \ BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
BPF_FIB_LOOKUP_SRC) BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK)
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
struct bpf_fib_lookup *, params, int, plen, u32, flags) struct bpf_fib_lookup *, params, int, plen, u32, flags)

View File

@ -24,8 +24,16 @@ struct bpf_stab {
#define SOCK_CREATE_FLAG_MASK \ #define SOCK_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
/* This mutex is used to
* - protect race between prog/link attach/detach and link prog update, and
* - protect race between releasing and accessing map in bpf_link.
* A single global mutex lock is used since it is expected contention is low.
*/
static DEFINE_MUTEX(sockmap_mutex);
static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
struct bpf_prog *old, u32 which); struct bpf_prog *old, struct bpf_link *link,
u32 which);
static struct sk_psock_progs *sock_map_progs(struct bpf_map *map); static struct sk_psock_progs *sock_map_progs(struct bpf_map *map);
static struct bpf_map *sock_map_alloc(union bpf_attr *attr) static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
@ -71,7 +79,9 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
map = __bpf_map_get(f); map = __bpf_map_get(f);
if (IS_ERR(map)) if (IS_ERR(map))
return PTR_ERR(map); return PTR_ERR(map);
ret = sock_map_prog_update(map, prog, NULL, attr->attach_type); mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
mutex_unlock(&sockmap_mutex);
fdput(f); fdput(f);
return ret; return ret;
} }
@ -103,7 +113,9 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
goto put_prog; goto put_prog;
} }
ret = sock_map_prog_update(map, NULL, prog, attr->attach_type); mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
mutex_unlock(&sockmap_mutex);
put_prog: put_prog:
bpf_prog_put(prog); bpf_prog_put(prog);
put_map: put_map:
@ -1460,55 +1472,84 @@ static struct sk_psock_progs *sock_map_progs(struct bpf_map *map)
return NULL; return NULL;
} }
static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog, static int sock_map_prog_link_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
u32 which) struct bpf_link ***plink, u32 which)
{ {
struct sk_psock_progs *progs = sock_map_progs(map); struct sk_psock_progs *progs = sock_map_progs(map);
struct bpf_prog **cur_pprog;
struct bpf_link **cur_plink;
if (!progs) if (!progs)
return -EOPNOTSUPP; return -EOPNOTSUPP;
switch (which) { switch (which) {
case BPF_SK_MSG_VERDICT: case BPF_SK_MSG_VERDICT:
*pprog = &progs->msg_parser; cur_pprog = &progs->msg_parser;
cur_plink = &progs->msg_parser_link;
break; break;
#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_PARSER:
*pprog = &progs->stream_parser; cur_pprog = &progs->stream_parser;
cur_plink = &progs->stream_parser_link;
break; break;
#endif #endif
case BPF_SK_SKB_STREAM_VERDICT: case BPF_SK_SKB_STREAM_VERDICT:
if (progs->skb_verdict) if (progs->skb_verdict)
return -EBUSY; return -EBUSY;
*pprog = &progs->stream_verdict; cur_pprog = &progs->stream_verdict;
cur_plink = &progs->stream_verdict_link;
break; break;
case BPF_SK_SKB_VERDICT: case BPF_SK_SKB_VERDICT:
if (progs->stream_verdict) if (progs->stream_verdict)
return -EBUSY; return -EBUSY;
*pprog = &progs->skb_verdict; cur_pprog = &progs->skb_verdict;
cur_plink = &progs->skb_verdict_link;
break; break;
default: default:
return -EOPNOTSUPP; return -EOPNOTSUPP;
} }
*pprog = cur_pprog;
if (plink)
*plink = cur_plink;
return 0; return 0;
} }
/* Handle the following four cases:
* prog_attach: prog != NULL, old == NULL, link == NULL
* prog_detach: prog == NULL, old != NULL, link == NULL
* link_attach: prog != NULL, old == NULL, link != NULL
* link_detach: prog == NULL, old != NULL, link != NULL
*/
static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
struct bpf_prog *old, u32 which) struct bpf_prog *old, struct bpf_link *link,
u32 which)
{ {
struct bpf_prog **pprog; struct bpf_prog **pprog;
struct bpf_link **plink;
int ret; int ret;
ret = sock_map_prog_lookup(map, &pprog, which); ret = sock_map_prog_link_lookup(map, &pprog, &plink, which);
if (ret) if (ret)
return ret; return ret;
if (old) /* for prog_attach/prog_detach/link_attach, return error if a bpf_link
return psock_replace_prog(pprog, prog, old); * exists for that prog.
*/
if ((!link || prog) && *plink)
return -EBUSY;
psock_set_prog(pprog, prog); if (old) {
return 0; ret = psock_replace_prog(pprog, prog, old);
if (!ret)
*plink = NULL;
} else {
psock_set_prog(pprog, prog);
if (link)
*plink = link;
}
return ret;
} }
int sock_map_bpf_prog_query(const union bpf_attr *attr, int sock_map_bpf_prog_query(const union bpf_attr *attr,
@ -1533,7 +1574,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
rcu_read_lock(); rcu_read_lock();
ret = sock_map_prog_lookup(map, &pprog, attr->query.attach_type); ret = sock_map_prog_link_lookup(map, &pprog, NULL, attr->query.attach_type);
if (ret) if (ret)
goto end; goto end;
@ -1663,6 +1704,196 @@ void sock_map_close(struct sock *sk, long timeout)
} }
EXPORT_SYMBOL_GPL(sock_map_close); EXPORT_SYMBOL_GPL(sock_map_close);
struct sockmap_link {
struct bpf_link link;
struct bpf_map *map;
enum bpf_attach_type attach_type;
};
static void sock_map_link_release(struct bpf_link *link)
{
struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
mutex_lock(&sockmap_mutex);
if (!sockmap_link->map)
goto out;
WARN_ON_ONCE(sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
sockmap_link->attach_type));
bpf_map_put_with_uref(sockmap_link->map);
sockmap_link->map = NULL;
out:
mutex_unlock(&sockmap_mutex);
}
static int sock_map_link_detach(struct bpf_link *link)
{
sock_map_link_release(link);
return 0;
}
static void sock_map_link_dealloc(struct bpf_link *link)
{
kfree(link);
}
/* Handle the following two cases:
* case 1: link != NULL, prog != NULL, old != NULL
* case 2: link != NULL, prog != NULL, old == NULL
*/
static int sock_map_link_update_prog(struct bpf_link *link,
struct bpf_prog *prog,
struct bpf_prog *old)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
struct bpf_prog **pprog, *old_link_prog;
struct bpf_link **plink;
int ret = 0;
mutex_lock(&sockmap_mutex);
/* If old prog is not NULL, ensure old prog is the same as link->prog. */
if (old && link->prog != old) {
ret = -EPERM;
goto out;
}
/* Ensure link->prog has the same type/attach_type as the new prog. */
if (link->prog->type != prog->type ||
link->prog->expected_attach_type != prog->expected_attach_type) {
ret = -EINVAL;
goto out;
}
ret = sock_map_prog_link_lookup(sockmap_link->map, &pprog, &plink,
sockmap_link->attach_type);
if (ret)
goto out;
/* return error if the stored bpf_link does not match the incoming bpf_link. */
if (link != *plink) {
ret = -EBUSY;
goto out;
}
if (old) {
ret = psock_replace_prog(pprog, prog, old);
if (ret)
goto out;
} else {
psock_set_prog(pprog, prog);
}
bpf_prog_inc(prog);
old_link_prog = xchg(&link->prog, prog);
bpf_prog_put(old_link_prog);
out:
mutex_unlock(&sockmap_mutex);
return ret;
}
static u32 sock_map_link_get_map_id(const struct sockmap_link *sockmap_link)
{
u32 map_id = 0;
mutex_lock(&sockmap_mutex);
if (sockmap_link->map)
map_id = sockmap_link->map->id;
mutex_unlock(&sockmap_mutex);
return map_id;
}
static int sock_map_link_fill_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
u32 map_id = sock_map_link_get_map_id(sockmap_link);
info->sockmap.map_id = map_id;
info->sockmap.attach_type = sockmap_link->attach_type;
return 0;
}
static void sock_map_link_show_fdinfo(const struct bpf_link *link,
struct seq_file *seq)
{
const struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
u32 map_id = sock_map_link_get_map_id(sockmap_link);
seq_printf(seq, "map_id:\t%u\n", map_id);
seq_printf(seq, "attach_type:\t%u\n", sockmap_link->attach_type);
}
static const struct bpf_link_ops sock_map_link_ops = {
.release = sock_map_link_release,
.dealloc = sock_map_link_dealloc,
.detach = sock_map_link_detach,
.update_prog = sock_map_link_update_prog,
.fill_link_info = sock_map_link_fill_info,
.show_fdinfo = sock_map_link_show_fdinfo,
};
int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
{
struct bpf_link_primer link_primer;
struct sockmap_link *sockmap_link;
enum bpf_attach_type attach_type;
struct bpf_map *map;
int ret;
if (attr->link_create.flags)
return -EINVAL;
map = bpf_map_get_with_uref(attr->link_create.target_fd);
if (IS_ERR(map))
return PTR_ERR(map);
if (map->map_type != BPF_MAP_TYPE_SOCKMAP && map->map_type != BPF_MAP_TYPE_SOCKHASH) {
ret = -EINVAL;
goto out;
}
sockmap_link = kzalloc(sizeof(*sockmap_link), GFP_USER);
if (!sockmap_link) {
ret = -ENOMEM;
goto out;
}
attach_type = attr->link_create.attach_type;
bpf_link_init(&sockmap_link->link, BPF_LINK_TYPE_SOCKMAP, &sock_map_link_ops, prog);
sockmap_link->map = map;
sockmap_link->attach_type = attach_type;
ret = bpf_link_prime(&sockmap_link->link, &link_primer);
if (ret) {
kfree(sockmap_link);
goto out;
}
mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
mutex_unlock(&sockmap_mutex);
if (ret) {
bpf_link_cleanup(&link_primer);
goto out;
}
/* Increase refcnt for the prog since when old prog is replaced with
* psock_replace_prog() and psock_set_prog() its refcnt will be decreased.
*
* Actually, we do not need to increase refcnt for the prog since bpf_link
* will hold a reference. But in order to have less complexity w.r.t.
* replacing/setting prog, let us increase the refcnt to make things simpler.
*/
bpf_prog_inc(prog);
return bpf_link_settle(&link_primer);
out:
bpf_map_put_with_uref(map);
return ret;
}
static int sock_map_iter_attach_target(struct bpf_prog *prog, static int sock_map_iter_attach_target(struct bpf_prog *prog,
union bpf_iter_link_info *linfo, union bpf_iter_link_info *linfo,
struct bpf_iter_aux_info *aux) struct bpf_iter_aux_info *aux)

View File

@ -1156,8 +1156,6 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
}; };
BTF_KFUNCS_START(tcp_bbr_check_kfunc_ids) BTF_KFUNCS_START(tcp_bbr_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, bbr_init) BTF_ID_FLAGS(func, bbr_init)
BTF_ID_FLAGS(func, bbr_main) BTF_ID_FLAGS(func, bbr_main)
BTF_ID_FLAGS(func, bbr_sndbuf_expand) BTF_ID_FLAGS(func, bbr_sndbuf_expand)
@ -1166,8 +1164,6 @@ BTF_ID_FLAGS(func, bbr_cwnd_event)
BTF_ID_FLAGS(func, bbr_ssthresh) BTF_ID_FLAGS(func, bbr_ssthresh)
BTF_ID_FLAGS(func, bbr_min_tso_segs) BTF_ID_FLAGS(func, bbr_min_tso_segs)
BTF_ID_FLAGS(func, bbr_set_state) BTF_ID_FLAGS(func, bbr_set_state)
#endif
#endif
BTF_KFUNCS_END(tcp_bbr_check_kfunc_ids) BTF_KFUNCS_END(tcp_bbr_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = { static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = {

View File

@ -486,16 +486,12 @@ static struct tcp_congestion_ops cubictcp __read_mostly = {
}; };
BTF_KFUNCS_START(tcp_cubic_check_kfunc_ids) BTF_KFUNCS_START(tcp_cubic_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, cubictcp_init) BTF_ID_FLAGS(func, cubictcp_init)
BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh) BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh)
BTF_ID_FLAGS(func, cubictcp_cong_avoid) BTF_ID_FLAGS(func, cubictcp_cong_avoid)
BTF_ID_FLAGS(func, cubictcp_state) BTF_ID_FLAGS(func, cubictcp_state)
BTF_ID_FLAGS(func, cubictcp_cwnd_event) BTF_ID_FLAGS(func, cubictcp_cwnd_event)
BTF_ID_FLAGS(func, cubictcp_acked) BTF_ID_FLAGS(func, cubictcp_acked)
#endif
#endif
BTF_KFUNCS_END(tcp_cubic_check_kfunc_ids) BTF_KFUNCS_END(tcp_cubic_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = { static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = {

View File

@ -261,16 +261,12 @@ static struct tcp_congestion_ops dctcp_reno __read_mostly = {
}; };
BTF_KFUNCS_START(tcp_dctcp_check_kfunc_ids) BTF_KFUNCS_START(tcp_dctcp_check_kfunc_ids)
#ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID_FLAGS(func, dctcp_init) BTF_ID_FLAGS(func, dctcp_init)
BTF_ID_FLAGS(func, dctcp_update_alpha) BTF_ID_FLAGS(func, dctcp_update_alpha)
BTF_ID_FLAGS(func, dctcp_cwnd_event) BTF_ID_FLAGS(func, dctcp_cwnd_event)
BTF_ID_FLAGS(func, dctcp_ssthresh) BTF_ID_FLAGS(func, dctcp_ssthresh)
BTF_ID_FLAGS(func, dctcp_cwnd_undo) BTF_ID_FLAGS(func, dctcp_cwnd_undo)
BTF_ID_FLAGS(func, dctcp_state) BTF_ID_FLAGS(func, dctcp_state)
#endif
#endif
BTF_KFUNCS_END(tcp_dctcp_check_kfunc_ids) BTF_KFUNCS_END(tcp_dctcp_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = { static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = {

View File

@ -913,7 +913,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->rtt_seq = tp->snd_nxt; tp->rtt_seq = tp->snd_nxt;
tp->mdev_max_us = tcp_rto_min_us(sk); tp->mdev_max_us = tcp_rto_min_us(sk);
tcp_bpf_rtt(sk); tcp_bpf_rtt(sk, mrtt_us, srtt);
} }
} else { } else {
/* no previous measure. */ /* no previous measure. */
@ -923,7 +923,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
tp->mdev_max_us = tp->rttvar_us; tp->mdev_max_us = tp->rttvar_us;
tp->rtt_seq = tp->snd_nxt; tp->rtt_seq = tp->snd_nxt;
tcp_bpf_rtt(sk); tcp_bpf_rtt(sk, mrtt_us, srtt);
} }
tp->srtt_us = max(1U, srtt); tp->srtt_us = max(1U, srtt);
} }

View File

@ -31,9 +31,9 @@ see_also = $(subst " ",, \
"\n" \ "\n" \
"SEE ALSO\n" \ "SEE ALSO\n" \
"========\n" \ "========\n" \
"\t**bpf**\ (2),\n" \ "**bpf**\ (2),\n" \
"\t**bpf-helpers**\\ (7)" \ "**bpf-helpers**\\ (7)" \
$(foreach page,$(call list_pages,$(1)),",\n\t**$(page)**\\ (8)") \ $(foreach page,$(call list_pages,$(1)),",\n**$(page)**\\ (8)") \
"\n") "\n")
$(OUTPUT)%.8: %.rst $(OUTPUT)%.8: %.rst

View File

@ -14,82 +14,76 @@ tool for inspection of BTF data
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **btf** *COMMAND* **bpftool** [*OPTIONS*] **btf** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } } *OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
*COMMANDS* := { **dump** | **help** } *COMMANDS* := { **dump** | **help** }
BTF COMMANDS BTF COMMANDS
============= =============
| **bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*] | **bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*]
| **bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*] | **bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*]
| **bpftool** **btf help** | **bpftool** **btf help**
| |
| *BTF_SRC* := { **id** *BTF_ID* | **prog** *PROG* | **map** *MAP* [{**key** | **value** | **kv** | **all**}] | **file** *FILE* } | *BTF_SRC* := { **id** *BTF_ID* | **prog** *PROG* | **map** *MAP* [{**key** | **value** | **kv** | **all**}] | **file** *FILE* }
| *FORMAT* := { **raw** | **c** } | *FORMAT* := { **raw** | **c** }
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* } | *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool btf { show | list }** [**id** *BTF_ID*] bpftool btf { show | list } [id *BTF_ID*]
Show information about loaded BTF objects. If a BTF ID is Show information about loaded BTF objects. If a BTF ID is specified, show
specified, show information only about given BTF object, information only about given BTF object, otherwise list all BTF objects
otherwise list all BTF objects currently loaded on the currently loaded on the system.
system.
Since Linux 5.8 bpftool is able to discover information about Since Linux 5.8 bpftool is able to discover information about processes
processes that hold open file descriptors (FDs) against BTF that hold open file descriptors (FDs) against BTF objects. On such kernels
objects. On such kernels bpftool will automatically emit this bpftool will automatically emit this information as well.
information as well.
**bpftool btf dump** *BTF_SRC* bpftool btf dump *BTF_SRC*
Dump BTF entries from a given *BTF_SRC*. Dump BTF entries from a given *BTF_SRC*.
When **id** is specified, BTF object with that ID will be When **id** is specified, BTF object with that ID will be loaded and all
loaded and all its BTF types emitted. its BTF types emitted.
When **map** is provided, it's expected that map has When **map** is provided, it's expected that map has associated BTF object
associated BTF object with BTF types describing key and with BTF types describing key and value. It's possible to select whether to
value. It's possible to select whether to dump only BTF dump only BTF type(s) associated with key (**key**), value (**value**),
type(s) associated with key (**key**), value (**value**), both key and value (**kv**), or all BTF types present in associated BTF
both key and value (**kv**), or all BTF types present in object (**all**). If not specified, **kv** is assumed.
associated BTF object (**all**). If not specified, **kv**
is assumed.
When **prog** is provided, it's expected that program has When **prog** is provided, it's expected that program has associated BTF
associated BTF object with BTF types. object with BTF types.
When specifying *FILE*, an ELF file is expected, containing When specifying *FILE*, an ELF file is expected, containing .BTF section
.BTF section with well-defined BTF binary format data, with well-defined BTF binary format data, typically produced by clang or
typically produced by clang or pahole. pahole.
**format** option can be used to override default (raw) **format** option can be used to override default (raw) output format. Raw
output format. Raw (**raw**) or C-syntax (**c**) output (**raw**) or C-syntax (**c**) output formats are supported.
formats are supported.
**bpftool btf help** bpftool btf help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-B, --base-btf *FILE* -B, --base-btf *FILE*
Pass a base BTF object. Base BTF objects are typically used Pass a base BTF object. Base BTF objects are typically used with BTF
with BTF objects for kernel modules. To avoid duplicating objects for kernel modules. To avoid duplicating all kernel symbols
all kernel symbols required by modules, BTF objects for required by modules, BTF objects for modules are "split", they are
modules are "split", they are built incrementally on top of built incrementally on top of the kernel (vmlinux) BTF object. So the
the kernel (vmlinux) BTF object. So the base BTF reference base BTF reference should usually point to the kernel BTF.
should usually point to the kernel BTF.
When the main BTF object to process (for example, the When the main BTF object to process (for example, the module BTF to
module BTF to dump) is passed as a *FILE*, bpftool attempts dump) is passed as a *FILE*, bpftool attempts to autodetect the path
to autodetect the path for the base object, and passing for the base object, and passing this option is optional. When the main
this option is optional. When the main BTF object is passed BTF object is passed through other handles, this option becomes
through other handles, this option becomes necessary. necessary.
EXAMPLES EXAMPLES
======== ========

View File

@ -14,134 +14,125 @@ tool for inspection and simple manipulation of eBPF progs
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **cgroup** *COMMAND* **bpftool** [*OPTIONS*] **cgroup** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } } *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } }
*COMMANDS* := *COMMANDS* :=
{ **show** | **list** | **tree** | **attach** | **detach** | **help** } { **show** | **list** | **tree** | **attach** | **detach** | **help** }
CGROUP COMMANDS CGROUP COMMANDS
=============== ===============
| **bpftool** **cgroup** { **show** | **list** } *CGROUP* [**effective**] | **bpftool** **cgroup** { **show** | **list** } *CGROUP* [**effective**]
| **bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**] | **bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**]
| **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*] | **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
| **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG* | **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
| **bpftool** **cgroup help** | **bpftool** **cgroup help**
| |
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *ATTACH_TYPE* := { **cgroup_inet_ingress** | **cgroup_inet_egress** | | *ATTACH_TYPE* := { **cgroup_inet_ingress** | **cgroup_inet_egress** |
| **cgroup_inet_sock_create** | **cgroup_sock_ops** | | **cgroup_inet_sock_create** | **cgroup_sock_ops** |
| **cgroup_device** | **cgroup_inet4_bind** | **cgroup_inet6_bind** | | **cgroup_device** | **cgroup_inet4_bind** | **cgroup_inet6_bind** |
| **cgroup_inet4_post_bind** | **cgroup_inet6_post_bind** | | **cgroup_inet4_post_bind** | **cgroup_inet6_post_bind** |
| **cgroup_inet4_connect** | **cgroup_inet6_connect** | | **cgroup_inet4_connect** | **cgroup_inet6_connect** |
| **cgroup_unix_connect** | **cgroup_inet4_getpeername** | | **cgroup_unix_connect** | **cgroup_inet4_getpeername** |
| **cgroup_inet6_getpeername** | **cgroup_unix_getpeername** | | **cgroup_inet6_getpeername** | **cgroup_unix_getpeername** |
| **cgroup_inet4_getsockname** | **cgroup_inet6_getsockname** | | **cgroup_inet4_getsockname** | **cgroup_inet6_getsockname** |
| **cgroup_unix_getsockname** | **cgroup_udp4_sendmsg** | | **cgroup_unix_getsockname** | **cgroup_udp4_sendmsg** |
| **cgroup_udp6_sendmsg** | **cgroup_unix_sendmsg** | | **cgroup_udp6_sendmsg** | **cgroup_unix_sendmsg** |
| **cgroup_udp4_recvmsg** | **cgroup_udp6_recvmsg** | | **cgroup_udp4_recvmsg** | **cgroup_udp6_recvmsg** |
| **cgroup_unix_recvmsg** | **cgroup_sysctl** | | **cgroup_unix_recvmsg** | **cgroup_sysctl** |
| **cgroup_getsockopt** | **cgroup_setsockopt** | | **cgroup_getsockopt** | **cgroup_setsockopt** |
| **cgroup_inet_sock_release** } | **cgroup_inet_sock_release** }
| *ATTACH_FLAGS* := { **multi** | **override** } | *ATTACH_FLAGS* := { **multi** | **override** }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool cgroup { show | list }** *CGROUP* [**effective**] bpftool cgroup { show | list } *CGROUP* [effective]
List all programs attached to the cgroup *CGROUP*. List all programs attached to the cgroup *CGROUP*.
Output will start with program ID followed by attach type, Output will start with program ID followed by attach type, attach flags and
attach flags and program name. program name.
If **effective** is specified retrieve effective programs that If **effective** is specified retrieve effective programs that will execute
will execute for events within a cgroup. This includes for events within a cgroup. This includes inherited along with attached
inherited along with attached ones. ones.
**bpftool cgroup tree** [*CGROUP_ROOT*] [**effective**] bpftool cgroup tree [*CGROUP_ROOT*] [effective]
Iterate over all cgroups in *CGROUP_ROOT* and list all Iterate over all cgroups in *CGROUP_ROOT* and list all attached programs.
attached programs. If *CGROUP_ROOT* is not specified, If *CGROUP_ROOT* is not specified, bpftool uses cgroup v2 mountpoint.
bpftool uses cgroup v2 mountpoint.
The output is similar to the output of cgroup show/list The output is similar to the output of cgroup show/list commands: it starts
commands: it starts with absolute cgroup path, followed by with absolute cgroup path, followed by program ID, attach type, attach
program ID, attach type, attach flags and program name. flags and program name.
If **effective** is specified retrieve effective programs that If **effective** is specified retrieve effective programs that will execute
will execute for events within a cgroup. This includes for events within a cgroup. This includes inherited along with attached
inherited along with attached ones. ones.
**bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*] bpftool cgroup attach *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
Attach program *PROG* to the cgroup *CGROUP* with attach type Attach program *PROG* to the cgroup *CGROUP* with attach type *ATTACH_TYPE*
*ATTACH_TYPE* and optional *ATTACH_FLAGS*. and optional *ATTACH_FLAGS*.
*ATTACH_FLAGS* can be one of: **override** if a sub-cgroup installs *ATTACH_FLAGS* can be one of: **override** if a sub-cgroup installs some
some bpf program, the program in this cgroup yields to sub-cgroup bpf program, the program in this cgroup yields to sub-cgroup program;
program; **multi** if a sub-cgroup installs some bpf program, **multi** if a sub-cgroup installs some bpf program, that cgroup program
that cgroup program gets run in addition to the program in this gets run in addition to the program in this cgroup.
cgroup.
Only one program is allowed to be attached to a cgroup with Only one program is allowed to be attached to a cgroup with no attach flags
no attach flags or the **override** flag. Attaching another or the **override** flag. Attaching another program will release old
program will release old program and attach the new one. program and attach the new one.
Multiple programs are allowed to be attached to a cgroup with Multiple programs are allowed to be attached to a cgroup with **multi**.
**multi**. They are executed in FIFO order (those that were They are executed in FIFO order (those that were attached first, run
attached first, run first). first).
Non-default *ATTACH_FLAGS* are supported by kernel version 4.14 Non-default *ATTACH_FLAGS* are supported by kernel version 4.14 and later.
and later.
*ATTACH_TYPE* can be on of: *ATTACH_TYPE* can be one of:
**ingress** ingress path of the inet socket (since 4.10);
**egress** egress path of the inet socket (since 4.10);
**sock_create** opening of an inet socket (since 4.10);
**sock_ops** various socket operations (since 4.12);
**device** device access (since 4.15);
**bind4** call to bind(2) for an inet4 socket (since 4.17);
**bind6** call to bind(2) for an inet6 socket (since 4.17);
**post_bind4** return from bind(2) for an inet4 socket (since 4.17);
**post_bind6** return from bind(2) for an inet6 socket (since 4.17);
**connect4** call to connect(2) for an inet4 socket (since 4.17);
**connect6** call to connect(2) for an inet6 socket (since 4.17);
**connect_unix** call to connect(2) for a unix socket (since 6.7);
**sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an
unconnected udp4 socket (since 4.18);
**sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an
unconnected udp6 socket (since 4.18);
**sendmsg_unix** call to sendto(2), sendmsg(2), sendmmsg(2) for
an unconnected unix socket (since 6.7);
**recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp4 socket (since 5.2);
**recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected udp6 socket (since 5.2);
**recvmsg_unix** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
an unconnected unix socket (since 6.7);
**sysctl** sysctl access (since 5.2);
**getsockopt** call to getsockopt (since 5.3);
**setsockopt** call to setsockopt (since 5.3);
**getpeername4** call to getpeername(2) for an inet4 socket (since 5.8);
**getpeername6** call to getpeername(2) for an inet6 socket (since 5.8);
**getpeername_unix** call to getpeername(2) for a unix socket (since 6.7);
**getsockname4** call to getsockname(2) for an inet4 socket (since 5.8);
**getsockname6** call to getsockname(2) for an inet6 socket (since 5.8).
**getsockname_unix** call to getsockname(2) for a unix socket (since 6.7);
**sock_release** closing an userspace inet socket (since 5.9).
**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG* - **ingress** ingress path of the inet socket (since 4.10)
Detach *PROG* from the cgroup *CGROUP* and attach type - **egress** egress path of the inet socket (since 4.10)
*ATTACH_TYPE*. - **sock_create** opening of an inet socket (since 4.10)
- **sock_ops** various socket operations (since 4.12)
- **device** device access (since 4.15)
- **bind4** call to bind(2) for an inet4 socket (since 4.17)
- **bind6** call to bind(2) for an inet6 socket (since 4.17)
- **post_bind4** return from bind(2) for an inet4 socket (since 4.17)
- **post_bind6** return from bind(2) for an inet6 socket (since 4.17)
- **connect4** call to connect(2) for an inet4 socket (since 4.17)
- **connect6** call to connect(2) for an inet6 socket (since 4.17)
- **connect_unix** call to connect(2) for a unix socket (since 6.7)
- **sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected udp4 socket (since 4.18)
- **sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected udp6 socket (since 4.18)
- **sendmsg_unix** call to sendto(2), sendmsg(2), sendmmsg(2) for an unconnected unix socket (since 6.7)
- **recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected udp4 socket (since 5.2)
- **recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected udp6 socket (since 5.2)
- **recvmsg_unix** call to recvfrom(2), recvmsg(2), recvmmsg(2) for an unconnected unix socket (since 6.7)
- **sysctl** sysctl access (since 5.2)
- **getsockopt** call to getsockopt (since 5.3)
- **setsockopt** call to setsockopt (since 5.3)
- **getpeername4** call to getpeername(2) for an inet4 socket (since 5.8)
- **getpeername6** call to getpeername(2) for an inet6 socket (since 5.8)
- **getpeername_unix** call to getpeername(2) for a unix socket (since 6.7)
- **getsockname4** call to getsockname(2) for an inet4 socket (since 5.8)
- **getsockname6** call to getsockname(2) for an inet6 socket (since 5.8)
- **getsockname_unix** call to getsockname(2) for a unix socket (since 6.7)
- **sock_release** closing a userspace inet socket (since 5.9)
**bpftool prog help** bpftool cgroup detach *CGROUP* *ATTACH_TYPE* *PROG*
Print short help message. Detach *PROG* from the cgroup *CGROUP* and attach type *ATTACH_TYPE*.
bpftool prog help
Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-f, --bpffs -f, --bpffs
Show file names of pinned programs. Show file names of pinned programs.
EXAMPLES EXAMPLES
======== ========

View File

@ -14,77 +14,70 @@ tool for inspection of eBPF-related parameters for Linux kernel or net device
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **feature** *COMMAND* **bpftool** [*OPTIONS*] **feature** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| } *OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **probe** | **help** } *COMMANDS* := { **probe** | **help** }
FEATURE COMMANDS FEATURE COMMANDS
================ ================
| **bpftool** **feature probe** [*COMPONENT*] [**full**] [**unprivileged**] [**macros** [**prefix** *PREFIX*]] | **bpftool** **feature probe** [*COMPONENT*] [**full**] [**unprivileged**] [**macros** [**prefix** *PREFIX*]]
| **bpftool** **feature list_builtins** *GROUP* | **bpftool** **feature list_builtins** *GROUP*
| **bpftool** **feature help** | **bpftool** **feature help**
| |
| *COMPONENT* := { **kernel** | **dev** *NAME* } | *COMPONENT* := { **kernel** | **dev** *NAME* }
| *GROUP* := { **prog_types** | **map_types** | **attach_types** | **link_types** | **helpers** } | *GROUP* := { **prog_types** | **map_types** | **attach_types** | **link_types** | **helpers** }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool feature probe** [**kernel**] [**full**] [**macros** [**prefix** *PREFIX*]] bpftool feature probe [kernel] [full] [macros [prefix *PREFIX*]]
Probe the running kernel and dump a number of eBPF-related Probe the running kernel and dump a number of eBPF-related parameters, such
parameters, such as availability of the **bpf**\ () system call, as availability of the **bpf**\ () system call, JIT status, eBPF program
JIT status, eBPF program types availability, eBPF helper types availability, eBPF helper functions availability, and more.
functions availability, and more.
By default, bpftool **does not run probes** for By default, bpftool **does not run probes** for **bpf_probe_write_user**\
**bpf_probe_write_user**\ () and **bpf_trace_printk**\() () and **bpf_trace_printk**\() helpers which print warnings to kernel logs.
helpers which print warnings to kernel logs. To enable them To enable them and run all probes, the **full** keyword should be used.
and run all probes, the **full** keyword should be used.
If the **macros** keyword (but not the **-j** option) is If the **macros** keyword (but not the **-j** option) is passed, a subset
passed, a subset of the output is dumped as a list of of the output is dumped as a list of **#define** macros that are ready to
**#define** macros that are ready to be included in a C be included in a C header file, for example. If, additionally, **prefix**
header file, for example. If, additionally, **prefix** is is used to define a *PREFIX*, the provided string will be used as a prefix
used to define a *PREFIX*, the provided string will be used to the names of the macros: this can be used to avoid conflicts on macro
as a prefix to the names of the macros: this can be used to names when including the output of this command as a header file.
avoid conflicts on macro names when including the output of
this command as a header file.
Keyword **kernel** can be omitted. If no probe target is Keyword **kernel** can be omitted. If no probe target is specified, probing
specified, probing the kernel is the default behaviour. the kernel is the default behaviour.
When the **unprivileged** keyword is used, bpftool will dump When the **unprivileged** keyword is used, bpftool will dump only the
only the features available to a user who does not have the features available to a user who does not have the **CAP_SYS_ADMIN**
**CAP_SYS_ADMIN** capability set. The features available in capability set. The features available in that case usually represent a
that case usually represent a small subset of the parameters small subset of the parameters supported by the system. Unprivileged users
supported by the system. Unprivileged users MUST use the MUST use the **unprivileged** keyword: This is to avoid misdetection if
**unprivileged** keyword: This is to avoid misdetection if bpftool is inadvertently run as non-root, for example. This keyword is
bpftool is inadvertently run as non-root, for example. This unavailable if bpftool was compiled without libcap.
keyword is unavailable if bpftool was compiled without
libcap.
**bpftool feature probe dev** *NAME* [**full**] [**macros** [**prefix** *PREFIX*]] bpftool feature probe dev *NAME* [full] [macros [prefix *PREFIX*]]
Probe network device for supported eBPF features and dump Probe network device for supported eBPF features and dump results to the
results to the console. console.
The keywords **full**, **macros** and **prefix** have the The keywords **full**, **macros** and **prefix** have the same role as when
same role as when probing the kernel. probing the kernel.
**bpftool feature list_builtins** *GROUP* bpftool feature list_builtins *GROUP*
List items known to bpftool. These can be BPF program types List items known to bpftool. These can be BPF program types
(**prog_types**), BPF map types (**map_types**), attach types (**prog_types**), BPF map types (**map_types**), attach types
(**attach_types**), link types (**link_types**), or BPF helper (**attach_types**), link types (**link_types**), or BPF helper functions
functions (**helpers**). The command does not probe the system, but (**helpers**). The command does not probe the system, but simply lists the
simply lists the elements that bpftool knows from compilation time, elements that bpftool knows from compilation time, as provided from libbpf
as provided from libbpf (for all object types) or from the BPF UAPI (for all object types) or from the BPF UAPI header (list of helpers). This
header (list of helpers). This can be used in scripts to iterate over can be used in scripts to iterate over BPF types or helpers.
BPF types or helpers.
**bpftool feature help** bpftool feature help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst

View File

@ -14,199 +14,177 @@ tool for BPF code-generation
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **gen** *COMMAND* **bpftool** [*OPTIONS*] **gen** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } } *OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } }
*COMMAND* := { **object** | **skeleton** | **help** } *COMMAND* := { **object** | **skeleton** | **help** }
GEN COMMANDS GEN COMMANDS
============= =============
| **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...] | **bpftool** **gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
| **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*] | **bpftool** **gen skeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*] | **bpftool** **gen subskeleton** *FILE* [**name** *OBJECT_NAME*]
| **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...] | **bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
| **bpftool** **gen help** | **bpftool** **gen help**
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool gen object** *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...] bpftool gen object *OUTPUT_FILE* *INPUT_FILE* [*INPUT_FILE*...]
Statically link (combine) together one or more *INPUT_FILE*'s Statically link (combine) together one or more *INPUT_FILE*'s into a single
into a single resulting *OUTPUT_FILE*. All the files involved resulting *OUTPUT_FILE*. All the files involved are BPF ELF object files.
are BPF ELF object files.
The rules of BPF static linking are mostly the same as for The rules of BPF static linking are mostly the same as for user-space
user-space object files, but in addition to combining data object files, but in addition to combining data and instruction sections,
and instruction sections, .BTF and .BTF.ext (if present in .BTF and .BTF.ext (if present in any of the input files) data are combined
any of the input files) data are combined together. .BTF together. .BTF data is deduplicated, so all the common types across
data is deduplicated, so all the common types across *INPUT_FILE*'s will only be represented once in the resulting BTF
*INPUT_FILE*'s will only be represented once in the resulting information.
BTF information.
BPF static linking allows to partition BPF source code into BPF static linking allows to partition BPF source code into individually
individually compiled files that are then linked into compiled files that are then linked into a single resulting BPF object
a single resulting BPF object file, which can be used to file, which can be used to generated BPF skeleton (with **gen skeleton**
generated BPF skeleton (with **gen skeleton** command) or command) or passed directly into **libbpf** (using **bpf_object__open()**
passed directly into **libbpf** (using **bpf_object__open()** family of APIs).
family of APIs).
**bpftool gen skeleton** *FILE* bpftool gen skeleton *FILE*
Generate BPF skeleton C header file for a given *FILE*. Generate BPF skeleton C header file for a given *FILE*.
BPF skeleton is an alternative interface to existing libbpf BPF skeleton is an alternative interface to existing libbpf APIs for
APIs for working with BPF objects. Skeleton code is intended working with BPF objects. Skeleton code is intended to significantly
to significantly shorten and simplify code to load and work shorten and simplify code to load and work with BPF programs from userspace
with BPF programs from userspace side. Generated code is side. Generated code is tailored to specific input BPF object *FILE*,
tailored to specific input BPF object *FILE*, reflecting its reflecting its structure by listing out available maps, program, variables,
structure by listing out available maps, program, variables, etc. Skeleton eliminates the need to lookup mentioned components by name.
etc. Skeleton eliminates the need to lookup mentioned Instead, if skeleton instantiation succeeds, they are populated in skeleton
components by name. Instead, if skeleton instantiation structure as valid libbpf types (e.g., **struct bpf_map** pointer) and can
succeeds, they are populated in skeleton structure as valid be passed to existing generic libbpf APIs.
libbpf types (e.g., **struct bpf_map** pointer) and can be
passed to existing generic libbpf APIs.
In addition to simple and reliable access to maps and In addition to simple and reliable access to maps and programs, skeleton
programs, skeleton provides a storage for BPF links (**struct provides a storage for BPF links (**struct bpf_link**) for each BPF program
bpf_link**) for each BPF program within BPF object. When within BPF object. When requested, supported BPF programs will be
requested, supported BPF programs will be automatically automatically attached and resulting BPF links stored for further use by
attached and resulting BPF links stored for further use by user in pre-allocated fields in skeleton struct. For BPF programs that
user in pre-allocated fields in skeleton struct. For BPF can't be automatically attached by libbpf, user can attach them manually,
programs that can't be automatically attached by libbpf, but store resulting BPF link in per-program link field. All such set up
user can attach them manually, but store resulting BPF link links will be automatically destroyed on BPF skeleton destruction. This
in per-program link field. All such set up links will be eliminates the need for users to manage links manually and rely on libbpf
automatically destroyed on BPF skeleton destruction. This support to detach programs and free up resources.
eliminates the need for users to manage links manually and
rely on libbpf support to detach programs and free up
resources.
Another facility provided by BPF skeleton is an interface to Another facility provided by BPF skeleton is an interface to global
global variables of all supported kinds: mutable, read-only, variables of all supported kinds: mutable, read-only, as well as extern
as well as extern ones. This interface allows to pre-setup ones. This interface allows to pre-setup initial values of variables before
initial values of variables before BPF object is loaded and BPF object is loaded and verified by kernel. For non-read-only variables,
verified by kernel. For non-read-only variables, the same the same interface can be used to fetch values of global variables on
interface can be used to fetch values of global variables on userspace side, even if they are modified by BPF code.
userspace side, even if they are modified by BPF code.
During skeleton generation, contents of source BPF object During skeleton generation, contents of source BPF object *FILE* is
*FILE* is embedded within generated code and is thus not embedded within generated code and is thus not necessary to keep around.
necessary to keep around. This ensures skeleton and BPF This ensures skeleton and BPF object file are matching 1-to-1 and always
object file are matching 1-to-1 and always stay in sync. stay in sync. Generated code is dual-licensed under LGPL-2.1 and
Generated code is dual-licensed under LGPL-2.1 and BSD-2-Clause licenses.
BSD-2-Clause licenses.
It is a design goal and guarantee that skeleton interfaces It is a design goal and guarantee that skeleton interfaces are
are interoperable with generic libbpf APIs. User should interoperable with generic libbpf APIs. User should always be able to use
always be able to use skeleton API to create and load BPF skeleton API to create and load BPF object, and later use libbpf APIs to
object, and later use libbpf APIs to keep working with keep working with specific maps, programs, etc.
specific maps, programs, etc.
As part of skeleton, few custom functions are generated. As part of skeleton, few custom functions are generated. Each of them is
Each of them is prefixed with object name. Object name can prefixed with object name. Object name can either be derived from object
either be derived from object file name, i.e., if BPF object file name, i.e., if BPF object file name is **example.o**, BPF object name
file name is **example.o**, BPF object name will be will be **example**. Object name can be also specified explicitly through
**example**. Object name can be also specified explicitly **name** *OBJECT_NAME* parameter. The following custom functions are
through **name** *OBJECT_NAME* parameter. The following provided (assuming **example** as the object name):
custom functions are provided (assuming **example** as
the object name):
- **example__open** and **example__open_opts**. - **example__open** and **example__open_opts**.
These functions are used to instantiate skeleton. It These functions are used to instantiate skeleton. It corresponds to
corresponds to libbpf's **bpf_object__open**\ () API. libbpf's **bpf_object__open**\ () API. **_opts** variants accepts extra
**_opts** variants accepts extra **bpf_object_open_opts** **bpf_object_open_opts** options.
options.
- **example__load**. - **example__load**.
This function creates maps, loads and verifies BPF This function creates maps, loads and verifies BPF programs, initializes
programs, initializes global data maps. It corresponds to global data maps. It corresponds to libppf's **bpf_object__load**\ ()
libppf's **bpf_object__load**\ () API. API.
- **example__open_and_load** combines **example__open** and - **example__open_and_load** combines **example__open** and
**example__load** invocations in one commonly used **example__load** invocations in one commonly used operation.
operation.
- **example__attach** and **example__detach** - **example__attach** and **example__detach**.
This pair of functions allow to attach and detach, This pair of functions allow to attach and detach, correspondingly,
correspondingly, already loaded BPF object. Only BPF already loaded BPF object. Only BPF programs of types supported by libbpf
programs of types supported by libbpf for auto-attachment for auto-attachment will be auto-attached and their corresponding BPF
will be auto-attached and their corresponding BPF links links instantiated. For other BPF programs, user can manually create a
instantiated. For other BPF programs, user can manually BPF link and assign it to corresponding fields in skeleton struct.
create a BPF link and assign it to corresponding fields in **example__detach** will detach both links created automatically, as well
skeleton struct. **example__detach** will detach both as those populated by user manually.
links created automatically, as well as those populated by
user manually.
- **example__destroy** - **example__destroy**.
Detach and unload BPF programs, free up all the resources Detach and unload BPF programs, free up all the resources used by
used by skeleton and BPF object. skeleton and BPF object.
If BPF object has global variables, corresponding structs If BPF object has global variables, corresponding structs with memory
with memory layout corresponding to global data data section layout corresponding to global data data section layout will be created.
layout will be created. Currently supported ones are: *.data*, Currently supported ones are: *.data*, *.bss*, *.rodata*, and *.kconfig*
*.bss*, *.rodata*, and *.kconfig* structs/data sections. structs/data sections. These data sections/structs can be used to set up
These data sections/structs can be used to set up initial initial values of variables, if set before **example__load**. Afterwards,
values of variables, if set before **example__load**. if target kernel supports memory-mapped BPF arrays, same structs can be
Afterwards, if target kernel supports memory-mapped BPF used to fetch and update (non-read-only) data from userspace, with same
arrays, same structs can be used to fetch and update simplicity as for BPF side.
(non-read-only) data from userspace, with same simplicity
as for BPF side.
**bpftool gen subskeleton** *FILE* bpftool gen subskeleton *FILE*
Generate BPF subskeleton C header file for a given *FILE*. Generate BPF subskeleton C header file for a given *FILE*.
Subskeletons are similar to skeletons, except they do not own Subskeletons are similar to skeletons, except they do not own the
the corresponding maps, programs, or global variables. They corresponding maps, programs, or global variables. They require that the
require that the object file used to generate them is already object file used to generate them is already loaded into a *bpf_object* by
loaded into a *bpf_object* by some other means. some other means.
This functionality is useful when a library is included into a This functionality is useful when a library is included into a larger BPF
larger BPF program. A subskeleton for the library would have program. A subskeleton for the library would have access to all objects and
access to all objects and globals defined in it, without globals defined in it, without having to know about the larger program.
having to know about the larger program.
Consequently, there are only two functions defined Consequently, there are only two functions defined for subskeletons:
for subskeletons:
- **example__open(bpf_object\*)** - **example__open(bpf_object\*)**.
Instantiates a subskeleton from an already opened (but not Instantiates a subskeleton from an already opened (but not necessarily
necessarily loaded) **bpf_object**. loaded) **bpf_object**.
- **example__destroy()** - **example__destroy()**.
Frees the storage for the subskeleton but *does not* unload Frees the storage for the subskeleton but *does not* unload any BPF
any BPF programs or maps. programs or maps.
**bpftool** **gen min_core_btf** *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...] bpftool gen min_core_btf *INPUT* *OUTPUT* *OBJECT* [*OBJECT*...]
Generate a minimum BTF file as *OUTPUT*, derived from a given Generate a minimum BTF file as *OUTPUT*, derived from a given *INPUT* BTF
*INPUT* BTF file, containing all needed BTF types so one, or file, containing all needed BTF types so one, or more, given eBPF objects
more, given eBPF objects CO-RE relocations may be satisfied. CO-RE relocations may be satisfied.
When kernels aren't compiled with CONFIG_DEBUG_INFO_BTF, When kernels aren't compiled with CONFIG_DEBUG_INFO_BTF, libbpf, when
libbpf, when loading an eBPF object, has to rely on external loading an eBPF object, has to rely on external BTF files to be able to
BTF files to be able to calculate CO-RE relocations. calculate CO-RE relocations.
Usually, an external BTF file is built from existing kernel Usually, an external BTF file is built from existing kernel DWARF data
DWARF data using pahole. It contains all the types used by using pahole. It contains all the types used by its respective kernel image
its respective kernel image and, because of that, is big. and, because of that, is big.
The min_core_btf feature builds smaller BTF files, customized The min_core_btf feature builds smaller BTF files, customized to one or
to one or multiple eBPF objects, so they can be distributed multiple eBPF objects, so they can be distributed together with an eBPF
together with an eBPF CO-RE based application, turning the CO-RE based application, turning the application portable to different
application portable to different kernel versions. kernel versions.
Check examples bellow for more information how to use it. Check examples bellow for more information how to use it.
**bpftool gen help** bpftool gen help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-L, --use-loader -L, --use-loader
For skeletons, generate a "light" skeleton (also known as "loader" For skeletons, generate a "light" skeleton (also known as "loader"
skeleton). A light skeleton contains a loader eBPF program. It does skeleton). A light skeleton contains a loader eBPF program. It does not use
not use the majority of the libbpf infrastructure, and does not need the majority of the libbpf infrastructure, and does not need libelf.
libelf.
EXAMPLES EXAMPLES
======== ========

View File

@ -14,50 +14,46 @@ tool to create BPF iterators
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **iter** *COMMAND* **bpftool** [*OPTIONS*] **iter** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| } *OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := { **pin** | **help** } *COMMANDS* := { **pin** | **help** }
ITER COMMANDS ITER COMMANDS
=================== =============
| **bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*] | **bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
| **bpftool** **iter help** | **bpftool** **iter help**
| |
| *OBJ* := /a/file/of/bpf_iter_target.o | *OBJ* := /a/file/of/bpf_iter_target.o
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* } | *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool iter pin** *OBJ* *PATH* [**map** *MAP*] bpftool iter pin *OBJ* *PATH* [map *MAP*]
A bpf iterator combines a kernel iterating of A bpf iterator combines a kernel iterating of particular kernel data (e.g.,
particular kernel data (e.g., tasks, bpf_maps, etc.) tasks, bpf_maps, etc.) and a bpf program called for each kernel data object
and a bpf program called for each kernel data object (e.g., one task, one bpf_map, etc.). User space can *read* kernel iterator
(e.g., one task, one bpf_map, etc.). User space can output through *read()* syscall.
*read* kernel iterator output through *read()* syscall.
The *pin* command creates a bpf iterator from *OBJ*, The *pin* command creates a bpf iterator from *OBJ*, and pin it to *PATH*.
and pin it to *PATH*. The *PATH* should be located The *PATH* should be located in *bpffs* mount. It must not contain a dot
in *bpffs* mount. It must not contain a dot character ('.'), which is reserved for future extensions of *bpffs*.
character ('.'), which is reserved for future extensions
of *bpffs*.
Map element bpf iterator requires an additional parameter Map element bpf iterator requires an additional parameter *MAP* so bpf
*MAP* so bpf program can iterate over map elements for program can iterate over map elements for that map. User can have a bpf
that map. User can have a bpf program in kernel to run program in kernel to run with each map element, do checking, filtering,
with each map element, do checking, filtering, aggregation, aggregation, etc. without copying data to user space.
etc. without copying data to user space.
User can then *cat PATH* to see the bpf iterator output. User can then *cat PATH* to see the bpf iterator output.
**bpftool iter help** bpftool iter help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
EXAMPLES EXAMPLES
======== ========

View File

@ -14,67 +14,62 @@ tool for inspection and simple manipulation of eBPF links
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **link** *COMMAND* **bpftool** [*OPTIONS*] **link** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } } *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* := { **show** | **list** | **pin** | **help** } *COMMANDS* := { **show** | **list** | **pin** | **help** }
LINK COMMANDS LINK COMMANDS
============= =============
| **bpftool** **link { show | list }** [*LINK*] | **bpftool** **link { show | list }** [*LINK*]
| **bpftool** **link pin** *LINK* *FILE* | **bpftool** **link pin** *LINK* *FILE*
| **bpftool** **link detach** *LINK* | **bpftool** **link detach** *LINK*
| **bpftool** **link help** | **bpftool** **link help**
| |
| *LINK* := { **id** *LINK_ID* | **pinned** *FILE* } | *LINK* := { **id** *LINK_ID* | **pinned** *FILE* }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool link { show | list }** [*LINK*] bpftool link { show | list } [*LINK*]
Show information about active links. If *LINK* is Show information about active links. If *LINK* is specified show
specified show information only about given link, information only about given link, otherwise list all links currently
otherwise list all links currently active on the system. active on the system.
Output will start with link ID followed by link type and Output will start with link ID followed by link type and zero or more named
zero or more named attributes, some of which depend on type attributes, some of which depend on type of link.
of link.
Since Linux 5.8 bpftool is able to discover information about Since Linux 5.8 bpftool is able to discover information about processes
processes that hold open file descriptors (FDs) against BPF that hold open file descriptors (FDs) against BPF links. On such kernels
links. On such kernels bpftool will automatically emit this bpftool will automatically emit this information as well.
information as well.
**bpftool link pin** *LINK* *FILE* bpftool link pin *LINK* *FILE*
Pin link *LINK* as *FILE*. Pin link *LINK* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
contain a dot character ('.'), which is reserved for future character ('.'), which is reserved for future extensions of *bpffs*.
extensions of *bpffs*.
**bpftool link detach** *LINK* bpftool link detach *LINK*
Force-detach link *LINK*. BPF link and its underlying BPF Force-detach link *LINK*. BPF link and its underlying BPF program will stay
program will stay valid, but they will be detached from the valid, but they will be detached from the respective BPF hook and BPF link
respective BPF hook and BPF link will transition into will transition into a defunct state until last open file descriptor for
a defunct state until last open file descriptor for that that link is closed.
link is closed.
**bpftool link help** bpftool link help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-f, --bpffs -f, --bpffs
When showing BPF links, show file names of pinned When showing BPF links, show file names of pinned links.
links.
-n, --nomount -n, --nomount
Do not automatically attempt to mount any virtual file system Do not automatically attempt to mount any virtual file system (such as
(such as tracefs or BPF virtual file system) when necessary. tracefs or BPF virtual file system) when necessary.
EXAMPLES EXAMPLES
======== ========

View File

@ -14,166 +14,160 @@ tool for inspection and simple manipulation of eBPF maps
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **map** *COMMAND* **bpftool** [*OPTIONS*] **map** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } } *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } }
*COMMANDS* := *COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** | { **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **help** } **delete** | **pin** | **help** }
MAP COMMANDS MAP COMMANDS
============= =============
| **bpftool** **map** { **show** | **list** } [*MAP*] | **bpftool** **map** { **show** | **list** } [*MAP*]
| **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \ | **bpftool** **map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
| **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] \ | **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] \
| [**offload_dev** *NAME*] | [**offload_dev** *NAME*]
| **bpftool** **map dump** *MAP* | **bpftool** **map dump** *MAP*
| **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*] | **bpftool** **map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
| **bpftool** **map lookup** *MAP* [**key** *DATA*] | **bpftool** **map lookup** *MAP* [**key** *DATA*]
| **bpftool** **map getnext** *MAP* [**key** *DATA*] | **bpftool** **map getnext** *MAP* [**key** *DATA*]
| **bpftool** **map delete** *MAP* **key** *DATA* | **bpftool** **map delete** *MAP* **key** *DATA*
| **bpftool** **map pin** *MAP* *FILE* | **bpftool** **map pin** *MAP* *FILE*
| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*] | **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map peek** *MAP* | **bpftool** **map peek** *MAP*
| **bpftool** **map push** *MAP* **value** *VALUE* | **bpftool** **map push** *MAP* **value** *VALUE*
| **bpftool** **map pop** *MAP* | **bpftool** **map pop** *MAP*
| **bpftool** **map enqueue** *MAP* **value** *VALUE* | **bpftool** **map enqueue** *MAP* **value** *VALUE*
| **bpftool** **map dequeue** *MAP* | **bpftool** **map dequeue** *MAP*
| **bpftool** **map freeze** *MAP* | **bpftool** **map freeze** *MAP*
| **bpftool** **map help** | **bpftool** **map help**
| |
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* } | *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* }
| *DATA* := { [**hex**] *BYTES* } | *DATA* := { [**hex**] *BYTES* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *VALUE* := { *DATA* | *MAP* | *PROG* } | *VALUE* := { *DATA* | *MAP* | *PROG* }
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** } | *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
| *TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash** | *TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
| | **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash** | | **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
| | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps** | | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash** | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
| | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** } | | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** | **arena** }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool map { show | list }** [*MAP*] bpftool map { show | list } [*MAP*]
Show information about loaded maps. If *MAP* is specified Show information about loaded maps. If *MAP* is specified show information
show information only about given maps, otherwise list all only about given maps, otherwise list all maps currently loaded on the
maps currently loaded on the system. In case of **name**, system. In case of **name**, *MAP* may match several maps which will all
*MAP* may match several maps which will all be shown. be shown.
Output will start with map ID followed by map type and Output will start with map ID followed by map type and zero or more named
zero or more named attributes (depending on kernel version). attributes (depending on kernel version).
Since Linux 5.8 bpftool is able to discover information about Since Linux 5.8 bpftool is able to discover information about processes
processes that hold open file descriptors (FDs) against BPF that hold open file descriptors (FDs) against BPF maps. On such kernels
maps. On such kernels bpftool will automatically emit this bpftool will automatically emit this information as well.
information as well.
**bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**inner_map** *MAP*] [**offload_dev** *NAME*] bpftool map create *FILE* type *TYPE* key *KEY_SIZE* value *VALUE_SIZE* entries *MAX_ENTRIES* name *NAME* [flags *FLAGS*] [inner_map *MAP*] [offload_dev *NAME*]
Create a new map with given parameters and pin it to *bpffs* Create a new map with given parameters and pin it to *bpffs* as *FILE*.
as *FILE*.
*FLAGS* should be an integer which is the combination of *FLAGS* should be an integer which is the combination of desired flags,
desired flags, e.g. 1024 for **BPF_F_MMAPABLE** (see bpf.h e.g. 1024 for **BPF_F_MMAPABLE** (see bpf.h UAPI header for existing
UAPI header for existing flags). flags).
To create maps of type array-of-maps or hash-of-maps, the To create maps of type array-of-maps or hash-of-maps, the **inner_map**
**inner_map** keyword must be used to pass an inner map. The keyword must be used to pass an inner map. The kernel needs it to collect
kernel needs it to collect metadata related to the inner maps metadata related to the inner maps that the new map will work with.
that the new map will work with.
Keyword **offload_dev** expects a network interface name, Keyword **offload_dev** expects a network interface name, and is used to
and is used to request hardware offload for the map. request hardware offload for the map.
**bpftool map dump** *MAP* bpftool map dump *MAP*
Dump all entries in a given *MAP*. In case of **name**, Dump all entries in a given *MAP*. In case of **name**, *MAP* may match
*MAP* may match several maps which will all be dumped. several maps which will all be dumped.
**bpftool map update** *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*] bpftool map update *MAP* [key *DATA*] [value *VALUE*] [*UPDATE_FLAGS*]
Update map entry for a given *KEY*. Update map entry for a given *KEY*.
*UPDATE_FLAGS* can be one of: **any** update existing entry *UPDATE_FLAGS* can be one of: **any** update existing entry or add if
or add if doesn't exit; **exist** update only if entry already doesn't exit; **exist** update only if entry already exists; **noexist**
exists; **noexist** update only if entry doesn't exist. update only if entry doesn't exist.
If the **hex** keyword is provided in front of the bytes If the **hex** keyword is provided in front of the bytes sequence, the
sequence, the bytes are parsed as hexadecimal values, even if bytes are parsed as hexadecimal values, even if no "0x" prefix is added. If
no "0x" prefix is added. If the keyword is not provided, then the keyword is not provided, then the bytes are parsed as decimal values,
the bytes are parsed as decimal values, unless a "0x" prefix unless a "0x" prefix (for hexadecimal) or a "0" prefix (for octal) is
(for hexadecimal) or a "0" prefix (for octal) is provided. provided.
**bpftool map lookup** *MAP* [**key** *DATA*] bpftool map lookup *MAP* [key *DATA*]
Lookup **key** in the map. Lookup **key** in the map.
**bpftool map getnext** *MAP* [**key** *DATA*] bpftool map getnext *MAP* [key *DATA*]
Get next key. If *key* is not specified, get first key. Get next key. If *key* is not specified, get first key.
**bpftool map delete** *MAP* **key** *DATA* bpftool map delete *MAP* key *DATA*
Remove entry from the map. Remove entry from the map.
**bpftool map pin** *MAP* *FILE* bpftool map pin *MAP* *FILE*
Pin map *MAP* as *FILE*. Pin map *MAP* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
contain a dot character ('.'), which is reserved for future character ('.'), which is reserved for future extensions of *bpffs*.
extensions of *bpffs*.
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*] bpftool map event_pipe *MAP* [cpu *N* index *M*]
Read events from a **BPF_MAP_TYPE_PERF_EVENT_ARRAY** map. Read events from a **BPF_MAP_TYPE_PERF_EVENT_ARRAY** map.
Install perf rings into a perf event array map and dump Install perf rings into a perf event array map and dump output of any
output of any **bpf_perf_event_output**\ () call in the kernel. **bpf_perf_event_output**\ () call in the kernel. By default read the
By default read the number of CPUs on the system and number of CPUs on the system and install perf ring for each CPU in the
install perf ring for each CPU in the corresponding index corresponding index in the array.
in the array.
If **cpu** and **index** are specified, install perf ring If **cpu** and **index** are specified, install perf ring for given **cpu**
for given **cpu** at **index** in the array (single ring). at **index** in the array (single ring).
Note that installing a perf ring into an array will silently Note that installing a perf ring into an array will silently replace any
replace any existing ring. Any other application will stop existing ring. Any other application will stop receiving events if it
receiving events if it installed its rings earlier. installed its rings earlier.
**bpftool map peek** *MAP* bpftool map peek *MAP*
Peek next value in the queue or stack. Peek next value in the queue or stack.
**bpftool map push** *MAP* **value** *VALUE* bpftool map push *MAP* value *VALUE*
Push *VALUE* onto the stack. Push *VALUE* onto the stack.
**bpftool map pop** *MAP* bpftool map pop *MAP*
Pop and print value from the stack. Pop and print value from the stack.
**bpftool map enqueue** *MAP* **value** *VALUE* bpftool map enqueue *MAP* value *VALUE*
Enqueue *VALUE* into the queue. Enqueue *VALUE* into the queue.
**bpftool map dequeue** *MAP* bpftool map dequeue *MAP*
Dequeue and print value from the queue. Dequeue and print value from the queue.
**bpftool map freeze** *MAP* bpftool map freeze *MAP*
Freeze the map as read-only from user space. Entries from a Freeze the map as read-only from user space. Entries from a frozen map can
frozen map can not longer be updated or deleted with the not longer be updated or deleted with the **bpf**\ () system call. This
**bpf**\ () system call. This operation is not reversible, operation is not reversible, and the map remains immutable from user space
and the map remains immutable from user space until its until its destruction. However, read and write permissions for BPF programs
destruction. However, read and write permissions for BPF to the map remain unchanged.
programs to the map remain unchanged.
**bpftool map help** bpftool map help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-f, --bpffs -f, --bpffs
Show file names of pinned maps. Show file names of pinned maps.
-n, --nomount -n, --nomount
Do not automatically attempt to mount any virtual file system Do not automatically attempt to mount any virtual file system (such as
(such as tracefs or BPF virtual file system) when necessary. tracefs or BPF virtual file system) when necessary.
EXAMPLES EXAMPLES
======== ========

View File

@ -14,76 +14,74 @@ tool for inspection of networking related bpf prog attachments
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **net** *COMMAND* **bpftool** [*OPTIONS*] **net** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| } *OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := *COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
{ **show** | **list** | **attach** | **detach** | **help** }
NET COMMANDS NET COMMANDS
============ ============
| **bpftool** **net** { **show** | **list** } [ **dev** *NAME* ] | **bpftool** **net** { **show** | **list** } [ **dev** *NAME* ]
| **bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ] | **bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
| **bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME* | **bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME*
| **bpftool** **net help** | **bpftool** **net help**
| |
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** } | *ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool net { show | list }** [ **dev** *NAME* ] bpftool net { show | list } [ dev *NAME* ]
List bpf program attachments in the kernel networking subsystem. List bpf program attachments in the kernel networking subsystem.
Currently, device driver xdp attachments, tcx, netkit and old-style tc Currently, device driver xdp attachments, tcx, netkit and old-style tc
classifier/action attachments, flow_dissector as well as netfilter classifier/action attachments, flow_dissector as well as netfilter
attachments are implemented, i.e., for attachments are implemented, i.e., for program types **BPF_PROG_TYPE_XDP**,
program types **BPF_PROG_TYPE_XDP**, **BPF_PROG_TYPE_SCHED_CLS**, **BPF_PROG_TYPE_SCHED_CLS**, **BPF_PROG_TYPE_SCHED_ACT**,
**BPF_PROG_TYPE_SCHED_ACT**, **BPF_PROG_TYPE_FLOW_DISSECTOR**, **BPF_PROG_TYPE_FLOW_DISSECTOR**, **BPF_PROG_TYPE_NETFILTER**.
**BPF_PROG_TYPE_NETFILTER**.
For programs attached to a particular cgroup, e.g., For programs attached to a particular cgroup, e.g.,
**BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**, **BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
**BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**, **BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**, users
users can use **bpftool cgroup** to dump cgroup attachments. can use **bpftool cgroup** to dump cgroup attachments. For sk_{filter, skb,
For sk_{filter, skb, msg, reuseport} and lwt/seg6 msg, reuseport} and lwt/seg6 bpf programs, users should consult other
bpf programs, users should consult other tools, e.g., iproute2. tools, e.g., iproute2.
The current output will start with all xdp program attachments, followed by The current output will start with all xdp program attachments, followed by
all tcx, netkit, then tc class/qdisc bpf program attachments, then flow_dissector all tcx, netkit, then tc class/qdisc bpf program attachments, then
and finally netfilter programs. Both xdp programs and tcx/netkit/tc programs are flow_dissector and finally netfilter programs. Both xdp programs and
ordered based on ifindex number. If multiple bpf programs attached tcx/netkit/tc programs are ordered based on ifindex number. If multiple bpf
to the same networking device through **tc**, the order will be first programs attached to the same networking device through **tc**, the order
all bpf programs attached to tcx, netkit, then tc classes, then all bpf programs will be first all bpf programs attached to tcx, netkit, then tc classes,
attached to non clsact qdiscs, and finally all bpf programs attached then all bpf programs attached to non clsact qdiscs, and finally all bpf
to root and clsact qdisc. programs attached to root and clsact qdisc.
**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ] bpftool net attach *ATTACH_TYPE* *PROG* dev *NAME* [ overwrite ]
Attach bpf program *PROG* to network interface *NAME* with Attach bpf program *PROG* to network interface *NAME* with type specified
type specified by *ATTACH_TYPE*. Previously attached bpf program by *ATTACH_TYPE*. Previously attached bpf program can be replaced by the
can be replaced by the command used with **overwrite** option. command used with **overwrite** option. Currently, only XDP-related modes
Currently, only XDP-related modes are supported for *ATTACH_TYPE*. are supported for *ATTACH_TYPE*.
*ATTACH_TYPE* can be of: *ATTACH_TYPE* can be of:
**xdp** - try native XDP and fallback to generic XDP if NIC driver does not support it; **xdp** - try native XDP and fallback to generic XDP if NIC driver does not support it;
**xdpgeneric** - Generic XDP. runs at generic XDP hook when packet already enters receive path as skb; **xdpgeneric** - Generic XDP. runs at generic XDP hook when packet already enters receive path as skb;
**xdpdrv** - Native XDP. runs earliest point in driver's receive path; **xdpdrv** - Native XDP. runs earliest point in driver's receive path;
**xdpoffload** - Offload XDP. runs directly on NIC on each packet reception; **xdpoffload** - Offload XDP. runs directly on NIC on each packet reception;
**bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME* bpftool net detach *ATTACH_TYPE* dev *NAME*
Detach bpf program attached to network interface *NAME* with Detach bpf program attached to network interface *NAME* with type specified
type specified by *ATTACH_TYPE*. To detach bpf program, same by *ATTACH_TYPE*. To detach bpf program, same *ATTACH_TYPE* previously used
*ATTACH_TYPE* previously used for attach must be specified. for attach must be specified. Currently, only XDP-related modes are
Currently, only XDP-related modes are supported for *ATTACH_TYPE*. supported for *ATTACH_TYPE*.
**bpftool net help** bpftool net help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
EXAMPLES EXAMPLES
======== ========

View File

@ -14,37 +14,37 @@ tool for inspection of perf related bpf prog attachments
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **perf** *COMMAND* **bpftool** [*OPTIONS*] **perf** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| } *OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := *COMMANDS* :=
{ **show** | **list** | **help** } { **show** | **list** | **help** }
PERF COMMANDS PERF COMMANDS
============= =============
| **bpftool** **perf** { **show** | **list** } | **bpftool** **perf** { **show** | **list** }
| **bpftool** **perf help** | **bpftool** **perf help**
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool perf { show | list }** bpftool perf { show | list }
List all raw_tracepoint, tracepoint, kprobe attachment in the system. List all raw_tracepoint, tracepoint, kprobe attachment in the system.
Output will start with process id and file descriptor in that process, Output will start with process id and file descriptor in that process,
followed by bpf program id, attachment information, and attachment point. followed by bpf program id, attachment information, and attachment point.
The attachment point for raw_tracepoint/tracepoint is the trace probe name. The attachment point for raw_tracepoint/tracepoint is the trace probe name.
The attachment point for k[ret]probe is either symbol name and offset, The attachment point for k[ret]probe is either symbol name and offset, or a
or a kernel virtual address. kernel virtual address. The attachment point for u[ret]probe is the file
The attachment point for u[ret]probe is the file name and the file offset. name and the file offset.
**bpftool perf help** bpftool perf help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
EXAMPLES EXAMPLES
======== ========

View File

@ -14,250 +14,226 @@ tool for inspection and simple manipulation of eBPF progs
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **prog** *COMMAND* **bpftool** [*OPTIONS*] **prog** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| | *OPTIONS* := { |COMMON_OPTIONS| |
{ **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } | { **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } |
{ **-L** | **--use-loader** } } { **-L** | **--use-loader** } }
*COMMANDS* := *COMMANDS* :=
{ **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** | { **show** | **list** | **dump xlated** | **dump jited** | **pin** | **load** |
**loadall** | **help** } **loadall** | **help** }
PROG COMMANDS PROG COMMANDS
============= =============
| **bpftool** **prog** { **show** | **list** } [*PROG*] | **bpftool** **prog** { **show** | **list** } [*PROG*]
| **bpftool** **prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }] | **bpftool** **prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }]
| **bpftool** **prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }] | **bpftool** **prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }]
| **bpftool** **prog pin** *PROG* *FILE* | **bpftool** **prog pin** *PROG* *FILE*
| **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] | **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**]
| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*] | **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*] | **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog tracelog** | **bpftool** **prog tracelog**
| **bpftool** **prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*] | **bpftool** **prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*]
| **bpftool** **prog profile** *PROG* [**duration** *DURATION*] *METRICs* | **bpftool** **prog profile** *PROG* [**duration** *DURATION*] *METRICs*
| **bpftool** **prog help** | **bpftool** **prog help**
| |
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* } | *MAP* := { **id** *MAP_ID* | **pinned** *FILE* | **name** *MAP_NAME* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* } | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
| *TYPE* := { | *TYPE* := {
| **socket** | **kprobe** | **kretprobe** | **classifier** | **action** | | **socket** | **kprobe** | **kretprobe** | **classifier** | **action** |
| **tracepoint** | **raw_tracepoint** | **xdp** | **perf_event** | **cgroup/skb** | | **tracepoint** | **raw_tracepoint** | **xdp** | **perf_event** | **cgroup/skb** |
| **cgroup/sock** | **cgroup/dev** | **lwt_in** | **lwt_out** | **lwt_xmit** | | **cgroup/sock** | **cgroup/dev** | **lwt_in** | **lwt_out** | **lwt_xmit** |
| **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** | | **lwt_seg6local** | **sockops** | **sk_skb** | **sk_msg** | **lirc_mode2** |
| **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** | | **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
| **cgroup/connect4** | **cgroup/connect6** | **cgroup/connect_unix** | | **cgroup/connect4** | **cgroup/connect6** | **cgroup/connect_unix** |
| **cgroup/getpeername4** | **cgroup/getpeername6** | **cgroup/getpeername_unix** | | **cgroup/getpeername4** | **cgroup/getpeername6** | **cgroup/getpeername_unix** |
| **cgroup/getsockname4** | **cgroup/getsockname6** | **cgroup/getsockname_unix** | | **cgroup/getsockname4** | **cgroup/getsockname6** | **cgroup/getsockname_unix** |
| **cgroup/sendmsg4** | **cgroup/sendmsg6** | **cgroup/sendmsg_unix** | | **cgroup/sendmsg4** | **cgroup/sendmsg6** | **cgroup/sendmsg_unix** |
| **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/recvmsg_unix** | **cgroup/sysctl** | | **cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/recvmsg_unix** | **cgroup/sysctl** |
| **cgroup/getsockopt** | **cgroup/setsockopt** | **cgroup/sock_release** | | **cgroup/getsockopt** | **cgroup/setsockopt** | **cgroup/sock_release** |
| **struct_ops** | **fentry** | **fexit** | **freplace** | **sk_lookup** | **struct_ops** | **fentry** | **fexit** | **freplace** | **sk_lookup**
| } | }
| *ATTACH_TYPE* := { | *ATTACH_TYPE* := {
| **sk_msg_verdict** | **sk_skb_verdict** | **sk_skb_stream_verdict** | | **sk_msg_verdict** | **sk_skb_verdict** | **sk_skb_stream_verdict** |
| **sk_skb_stream_parser** | **flow_dissector** | **sk_skb_stream_parser** | **flow_dissector**
| } | }
| *METRICs* := { | *METRICs* := {
| **cycles** | **instructions** | **l1d_loads** | **llc_misses** | | **cycles** | **instructions** | **l1d_loads** | **llc_misses** |
| **itlb_misses** | **dtlb_misses** | **itlb_misses** | **dtlb_misses**
| } | }
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool prog { show | list }** [*PROG*] bpftool prog { show | list } [*PROG*]
Show information about loaded programs. If *PROG* is Show information about loaded programs. If *PROG* is specified show
specified show information only about given programs, information only about given programs, otherwise list all programs
otherwise list all programs currently loaded on the system. currently loaded on the system. In case of **tag** or **name**, *PROG* may
In case of **tag** or **name**, *PROG* may match several match several programs which will all be shown.
programs which will all be shown.
Output will start with program ID followed by program type and Output will start with program ID followed by program type and zero or more
zero or more named attributes (depending on kernel version). named attributes (depending on kernel version).
Since Linux 5.1 the kernel can collect statistics on BPF Since Linux 5.1 the kernel can collect statistics on BPF programs (such as
programs (such as the total time spent running the program, the total time spent running the program, and the number of times it was
and the number of times it was run). If available, bpftool run). If available, bpftool shows such statistics. However, the kernel does
shows such statistics. However, the kernel does not collect not collect them by defaults, as it slightly impacts performance on each
them by defaults, as it slightly impacts performance on each program run. Activation or deactivation of the feature is performed via the
program run. Activation or deactivation of the feature is **kernel.bpf_stats_enabled** sysctl knob.
performed via the **kernel.bpf_stats_enabled** sysctl knob.
Since Linux 5.8 bpftool is able to discover information about Since Linux 5.8 bpftool is able to discover information about processes
processes that hold open file descriptors (FDs) against BPF that hold open file descriptors (FDs) against BPF programs. On such kernels
programs. On such kernels bpftool will automatically emit this bpftool will automatically emit this information as well.
information as well.
**bpftool prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }] bpftool prog dump xlated *PROG* [{ file *FILE* | [opcodes] [linum] [visual] }]
Dump eBPF instructions of the programs from the kernel. By Dump eBPF instructions of the programs from the kernel. By default, eBPF
default, eBPF will be disassembled and printed to standard will be disassembled and printed to standard output in human-readable
output in human-readable format. In this case, **opcodes** format. In this case, **opcodes** controls if raw opcodes should be printed
controls if raw opcodes should be printed as well. as well.
In case of **tag** or **name**, *PROG* may match several In case of **tag** or **name**, *PROG* may match several programs which
programs which will all be dumped. However, if **file** or will all be dumped. However, if **file** or **visual** is specified,
**visual** is specified, *PROG* must match a single program. *PROG* must match a single program.
If **file** is specified, the binary image will instead be If **file** is specified, the binary image will instead be written to
written to *FILE*. *FILE*.
If **visual** is specified, control flow graph (CFG) will be If **visual** is specified, control flow graph (CFG) will be built instead,
built instead, and eBPF instructions will be presented with and eBPF instructions will be presented with CFG in DOT format, on standard
CFG in DOT format, on standard output. output.
If the programs have line_info available, the source line will If the programs have line_info available, the source line will be
be displayed. If **linum** is specified, the filename, line displayed. If **linum** is specified, the filename, line number and line
number and line column will also be displayed. column will also be displayed.
**bpftool prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }] bpftool prog dump jited *PROG* [{ file *FILE* | [opcodes] [linum] }]
Dump jited image (host machine code) of the program. Dump jited image (host machine code) of the program.
If *FILE* is specified image will be written to a file, If *FILE* is specified image will be written to a file, otherwise it will
otherwise it will be disassembled and printed to stdout. be disassembled and printed to stdout. *PROG* must match a single program
*PROG* must match a single program when **file** is specified. when **file** is specified.
**opcodes** controls if raw opcodes will be printed. **opcodes** controls if raw opcodes will be printed.
If the prog has line_info available, the source line will If the prog has line_info available, the source line will be displayed. If
be displayed. If **linum** is specified, the filename, line **linum** is specified, the filename, line number and line column will also
number and line column will also be displayed. be displayed.
**bpftool prog pin** *PROG* *FILE* bpftool prog pin *PROG* *FILE*
Pin program *PROG* as *FILE*. Pin program *PROG* as *FILE*.
Note: *FILE* must be located in *bpffs* mount. It must not Note: *FILE* must be located in *bpffs* mount. It must not contain a dot
contain a dot character ('.'), which is reserved for future character ('.'), which is reserved for future extensions of *bpffs*.
extensions of *bpffs*.
**bpftool prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] bpftool prog { load | loadall } *OBJ* *PATH* [type *TYPE*] [map { idx *IDX* | name *NAME* } *MAP*] [{ offload_dev | xdpmeta_dev } *NAME*] [pinmaps *MAP_DIR*] [autoattach]
Load bpf program(s) from binary *OBJ* and pin as *PATH*. Load bpf program(s) from binary *OBJ* and pin as *PATH*. **bpftool prog
**bpftool prog load** pins only the first program from the load** pins only the first program from the *OBJ* as *PATH*. **bpftool prog
*OBJ* as *PATH*. **bpftool prog loadall** pins all programs loadall** pins all programs from the *OBJ* under *PATH* directory. **type**
from the *OBJ* under *PATH* directory. is optional, if not specified program type will be inferred from section
**type** is optional, if not specified program type will be names. By default bpftool will create new maps as declared in the ELF
inferred from section names. object being loaded. **map** parameter allows for the reuse of existing
By default bpftool will create new maps as declared in the ELF maps. It can be specified multiple times, each time for a different map.
object being loaded. **map** parameter allows for the reuse *IDX* refers to index of the map to be replaced in the ELF file counting
of existing maps. It can be specified multiple times, each from 0, while *NAME* allows to replace a map by name. *MAP* specifies the
time for a different map. *IDX* refers to index of the map map to use, referring to it by **id** or through a **pinned** file. If
to be replaced in the ELF file counting from 0, while *NAME* **offload_dev** *NAME* is specified program will be loaded onto given
allows to replace a map by name. *MAP* specifies the map to networking device (offload). If **xdpmeta_dev** *NAME* is specified program
use, referring to it by **id** or through a **pinned** file. will become device-bound without offloading, this facilitates access to XDP
If **offload_dev** *NAME* is specified program will be loaded metadata. Optional **pinmaps** argument can be provided to pin all maps
onto given networking device (offload). under *MAP_DIR* directory.
If **xdpmeta_dev** *NAME* is specified program will become
device-bound without offloading, this facilitates access
to XDP metadata.
Optional **pinmaps** argument can be provided to pin all
maps under *MAP_DIR* directory.
If **autoattach** is specified program will be attached If **autoattach** is specified program will be attached before pin. In that
before pin. In that case, only the link (representing the case, only the link (representing the program attached to its hook) is
program attached to its hook) is pinned, not the program as pinned, not the program as such, so the path won't show in **bpftool prog
such, so the path won't show in **bpftool prog show -f**, show -f**, only show in **bpftool link show -f**. Also, this only works
only show in **bpftool link show -f**. Also, this only works when bpftool (libbpf) is able to infer all necessary information from the
when bpftool (libbpf) is able to infer all necessary object file, in particular, it's not supported for all program types. If a
information from the object file, in particular, it's not program does not support autoattach, bpftool falls back to regular pinning
supported for all program types. If a program does not for that program instead.
support autoattach, bpftool falls back to regular pinning
for that program instead.
Note: *PATH* must be located in *bpffs* mount. It must not Note: *PATH* must be located in *bpffs* mount. It must not contain a dot
contain a dot character ('.'), which is reserved for future character ('.'), which is reserved for future extensions of *bpffs*.
extensions of *bpffs*.
**bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*] bpftool prog attach *PROG* *ATTACH_TYPE* [*MAP*]
Attach bpf program *PROG* (with type specified by Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*). Most
*ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP* *ATTACH_TYPEs* require a *MAP* parameter, with the exception of
parameter, with the exception of *flow_dissector* which is *flow_dissector* which is attached to current networking name space.
attached to current networking name space.
**bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*] bpftool prog detach *PROG* *ATTACH_TYPE* [*MAP*]
Detach bpf program *PROG* (with type specified by Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*). Most
*ATTACH_TYPE*). Most *ATTACH_TYPEs* require a *MAP* *ATTACH_TYPEs* require a *MAP* parameter, with the exception of
parameter, with the exception of *flow_dissector* which is *flow_dissector* which is detached from the current networking name space.
detached from the current networking name space.
**bpftool prog tracelog** bpftool prog tracelog
Dump the trace pipe of the system to the console (stdout). Dump the trace pipe of the system to the console (stdout). Hit <Ctrl+C> to
Hit <Ctrl+C> to stop printing. BPF programs can write to this stop printing. BPF programs can write to this trace pipe at runtime with
trace pipe at runtime with the **bpf_trace_printk**\ () helper. the **bpf_trace_printk**\ () helper. This should be used only for debugging
This should be used only for debugging purposes. For purposes. For streaming data from BPF programs to user space, one can use
streaming data from BPF programs to user space, one can use perf events (see also **bpftool-map**\ (8)).
perf events (see also **bpftool-map**\ (8)).
**bpftool prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*] bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
Run BPF program *PROG* in the kernel testing infrastructure Run BPF program *PROG* in the kernel testing infrastructure for BPF,
for BPF, meaning that the program works on the data and meaning that the program works on the data and context provided by the
context provided by the user, and not on actual packets or user, and not on actual packets or monitored functions etc. Return value
monitored functions etc. Return value and duration for the and duration for the test run are printed out to the console.
test run are printed out to the console.
Input data is read from the *FILE* passed with **data_in**. Input data is read from the *FILE* passed with **data_in**. If this *FILE*
If this *FILE* is "**-**", input data is read from standard is "**-**", input data is read from standard input. Input context, if any,
input. Input context, if any, is read from *FILE* passed with is read from *FILE* passed with **ctx_in**. Again, "**-**" can be used to
**ctx_in**. Again, "**-**" can be used to read from standard read from standard input, but only if standard input is not already in use
input, but only if standard input is not already in use for for input data. If a *FILE* is passed with **data_out**, output data is
input data. If a *FILE* is passed with **data_out**, output written to that file. Similarly, output context is written to the *FILE*
data is written to that file. Similarly, output context is passed with **ctx_out**. For both output flows, "**-**" can be used to
written to the *FILE* passed with **ctx_out**. For both print to the standard output (as plain text, or JSON if relevant option was
output flows, "**-**" can be used to print to the standard passed). If output keywords are omitted, output data and context are
output (as plain text, or JSON if relevant option was discarded. Keywords **data_size_out** and **ctx_size_out** are used to pass
passed). If output keywords are omitted, output data and the size (in bytes) for the output buffers to the kernel, although the
context are discarded. Keywords **data_size_out** and default of 32 kB should be more than enough for most cases.
**ctx_size_out** are used to pass the size (in bytes) for the
output buffers to the kernel, although the default of 32 kB
should be more than enough for most cases.
Keyword **repeat** is used to indicate the number of Keyword **repeat** is used to indicate the number of consecutive runs to
consecutive runs to perform. Note that output data and perform. Note that output data and context printed to files correspond to
context printed to files correspond to the last of those the last of those runs. The duration printed out at the end of the runs is
runs. The duration printed out at the end of the runs is an an average over all runs performed by the command.
average over all runs performed by the command.
Not all program types support test run. Among those which do, Not all program types support test run. Among those which do, not all of
not all of them can take the **ctx_in**/**ctx_out** them can take the **ctx_in**/**ctx_out** arguments. bpftool does not
arguments. bpftool does not perform checks on program types. perform checks on program types.
**bpftool prog profile** *PROG* [**duration** *DURATION*] *METRICs* bpftool prog profile *PROG* [duration *DURATION*] *METRICs*
Profile *METRICs* for bpf program *PROG* for *DURATION* Profile *METRICs* for bpf program *PROG* for *DURATION* seconds or until
seconds or until user hits <Ctrl+C>. *DURATION* is optional. user hits <Ctrl+C>. *DURATION* is optional. If *DURATION* is not specified,
If *DURATION* is not specified, the profiling will run up to the profiling will run up to **UINT_MAX** seconds.
**UINT_MAX** seconds.
**bpftool prog help** bpftool prog help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-f, --bpffs -f, --bpffs
When showing BPF programs, show file names of pinned When showing BPF programs, show file names of pinned programs.
programs.
-m, --mapcompat -m, --mapcompat
Allow loading maps with unknown map definitions. Allow loading maps with unknown map definitions.
-n, --nomount -n, --nomount
Do not automatically attempt to mount any virtual file system Do not automatically attempt to mount any virtual file system (such as
(such as tracefs or BPF virtual file system) when necessary. tracefs or BPF virtual file system) when necessary.
-L, --use-loader -L, --use-loader
Load program as a "loader" program. This is useful to debug Load program as a "loader" program. This is useful to debug the generation
the generation of such programs. When this option is in of such programs. When this option is in use, bpftool attempts to load the
use, bpftool attempts to load the programs from the object programs from the object file into the kernel, but does not pin them
file into the kernel, but does not pin them (therefore, the (therefore, the *PATH* must not be provided).
*PATH* must not be provided).
When combined with the **-d**\ \|\ **--debug** option, When combined with the **-d**\ \|\ **--debug** option, additional debug
additional debug messages are generated, and the execution messages are generated, and the execution of the loader program will use
of the loader program will use the **bpf_trace_printk**\ () the **bpf_trace_printk**\ () helper to log each step of loading BTF,
helper to log each step of loading BTF, creating the maps, creating the maps, and loading the programs (see **bpftool prog tracelog**
and loading the programs (see **bpftool prog tracelog** as as a way to dump those messages).
a way to dump those messages).
EXAMPLES EXAMPLES
======== ========

View File

@ -14,61 +14,60 @@ tool to register/unregister/introspect BPF struct_ops
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] **struct_ops** *COMMAND* **bpftool** [*OPTIONS*] **struct_ops** *COMMAND*
*OPTIONS* := { |COMMON_OPTIONS| } *OPTIONS* := { |COMMON_OPTIONS| }
*COMMANDS* := *COMMANDS* :=
{ **show** | **list** | **dump** | **register** | **unregister** | **help** } { **show** | **list** | **dump** | **register** | **unregister** | **help** }
STRUCT_OPS COMMANDS STRUCT_OPS COMMANDS
=================== ===================
| **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*] | **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*] | **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*]
| **bpftool** **struct_ops register** *OBJ* [*LINK_DIR*] | **bpftool** **struct_ops register** *OBJ* [*LINK_DIR*]
| **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP* | **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP*
| **bpftool** **struct_ops help** | **bpftool** **struct_ops help**
| |
| *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* } | *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* }
| *OBJ* := /a/file/of/bpf_struct_ops.o | *OBJ* := /a/file/of/bpf_struct_ops.o
DESCRIPTION DESCRIPTION
=========== ===========
**bpftool struct_ops { show | list }** [*STRUCT_OPS_MAP*] bpftool struct_ops { show | list } [*STRUCT_OPS_MAP*]
Show brief information about the struct_ops in the system. Show brief information about the struct_ops in the system. If
If *STRUCT_OPS_MAP* is specified, it shows information only *STRUCT_OPS_MAP* is specified, it shows information only for the given
for the given struct_ops. Otherwise, it lists all struct_ops struct_ops. Otherwise, it lists all struct_ops currently existing in the
currently existing in the system. system.
Output will start with struct_ops map ID, followed by its map Output will start with struct_ops map ID, followed by its map name and its
name and its struct_ops's kernel type. struct_ops's kernel type.
**bpftool struct_ops dump** [*STRUCT_OPS_MAP*] bpftool struct_ops dump [*STRUCT_OPS_MAP*]
Dump details information about the struct_ops in the system. Dump details information about the struct_ops in the system. If
If *STRUCT_OPS_MAP* is specified, it dumps information only *STRUCT_OPS_MAP* is specified, it dumps information only for the given
for the given struct_ops. Otherwise, it dumps all struct_ops struct_ops. Otherwise, it dumps all struct_ops currently existing in the
currently existing in the system. system.
**bpftool struct_ops register** *OBJ* [*LINK_DIR*] bpftool struct_ops register *OBJ* [*LINK_DIR*]
Register bpf struct_ops from *OBJ*. All struct_ops under Register bpf struct_ops from *OBJ*. All struct_ops under the ELF section
the ELF section ".struct_ops" and ".struct_ops.link" will ".struct_ops" and ".struct_ops.link" will be registered to its kernel
be registered to its kernel subsystem. For each subsystem. For each struct_ops in the ".struct_ops.link" section, a link
struct_ops in the ".struct_ops.link" section, a link will be created. You can give *LINK_DIR* to provide a directory path where
will be created. You can give *LINK_DIR* to provide a these links will be pinned with the same name as their corresponding map
directory path where these links will be pinned with the name.
same name as their corresponding map name.
**bpftool struct_ops unregister** *STRUCT_OPS_MAP* bpftool struct_ops unregister *STRUCT_OPS_MAP*
Unregister the *STRUCT_OPS_MAP* from the kernel subsystem. Unregister the *STRUCT_OPS_MAP* from the kernel subsystem.
**bpftool struct_ops help** bpftool struct_ops help
Print short help message. Print short help message.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
EXAMPLES EXAMPLES
======== ========

View File

@ -14,57 +14,57 @@ tool for inspection and simple manipulation of eBPF programs and maps
SYNOPSIS SYNOPSIS
======== ========
**bpftool** [*OPTIONS*] *OBJECT* { *COMMAND* | **help** } **bpftool** [*OPTIONS*] *OBJECT* { *COMMAND* | **help** }
**bpftool** **batch file** *FILE* **bpftool** **batch file** *FILE*
**bpftool** **version** **bpftool** **version**
*OBJECT* := { **map** | **prog** | **link** | **cgroup** | **perf** | **net** | **feature** | *OBJECT* := { **map** | **prog** | **link** | **cgroup** | **perf** | **net** | **feature** |
**btf** | **gen** | **struct_ops** | **iter** } **btf** | **gen** | **struct_ops** | **iter** }
*OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| } *OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| }
*MAP-COMMANDS* := *MAP-COMMANDS* :=
{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** | { **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |
**delete** | **pin** | **event_pipe** | **help** } **delete** | **pin** | **event_pipe** | **help** }
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** | *PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin** |
**load** | **attach** | **detach** | **help** } **load** | **attach** | **detach** | **help** }
*LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** } *LINK-COMMANDS* := { **show** | **list** | **pin** | **detach** | **help** }
*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** } *CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
*PERF-COMMANDS* := { **show** | **list** | **help** } *PERF-COMMANDS* := { **show** | **list** | **help** }
*NET-COMMANDS* := { **show** | **list** | **help** } *NET-COMMANDS* := { **show** | **list** | **help** }
*FEATURE-COMMANDS* := { **probe** | **help** } *FEATURE-COMMANDS* := { **probe** | **help** }
*BTF-COMMANDS* := { **show** | **list** | **dump** | **help** } *BTF-COMMANDS* := { **show** | **list** | **dump** | **help** }
*GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** } *GEN-COMMANDS* := { **object** | **skeleton** | **min_core_btf** | **help** }
*STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** } *STRUCT-OPS-COMMANDS* := { **show** | **list** | **dump** | **register** | **unregister** | **help** }
*ITER-COMMANDS* := { **pin** | **help** } *ITER-COMMANDS* := { **pin** | **help** }
DESCRIPTION DESCRIPTION
=========== ===========
*bpftool* allows for inspection and simple modification of BPF objects *bpftool* allows for inspection and simple modification of BPF objects on the
on the system. system.
Note that format of the output of all tools is not guaranteed to be Note that format of the output of all tools is not guaranteed to be stable and
stable and should not be depended upon. should not be depended upon.
OPTIONS OPTIONS
======= =======
.. include:: common_options.rst .. include:: common_options.rst
-m, --mapcompat -m, --mapcompat
Allow loading maps with unknown map definitions. Allow loading maps with unknown map definitions.
-n, --nomount -n, --nomount
Do not automatically attempt to mount any virtual file system Do not automatically attempt to mount any virtual file system (such as
(such as tracefs or BPF virtual file system) when necessary. tracefs or BPF virtual file system) when necessary.

View File

@ -1,25 +1,23 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
-h, --help -h, --help
Print short help message (similar to **bpftool help**). Print short help message (similar to **bpftool help**).
-V, --version -V, --version
Print bpftool's version number (similar to **bpftool version**), the Print bpftool's version number (similar to **bpftool version**), the number
number of the libbpf version in use, and optional features that were of the libbpf version in use, and optional features that were included when
included when bpftool was compiled. Optional features include linking bpftool was compiled. Optional features include linking against LLVM or
against LLVM or libbfd to provide the disassembler for JIT-ted libbfd to provide the disassembler for JIT-ted programs (**bpftool prog
programs (**bpftool prog dump jited**) and usage of BPF skeletons dump jited**) and usage of BPF skeletons (some features like **bpftool prog
(some features like **bpftool prog profile** or showing pids profile** or showing pids associated to BPF objects may rely on it).
associated to BPF objects may rely on it).
-j, --json -j, --json
Generate JSON output. For commands that cannot produce JSON, this Generate JSON output. For commands that cannot produce JSON, this option
option has no effect. has no effect.
-p, --pretty -p, --pretty
Generate human-readable JSON output. Implies **-j**. Generate human-readable JSON output. Implies **-j**.
-d, --debug -d, --debug
Print all logs available, even debug-level information. This includes Print all logs available, even debug-level information. This includes logs
logs from libbpf as well as from the verifier, when attempting to from libbpf as well as from the verifier, when attempting to load programs.
load programs.

View File

@ -106,19 +106,19 @@ _bpftool_get_link_ids()
_bpftool_get_obj_map_names() _bpftool_get_obj_map_names()
{ {
local obj local obj maps
obj=$1 obj=$1
maps=$(objdump -j maps -t $obj 2>/dev/null | \ maps=$(objdump -j .maps -t $obj 2>/dev/null | \
command awk '/g . maps/ {print $NF}') command awk '/g . .maps/ {print $NF}')
COMPREPLY+=( $( compgen -W "$maps" -- "$cur" ) ) COMPREPLY+=( $( compgen -W "$maps" -- "$cur" ) )
} }
_bpftool_get_obj_map_idxs() _bpftool_get_obj_map_idxs()
{ {
local obj local obj nmaps
obj=$1 obj=$1
@ -136,7 +136,7 @@ _sysfs_get_netdevs()
# Retrieve type of the map that we are operating on. # Retrieve type of the map that we are operating on.
_bpftool_map_guess_map_type() _bpftool_map_guess_map_type()
{ {
local keyword ref local keyword idx ref=""
for (( idx=3; idx < ${#words[@]}-1; idx++ )); do for (( idx=3; idx < ${#words[@]}-1; idx++ )); do
case "${words[$((idx-2))]}" in case "${words[$((idx-2))]}" in
lookup|update) lookup|update)
@ -255,8 +255,9 @@ _bpftool_map_update_get_name()
_bpftool() _bpftool()
{ {
local cur prev words objword json=0 local cur prev words cword comp_args
_init_completion || return local json=0
_init_completion -- "$@" || return
# Deal with options # Deal with options
if [[ ${words[cword]} == -* ]]; then if [[ ${words[cword]} == -* ]]; then
@ -293,7 +294,7 @@ _bpftool()
esac esac
# Remove all options so completions don't have to deal with them. # Remove all options so completions don't have to deal with them.
local i local i pprev
for (( i=1; i < ${#words[@]}; )); do for (( i=1; i < ${#words[@]}; )); do
if [[ ${words[i]::1} == - ]] && if [[ ${words[i]::1} == - ]] &&
[[ ${words[i]} != "-B" ]] && [[ ${words[i]} != "--base-btf" ]]; then [[ ${words[i]} != "-B" ]] && [[ ${words[i]} != "--base-btf" ]]; then
@ -307,7 +308,7 @@ _bpftool()
prev=${words[cword - 1]} prev=${words[cword - 1]}
pprev=${words[cword - 2]} pprev=${words[cword - 2]}
local object=${words[1]} command=${words[2]} local object=${words[1]}
if [[ -z $object || $cword -eq 1 ]]; then if [[ -z $object || $cword -eq 1 ]]; then
case $cur in case $cur in
@ -324,8 +325,12 @@ _bpftool()
esac esac
fi fi
local command=${words[2]}
[[ $command == help ]] && return 0 [[ $command == help ]] && return 0
local MAP_TYPE='id pinned name'
local PROG_TYPE='id pinned tag name'
# Completion depends on object and command in use # Completion depends on object and command in use
case $object in case $object in
prog) prog)
@ -346,8 +351,6 @@ _bpftool()
;; ;;
esac esac
local PROG_TYPE='id pinned tag name'
local MAP_TYPE='id pinned name'
local METRIC_TYPE='cycles instructions l1d_loads llc_misses \ local METRIC_TYPE='cycles instructions l1d_loads llc_misses \
itlb_misses dtlb_misses' itlb_misses dtlb_misses'
case $command in case $command in
@ -457,7 +460,7 @@ _bpftool()
obj=${words[3]} obj=${words[3]}
if [[ ${words[-4]} == "map" ]]; then if [[ ${words[-4]} == "map" ]]; then
COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) ) COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
return 0 return 0
fi fi
if [[ ${words[-3]} == "map" ]]; then if [[ ${words[-3]} == "map" ]]; then
@ -541,20 +544,9 @@ _bpftool()
COMPREPLY=( $( compgen -W "$METRIC_TYPE duration" -- "$cur" ) ) COMPREPLY=( $( compgen -W "$METRIC_TYPE duration" -- "$cur" ) )
return 0 return 0
;; ;;
6)
case $prev in
duration)
return 0
;;
*)
COMPREPLY=( $( compgen -W "$METRIC_TYPE" -- "$cur" ) )
return 0
;;
esac
return 0
;;
*) *)
COMPREPLY=( $( compgen -W "$METRIC_TYPE" -- "$cur" ) ) [[ $prev == duration ]] && return 0
_bpftool_once_attr "$METRIC_TYPE"
return 0 return 0
;; ;;
esac esac
@ -612,7 +604,7 @@ _bpftool()
return 0 return 0
;; ;;
register) register)
_filedir [[ $prev == $command ]] && _filedir
return 0 return 0
;; ;;
*) *)
@ -638,9 +630,12 @@ _bpftool()
pinned) pinned)
_filedir _filedir
;; ;;
*) map)
_bpftool_one_of_list $MAP_TYPE _bpftool_one_of_list $MAP_TYPE
;; ;;
*)
_bpftool_once_attr 'map'
;;
esac esac
return 0 return 0
;; ;;
@ -652,7 +647,6 @@ _bpftool()
esac esac
;; ;;
map) map)
local MAP_TYPE='id pinned name'
case $command in case $command in
show|list|dump|peek|pop|dequeue|freeze) show|list|dump|peek|pop|dequeue|freeze)
case $prev in case $prev in
@ -793,13 +787,11 @@ _bpftool()
# map, depending on the type of the map to update. # map, depending on the type of the map to update.
case "$(_bpftool_map_guess_map_type)" in case "$(_bpftool_map_guess_map_type)" in
array_of_maps|hash_of_maps) array_of_maps|hash_of_maps)
local MAP_TYPE='id pinned name'
COMPREPLY+=( $( compgen -W "$MAP_TYPE" \ COMPREPLY+=( $( compgen -W "$MAP_TYPE" \
-- "$cur" ) ) -- "$cur" ) )
return 0 return 0
;; ;;
prog_array) prog_array)
local PROG_TYPE='id pinned tag name'
COMPREPLY+=( $( compgen -W "$PROG_TYPE" \ COMPREPLY+=( $( compgen -W "$PROG_TYPE" \
-- "$cur" ) ) -- "$cur" ) )
return 0 return 0
@ -821,7 +813,7 @@ _bpftool()
esac esac
_bpftool_once_attr 'key' _bpftool_once_attr 'key'
local UPDATE_FLAGS='any exist noexist' local UPDATE_FLAGS='any exist noexist' idx
for (( idx=3; idx < ${#words[@]}-1; idx++ )); do for (( idx=3; idx < ${#words[@]}-1; idx++ )); do
if [[ ${words[idx]} == 'value' ]]; then if [[ ${words[idx]} == 'value' ]]; then
# 'value' is present, but is not the last # 'value' is present, but is not the last
@ -893,7 +885,6 @@ _bpftool()
esac esac
;; ;;
btf) btf)
local PROG_TYPE='id pinned tag name'
local MAP_TYPE='id pinned name' local MAP_TYPE='id pinned name'
case $command in case $command in
dump) dump)
@ -1033,7 +1024,6 @@ _bpftool()
local BPFTOOL_CGROUP_ATTACH_TYPES="$(bpftool feature list_builtins attach_types 2>/dev/null | \ local BPFTOOL_CGROUP_ATTACH_TYPES="$(bpftool feature list_builtins attach_types 2>/dev/null | \
grep '^cgroup_')" grep '^cgroup_')"
local ATTACH_FLAGS='multi override' local ATTACH_FLAGS='multi override'
local PROG_TYPE='id pinned tag name'
# Check for $prev = $command first # Check for $prev = $command first
if [ $prev = $command ]; then if [ $prev = $command ]; then
_filedir _filedir
@ -1086,7 +1076,6 @@ _bpftool()
esac esac
;; ;;
net) net)
local PROG_TYPE='id pinned tag name'
local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload' local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload'
case $command in case $command in
show|list) show|list)
@ -1193,14 +1182,14 @@ _bpftool()
pin|detach) pin|detach)
if [[ $prev == "$command" ]]; then if [[ $prev == "$command" ]]; then
COMPREPLY=( $( compgen -W "$LINK_TYPE" -- "$cur" ) ) COMPREPLY=( $( compgen -W "$LINK_TYPE" -- "$cur" ) )
else elif [[ $pprev == "$command" ]]; then
_filedir _filedir
fi fi
return 0 return 0
;; ;;
*) *)
[[ $prev == $object ]] && \ [[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'help pin show list' -- "$cur" ) ) COMPREPLY=( $( compgen -W 'help pin detach show list' -- "$cur" ) )
;; ;;
esac esac
;; ;;

View File

@ -244,29 +244,101 @@ int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type)
return fd; return fd;
} }
int mount_bpffs_for_pin(const char *name, bool is_dir) int create_and_mount_bpffs_dir(const char *dir_name)
{ {
char err_str[ERR_MAX_LEN]; char err_str[ERR_MAX_LEN];
char *file; bool dir_exists;
int err = 0;
if (is_bpffs(dir_name))
return err;
dir_exists = access(dir_name, F_OK) == 0;
if (!dir_exists) {
char *temp_name;
char *parent_name;
temp_name = strdup(dir_name);
if (!temp_name) {
p_err("mem alloc failed");
return -1;
}
parent_name = dirname(temp_name);
if (is_bpffs(parent_name)) {
/* nothing to do if already mounted */
free(temp_name);
return err;
}
if (access(parent_name, F_OK) == -1) {
p_err("can't create dir '%s' to pin BPF object: parent dir '%s' doesn't exist",
dir_name, parent_name);
free(temp_name);
return -1;
}
free(temp_name);
}
if (block_mount) {
p_err("no BPF file system found, not mounting it due to --nomount option");
return -1;
}
if (!dir_exists) {
err = mkdir(dir_name, S_IRWXU);
if (err) {
p_err("failed to create dir '%s': %s", dir_name, strerror(errno));
return err;
}
}
err = mnt_fs(dir_name, "bpf", err_str, ERR_MAX_LEN);
if (err) {
err_str[ERR_MAX_LEN - 1] = '\0';
p_err("can't mount BPF file system on given dir '%s': %s",
dir_name, err_str);
if (!dir_exists)
rmdir(dir_name);
}
return err;
}
int mount_bpffs_for_file(const char *file_name)
{
char err_str[ERR_MAX_LEN];
char *temp_name;
char *dir; char *dir;
int err = 0; int err = 0;
if (is_dir && is_bpffs(name)) if (access(file_name, F_OK) != -1) {
return err; p_err("can't pin BPF object: path '%s' already exists", file_name);
return -1;
}
file = malloc(strlen(name) + 1); temp_name = strdup(file_name);
if (!file) { if (!temp_name) {
p_err("mem alloc failed"); p_err("mem alloc failed");
return -1; return -1;
} }
strcpy(file, name); dir = dirname(temp_name);
dir = dirname(file);
if (is_bpffs(dir)) if (is_bpffs(dir))
/* nothing to do if already mounted */ /* nothing to do if already mounted */
goto out_free; goto out_free;
if (access(dir, F_OK) == -1) {
p_err("can't pin BPF object: dir '%s' doesn't exist", dir);
err = -1;
goto out_free;
}
if (block_mount) { if (block_mount) {
p_err("no BPF file system found, not mounting it due to --nomount option"); p_err("no BPF file system found, not mounting it due to --nomount option");
err = -1; err = -1;
@ -276,12 +348,12 @@ int mount_bpffs_for_pin(const char *name, bool is_dir)
err = mnt_fs(dir, "bpf", err_str, ERR_MAX_LEN); err = mnt_fs(dir, "bpf", err_str, ERR_MAX_LEN);
if (err) { if (err) {
err_str[ERR_MAX_LEN - 1] = '\0'; err_str[ERR_MAX_LEN - 1] = '\0';
p_err("can't mount BPF file system to pin the object (%s): %s", p_err("can't mount BPF file system to pin the object '%s': %s",
name, err_str); file_name, err_str);
} }
out_free: out_free:
free(file); free(temp_name);
return err; return err;
} }
@ -289,7 +361,7 @@ int do_pin_fd(int fd, const char *name)
{ {
int err; int err;
err = mount_bpffs_for_pin(name, false); err = mount_bpffs_for_file(name);
if (err) if (err)
return err; return err;

View File

@ -664,7 +664,8 @@ probe_helper_ifindex(enum bpf_func_id id, enum bpf_prog_type prog_type,
probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), buf, probe_prog_load_ifindex(prog_type, insns, ARRAY_SIZE(insns), buf,
sizeof(buf), ifindex); sizeof(buf), ifindex);
res = !grep(buf, "invalid func ") && !grep(buf, "unknown func "); res = !grep(buf, "invalid func ") && !grep(buf, "unknown func ") &&
!grep(buf, "program of this type cannot use helper ");
switch (get_vendor_id(ifindex)) { switch (get_vendor_id(ifindex)) {
case 0x19ee: /* Netronome specific */ case 0x19ee: /* Netronome specific */

View File

@ -386,7 +386,7 @@ static int codegen_subskel_datasecs(struct bpf_object *obj, const char *obj_name
*/ */
needs_typeof = btf_is_array(var) || btf_is_ptr_to_func_proto(btf, var); needs_typeof = btf_is_array(var) || btf_is_ptr_to_func_proto(btf, var);
if (needs_typeof) if (needs_typeof)
printf("typeof("); printf("__typeof__(");
err = btf_dump__emit_type_decl(d, var_type_id, &opts); err = btf_dump__emit_type_decl(d, var_type_id, &opts);
if (err) if (err)
@ -1131,7 +1131,7 @@ static void gen_st_ops_shadow_init(struct btf *btf, struct bpf_object *obj)
continue; continue;
codegen("\ codegen("\
\n\ \n\
obj->struct_ops.%1$s = (typeof(obj->struct_ops.%1$s))\n\ obj->struct_ops.%1$s = (__typeof__(obj->struct_ops.%1$s))\n\
bpf_map__initial_value(obj->maps.%1$s, NULL);\n\ bpf_map__initial_value(obj->maps.%1$s, NULL);\n\
\n\ \n\
", ident); ", ident);

View File

@ -76,7 +76,7 @@ static int do_pin(int argc, char **argv)
goto close_obj; goto close_obj;
} }
err = mount_bpffs_for_pin(path, false); err = mount_bpffs_for_file(path);
if (err) if (err)
goto close_link; goto close_link;

View File

@ -526,6 +526,10 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
show_link_ifindex_json(info->netkit.ifindex, json_wtr); show_link_ifindex_json(info->netkit.ifindex, json_wtr);
show_link_attach_type_json(info->netkit.attach_type, json_wtr); show_link_attach_type_json(info->netkit.attach_type, json_wtr);
break; break;
case BPF_LINK_TYPE_SOCKMAP:
jsonw_uint_field(json_wtr, "map_id", info->sockmap.map_id);
show_link_attach_type_json(info->sockmap.attach_type, json_wtr);
break;
case BPF_LINK_TYPE_XDP: case BPF_LINK_TYPE_XDP:
show_link_ifindex_json(info->xdp.ifindex, json_wtr); show_link_ifindex_json(info->xdp.ifindex, json_wtr);
break; break;
@ -915,6 +919,11 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
show_link_ifindex_plain(info->netkit.ifindex); show_link_ifindex_plain(info->netkit.ifindex);
show_link_attach_type_plain(info->netkit.attach_type); show_link_attach_type_plain(info->netkit.attach_type);
break; break;
case BPF_LINK_TYPE_SOCKMAP:
printf("\n\t");
printf("map_id %u ", info->sockmap.map_id);
show_link_attach_type_plain(info->sockmap.attach_type);
break;
case BPF_LINK_TYPE_XDP: case BPF_LINK_TYPE_XDP:
printf("\n\t"); printf("\n\t");
show_link_ifindex_plain(info->xdp.ifindex); show_link_ifindex_plain(info->xdp.ifindex);

View File

@ -142,7 +142,8 @@ const char *get_fd_type_name(enum bpf_obj_type type);
char *get_fdinfo(int fd, const char *key); char *get_fdinfo(int fd, const char *key);
int open_obj_pinned(const char *path, bool quiet); int open_obj_pinned(const char *path, bool quiet);
int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type); int open_obj_pinned_any(const char *path, enum bpf_obj_type exp_type);
int mount_bpffs_for_pin(const char *name, bool is_dir); int mount_bpffs_for_file(const char *file_name);
int create_and_mount_bpffs_dir(const char *dir_name);
int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***)); int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(int *, char ***));
int do_pin_fd(int fd, const char *name); int do_pin_fd(int fd, const char *name);

View File

@ -1778,7 +1778,10 @@ offload_dev:
goto err_close_obj; goto err_close_obj;
} }
err = mount_bpffs_for_pin(pinfile, !first_prog_only); if (first_prog_only)
err = mount_bpffs_for_file(pinfile);
else
err = create_and_mount_bpffs_dir(pinfile);
if (err) if (err)
goto err_close_obj; goto err_close_obj;
@ -2078,7 +2081,7 @@ static int profile_parse_metrics(int argc, char **argv)
NEXT_ARG(); NEXT_ARG();
} }
if (selected_cnt > MAX_NUM_PROFILE_METRICS) { if (selected_cnt > MAX_NUM_PROFILE_METRICS) {
p_err("too many (%d) metrics, please specify no more than %d metrics at at time", p_err("too many (%d) metrics, please specify no more than %d metrics at a time",
selected_cnt, MAX_NUM_PROFILE_METRICS); selected_cnt, MAX_NUM_PROFILE_METRICS);
return -1; return -1;
} }

View File

@ -515,7 +515,7 @@ static int do_register(int argc, char **argv)
if (argc == 1) if (argc == 1)
linkdir = GET_ARG(); linkdir = GET_ARG();
if (linkdir && mount_bpffs_for_pin(linkdir, true)) { if (linkdir && create_and_mount_bpffs_dir(linkdir)) {
p_err("can't mount bpffs for pinning"); p_err("can't mount bpffs for pinning");
return -1; return -1;
} }

View File

@ -111,6 +111,24 @@
.off = 0, \ .off = 0, \
.imm = IMM }) .imm = IMM })
/* Short form of movsx, dst_reg = (s8,s16,s32)src_reg */
#define BPF_MOVSX64_REG(DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
#define BPF_MOVSX32_REG(DST, SRC, OFF) \
((struct bpf_insn) { \
.code = BPF_ALU | BPF_MOV | BPF_X, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
.imm = 0 })
/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */ /* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \ #define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \

View File

@ -1135,6 +1135,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_TCX = 11, BPF_LINK_TYPE_TCX = 11,
BPF_LINK_TYPE_UPROBE_MULTI = 12, BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13, BPF_LINK_TYPE_NETKIT = 13,
BPF_LINK_TYPE_SOCKMAP = 14,
__MAX_BPF_LINK_TYPE, __MAX_BPF_LINK_TYPE,
}; };
@ -3394,6 +3395,10 @@ union bpf_attr {
* for the nexthop. If the src addr cannot be derived, * for the nexthop. If the src addr cannot be derived,
* **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned. In this
* case, *params*->dmac and *params*->smac are not set either. * case, *params*->dmac and *params*->smac are not set either.
* **BPF_FIB_LOOKUP_MARK**
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
* *
* *ctx* is either **struct xdp_md** for XDP programs or * *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs. * **struct sk_buff** tc cls_act programs.
@ -5022,7 +5027,7 @@ union bpf_attr {
* bytes will be copied to *dst* * bytes will be copied to *dst*
* Return * Return
* The **hash_algo** is returned on success, * The **hash_algo** is returned on success,
* **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if * **-EOPNOTSUPP** if IMA is disabled or **-EINVAL** if
* invalid arguments are passed. * invalid arguments are passed.
* *
* struct socket *bpf_sock_from_file(struct file *file) * struct socket *bpf_sock_from_file(struct file *file)
@ -5508,7 +5513,7 @@ union bpf_attr {
* bytes will be copied to *dst* * bytes will be copied to *dst*
* Return * Return
* The **hash_algo** is returned on success, * The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if * **-EOPNOTSUPP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed. * invalid arguments are passed.
* *
* void *bpf_kptr_xchg(void *map_value, void *ptr) * void *bpf_kptr_xchg(void *map_value, void *ptr)
@ -6720,6 +6725,10 @@ struct bpf_link_info {
__u32 ifindex; __u32 ifindex;
__u32 attach_type; __u32 attach_type;
} netkit; } netkit;
struct {
__u32 map_id;
__u32 attach_type;
} sockmap;
}; };
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
@ -6938,6 +6947,8 @@ enum {
* socket transition to LISTEN state. * socket transition to LISTEN state.
*/ */
BPF_SOCK_OPS_RTT_CB, /* Called on every RTT. BPF_SOCK_OPS_RTT_CB, /* Called on every RTT.
* Arg1: measured RTT input (mrtt)
* Arg2: updated srtt
*/ */
BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option. BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option.
* It will be called to handle * It will be called to handle
@ -7120,6 +7131,7 @@ enum {
BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
BPF_FIB_LOOKUP_TBID = (1U << 3), BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4), BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
}; };
enum { enum {
@ -7152,7 +7164,7 @@ struct bpf_fib_lookup {
/* output: MTU value */ /* output: MTU value */
__u16 mtu_result; __u16 mtu_result;
}; } __attribute__((packed, aligned(2)));
/* input: L3 device index for lookup /* input: L3 device index for lookup
* output: device index from FIB lookup * output: device index from FIB lookup
*/ */
@ -7197,8 +7209,19 @@ struct bpf_fib_lookup {
__u32 tbid; __u32 tbid;
}; };
__u8 smac[6]; /* ETH_ALEN */ union {
__u8 dmac[6]; /* ETH_ALEN */ /* input */
struct {
__u32 mark; /* policy routing */
/* 2 4-byte holes for input */
};
/* output: source and dest mac */
struct {
__u8 smac[6]; /* ETH_ALEN */
__u8 dmac[6]; /* ETH_ALEN */
};
};
}; };
struct bpf_redir_neigh { struct bpf_redir_neigh {
@ -7285,6 +7308,10 @@ struct bpf_timer {
__u64 __opaque[2]; __u64 __opaque[2];
} __attribute__((aligned(8))); } __attribute__((aligned(8)));
struct bpf_wq {
__u64 __opaque[2];
} __attribute__((aligned(8)));
struct bpf_dynptr { struct bpf_dynptr {
__u64 __opaque[2]; __u64 __opaque[2];
} __attribute__((aligned(8))); } __attribute__((aligned(8)));

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@
#ifndef __BPF_CORE_READ_H__ #ifndef __BPF_CORE_READ_H__
#define __BPF_CORE_READ_H__ #define __BPF_CORE_READ_H__
#include <bpf/bpf_helpers.h> #include "bpf_helpers.h"
/* /*
* enum bpf_field_info_kind is passed as a second argument into * enum bpf_field_info_kind is passed as a second argument into

View File

@ -137,7 +137,8 @@
/* /*
* Helper function to perform a tail call with a constant/immediate map slot. * Helper function to perform a tail call with a constant/immediate map slot.
*/ */
#if __clang_major__ >= 8 && defined(__bpf__) #if (defined(__clang__) && __clang_major__ >= 8) || (!defined(__clang__) && __GNUC__ > 12)
#if defined(__bpf__)
static __always_inline void static __always_inline void
bpf_tail_call_static(void *ctx, const void *map, const __u32 slot) bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
{ {
@ -165,6 +166,7 @@ bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
: "r0", "r1", "r2", "r3", "r4", "r5"); : "r0", "r1", "r2", "r3", "r4", "r5");
} }
#endif #endif
#endif
enum libbpf_pin_type { enum libbpf_pin_type {
LIBBPF_PIN_NONE, LIBBPF_PIN_NONE,

View File

@ -1929,6 +1929,7 @@ static int btf_dump_int_data(struct btf_dump *d,
if (d->typed_dump->is_array_terminated) if (d->typed_dump->is_array_terminated)
break; break;
if (*(char *)data == '\0') { if (*(char *)data == '\0') {
btf_dump_type_values(d, "'\\0'");
d->typed_dump->is_array_terminated = true; d->typed_dump->is_array_terminated = true;
break; break;
} }
@ -2031,6 +2032,7 @@ static int btf_dump_array_data(struct btf_dump *d,
__u32 i, elem_type_id; __u32 i, elem_type_id;
__s64 elem_size; __s64 elem_size;
bool is_array_member; bool is_array_member;
bool is_array_terminated;
elem_type_id = array->type; elem_type_id = array->type;
elem_type = skip_mods_and_typedefs(d->btf, elem_type_id, NULL); elem_type = skip_mods_and_typedefs(d->btf, elem_type_id, NULL);
@ -2066,12 +2068,15 @@ static int btf_dump_array_data(struct btf_dump *d,
*/ */
is_array_member = d->typed_dump->is_array_member; is_array_member = d->typed_dump->is_array_member;
d->typed_dump->is_array_member = true; d->typed_dump->is_array_member = true;
is_array_terminated = d->typed_dump->is_array_terminated;
d->typed_dump->is_array_terminated = false;
for (i = 0; i < array->nelems; i++, data += elem_size) { for (i = 0; i < array->nelems; i++, data += elem_size) {
if (d->typed_dump->is_array_terminated) if (d->typed_dump->is_array_terminated)
break; break;
btf_dump_dump_type_data(d, NULL, elem_type, elem_type_id, data, 0, 0); btf_dump_dump_type_data(d, NULL, elem_type, elem_type_id, data, 0, 0);
} }
d->typed_dump->is_array_member = is_array_member; d->typed_dump->is_array_member = is_array_member;
d->typed_dump->is_array_terminated = is_array_terminated;
d->typed_dump->depth--; d->typed_dump->depth--;
btf_dump_data_pfx(d); btf_dump_data_pfx(d);
btf_dump_type_values(d, "]"); btf_dump_type_values(d, "]");

View File

@ -149,6 +149,7 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_TCX] = "tcx", [BPF_LINK_TYPE_TCX] = "tcx",
[BPF_LINK_TYPE_UPROBE_MULTI] = "uprobe_multi", [BPF_LINK_TYPE_UPROBE_MULTI] = "uprobe_multi",
[BPF_LINK_TYPE_NETKIT] = "netkit", [BPF_LINK_TYPE_NETKIT] = "netkit",
[BPF_LINK_TYPE_SOCKMAP] = "sockmap",
}; };
static const char * const map_type_name[] = { static const char * const map_type_name[] = {
@ -1970,6 +1971,20 @@ static struct extern_desc *find_extern_by_name(const struct bpf_object *obj,
return NULL; return NULL;
} }
static struct extern_desc *find_extern_by_name_with_len(const struct bpf_object *obj,
const void *name, int len)
{
const char *ext_name;
int i;
for (i = 0; i < obj->nr_extern; i++) {
ext_name = obj->externs[i].name;
if (strlen(ext_name) == len && strncmp(ext_name, name, len) == 0)
return &obj->externs[i];
}
return NULL;
}
static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val, static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
char value) char value)
{ {
@ -7986,7 +8001,10 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj)
return 0; return 0;
} }
int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *ctx) typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type,
const char *sym_name, void *ctx);
static int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *ctx)
{ {
char sym_type, sym_name[500]; char sym_type, sym_name[500];
unsigned long long sym_addr; unsigned long long sym_addr;
@ -8026,8 +8044,13 @@ static int kallsyms_cb(unsigned long long sym_addr, char sym_type,
struct bpf_object *obj = ctx; struct bpf_object *obj = ctx;
const struct btf_type *t; const struct btf_type *t;
struct extern_desc *ext; struct extern_desc *ext;
char *res;
ext = find_extern_by_name(obj, sym_name); res = strstr(sym_name, ".llvm.");
if (sym_type == 'd' && res)
ext = find_extern_by_name_with_len(obj, sym_name, res - sym_name);
else
ext = find_extern_by_name(obj, sym_name);
if (!ext || ext->type != EXT_KSYM) if (!ext || ext->type != EXT_KSYM)
return 0; return 0;
@ -12511,6 +12534,12 @@ bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
return bpf_program_attach_fd(prog, netns_fd, "netns", NULL); return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
} }
struct bpf_link *
bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd)
{
return bpf_program_attach_fd(prog, map_fd, "sockmap", NULL);
}
struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex) struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
{ {
/* target_fd/target_ifindex use the same field in LINK_CREATE */ /* target_fd/target_ifindex use the same field in LINK_CREATE */

View File

@ -795,6 +795,8 @@ bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
LIBBPF_API struct bpf_link * LIBBPF_API struct bpf_link *
bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd); bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd);
LIBBPF_API struct bpf_link * LIBBPF_API struct bpf_link *
bpf_program__attach_sockmap(const struct bpf_program *prog, int map_fd);
LIBBPF_API struct bpf_link *
bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex); bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex);
LIBBPF_API struct bpf_link * LIBBPF_API struct bpf_link *
bpf_program__attach_freplace(const struct bpf_program *prog, bpf_program__attach_freplace(const struct bpf_program *prog,
@ -1293,6 +1295,7 @@ LIBBPF_API int ring_buffer__add(struct ring_buffer *rb, int map_fd,
ring_buffer_sample_fn sample_cb, void *ctx); ring_buffer_sample_fn sample_cb, void *ctx);
LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms); LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb); LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
LIBBPF_API int ring_buffer__consume_n(struct ring_buffer *rb, size_t n);
LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb); LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
/** /**
@ -1367,6 +1370,17 @@ LIBBPF_API int ring__map_fd(const struct ring *r);
*/ */
LIBBPF_API int ring__consume(struct ring *r); LIBBPF_API int ring__consume(struct ring *r);
/**
* @brief **ring__consume_n()** consumes up to a requested amount of items from
* a ringbuffer without event polling.
*
* @param r A ringbuffer object.
* @param n Maximum amount of items to consume.
* @return The number of items consumed, or a negative number if any of the
* callbacks return an error.
*/
LIBBPF_API int ring__consume_n(struct ring *r, size_t n);
struct user_ring_buffer_opts { struct user_ring_buffer_opts {
size_t sz; /* size of this struct, for forward/backward compatibility */ size_t sz; /* size of this struct, for forward/backward compatibility */
}; };

View File

@ -416,3 +416,10 @@ LIBBPF_1.4.0 {
btf__new_split; btf__new_split;
btf_ext__raw_data; btf_ext__raw_data;
} LIBBPF_1.3.0; } LIBBPF_1.3.0;
LIBBPF_1.5.0 {
global:
bpf_program__attach_sockmap;
ring__consume_n;
ring_buffer__consume_n;
} LIBBPF_1.4.0;

View File

@ -518,11 +518,6 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
__s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name, __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
__u32 kind); __u32 kind);
typedef int (*kallsyms_cb_t)(unsigned long long sym_addr, char sym_type,
const char *sym_name, void *ctx);
int libbpf_kallsyms_parse(kallsyms_cb_t cb, void *arg);
/* handle direct returned errors */ /* handle direct returned errors */
static inline int libbpf_err(int ret) static inline int libbpf_err(int ret)
{ {

View File

@ -448,7 +448,8 @@ int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type, enum bpf_func_id helpe
/* If BPF verifier doesn't recognize BPF helper ID (enum bpf_func_id) /* If BPF verifier doesn't recognize BPF helper ID (enum bpf_func_id)
* at all, it will emit something like "invalid func unknown#181". * at all, it will emit something like "invalid func unknown#181".
* If BPF verifier recognizes BPF helper but it's not supported for * If BPF verifier recognizes BPF helper but it's not supported for
* given BPF program type, it will emit "unknown func bpf_sys_bpf#166". * given BPF program type, it will emit "unknown func bpf_sys_bpf#166"
* or "program of this type cannot use helper bpf_sys_bpf#166".
* In both cases, provided combination of BPF program type and BPF * In both cases, provided combination of BPF program type and BPF
* helper is not supported by the kernel. * helper is not supported by the kernel.
* In all other cases, probe_prog_load() above will either succeed (e.g., * In all other cases, probe_prog_load() above will either succeed (e.g.,
@ -457,7 +458,8 @@ int libbpf_probe_bpf_helper(enum bpf_prog_type prog_type, enum bpf_func_id helpe
* that), or we'll get some more specific BPF verifier error about * that), or we'll get some more specific BPF verifier error about
* some unsatisfied conditions. * some unsatisfied conditions.
*/ */
if (ret == 0 && (strstr(buf, "invalid func ") || strstr(buf, "unknown func "))) if (ret == 0 && (strstr(buf, "invalid func ") || strstr(buf, "unknown func ") ||
strstr(buf, "program of this type cannot use helper ")))
return 0; return 0;
return 1; /* assume supported */ return 1; /* assume supported */
} }

View File

@ -4,6 +4,6 @@
#define __LIBBPF_VERSION_H #define __LIBBPF_VERSION_H
#define LIBBPF_MAJOR_VERSION 1 #define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 4 #define LIBBPF_MINOR_VERSION 5
#endif /* __LIBBPF_VERSION_H */ #endif /* __LIBBPF_VERSION_H */

View File

@ -231,7 +231,7 @@ static inline int roundup_len(__u32 len)
return (len + 7) / 8 * 8; return (len + 7) / 8 * 8;
} }
static int64_t ringbuf_process_ring(struct ring *r) static int64_t ringbuf_process_ring(struct ring *r, size_t n)
{ {
int *len_ptr, len, err; int *len_ptr, len, err;
/* 64-bit to avoid overflow in case of extreme application behavior */ /* 64-bit to avoid overflow in case of extreme application behavior */
@ -268,12 +268,42 @@ static int64_t ringbuf_process_ring(struct ring *r)
} }
smp_store_release(r->consumer_pos, cons_pos); smp_store_release(r->consumer_pos, cons_pos);
if (cnt >= n)
goto done;
} }
} while (got_new_data); } while (got_new_data);
done: done:
return cnt; return cnt;
} }
/* Consume available ring buffer(s) data without event polling, up to n
* records.
*
* Returns number of records consumed across all registered ring buffers (or
* n, whichever is less), or negative number if any of the callbacks return
* error.
*/
int ring_buffer__consume_n(struct ring_buffer *rb, size_t n)
{
int64_t err, res = 0;
int i;
for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = rb->rings[i];
err = ringbuf_process_ring(ring, n);
if (err < 0)
return libbpf_err(err);
res += err;
n -= err;
if (n == 0)
break;
}
return res;
}
/* Consume available ring buffer(s) data without event polling. /* Consume available ring buffer(s) data without event polling.
* Returns number of records consumed across all registered ring buffers (or * Returns number of records consumed across all registered ring buffers (or
* INT_MAX, whichever is less), or negative number if any of the callbacks * INT_MAX, whichever is less), or negative number if any of the callbacks
@ -287,13 +317,15 @@ int ring_buffer__consume(struct ring_buffer *rb)
for (i = 0; i < rb->ring_cnt; i++) { for (i = 0; i < rb->ring_cnt; i++) {
struct ring *ring = rb->rings[i]; struct ring *ring = rb->rings[i];
err = ringbuf_process_ring(ring); err = ringbuf_process_ring(ring, INT_MAX);
if (err < 0) if (err < 0)
return libbpf_err(err); return libbpf_err(err);
res += err; res += err;
if (res > INT_MAX) {
res = INT_MAX;
break;
}
} }
if (res > INT_MAX)
return INT_MAX;
return res; return res;
} }
@ -314,13 +346,13 @@ int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms)
__u32 ring_id = rb->events[i].data.fd; __u32 ring_id = rb->events[i].data.fd;
struct ring *ring = rb->rings[ring_id]; struct ring *ring = rb->rings[ring_id];
err = ringbuf_process_ring(ring); err = ringbuf_process_ring(ring, INT_MAX);
if (err < 0) if (err < 0)
return libbpf_err(err); return libbpf_err(err);
res += err; res += err;
} }
if (res > INT_MAX) if (res > INT_MAX)
return INT_MAX; res = INT_MAX;
return res; return res;
} }
@ -371,17 +403,22 @@ int ring__map_fd(const struct ring *r)
return r->map_fd; return r->map_fd;
} }
int ring__consume(struct ring *r) int ring__consume_n(struct ring *r, size_t n)
{ {
int64_t res; int res;
res = ringbuf_process_ring(r); res = ringbuf_process_ring(r, n);
if (res < 0) if (res < 0)
return libbpf_err(res); return libbpf_err(res);
return res > INT_MAX ? INT_MAX : res; return res > INT_MAX ? INT_MAX : res;
} }
int ring__consume(struct ring *r)
{
return ring__consume_n(r, INT_MAX);
}
static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb) static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb)
{ {
if (rb->consumer_pos) { if (rb->consumer_pos) {

View File

@ -10,5 +10,4 @@ fill_link_info/kprobe_multi_link_info # bpf_program__attach_kprobe_mu
fill_link_info/kretprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95 fill_link_info/kretprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95
fill_link_info/kprobe_multi_invalid_ubuff # bpf_program__attach_kprobe_multi_opts unexpected error: -95 fill_link_info/kprobe_multi_invalid_ubuff # bpf_program__attach_kprobe_multi_opts unexpected error: -95
missed/kprobe_recursion # missed_kprobe_recursion__attach unexpected error: -95 (errno 95) missed/kprobe_recursion # missed_kprobe_recursion__attach unexpected error: -95 (errno 95)
verifier_arena # JIT does not support arena arena_atomics
arena_htab # JIT does not support arena

View File

@ -6,3 +6,4 @@ stacktrace_build_id # compare_map_keys stackid_hmap vs. sta
verifier_iterating_callbacks verifier_iterating_callbacks
verifier_arena # JIT does not support arena verifier_arena # JIT does not support arena
arena_htab # JIT does not support arena arena_htab # JIT does not support arena
arena_atomics

View File

@ -278,11 +278,12 @@ UNPRIV_HELPERS := $(OUTPUT)/unpriv_helpers.o
TRACE_HELPERS := $(OUTPUT)/trace_helpers.o TRACE_HELPERS := $(OUTPUT)/trace_helpers.o
JSON_WRITER := $(OUTPUT)/json_writer.o JSON_WRITER := $(OUTPUT)/json_writer.o
CAP_HELPERS := $(OUTPUT)/cap_helpers.o CAP_HELPERS := $(OUTPUT)/cap_helpers.o
NETWORK_HELPERS := $(OUTPUT)/network_helpers.o
$(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sock: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_sock: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_sock_addr: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_sock_addr: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(NETWORK_HELPERS)
$(OUTPUT)/test_sockmap: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/test_sockmap: $(CGROUP_HELPERS) $(TESTING_HELPERS)
$(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELPERS) $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELPERS)
$(OUTPUT)/get_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(OUTPUT)/get_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS)
@ -443,7 +444,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h \
LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \ LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \
trace_printk.c trace_vprintk.c map_ptr_kern.c \ trace_printk.c trace_vprintk.c map_ptr_kern.c \
core_kern.c core_kern_overflow.c test_ringbuf.c \ core_kern.c core_kern_overflow.c test_ringbuf.c \
test_ringbuf_map_key.c test_ringbuf_n.c test_ringbuf_map_key.c
# Generate both light skeleton and libbpf skeleton for these # Generate both light skeleton and libbpf skeleton for these
LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \ LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \
@ -646,7 +647,7 @@ $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
# Define test_progs-cpuv4 test runner. # Define test_progs-cpuv4 test runner.
ifneq ($(CLANG_CPUV4),) ifneq ($(CLANG_CPUV4),)
TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE
TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
$(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4)) $(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4))
endif endif
@ -683,7 +684,7 @@ $(OUTPUT)/test_verifier: test_verifier.c verifier/tests.h $(BPFOBJ) | $(OUTPUT)
# Include find_bit.c to compile xskxceiver. # Include find_bit.c to compile xskxceiver.
EXTRA_SRC := $(TOOLSDIR)/lib/find_bit.c EXTRA_SRC := $(TOOLSDIR)/lib/find_bit.c
$(OUTPUT)/xskxceiver: $(EXTRA_SRC) xskxceiver.c xskxceiver.h $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel.h $(BPFOBJ) | $(OUTPUT) $(OUTPUT)/xskxceiver: $(EXTRA_SRC) xskxceiver.c xskxceiver.h $(OUTPUT)/network_helpers.o $(OUTPUT)/xsk.o $(OUTPUT)/xsk_xdp_progs.skel.h $(BPFOBJ) | $(OUTPUT)
$(call msg,BINARY,,$@) $(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@ $(Q)$(CC) $(CFLAGS) $(filter %.a %.o %.c,$^) $(LDLIBS) -o $@
@ -717,6 +718,7 @@ $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tas
$(OUTPUT)/bench_local_storage_create.o: $(OUTPUT)/bench_local_storage_create.skel.h $(OUTPUT)/bench_local_storage_create.o: $(OUTPUT)/bench_local_storage_create.skel.h
$(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h $(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h
$(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h $(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h
$(OUTPUT)/bench_bpf_crypto.o: $(OUTPUT)/crypto_bench.skel.h
$(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
$(OUTPUT)/bench: LDLIBS += -lm $(OUTPUT)/bench: LDLIBS += -lm
$(OUTPUT)/bench: $(OUTPUT)/bench.o \ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
@ -736,6 +738,7 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(OUTPUT)/bench_bpf_hashmap_lookup.o \ $(OUTPUT)/bench_bpf_hashmap_lookup.o \
$(OUTPUT)/bench_local_storage_create.o \ $(OUTPUT)/bench_local_storage_create.o \
$(OUTPUT)/bench_htab_mem.o \ $(OUTPUT)/bench_htab_mem.o \
$(OUTPUT)/bench_bpf_crypto.o \
# #
$(call msg,BINARY,,$@) $(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
@ -747,7 +750,7 @@ $(OUTPUT)/veristat: $(OUTPUT)/veristat.o
$(OUTPUT)/uprobe_multi: uprobe_multi.c $(OUTPUT)/uprobe_multi: uprobe_multi.c
$(call msg,BINARY,,$@) $(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@ $(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \ EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \ prog_tests/tests.h map_tests/tests.h verifier/tests.h \

View File

@ -280,6 +280,8 @@ extern struct argp bench_strncmp_argp;
extern struct argp bench_hashmap_lookup_argp; extern struct argp bench_hashmap_lookup_argp;
extern struct argp bench_local_storage_create_argp; extern struct argp bench_local_storage_create_argp;
extern struct argp bench_htab_mem_argp; extern struct argp bench_htab_mem_argp;
extern struct argp bench_trigger_batch_argp;
extern struct argp bench_crypto_argp;
static const struct argp_child bench_parsers[] = { static const struct argp_child bench_parsers[] = {
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
@ -292,6 +294,8 @@ static const struct argp_child bench_parsers[] = {
{ &bench_hashmap_lookup_argp, 0, "Hashmap lookup benchmark", 0 }, { &bench_hashmap_lookup_argp, 0, "Hashmap lookup benchmark", 0 },
{ &bench_local_storage_create_argp, 0, "local-storage-create benchmark", 0 }, { &bench_local_storage_create_argp, 0, "local-storage-create benchmark", 0 },
{ &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 }, { &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 },
{ &bench_trigger_batch_argp, 0, "BPF triggering benchmark", 0 },
{ &bench_crypto_argp, 0, "bpf crypto benchmark", 0 },
{}, {},
}; };
@ -491,24 +495,31 @@ extern const struct bench bench_rename_kretprobe;
extern const struct bench bench_rename_rawtp; extern const struct bench bench_rename_rawtp;
extern const struct bench bench_rename_fentry; extern const struct bench bench_rename_fentry;
extern const struct bench bench_rename_fexit; extern const struct bench bench_rename_fexit;
extern const struct bench bench_trig_base;
extern const struct bench bench_trig_tp; /* pure counting benchmarks to establish theoretical lmits */
extern const struct bench bench_trig_rawtp; extern const struct bench bench_trig_usermode_count;
extern const struct bench bench_trig_syscall_count;
extern const struct bench bench_trig_kernel_count;
/* batched, staying mostly in-kernel benchmarks */
extern const struct bench bench_trig_kprobe; extern const struct bench bench_trig_kprobe;
extern const struct bench bench_trig_kretprobe; extern const struct bench bench_trig_kretprobe;
extern const struct bench bench_trig_kprobe_multi; extern const struct bench bench_trig_kprobe_multi;
extern const struct bench bench_trig_kretprobe_multi; extern const struct bench bench_trig_kretprobe_multi;
extern const struct bench bench_trig_fentry; extern const struct bench bench_trig_fentry;
extern const struct bench bench_trig_fexit; extern const struct bench bench_trig_fexit;
extern const struct bench bench_trig_fentry_sleep;
extern const struct bench bench_trig_fmodret; extern const struct bench bench_trig_fmodret;
extern const struct bench bench_trig_uprobe_base; extern const struct bench bench_trig_tp;
extern const struct bench bench_trig_rawtp;
/* uprobe/uretprobe benchmarks */
extern const struct bench bench_trig_uprobe_nop; extern const struct bench bench_trig_uprobe_nop;
extern const struct bench bench_trig_uretprobe_nop; extern const struct bench bench_trig_uretprobe_nop;
extern const struct bench bench_trig_uprobe_push; extern const struct bench bench_trig_uprobe_push;
extern const struct bench bench_trig_uretprobe_push; extern const struct bench bench_trig_uretprobe_push;
extern const struct bench bench_trig_uprobe_ret; extern const struct bench bench_trig_uprobe_ret;
extern const struct bench bench_trig_uretprobe_ret; extern const struct bench bench_trig_uretprobe_ret;
extern const struct bench bench_rb_libbpf; extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom; extern const struct bench bench_rb_custom;
extern const struct bench bench_pb_libbpf; extern const struct bench bench_pb_libbpf;
@ -529,6 +540,8 @@ extern const struct bench bench_local_storage_tasks_trace;
extern const struct bench bench_bpf_hashmap_lookup; extern const struct bench bench_bpf_hashmap_lookup;
extern const struct bench bench_local_storage_create; extern const struct bench bench_local_storage_create;
extern const struct bench bench_htab_mem; extern const struct bench bench_htab_mem;
extern const struct bench bench_crypto_encrypt;
extern const struct bench bench_crypto_decrypt;
static const struct bench *benchs[] = { static const struct bench *benchs[] = {
&bench_count_global, &bench_count_global,
@ -539,24 +552,28 @@ static const struct bench *benchs[] = {
&bench_rename_rawtp, &bench_rename_rawtp,
&bench_rename_fentry, &bench_rename_fentry,
&bench_rename_fexit, &bench_rename_fexit,
&bench_trig_base, /* pure counting benchmarks for establishing theoretical limits */
&bench_trig_tp, &bench_trig_usermode_count,
&bench_trig_rawtp, &bench_trig_kernel_count,
&bench_trig_syscall_count,
/* batched, staying mostly in-kernel triggers */
&bench_trig_kprobe, &bench_trig_kprobe,
&bench_trig_kretprobe, &bench_trig_kretprobe,
&bench_trig_kprobe_multi, &bench_trig_kprobe_multi,
&bench_trig_kretprobe_multi, &bench_trig_kretprobe_multi,
&bench_trig_fentry, &bench_trig_fentry,
&bench_trig_fexit, &bench_trig_fexit,
&bench_trig_fentry_sleep,
&bench_trig_fmodret, &bench_trig_fmodret,
&bench_trig_uprobe_base, &bench_trig_tp,
&bench_trig_rawtp,
/* uprobes */
&bench_trig_uprobe_nop, &bench_trig_uprobe_nop,
&bench_trig_uretprobe_nop, &bench_trig_uretprobe_nop,
&bench_trig_uprobe_push, &bench_trig_uprobe_push,
&bench_trig_uretprobe_push, &bench_trig_uretprobe_push,
&bench_trig_uprobe_ret, &bench_trig_uprobe_ret,
&bench_trig_uretprobe_ret, &bench_trig_uretprobe_ret,
/* ringbuf/perfbuf benchmarks */
&bench_rb_libbpf, &bench_rb_libbpf,
&bench_rb_custom, &bench_rb_custom,
&bench_pb_libbpf, &bench_pb_libbpf,
@ -577,6 +594,8 @@ static const struct bench *benchs[] = {
&bench_bpf_hashmap_lookup, &bench_bpf_hashmap_lookup,
&bench_local_storage_create, &bench_local_storage_create,
&bench_htab_mem, &bench_htab_mem,
&bench_crypto_encrypt,
&bench_crypto_decrypt,
}; };
static void find_benchmark(void) static void find_benchmark(void)

View File

@ -0,0 +1,185 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <argp.h>
#include "bench.h"
#include "crypto_bench.skel.h"
#define MAX_CIPHER_LEN 32
static char *input;
static struct crypto_ctx {
struct crypto_bench *skel;
int pfd;
} ctx;
static struct crypto_args {
u32 crypto_len;
char *crypto_cipher;
} args = {
.crypto_len = 16,
.crypto_cipher = "ecb(aes)",
};
enum {
ARG_CRYPTO_LEN = 5000,
ARG_CRYPTO_CIPHER = 5001,
};
static const struct argp_option opts[] = {
{ "crypto-len", ARG_CRYPTO_LEN, "CRYPTO_LEN", 0,
"Set the length of crypto buffer" },
{ "crypto-cipher", ARG_CRYPTO_CIPHER, "CRYPTO_CIPHER", 0,
"Set the cipher to use (default:ecb(aes))" },
{},
};
static error_t crypto_parse_arg(int key, char *arg, struct argp_state *state)
{
switch (key) {
case ARG_CRYPTO_LEN:
args.crypto_len = strtoul(arg, NULL, 10);
if (!args.crypto_len ||
args.crypto_len > sizeof(ctx.skel->bss->dst)) {
fprintf(stderr, "Invalid crypto buffer len (limit %zu)\n",
sizeof(ctx.skel->bss->dst));
argp_usage(state);
}
break;
case ARG_CRYPTO_CIPHER:
args.crypto_cipher = strdup(arg);
if (!strlen(args.crypto_cipher) ||
strlen(args.crypto_cipher) > MAX_CIPHER_LEN) {
fprintf(stderr, "Invalid crypto cipher len (limit %d)\n",
MAX_CIPHER_LEN);
argp_usage(state);
}
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
const struct argp bench_crypto_argp = {
.options = opts,
.parser = crypto_parse_arg,
};
static void crypto_validate(void)
{
if (env.consumer_cnt != 0) {
fprintf(stderr, "bpf crypto benchmark doesn't support consumer!\n");
exit(1);
}
}
static void crypto_setup(void)
{
LIBBPF_OPTS(bpf_test_run_opts, opts);
int err, pfd;
size_t i, sz;
sz = args.crypto_len;
if (!sz || sz > sizeof(ctx.skel->bss->dst)) {
fprintf(stderr, "invalid encrypt buffer size (source %zu, target %zu)\n",
sz, sizeof(ctx.skel->bss->dst));
exit(1);
}
setup_libbpf();
ctx.skel = crypto_bench__open();
if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
snprintf(ctx.skel->bss->cipher, 128, "%s", args.crypto_cipher);
memcpy(ctx.skel->bss->key, "12345678testtest", 16);
ctx.skel->bss->key_len = 16;
ctx.skel->bss->authsize = 0;
srandom(time(NULL));
input = malloc(sz);
for (i = 0; i < sz - 1; i++)
input[i] = '1' + random() % 9;
input[sz - 1] = '\0';
ctx.skel->rodata->len = args.crypto_len;
err = crypto_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to load skeleton\n");
crypto_bench__destroy(ctx.skel);
exit(1);
}
pfd = bpf_program__fd(ctx.skel->progs.crypto_setup);
if (pfd < 0) {
fprintf(stderr, "failed to get fd for setup prog\n");
crypto_bench__destroy(ctx.skel);
exit(1);
}
err = bpf_prog_test_run_opts(pfd, &opts);
if (err || ctx.skel->bss->status) {
fprintf(stderr, "failed to run setup prog: err %d, status %d\n",
err, ctx.skel->bss->status);
crypto_bench__destroy(ctx.skel);
exit(1);
}
}
static void crypto_encrypt_setup(void)
{
crypto_setup();
ctx.pfd = bpf_program__fd(ctx.skel->progs.crypto_encrypt);
}
static void crypto_decrypt_setup(void)
{
crypto_setup();
ctx.pfd = bpf_program__fd(ctx.skel->progs.crypto_decrypt);
}
static void crypto_measure(struct bench_res *res)
{
res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
}
static void *crypto_producer(void *unused)
{
LIBBPF_OPTS(bpf_test_run_opts, opts,
.repeat = 64,
.data_in = input,
.data_size_in = args.crypto_len,
);
while (true)
(void)bpf_prog_test_run_opts(ctx.pfd, &opts);
return NULL;
}
const struct bench bench_crypto_encrypt = {
.name = "crypto-encrypt",
.argp = &bench_crypto_argp,
.validate = crypto_validate,
.setup = crypto_encrypt_setup,
.producer_thread = crypto_producer,
.measure = crypto_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_crypto_decrypt = {
.name = "crypto-decrypt",
.argp = &bench_crypto_argp,
.validate = crypto_validate,
.setup = crypto_decrypt_setup,
.producer_thread = crypto_producer,
.measure = crypto_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};

View File

@ -1,11 +1,57 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */ /* Copyright (c) 2020 Facebook */
#define _GNU_SOURCE #define _GNU_SOURCE
#include <argp.h>
#include <unistd.h> #include <unistd.h>
#include <stdint.h>
#include "bench.h" #include "bench.h"
#include "trigger_bench.skel.h" #include "trigger_bench.skel.h"
#include "trace_helpers.h" #include "trace_helpers.h"
#define MAX_TRIG_BATCH_ITERS 1000
static struct {
__u32 batch_iters;
} args = {
.batch_iters = 100,
};
enum {
ARG_TRIG_BATCH_ITERS = 7000,
};
static const struct argp_option opts[] = {
{ "trig-batch-iters", ARG_TRIG_BATCH_ITERS, "BATCH_ITER_CNT", 0,
"Number of in-kernel iterations per one driver test run"},
{},
};
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
long ret;
switch (key) {
case ARG_TRIG_BATCH_ITERS:
ret = strtol(arg, NULL, 10);
if (ret < 1 || ret > MAX_TRIG_BATCH_ITERS) {
fprintf(stderr, "invalid --trig-batch-iters value (should be between %d and %d)\n",
1, MAX_TRIG_BATCH_ITERS);
argp_usage(state);
}
args.batch_iters = ret;
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
const struct argp bench_trigger_batch_argp = {
.options = opts,
.parser = parse_arg,
};
/* adjust slot shift in inc_hits() if changing */ /* adjust slot shift in inc_hits() if changing */
#define MAX_BUCKETS 256 #define MAX_BUCKETS 256
@ -14,6 +60,8 @@
/* BPF triggering benchmarks */ /* BPF triggering benchmarks */
static struct trigger_ctx { static struct trigger_ctx {
struct trigger_bench *skel; struct trigger_bench *skel;
bool usermode_counters;
int driver_prog_fd;
} ctx; } ctx;
static struct counter base_hits[MAX_BUCKETS]; static struct counter base_hits[MAX_BUCKETS];
@ -51,41 +99,63 @@ static void trigger_validate(void)
} }
} }
static void *trigger_base_producer(void *input) static void *trigger_producer(void *input)
{ {
while (true) { if (ctx.usermode_counters) {
(void)syscall(__NR_getpgid); while (true) {
inc_counter(base_hits); (void)syscall(__NR_getpgid);
inc_counter(base_hits);
}
} else {
while (true)
(void)syscall(__NR_getpgid);
} }
return NULL; return NULL;
} }
static void trigger_base_measure(struct bench_res *res) static void *trigger_producer_batch(void *input)
{ {
res->hits = sum_and_reset_counters(base_hits); int fd = ctx.driver_prog_fd ?: bpf_program__fd(ctx.skel->progs.trigger_driver);
}
static void *trigger_producer(void *input)
{
while (true) while (true)
(void)syscall(__NR_getpgid); bpf_prog_test_run_opts(fd, NULL);
return NULL; return NULL;
} }
static void trigger_measure(struct bench_res *res) static void trigger_measure(struct bench_res *res)
{ {
res->hits = sum_and_reset_counters(ctx.skel->bss->hits); if (ctx.usermode_counters)
res->hits = sum_and_reset_counters(base_hits);
else
res->hits = sum_and_reset_counters(ctx.skel->bss->hits);
} }
static void setup_ctx(void) static void setup_ctx(void)
{ {
setup_libbpf(); setup_libbpf();
ctx.skel = trigger_bench__open_and_load(); ctx.skel = trigger_bench__open();
if (!ctx.skel) { if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n"); fprintf(stderr, "failed to open skeleton\n");
exit(1); exit(1);
} }
/* default "driver" BPF program */
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, true);
ctx.skel->rodata->batch_iters = args.batch_iters;
}
static void load_ctx(void)
{
int err;
err = trigger_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to open skeleton\n");
exit(1);
}
} }
static void attach_bpf(struct bpf_program *prog) static void attach_bpf(struct bpf_program *prog)
@ -99,66 +169,106 @@ static void attach_bpf(struct bpf_program *prog)
} }
} }
static void trigger_tp_setup(void) static void trigger_syscall_count_setup(void)
{ {
setup_ctx(); ctx.usermode_counters = true;
attach_bpf(ctx.skel->progs.bench_trigger_tp);
} }
static void trigger_rawtp_setup(void) /* Batched, staying mostly in-kernel triggering setups */
static void trigger_kernel_count_setup(void)
{ {
setup_ctx(); setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_raw_tp); bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_count, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_count);
} }
static void trigger_kprobe_setup(void) static void trigger_kprobe_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kprobe, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe); attach_bpf(ctx.skel->progs.bench_trigger_kprobe);
} }
static void trigger_kretprobe_setup(void) static void trigger_kretprobe_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kretprobe, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kretprobe); attach_bpf(ctx.skel->progs.bench_trigger_kretprobe);
} }
static void trigger_kprobe_multi_setup(void) static void trigger_kprobe_multi_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kprobe_multi, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe_multi); attach_bpf(ctx.skel->progs.bench_trigger_kprobe_multi);
} }
static void trigger_kretprobe_multi_setup(void) static void trigger_kretprobe_multi_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_kretprobe_multi, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kretprobe_multi); attach_bpf(ctx.skel->progs.bench_trigger_kretprobe_multi);
} }
static void trigger_fentry_setup(void) static void trigger_fentry_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fentry, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry); attach_bpf(ctx.skel->progs.bench_trigger_fentry);
} }
static void trigger_fexit_setup(void) static void trigger_fexit_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fexit, true);
load_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fexit); attach_bpf(ctx.skel->progs.bench_trigger_fexit);
} }
static void trigger_fentry_sleep_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep);
}
static void trigger_fmodret_setup(void) static void trigger_fmodret_setup(void)
{ {
setup_ctx(); setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_fmodret, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_fmodret); attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
} }
static void trigger_tp_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_tp, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_tp);
}
static void trigger_rawtp_setup(void)
{
setup_ctx();
bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
bpf_program__set_autoload(ctx.skel->progs.trigger_driver_kfunc, true);
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_rawtp, true);
load_ctx();
/* override driver program */
ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_driver_kfunc);
attach_bpf(ctx.skel->progs.bench_trigger_rawtp);
}
/* make sure call is not inlined and not avoided by compiler, so __weak and /* make sure call is not inlined and not avoided by compiler, so __weak and
* inline asm volatile in the body of the function * inline asm volatile in the body of the function
* *
@ -192,7 +302,7 @@ __nocf_check __weak void uprobe_target_ret(void)
asm volatile (""); asm volatile ("");
} }
static void *uprobe_base_producer(void *input) static void *uprobe_producer_count(void *input)
{ {
while (true) { while (true) {
uprobe_target_nop(); uprobe_target_nop();
@ -226,15 +336,24 @@ static void usetup(bool use_retprobe, void *target_addr)
{ {
size_t uprobe_offset; size_t uprobe_offset;
struct bpf_link *link; struct bpf_link *link;
int err;
setup_libbpf(); setup_libbpf();
ctx.skel = trigger_bench__open_and_load(); ctx.skel = trigger_bench__open();
if (!ctx.skel) { if (!ctx.skel) {
fprintf(stderr, "failed to open skeleton\n"); fprintf(stderr, "failed to open skeleton\n");
exit(1); exit(1);
} }
bpf_program__set_autoload(ctx.skel->progs.bench_trigger_uprobe, true);
err = trigger_bench__load(ctx.skel);
if (err) {
fprintf(stderr, "failed to load skeleton\n");
exit(1);
}
uprobe_offset = get_uprobe_offset(target_addr); uprobe_offset = get_uprobe_offset(target_addr);
link = bpf_program__attach_uprobe(ctx.skel->progs.bench_trigger_uprobe, link = bpf_program__attach_uprobe(ctx.skel->progs.bench_trigger_uprobe,
use_retprobe, use_retprobe,
@ -248,204 +367,90 @@ static void usetup(bool use_retprobe, void *target_addr)
ctx.skel->links.bench_trigger_uprobe = link; ctx.skel->links.bench_trigger_uprobe = link;
} }
static void uprobe_setup_nop(void) static void usermode_count_setup(void)
{
ctx.usermode_counters = true;
}
static void uprobe_nop_setup(void)
{ {
usetup(false, &uprobe_target_nop); usetup(false, &uprobe_target_nop);
} }
static void uretprobe_setup_nop(void) static void uretprobe_nop_setup(void)
{ {
usetup(true, &uprobe_target_nop); usetup(true, &uprobe_target_nop);
} }
static void uprobe_setup_push(void) static void uprobe_push_setup(void)
{ {
usetup(false, &uprobe_target_push); usetup(false, &uprobe_target_push);
} }
static void uretprobe_setup_push(void) static void uretprobe_push_setup(void)
{ {
usetup(true, &uprobe_target_push); usetup(true, &uprobe_target_push);
} }
static void uprobe_setup_ret(void) static void uprobe_ret_setup(void)
{ {
usetup(false, &uprobe_target_ret); usetup(false, &uprobe_target_ret);
} }
static void uretprobe_setup_ret(void) static void uretprobe_ret_setup(void)
{ {
usetup(true, &uprobe_target_ret); usetup(true, &uprobe_target_ret);
} }
const struct bench bench_trig_base = { const struct bench bench_trig_syscall_count = {
.name = "trig-base", .name = "trig-syscall-count",
.validate = trigger_validate, .validate = trigger_validate,
.producer_thread = trigger_base_producer, .setup = trigger_syscall_count_setup,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_tp = {
.name = "trig-tp",
.validate = trigger_validate,
.setup = trigger_tp_setup,
.producer_thread = trigger_producer, .producer_thread = trigger_producer,
.measure = trigger_measure, .measure = trigger_measure,
.report_progress = hits_drops_report_progress, .report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final, .report_final = hits_drops_report_final,
}; };
const struct bench bench_trig_rawtp = { /* batched (staying mostly in kernel) kprobe/fentry benchmarks */
.name = "trig-rawtp", #define BENCH_TRIG_KERNEL(KIND, NAME) \
.validate = trigger_validate, const struct bench bench_trig_##KIND = { \
.setup = trigger_rawtp_setup, .name = "trig-" NAME, \
.producer_thread = trigger_producer, .setup = trigger_##KIND##_setup, \
.measure = trigger_measure, .producer_thread = trigger_producer_batch, \
.report_progress = hits_drops_report_progress, .measure = trigger_measure, \
.report_final = hits_drops_report_final, .report_progress = hits_drops_report_progress, \
}; .report_final = hits_drops_report_final, \
.argp = &bench_trigger_batch_argp, \
}
const struct bench bench_trig_kprobe = { BENCH_TRIG_KERNEL(kernel_count, "kernel-count");
.name = "trig-kprobe", BENCH_TRIG_KERNEL(kprobe, "kprobe");
.validate = trigger_validate, BENCH_TRIG_KERNEL(kretprobe, "kretprobe");
.setup = trigger_kprobe_setup, BENCH_TRIG_KERNEL(kprobe_multi, "kprobe-multi");
.producer_thread = trigger_producer, BENCH_TRIG_KERNEL(kretprobe_multi, "kretprobe-multi");
.measure = trigger_measure, BENCH_TRIG_KERNEL(fentry, "fentry");
.report_progress = hits_drops_report_progress, BENCH_TRIG_KERNEL(fexit, "fexit");
.report_final = hits_drops_report_final, BENCH_TRIG_KERNEL(fmodret, "fmodret");
}; BENCH_TRIG_KERNEL(tp, "tp");
BENCH_TRIG_KERNEL(rawtp, "rawtp");
const struct bench bench_trig_kretprobe = { /* uprobe benchmarks */
.name = "trig-kretprobe", #define BENCH_TRIG_USERMODE(KIND, PRODUCER, NAME) \
.validate = trigger_validate, const struct bench bench_trig_##KIND = { \
.setup = trigger_kretprobe_setup, .name = "trig-" NAME, \
.producer_thread = trigger_producer, .validate = trigger_validate, \
.measure = trigger_measure, .setup = KIND##_setup, \
.report_progress = hits_drops_report_progress, .producer_thread = uprobe_producer_##PRODUCER, \
.report_final = hits_drops_report_final, .measure = trigger_measure, \
}; .report_progress = hits_drops_report_progress, \
.report_final = hits_drops_report_final, \
}
const struct bench bench_trig_kprobe_multi = { BENCH_TRIG_USERMODE(usermode_count, count, "usermode-count");
.name = "trig-kprobe-multi", BENCH_TRIG_USERMODE(uprobe_nop, nop, "uprobe-nop");
.validate = trigger_validate, BENCH_TRIG_USERMODE(uprobe_push, push, "uprobe-push");
.setup = trigger_kprobe_multi_setup, BENCH_TRIG_USERMODE(uprobe_ret, ret, "uprobe-ret");
.producer_thread = trigger_producer, BENCH_TRIG_USERMODE(uretprobe_nop, nop, "uretprobe-nop");
.measure = trigger_measure, BENCH_TRIG_USERMODE(uretprobe_push, push, "uretprobe-push");
.report_progress = hits_drops_report_progress, BENCH_TRIG_USERMODE(uretprobe_ret, ret, "uretprobe-ret");
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_kretprobe_multi = {
.name = "trig-kretprobe-multi",
.validate = trigger_validate,
.setup = trigger_kretprobe_multi_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry = {
.name = "trig-fentry",
.validate = trigger_validate,
.setup = trigger_fentry_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fexit = {
.name = "trig-fexit",
.validate = trigger_validate,
.setup = trigger_fexit_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fentry_sleep = {
.name = "trig-fentry-sleep",
.validate = trigger_validate,
.setup = trigger_fentry_sleep_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_fmodret = {
.name = "trig-fmodret",
.validate = trigger_validate,
.setup = trigger_fmodret_setup,
.producer_thread = trigger_producer,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_base = {
.name = "trig-uprobe-base",
.setup = NULL, /* no uprobe/uretprobe is attached */
.producer_thread = uprobe_base_producer,
.measure = trigger_base_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_nop = {
.name = "trig-uprobe-nop",
.setup = uprobe_setup_nop,
.producer_thread = uprobe_producer_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_nop = {
.name = "trig-uretprobe-nop",
.setup = uretprobe_setup_nop,
.producer_thread = uprobe_producer_nop,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_push = {
.name = "trig-uprobe-push",
.setup = uprobe_setup_push,
.producer_thread = uprobe_producer_push,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_push = {
.name = "trig-uretprobe-push",
.setup = uretprobe_setup_push,
.producer_thread = uprobe_producer_push,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uprobe_ret = {
.name = "trig-uprobe-ret",
.setup = uprobe_setup_ret,
.producer_thread = uprobe_producer_ret,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};
const struct bench bench_trig_uretprobe_ret = {
.name = "trig-uretprobe-ret",
.setup = uretprobe_setup_ret,
.producer_thread = uprobe_producer_ret,
.measure = trigger_measure,
.report_progress = hits_drops_report_progress,
.report_final = hits_drops_report_final,
};

View File

@ -2,8 +2,22 @@
set -eufo pipefail set -eufo pipefail
for i in base tp rawtp kprobe fentry fmodret def_tests=( \
do usermode-count kernel-count syscall-count \
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-) fentry fexit fmodret \
printf "%-10s: %s\n" $i "$summary" rawtp tp \
kprobe kprobe-multi \
kretprobe kretprobe-multi \
)
tests=("$@")
if [ ${#tests[@]} -eq 0 ]; then
tests=("${def_tests[@]}")
fi
p=${PROD_CNT:-1}
for t in "${tests[@]}"; do
summary=$(sudo ./bench -w2 -d5 -a -p$p trig-$t | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $t "$summary"
done done

View File

@ -2,7 +2,7 @@
set -eufo pipefail set -eufo pipefail
for i in base {uprobe,uretprobe}-{nop,push,ret} for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret}
do do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-) summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $i "$summary" printf "%-15s: %s\n" $i "$summary"

View File

@ -326,6 +326,16 @@ l_true: \
}) })
#endif #endif
#ifdef __BPF_FEATURE_MAY_GOTO
#define cond_break \
({ __label__ l_break, l_continue; \
asm volatile goto("may_goto %l[l_break]" \
:::: l_break); \
goto l_continue; \
l_break: break; \
l_continue:; \
})
#else
#define cond_break \ #define cond_break \
({ __label__ l_break, l_continue; \ ({ __label__ l_break, l_continue; \
asm volatile goto("1:.byte 0xe5; \ asm volatile goto("1:.byte 0xe5; \
@ -337,6 +347,7 @@ l_true: \
l_break: break; \ l_break: break; \
l_continue:; \ l_continue:; \
}) })
#endif
#ifndef bpf_nop_mov #ifndef bpf_nop_mov
#define bpf_nop_mov(var) \ #define bpf_nop_mov(var) \
@ -386,6 +397,28 @@ l_true: \
, [as]"i"((dst_as << 16) | src_as)); , [as]"i"((dst_as << 16) | src_as));
#endif #endif
void bpf_preempt_disable(void) __weak __ksym;
void bpf_preempt_enable(void) __weak __ksym;
typedef struct {
} __bpf_preempt_t;
static inline __bpf_preempt_t __bpf_preempt_constructor(void)
{
__bpf_preempt_t ret = {};
bpf_preempt_disable();
return ret;
}
static inline void __bpf_preempt_destructor(__bpf_preempt_t *t)
{
bpf_preempt_enable();
}
#define bpf_guard_preempt() \
__bpf_preempt_t ___bpf_apply(preempt, __COUNTER__) \
__attribute__((__unused__, __cleanup__(__bpf_preempt_destructor))) = \
__bpf_preempt_constructor()
/* Description /* Description
* Assert that a conditional expression is true. * Assert that a conditional expression is true.
* Returns * Returns
@ -459,4 +492,11 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym;
extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags__k, void *aux__ign) __ksym;
#define bpf_wq_set_callback(timer, cb, flags) \
bpf_wq_set_callback_impl(timer, cb, flags, NULL)
#endif #endif

View File

@ -494,6 +494,10 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
return arg; return arg;
} }
__bpf_kfunc void bpf_kfunc_call_test_sleepable(void)
{
}
BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids) BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc) BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
BTF_ID_FLAGS(func, bpf_kfunc_call_test1) BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@ -520,6 +524,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_sleepable, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids) BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids)
static int bpf_testmod_ops_init(struct btf *btf) static int bpf_testmod_ops_init(struct btf *btf)

View File

@ -96,6 +96,7 @@ void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) __ksym;
void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym; void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym;
void bpf_kfunc_call_test_destructive(void) __ksym; void bpf_kfunc_call_test_destructive(void) __ksym;
void bpf_kfunc_call_test_sleepable(void) __ksym;
void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p); void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p);
struct prog_test_member *bpf_kfunc_call_memb_acquire(void); struct prog_test_member *bpf_kfunc_call_memb_acquire(void);

View File

@ -429,7 +429,7 @@ int create_and_get_cgroup(const char *relative_path)
* which is an invalid cgroup id. * which is an invalid cgroup id.
* If there is a failure, it prints the error to stderr. * If there is a failure, it prints the error to stderr.
*/ */
unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir) static unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
{ {
int dirfd, err, flags, mount_id, fhsize; int dirfd, err, flags, mount_id, fhsize;
union { union {

View File

@ -13,7 +13,12 @@ CONFIG_BPF_SYSCALL=y
CONFIG_CGROUP_BPF=y CONFIG_CGROUP_BPF=y
CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_HASH=y CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_AES=y
CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_BTF=y CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_DWARF4=y CONFIG_DEBUG_INFO_DWARF4=y
@ -88,3 +93,5 @@ CONFIG_VSOCKETS=y
CONFIG_VXLAN=y CONFIG_VXLAN=y
CONFIG_XDP_SOCKETS=y CONFIG_XDP_SOCKETS=y
CONFIG_XFRM_INTERFACE=y CONFIG_XFRM_INTERFACE=y
CONFIG_TCP_CONG_DCTCP=y
CONFIG_TCP_CONG_BBR=y

View File

@ -52,6 +52,8 @@ struct ipv6_packet pkt_v6 = {
.tcp.doff = 5, .tcp.doff = 5,
}; };
static const struct network_helper_opts default_opts;
int settimeo(int fd, int timeout_ms) int settimeo(int fd, int timeout_ms)
{ {
struct timeval timeout = { .tv_sec = 3 }; struct timeval timeout = { .tv_sec = 3 };
@ -185,6 +187,16 @@ close_fds:
return NULL; return NULL;
} }
int start_server_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts)
{
if (!opts)
opts = &default_opts;
return __start_server(type, 0, (struct sockaddr *)addr, len,
opts->timeout_ms, 0);
}
void free_fds(int *fds, unsigned int nr_close_fds) void free_fds(int *fds, unsigned int nr_close_fds)
{ {
if (fds) { if (fds) {
@ -258,17 +270,24 @@ static int connect_fd_to_addr(int fd,
return 0; return 0;
} }
int connect_to_addr(const struct sockaddr_storage *addr, socklen_t addrlen, int type) int connect_to_addr(int type, const struct sockaddr_storage *addr, socklen_t addrlen,
const struct network_helper_opts *opts)
{ {
int fd; int fd;
fd = socket(addr->ss_family, type, 0); if (!opts)
opts = &default_opts;
fd = socket(addr->ss_family, type, opts->proto);
if (fd < 0) { if (fd < 0) {
log_err("Failed to create client socket"); log_err("Failed to create client socket");
return -1; return -1;
} }
if (connect_fd_to_addr(fd, addr, addrlen, false)) if (settimeo(fd, opts->timeout_ms))
goto error_close;
if (connect_fd_to_addr(fd, addr, addrlen, opts->must_fail))
goto error_close; goto error_close;
return fd; return fd;
@ -278,8 +297,6 @@ error_close:
return -1; return -1;
} }
static const struct network_helper_opts default_opts;
int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts) int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
{ {
struct sockaddr_storage addr; struct sockaddr_storage addr;
@ -442,25 +459,35 @@ struct nstoken *open_netns(const char *name)
struct nstoken *token; struct nstoken *token;
token = calloc(1, sizeof(struct nstoken)); token = calloc(1, sizeof(struct nstoken));
if (!ASSERT_OK_PTR(token, "malloc token")) if (!token) {
log_err("Failed to malloc token");
return NULL; return NULL;
}
token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net")) if (token->orig_netns_fd == -1) {
log_err("Failed to open(/proc/self/ns/net)");
goto fail; goto fail;
}
snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name); snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
nsfd = open(nspath, O_RDONLY | O_CLOEXEC); nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
if (!ASSERT_GE(nsfd, 0, "open netns fd")) if (nsfd == -1) {
log_err("Failed to open(%s)", nspath);
goto fail; goto fail;
}
err = setns(nsfd, CLONE_NEWNET); err = setns(nsfd, CLONE_NEWNET);
close(nsfd); close(nsfd);
if (!ASSERT_OK(err, "setns")) if (err) {
log_err("Failed to setns(nsfd)");
goto fail; goto fail;
}
return token; return token;
fail: fail:
if (token->orig_netns_fd != -1)
close(token->orig_netns_fd);
free(token); free(token);
return NULL; return NULL;
} }
@ -470,7 +497,8 @@ void close_netns(struct nstoken *token)
if (!token) if (!token)
return; return;
ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns"); if (setns(token->orig_netns_fd, CLONE_NEWNET))
log_err("Failed to setns(orig_netns_fd)");
close(token->orig_netns_fd); close(token->orig_netns_fd);
free(token); free(token);
} }
@ -497,3 +525,153 @@ int get_socket_local_port(int sock_fd)
return -1; return -1;
} }
int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
{
struct ifreq ifr = {0};
int sockfd, err;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sockfd < 0)
return -errno;
memcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name));
ring_param->cmd = ETHTOOL_GRINGPARAM;
ifr.ifr_data = (char *)ring_param;
if (ioctl(sockfd, SIOCETHTOOL, &ifr) < 0) {
err = errno;
close(sockfd);
return -err;
}
close(sockfd);
return 0;
}
int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param)
{
struct ifreq ifr = {0};
int sockfd, err;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sockfd < 0)
return -errno;
memcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name));
ring_param->cmd = ETHTOOL_SRINGPARAM;
ifr.ifr_data = (char *)ring_param;
if (ioctl(sockfd, SIOCETHTOOL, &ifr) < 0) {
err = errno;
close(sockfd);
return -err;
}
close(sockfd);
return 0;
}
struct send_recv_arg {
int fd;
uint32_t bytes;
int stop;
};
static void *send_recv_server(void *arg)
{
struct send_recv_arg *a = (struct send_recv_arg *)arg;
ssize_t nr_sent = 0, bytes = 0;
char batch[1500];
int err = 0, fd;
fd = accept(a->fd, NULL, NULL);
while (fd == -1) {
if (errno == EINTR)
continue;
err = -errno;
goto done;
}
if (settimeo(fd, 0)) {
err = -errno;
goto done;
}
while (bytes < a->bytes && !READ_ONCE(a->stop)) {
nr_sent = send(fd, &batch,
MIN(a->bytes - bytes, sizeof(batch)), 0);
if (nr_sent == -1 && errno == EINTR)
continue;
if (nr_sent == -1) {
err = -errno;
break;
}
bytes += nr_sent;
}
if (bytes != a->bytes) {
log_err("send %zd expected %u", bytes, a->bytes);
if (!err)
err = bytes > a->bytes ? -E2BIG : -EINTR;
}
done:
if (fd >= 0)
close(fd);
if (err) {
WRITE_ONCE(a->stop, 1);
return ERR_PTR(err);
}
return NULL;
}
int send_recv_data(int lfd, int fd, uint32_t total_bytes)
{
ssize_t nr_recv = 0, bytes = 0;
struct send_recv_arg arg = {
.fd = lfd,
.bytes = total_bytes,
.stop = 0,
};
pthread_t srv_thread;
void *thread_ret;
char batch[1500];
int err = 0;
err = pthread_create(&srv_thread, NULL, send_recv_server, (void *)&arg);
if (err) {
log_err("Failed to pthread_create");
return err;
}
/* recv total_bytes */
while (bytes < total_bytes && !READ_ONCE(arg.stop)) {
nr_recv = recv(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_recv == -1 && errno == EINTR)
continue;
if (nr_recv == -1) {
err = -errno;
break;
}
bytes += nr_recv;
}
if (bytes != total_bytes) {
log_err("recv %zd expected %u", bytes, total_bytes);
if (!err)
err = bytes > total_bytes ? -E2BIG : -EINTR;
}
WRITE_ONCE(arg.stop, 1);
pthread_join(srv_thread, &thread_ret);
if (IS_ERR(thread_ret)) {
log_err("Failed in thread_ret %ld", PTR_ERR(thread_ret));
err = err ? : PTR_ERR(thread_ret);
}
return err;
}

View File

@ -9,8 +9,12 @@ typedef __u16 __sum16;
#include <linux/if_packet.h> #include <linux/if_packet.h>
#include <linux/ip.h> #include <linux/ip.h>
#include <linux/ipv6.h> #include <linux/ipv6.h>
#include <linux/ethtool.h>
#include <linux/sockios.h>
#include <linux/err.h>
#include <netinet/tcp.h> #include <netinet/tcp.h>
#include <bpf/bpf_endian.h> #include <bpf/bpf_endian.h>
#include <net/if.h>
#define MAGIC_VAL 0x1234 #define MAGIC_VAL 0x1234
#define NUM_ITER 100000 #define NUM_ITER 100000
@ -50,8 +54,11 @@ int start_mptcp_server(int family, const char *addr, __u16 port,
int *start_reuseport_server(int family, int type, const char *addr_str, int *start_reuseport_server(int family, int type, const char *addr_str,
__u16 port, int timeout_ms, __u16 port, int timeout_ms,
unsigned int nr_listens); unsigned int nr_listens);
int start_server_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts);
void free_fds(int *fds, unsigned int nr_close_fds); void free_fds(int *fds, unsigned int nr_close_fds);
int connect_to_addr(const struct sockaddr_storage *addr, socklen_t len, int type); int connect_to_addr(int type, const struct sockaddr_storage *addr, socklen_t len,
const struct network_helper_opts *opts);
int connect_to_fd(int server_fd, int timeout_ms); int connect_to_fd(int server_fd, int timeout_ms);
int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts); int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts);
int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms); int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
@ -61,6 +68,8 @@ int make_sockaddr(int family, const char *addr_str, __u16 port,
struct sockaddr_storage *addr, socklen_t *len); struct sockaddr_storage *addr, socklen_t *len);
char *ping_command(int family); char *ping_command(int family);
int get_socket_local_port(int sock_fd); int get_socket_local_port(int sock_fd);
int get_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
int set_hw_ring_size(char *ifname, struct ethtool_ringparam *ring_param);
struct nstoken; struct nstoken;
/** /**
@ -71,6 +80,7 @@ struct nstoken;
*/ */
struct nstoken *open_netns(const char *name); struct nstoken *open_netns(const char *name);
void close_netns(struct nstoken *token); void close_netns(struct nstoken *token);
int send_recv_data(int lfd, int fd, uint32_t total_bytes);
static __u16 csum_fold(__u32 csum) static __u16 csum_fold(__u32 csum)
{ {

View File

@ -0,0 +1,186 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <test_progs.h>
#include "arena_atomics.skel.h"
static void test_add(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.add);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->add64_value, 3, "add64_value");
ASSERT_EQ(skel->arena->add64_result, 1, "add64_result");
ASSERT_EQ(skel->arena->add32_value, 3, "add32_value");
ASSERT_EQ(skel->arena->add32_result, 1, "add32_result");
ASSERT_EQ(skel->arena->add_stack_value_copy, 3, "add_stack_value");
ASSERT_EQ(skel->arena->add_stack_result, 1, "add_stack_result");
ASSERT_EQ(skel->arena->add_noreturn_value, 3, "add_noreturn_value");
}
static void test_sub(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.sub);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->sub64_value, -1, "sub64_value");
ASSERT_EQ(skel->arena->sub64_result, 1, "sub64_result");
ASSERT_EQ(skel->arena->sub32_value, -1, "sub32_value");
ASSERT_EQ(skel->arena->sub32_result, 1, "sub32_result");
ASSERT_EQ(skel->arena->sub_stack_value_copy, -1, "sub_stack_value");
ASSERT_EQ(skel->arena->sub_stack_result, 1, "sub_stack_result");
ASSERT_EQ(skel->arena->sub_noreturn_value, -1, "sub_noreturn_value");
}
static void test_and(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.and);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->and64_value, 0x010ull << 32, "and64_value");
ASSERT_EQ(skel->arena->and32_value, 0x010, "and32_value");
}
static void test_or(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.or);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->or64_value, 0x111ull << 32, "or64_value");
ASSERT_EQ(skel->arena->or32_value, 0x111, "or32_value");
}
static void test_xor(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.xor);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->xor64_value, 0x101ull << 32, "xor64_value");
ASSERT_EQ(skel->arena->xor32_value, 0x101, "xor32_value");
}
static void test_cmpxchg(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.cmpxchg);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->cmpxchg64_value, 2, "cmpxchg64_value");
ASSERT_EQ(skel->arena->cmpxchg64_result_fail, 1, "cmpxchg_result_fail");
ASSERT_EQ(skel->arena->cmpxchg64_result_succeed, 1, "cmpxchg_result_succeed");
ASSERT_EQ(skel->arena->cmpxchg32_value, 2, "lcmpxchg32_value");
ASSERT_EQ(skel->arena->cmpxchg32_result_fail, 1, "cmpxchg_result_fail");
ASSERT_EQ(skel->arena->cmpxchg32_result_succeed, 1, "cmpxchg_result_succeed");
}
static void test_xchg(struct arena_atomics *skel)
{
LIBBPF_OPTS(bpf_test_run_opts, topts);
int err, prog_fd;
/* No need to attach it, just run it directly */
prog_fd = bpf_program__fd(skel->progs.xchg);
err = bpf_prog_test_run_opts(prog_fd, &topts);
if (!ASSERT_OK(err, "test_run_opts err"))
return;
if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
return;
ASSERT_EQ(skel->arena->xchg64_value, 2, "xchg64_value");
ASSERT_EQ(skel->arena->xchg64_result, 1, "xchg64_result");
ASSERT_EQ(skel->arena->xchg32_value, 2, "xchg32_value");
ASSERT_EQ(skel->arena->xchg32_result, 1, "xchg32_result");
}
void test_arena_atomics(void)
{
struct arena_atomics *skel;
int err;
skel = arena_atomics__open();
if (!ASSERT_OK_PTR(skel, "arena atomics skeleton open"))
return;
if (skel->data->skip_tests) {
printf("%s:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang",
__func__);
test__skip();
goto cleanup;
}
err = arena_atomics__load(skel);
if (!ASSERT_OK(err, "arena atomics skeleton load"))
return;
skel->bss->pid = getpid();
if (test__start_subtest("add"))
test_add(skel);
if (test__start_subtest("sub"))
test_sub(skel);
if (test__start_subtest("and"))
test_and(skel);
if (test__start_subtest("or"))
test_or(skel);
if (test__start_subtest("xor"))
test_xor(skel);
if (test__start_subtest("cmpxchg"))
test_cmpxchg(skel);
if (test__start_subtest("xchg"))
test_xchg(skel);
cleanup:
arena_atomics__destroy(skel);
}

View File

@ -13,6 +13,7 @@
#include "tcp_ca_write_sk_pacing.skel.h" #include "tcp_ca_write_sk_pacing.skel.h"
#include "tcp_ca_incompl_cong_ops.skel.h" #include "tcp_ca_incompl_cong_ops.skel.h"
#include "tcp_ca_unsupp_cong_op.skel.h" #include "tcp_ca_unsupp_cong_op.skel.h"
#include "tcp_ca_kfunc.skel.h"
#ifndef ENOTSUPP #ifndef ENOTSUPP
#define ENOTSUPP 524 #define ENOTSUPP 524
@ -20,7 +21,6 @@
static const unsigned int total_bytes = 10 * 1024 * 1024; static const unsigned int total_bytes = 10 * 1024 * 1024;
static int expected_stg = 0xeB9F; static int expected_stg = 0xeB9F;
static int stop;
static int settcpca(int fd, const char *tcp_ca) static int settcpca(int fd, const char *tcp_ca)
{ {
@ -33,62 +33,11 @@ static int settcpca(int fd, const char *tcp_ca)
return 0; return 0;
} }
static void *server(void *arg)
{
int lfd = (int)(long)arg, err = 0, fd;
ssize_t nr_sent = 0, bytes = 0;
char batch[1500];
fd = accept(lfd, NULL, NULL);
while (fd == -1) {
if (errno == EINTR)
continue;
err = -errno;
goto done;
}
if (settimeo(fd, 0)) {
err = -errno;
goto done;
}
while (bytes < total_bytes && !READ_ONCE(stop)) {
nr_sent = send(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_sent == -1 && errno == EINTR)
continue;
if (nr_sent == -1) {
err = -errno;
break;
}
bytes += nr_sent;
}
ASSERT_EQ(bytes, total_bytes, "send");
done:
if (fd >= 0)
close(fd);
if (err) {
WRITE_ONCE(stop, 1);
return ERR_PTR(err);
}
return NULL;
}
static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map) static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
{ {
struct sockaddr_in6 sa6 = {};
ssize_t nr_recv = 0, bytes = 0;
int lfd = -1, fd = -1; int lfd = -1, fd = -1;
pthread_t srv_thread;
socklen_t addrlen = sizeof(sa6);
void *thread_ret;
char batch[1500];
int err; int err;
WRITE_ONCE(stop, 0);
lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); lfd = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
if (!ASSERT_NEQ(lfd, -1, "socket")) if (!ASSERT_NEQ(lfd, -1, "socket"))
return; return;
@ -99,12 +48,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
return; return;
} }
if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca) || if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca))
settimeo(lfd, 0) || settimeo(fd, 0))
goto done;
err = getsockname(lfd, (struct sockaddr *)&sa6, &addrlen);
if (!ASSERT_NEQ(err, -1, "getsockname"))
goto done; goto done;
if (sk_stg_map) { if (sk_stg_map) {
@ -115,7 +59,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
} }
/* connect to server */ /* connect to server */
err = connect(fd, (struct sockaddr *)&sa6, addrlen); err = connect_fd_to_fd(fd, lfd, 0);
if (!ASSERT_NEQ(err, -1, "connect")) if (!ASSERT_NEQ(err, -1, "connect"))
goto done; goto done;
@ -129,26 +73,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
goto done; goto done;
} }
err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); ASSERT_OK(send_recv_data(lfd, fd, total_bytes), "send_recv_data");
if (!ASSERT_OK(err, "pthread_create"))
goto done;
/* recv total_bytes */
while (bytes < total_bytes && !READ_ONCE(stop)) {
nr_recv = recv(fd, &batch,
MIN(total_bytes - bytes, sizeof(batch)), 0);
if (nr_recv == -1 && errno == EINTR)
continue;
if (nr_recv == -1)
break;
bytes += nr_recv;
}
ASSERT_EQ(bytes, total_bytes, "recv");
WRITE_ONCE(stop, 1);
pthread_join(srv_thread, &thread_ret);
ASSERT_OK(IS_ERR(thread_ret), "thread_ret");
done: done:
close(lfd); close(lfd);
@ -304,7 +229,7 @@ static void test_rel_setsockopt(void)
struct bpf_dctcp_release *rel_skel; struct bpf_dctcp_release *rel_skel;
libbpf_print_fn_t old_print_fn; libbpf_print_fn_t old_print_fn;
err_str = "unknown func bpf_setsockopt"; err_str = "program of this type cannot use helper bpf_setsockopt";
found = false; found = false;
old_print_fn = libbpf_set_print(libbpf_debug_print); old_print_fn = libbpf_set_print(libbpf_debug_print);
@ -518,6 +443,15 @@ static void test_link_replace(void)
tcp_ca_update__destroy(skel); tcp_ca_update__destroy(skel);
} }
static void test_tcp_ca_kfunc(void)
{
struct tcp_ca_kfunc *skel;
skel = tcp_ca_kfunc__open_and_load();
ASSERT_OK_PTR(skel, "tcp_ca_kfunc__open_and_load");
tcp_ca_kfunc__destroy(skel);
}
void test_bpf_tcp_ca(void) void test_bpf_tcp_ca(void)
{ {
if (test__start_subtest("dctcp")) if (test__start_subtest("dctcp"))
@ -546,4 +480,6 @@ void test_bpf_tcp_ca(void)
test_multi_links(); test_multi_links();
if (test__start_subtest("link_replace")) if (test__start_subtest("link_replace"))
test_link_replace(); test_link_replace();
if (test__start_subtest("tcp_ca_kfunc"))
test_tcp_ca_kfunc();
} }

View File

@ -10,6 +10,7 @@
#include <netinet/tcp.h> #include <netinet/tcp.h>
#include <test_progs.h> #include <test_progs.h>
#include "network_helpers.h"
#include "progs/test_cls_redirect.h" #include "progs/test_cls_redirect.h"
#include "test_cls_redirect.skel.h" #include "test_cls_redirect.skel.h"
@ -35,39 +36,6 @@ struct tuple {
struct addr_port dst; struct addr_port dst;
}; };
static int start_server(const struct sockaddr *addr, socklen_t len, int type)
{
int fd = socket(addr->sa_family, type, 0);
if (CHECK_FAIL(fd == -1))
return -1;
if (CHECK_FAIL(bind(fd, addr, len) == -1))
goto err;
if (type == SOCK_STREAM && CHECK_FAIL(listen(fd, 128) == -1))
goto err;
return fd;
err:
close(fd);
return -1;
}
static int connect_to_server(const struct sockaddr *addr, socklen_t len,
int type)
{
int fd = socket(addr->sa_family, type, 0);
if (CHECK_FAIL(fd == -1))
return -1;
if (CHECK_FAIL(connect(fd, addr, len)))
goto err;
return fd;
err:
close(fd);
return -1;
}
static bool fill_addr_port(const struct sockaddr *sa, struct addr_port *ap) static bool fill_addr_port(const struct sockaddr *sa, struct addr_port *ap)
{ {
const struct sockaddr_in6 *in6; const struct sockaddr_in6 *in6;
@ -98,14 +66,14 @@ static bool set_up_conn(const struct sockaddr *addr, socklen_t len, int type,
socklen_t slen = sizeof(ss); socklen_t slen = sizeof(ss);
struct sockaddr *sa = (struct sockaddr *)&ss; struct sockaddr *sa = (struct sockaddr *)&ss;
*server = start_server(addr, len, type); *server = start_server_addr(type, (struct sockaddr_storage *)addr, len, NULL);
if (*server < 0) if (*server < 0)
return false; return false;
if (CHECK_FAIL(getsockname(*server, sa, &slen))) if (CHECK_FAIL(getsockname(*server, sa, &slen)))
goto close_server; goto close_server;
*conn = connect_to_server(sa, slen, type); *conn = connect_to_addr(type, (struct sockaddr_storage *)sa, slen, NULL);
if (*conn < 0) if (*conn < 0)
goto close_server; goto close_server;

View File

@ -0,0 +1,197 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <sys/types.h>
#include <sys/socket.h>
#include <net/if.h>
#include <linux/in6.h>
#include <linux/if_alg.h>
#include "test_progs.h"
#include "network_helpers.h"
#include "crypto_sanity.skel.h"
#include "crypto_basic.skel.h"
#define NS_TEST "crypto_sanity_ns"
#define IPV6_IFACE_ADDR "face::1"
static const unsigned char crypto_key[] = "testtest12345678";
static const char plain_text[] = "stringtoencrypt0";
static int opfd = -1, tfmfd = -1;
static const char algo[] = "ecb(aes)";
static int init_afalg(void)
{
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "ecb(aes)"
};
tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (tfmfd == -1)
return errno;
if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) == -1)
return errno;
if (setsockopt(tfmfd, SOL_ALG, ALG_SET_KEY, crypto_key, 16) == -1)
return errno;
opfd = accept(tfmfd, NULL, 0);
if (opfd == -1)
return errno;
return 0;
}
static void deinit_afalg(void)
{
if (tfmfd != -1)
close(tfmfd);
if (opfd != -1)
close(opfd);
}
static void do_crypt_afalg(const void *src, void *dst, int size, bool encrypt)
{
struct msghdr msg = {};
struct cmsghdr *cmsg;
char cbuf[CMSG_SPACE(4)] = {0};
struct iovec iov;
msg.msg_control = cbuf;
msg.msg_controllen = sizeof(cbuf);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_ALG;
cmsg->cmsg_type = ALG_SET_OP;
cmsg->cmsg_len = CMSG_LEN(4);
*(__u32 *)CMSG_DATA(cmsg) = encrypt ? ALG_OP_ENCRYPT : ALG_OP_DECRYPT;
iov.iov_base = (char *)src;
iov.iov_len = size;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
sendmsg(opfd, &msg, 0);
read(opfd, dst, size);
}
void test_crypto_basic(void)
{
RUN_TESTS(crypto_basic);
}
void test_crypto_sanity(void)
{
LIBBPF_OPTS(bpf_tc_hook, qdisc_hook, .attach_point = BPF_TC_EGRESS);
LIBBPF_OPTS(bpf_tc_opts, tc_attach_enc);
LIBBPF_OPTS(bpf_tc_opts, tc_attach_dec);
LIBBPF_OPTS(bpf_test_run_opts, opts);
struct nstoken *nstoken = NULL;
struct crypto_sanity *skel;
char afalg_plain[16] = {0};
char afalg_dst[16] = {0};
struct sockaddr_in6 addr;
int sockfd, err, pfd;
socklen_t addrlen;
u16 udp_test_port;
skel = crypto_sanity__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel open"))
return;
SYS(fail, "ip netns add %s", NS_TEST);
SYS(fail, "ip -net %s -6 addr add %s/128 dev lo nodad", NS_TEST, IPV6_IFACE_ADDR);
SYS(fail, "ip -net %s link set dev lo up", NS_TEST);
nstoken = open_netns(NS_TEST);
if (!ASSERT_OK_PTR(nstoken, "open_netns"))
goto fail;
err = init_afalg();
if (!ASSERT_OK(err, "AF_ALG init fail"))
goto fail;
qdisc_hook.ifindex = if_nametoindex("lo");
if (!ASSERT_GT(qdisc_hook.ifindex, 0, "if_nametoindex lo"))
goto fail;
skel->bss->key_len = 16;
skel->bss->authsize = 0;
udp_test_port = skel->data->udp_test_port;
memcpy(skel->bss->key, crypto_key, sizeof(crypto_key));
snprintf(skel->bss->algo, 128, "%s", algo);
pfd = bpf_program__fd(skel->progs.skb_crypto_setup);
if (!ASSERT_GT(pfd, 0, "skb_crypto_setup fd"))
goto fail;
err = bpf_prog_test_run_opts(pfd, &opts);
if (!ASSERT_OK(err, "skb_crypto_setup") ||
!ASSERT_OK(opts.retval, "skb_crypto_setup retval"))
goto fail;
if (!ASSERT_OK(skel->bss->status, "skb_crypto_setup status"))
goto fail;
err = bpf_tc_hook_create(&qdisc_hook);
if (!ASSERT_OK(err, "create qdisc hook"))
goto fail;
addrlen = sizeof(addr);
err = make_sockaddr(AF_INET6, IPV6_IFACE_ADDR, udp_test_port,
(void *)&addr, &addrlen);
if (!ASSERT_OK(err, "make_sockaddr"))
goto fail;
tc_attach_enc.prog_fd = bpf_program__fd(skel->progs.encrypt_sanity);
err = bpf_tc_attach(&qdisc_hook, &tc_attach_enc);
if (!ASSERT_OK(err, "attach encrypt filter"))
goto fail;
sockfd = socket(AF_INET6, SOCK_DGRAM, 0);
if (!ASSERT_NEQ(sockfd, -1, "encrypt socket"))
goto fail;
err = sendto(sockfd, plain_text, sizeof(plain_text), 0, (void *)&addr, addrlen);
close(sockfd);
if (!ASSERT_EQ(err, sizeof(plain_text), "encrypt send"))
goto fail;
do_crypt_afalg(plain_text, afalg_dst, sizeof(afalg_dst), true);
if (!ASSERT_OK(skel->bss->status, "encrypt status"))
goto fail;
if (!ASSERT_STRNEQ(skel->bss->dst, afalg_dst, sizeof(afalg_dst), "encrypt AF_ALG"))
goto fail;
tc_attach_enc.flags = tc_attach_enc.prog_fd = tc_attach_enc.prog_id = 0;
err = bpf_tc_detach(&qdisc_hook, &tc_attach_enc);
if (!ASSERT_OK(err, "bpf_tc_detach encrypt"))
goto fail;
tc_attach_dec.prog_fd = bpf_program__fd(skel->progs.decrypt_sanity);
err = bpf_tc_attach(&qdisc_hook, &tc_attach_dec);
if (!ASSERT_OK(err, "attach decrypt filter"))
goto fail;
sockfd = socket(AF_INET6, SOCK_DGRAM, 0);
if (!ASSERT_NEQ(sockfd, -1, "decrypt socket"))
goto fail;
err = sendto(sockfd, afalg_dst, sizeof(afalg_dst), 0, (void *)&addr, addrlen);
close(sockfd);
if (!ASSERT_EQ(err, sizeof(afalg_dst), "decrypt send"))
goto fail;
do_crypt_afalg(afalg_dst, afalg_plain, sizeof(afalg_plain), false);
if (!ASSERT_OK(skel->bss->status, "decrypt status"))
goto fail;
if (!ASSERT_STRNEQ(skel->bss->dst, afalg_plain, sizeof(afalg_plain), "decrypt AF_ALG"))
goto fail;
tc_attach_dec.flags = tc_attach_dec.prog_fd = tc_attach_dec.prog_id = 0;
err = bpf_tc_detach(&qdisc_hook, &tc_attach_dec);
ASSERT_OK(err, "bpf_tc_detach decrypt");
fail:
close_netns(nstoken);
deinit_afalg();
SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null");
crypto_sanity__destroy(skel);
}

View File

@ -98,7 +98,8 @@ done:
static void test_dummy_multiple_args(void) static void test_dummy_multiple_args(void)
{ {
__u64 args[5] = {0, -100, 0x8a5f, 'c', 0x1234567887654321ULL}; struct bpf_dummy_ops_state st = { 7 };
__u64 args[5] = {(__u64)&st, -100, 0x8a5f, 'c', 0x1234567887654321ULL};
LIBBPF_OPTS(bpf_test_run_opts, attr, LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args, .ctx_in = args,
.ctx_size_in = sizeof(args), .ctx_size_in = sizeof(args),
@ -115,6 +116,7 @@ static void test_dummy_multiple_args(void)
fd = bpf_program__fd(skel->progs.test_2); fd = bpf_program__fd(skel->progs.test_2);
err = bpf_prog_test_run_opts(fd, &attr); err = bpf_prog_test_run_opts(fd, &attr);
ASSERT_OK(err, "test_run"); ASSERT_OK(err, "test_run");
args[0] = 7;
for (i = 0; i < ARRAY_SIZE(args); i++) { for (i = 0; i < ARRAY_SIZE(args); i++) {
snprintf(name, sizeof(name), "arg %zu", i); snprintf(name, sizeof(name), "arg %zu", i);
ASSERT_EQ(skel->bss->test_2_args[i], args[i], name); ASSERT_EQ(skel->bss->test_2_args[i], args[i], name);
@ -125,7 +127,8 @@ static void test_dummy_multiple_args(void)
static void test_dummy_sleepable(void) static void test_dummy_sleepable(void)
{ {
__u64 args[1] = {0}; struct bpf_dummy_ops_state st;
__u64 args[1] = {(__u64)&st};
LIBBPF_OPTS(bpf_test_run_opts, attr, LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args, .ctx_in = args,
.ctx_size_in = sizeof(args), .ctx_size_in = sizeof(args),
@ -144,6 +147,31 @@ static void test_dummy_sleepable(void)
dummy_st_ops_success__destroy(skel); dummy_st_ops_success__destroy(skel);
} }
/* dummy_st_ops.test_sleepable() parameter is not marked as nullable,
* thus bpf_prog_test_run_opts() below should be rejected as it tries
* to pass NULL for this parameter.
*/
static void test_dummy_sleepable_reject_null(void)
{
__u64 args[1] = {0};
LIBBPF_OPTS(bpf_test_run_opts, attr,
.ctx_in = args,
.ctx_size_in = sizeof(args),
);
struct dummy_st_ops_success *skel;
int fd, err;
skel = dummy_st_ops_success__open_and_load();
if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load"))
return;
fd = bpf_program__fd(skel->progs.test_sleepable);
err = bpf_prog_test_run_opts(fd, &attr);
ASSERT_EQ(err, -EINVAL, "test_run");
dummy_st_ops_success__destroy(skel);
}
void test_dummy_st_ops(void) void test_dummy_st_ops(void)
{ {
if (test__start_subtest("dummy_st_ops_attach")) if (test__start_subtest("dummy_st_ops_attach"))
@ -156,6 +184,8 @@ void test_dummy_st_ops(void)
test_dummy_multiple_args(); test_dummy_multiple_args();
if (test__start_subtest("dummy_sleepable")) if (test__start_subtest("dummy_sleepable"))
test_dummy_sleepable(); test_dummy_sleepable();
if (test__start_subtest("dummy_sleepable_reject_null"))
test_dummy_sleepable_reject_null();
RUN_TESTS(dummy_st_ops_fail); RUN_TESTS(dummy_st_ops_fail);
} }

Some files were not shown because too many files have changed in this diff Show More