Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2022-07-22 16:55:43 -07:00
commit b3fce974d4
88 changed files with 3461 additions and 863 deletions

View File

@ -369,7 +369,8 @@ No additional type data follow ``btf_type``.
* ``name_off``: offset to a valid C identifier * ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0 * ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_FUNC * ``info.kind``: BTF_KIND_FUNC
* ``info.vlen``: 0 * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL
or BTF_FUNC_EXTERN)
* ``type``: a BTF_KIND_FUNC_PROTO type * ``type``: a BTF_KIND_FUNC_PROTO type
No additional type data follow ``btf_type``. No additional type data follow ``btf_type``.
@ -380,6 +381,9 @@ type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
(ABI). (ABI).
Currently, only linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL are
supported in the kernel.
2.2.13 BTF_KIND_FUNC_PROTO 2.2.13 BTF_KIND_FUNC_PROTO
~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -19,6 +19,7 @@ that goes into great technical depth about the BPF Architecture.
faq faq
syscall_api syscall_api
helpers helpers
kfuncs
programs programs
maps maps
bpf_prog_run bpf_prog_run

View File

@ -0,0 +1,170 @@
=============================
BPF Kernel Functions (kfuncs)
=============================
1. Introduction
===============
BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
kfuncs do not have a stable interface and can change from one kernel release to
another. Hence, BPF programs need to be updated in response to changes in the
kernel.
2. Defining a kfunc
===================
There are two ways to expose a kernel function to BPF programs, either make an
existing function in the kernel visible, or add a new wrapper for BPF. In both
cases, care must be taken that BPF program can only call such function in a
valid context. To enforce this, visibility of a kfunc can be per program type.
If you are not creating a BPF wrapper for existing kernel function, skip ahead
to :ref:`BPF_kfunc_nodef`.
2.1 Creating a wrapper kfunc
----------------------------
When defining a wrapper kfunc, the wrapper function should have extern linkage.
This prevents the compiler from optimizing away dead code, as this wrapper kfunc
is not invoked anywhere in the kernel itself. It is not necessary to provide a
prototype in a header for the wrapper kfunc.
An example is given below::
/* Disables missing prototype warnings */
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global kfuncs as their definitions will be in BTF");
struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
{
return find_get_task_by_vpid(nr);
}
__diag_pop();
A wrapper kfunc is often needed when we need to annotate parameters of the
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
2.2 Annotating kfunc parameters
-------------------------------
Similar to BPF helpers, there is sometime need for additional context required
by the verifier to make the usage of kernel functions safer and more useful.
Hence, we can annotate a parameter by suffixing the name of the argument of the
kfunc with a __tag, where tag may be one of the supported annotations.
2.2.1 __sz Annotation
---------------------
This annotation is used to indicate a memory and size pair in the argument list.
An example is given below::
void bpf_memzero(void *mem, int mem__sz)
{
...
}
Here, the verifier will treat first argument as a PTR_TO_MEM, and second
argument as its size. By default, without __sz annotation, the size of the type
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
pointer.
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
-------------------------------------
When an existing function in the kernel is fit for consumption by BPF programs,
it can be directly registered with the BPF subsystem. However, care must still
be taken to review the context in which it will be invoked by the BPF program
and whether it is safe to do so.
2.4 Annotating kfuncs
---------------------
In addition to kfuncs' arguments, verifier may need more information about the
type of kfunc(s) being registered with the BPF subsystem. To do so, we define
flags on a set of kfuncs as follows::
BTF_SET8_START(bpf_task_set)
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
BTF_SET8_END(bpf_task_set)
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
along with it. Ofcourse, it is also allowed to specify no flags.
2.4.1 KF_ACQUIRE flag
---------------------
The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
refcounted object. The verifier will then ensure that the pointer to the object
is eventually released using a release kfunc, or transferred to a map using a
referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
loading of the BPF program until no lingering references remain in all possible
explored states of the program.
2.4.2 KF_RET_NULL flag
----------------------
The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
may be NULL. Hence, it forces the user to do a NULL check on the pointer
returned from the kfunc before making use of it (dereferencing or passing to
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
both are orthogonal to each other.
2.4.3 KF_RELEASE flag
---------------------
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
passed in to it. There can be only one referenced pointer that can be passed in.
All copies of the pointer being released are invalidated as a result of invoking
kfunc with this flag.
2.4.4 KF_KPTR_GET flag
----------------------
The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
as a pointer to kptr, safely increments the refcount of the object it points to,
and returns a reference to the user. The rest of the arguments may be normal
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
KF_ACQUIRE and KF_RET_NULL flags.
2.4.5 KF_TRUSTED_ARGS flag
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always be refcounted, and have
their offset set to 0. It can be used to enforce that a pointer to a refcounted
object acquired from a kfunc or BPF helper is passed as an argument to this
kfunc without any modifications (e.g. pointer arithmetic) such that it is
trusted and points to the original object. This flag is often used for kfuncs
that operate (change some property, perform some operation) on an object that
was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to
ensure the integrity of the operation being performed on the expected object.
2.5 Registering the kfuncs
--------------------------
Once the kfunc is prepared for use, the final step to making it visible is
registering it with the BPF subsystem. Registration is done per BPF program
type. An example is shown below::
BTF_SET8_START(bpf_task_set)
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
BTF_SET8_END(bpf_task_set)
static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
.owner = THIS_MODULE,
.set = &bpf_task_set,
};
static int init_subsystem(void)
{
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
}
late_initcall(init_subsystem);

View File

@ -0,0 +1,185 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.
===============================================
BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
===============================================
.. note::
- ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19
- ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6
- Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
were introduced in version 4.10
``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general
purpose hash map storage. Both the key and the value can be structs,
allowing for composite keys and values.
The kernel is responsible for allocating and freeing key/value pairs, up
to the max_entries limit that you specify. Hash maps use pre-allocation
of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be
used to disable pre-allocation when it is too memory expensive.
``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per
CPU. The per-cpu values are stored internally in an array.
The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
variants add LRU semantics to their respective hash tables. An LRU hash
will automatically evict the least recently used entries when the hash
table reaches capacity. An LRU hash maintains an internal LRU list that
is used to select elements for eviction. This internal LRU list is
shared across CPUs but it is possible to request a per CPU LRU list with
the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
Usage
=====
.. c:function::
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
Hash entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically. The ``flags``
parameter can be used to control the update behaviour:
- ``BPF_ANY`` will create a new element or update an existing element
- ``BPF_NOEXIST`` will create a new element only if one did not already
exist
- ``BPF_EXIST`` will update an existing element
``bpf_map_update_elem()`` returns 0 on success, or negative error in
case of failure.
.. c:function::
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
helper. This helper returns a pointer to the value associated with
``key``, or ``NULL`` if no entry was found.
.. c:function::
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
Hash entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case
of failure.
Per CPU Hashes
--------------
For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
automatically access the hash slot for the current CPU.
.. c:function::
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
value in the hash slot for a specific CPU. Returns value associated with
``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is
invalid.
Concurrency
-----------
Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by
programs running on different CPUs. Since Kernel version 5.1, the BPF
infrastructure provides ``struct bpf_spin_lock`` to synchronise access.
See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
Userspace
---------
.. c:function::
int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
In userspace, it is possible to iterate through the keys of a hash using
libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by
calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
``NULL``. Subsequent calls will fetch the next key that follows the
current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if
cur_key is the last key in the hash, or negative error in case of
failure.
Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()``
will instead return the *first* key in the hash table which is
undesirable. It is recommended to use batched lookup if there is going
to be key deletion intermixed with ``bpf_map_get_next_key()``.
Examples
========
Please see the ``tools/testing/selftests/bpf`` directory for functional
examples. The code snippets below demonstrates API usage.
This example shows how to declare an LRU Hash with a struct key and a
struct value.
.. code-block:: c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct key {
__u32 srcip;
};
struct value {
__u64 packets;
__u64 bytes;
};
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, 32);
__type(key, struct key);
__type(value, struct value);
} packet_stats SEC(".maps");
This example shows how to create or update hash values using atomic
instructions:
.. code-block:: c
static void update_stats(__u32 srcip, int bytes)
{
struct key key = {
.srcip = srcip,
};
struct value *value = bpf_map_lookup_elem(&packet_stats, &key);
if (value) {
__sync_fetch_and_add(&value->packets, 1);
__sync_fetch_and_add(&value->bytes, bytes);
} else {
struct value newval = { 1, bytes };
bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST);
}
}
Userspace walking the map elements from the map declared above:
.. code-block:: c
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
static void walk_hash_elements(int map_fd)
{
struct key *cur_key = NULL;
struct key next_key;
struct value value;
int err;
for (;;) {
err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
if (err)
break;
bpf_map_lookup_elem(map_fd, &next_key, &value);
// Use key and value here
cur_key = &next_key;
}
}

View File

@ -510,6 +510,9 @@ u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg,
unsigned int imm, unsigned int imm,
enum aarch64_insn_size_type size, enum aarch64_insn_size_type size,
enum aarch64_insn_ldst_type type); enum aarch64_insn_ldst_type type);
u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr,
enum aarch64_insn_register reg,
bool is64bit);
u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
enum aarch64_insn_register reg2, enum aarch64_insn_register reg2,
enum aarch64_insn_register base, enum aarch64_insn_register base,

View File

@ -323,7 +323,7 @@ static u32 aarch64_insn_encode_ldst_size(enum aarch64_insn_size_type type,
return insn; return insn;
} }
static inline long branch_imm_common(unsigned long pc, unsigned long addr, static inline long label_imm_common(unsigned long pc, unsigned long addr,
long range) long range)
{ {
long offset; long offset;
@ -354,7 +354,7 @@ u32 __kprobes aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr,
* ARM64 virtual address arrangement guarantees all kernel and module * ARM64 virtual address arrangement guarantees all kernel and module
* texts are within +/-128M. * texts are within +/-128M.
*/ */
offset = branch_imm_common(pc, addr, SZ_128M); offset = label_imm_common(pc, addr, SZ_128M);
if (offset >= SZ_128M) if (offset >= SZ_128M)
return AARCH64_BREAK_FAULT; return AARCH64_BREAK_FAULT;
@ -382,7 +382,7 @@ u32 aarch64_insn_gen_comp_branch_imm(unsigned long pc, unsigned long addr,
u32 insn; u32 insn;
long offset; long offset;
offset = branch_imm_common(pc, addr, SZ_1M); offset = label_imm_common(pc, addr, SZ_1M);
if (offset >= SZ_1M) if (offset >= SZ_1M)
return AARCH64_BREAK_FAULT; return AARCH64_BREAK_FAULT;
@ -421,7 +421,7 @@ u32 aarch64_insn_gen_cond_branch_imm(unsigned long pc, unsigned long addr,
u32 insn; u32 insn;
long offset; long offset;
offset = branch_imm_common(pc, addr, SZ_1M); offset = label_imm_common(pc, addr, SZ_1M);
insn = aarch64_insn_get_bcond_value(); insn = aarch64_insn_get_bcond_value();
@ -543,6 +543,28 @@ u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg,
return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm); return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm);
} }
u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr,
enum aarch64_insn_register reg,
bool is64bit)
{
u32 insn;
long offset;
offset = label_imm_common(pc, addr, SZ_1M);
if (offset >= SZ_1M)
return AARCH64_BREAK_FAULT;
insn = aarch64_insn_get_ldr_lit_value();
if (is64bit)
insn |= BIT(30);
insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, reg);
return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_19, insn,
offset >> 2);
}
u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
enum aarch64_insn_register reg2, enum aarch64_insn_register reg2,
enum aarch64_insn_register base, enum aarch64_insn_register base,

View File

@ -80,6 +80,12 @@
#define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE) #define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE)
#define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD) #define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD)
/* LDR (literal) */
#define A64_LDR32LIT(Wt, offset) \
aarch64_insn_gen_load_literal(0, offset, Wt, false)
#define A64_LDR64LIT(Xt, offset) \
aarch64_insn_gen_load_literal(0, offset, Xt, true)
/* Load/store register pair */ /* Load/store register pair */
#define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \ #define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \
aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \ aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \
@ -270,6 +276,7 @@
#define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC) #define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC)
#define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ) #define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ)
#define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC) #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
#define A64_NOP A64_HINT(AARCH64_INSN_HINT_NOP)
/* DMB */ /* DMB */
#define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH) #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)

View File

@ -10,6 +10,7 @@
#include <linux/bitfield.h> #include <linux/bitfield.h>
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/memory.h>
#include <linux/printk.h> #include <linux/printk.h>
#include <linux/slab.h> #include <linux/slab.h>
@ -18,6 +19,7 @@
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
#include <asm/debug-monitors.h> #include <asm/debug-monitors.h>
#include <asm/insn.h> #include <asm/insn.h>
#include <asm/patching.h>
#include <asm/set_memory.h> #include <asm/set_memory.h>
#include "bpf_jit.h" #include "bpf_jit.h"
@ -78,6 +80,15 @@ struct jit_ctx {
int fpb_offset; int fpb_offset;
}; };
struct bpf_plt {
u32 insn_ldr; /* load target */
u32 insn_br; /* branch to target */
u64 target; /* target value */
};
#define PLT_TARGET_SIZE sizeof_field(struct bpf_plt, target)
#define PLT_TARGET_OFFSET offsetof(struct bpf_plt, target)
static inline void emit(const u32 insn, struct jit_ctx *ctx) static inline void emit(const u32 insn, struct jit_ctx *ctx)
{ {
if (ctx->image != NULL) if (ctx->image != NULL)
@ -140,6 +151,12 @@ static inline void emit_a64_mov_i64(const int reg, const u64 val,
} }
} }
static inline void emit_bti(u32 insn, struct jit_ctx *ctx)
{
if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
emit(insn, ctx);
}
/* /*
* Kernel addresses in the vmalloc space use at most 48 bits, and the * Kernel addresses in the vmalloc space use at most 48 bits, and the
* remaining bits are guaranteed to be 0x1. So we can compose the address * remaining bits are guaranteed to be 0x1. So we can compose the address
@ -159,6 +176,14 @@ static inline void emit_addr_mov_i64(const int reg, const u64 val,
} }
} }
static inline void emit_call(u64 target, struct jit_ctx *ctx)
{
u8 tmp = bpf2a64[TMP_REG_1];
emit_addr_mov_i64(tmp, target, ctx);
emit(A64_BLR(tmp), ctx);
}
static inline int bpf2a64_offset(int bpf_insn, int off, static inline int bpf2a64_offset(int bpf_insn, int off,
const struct jit_ctx *ctx) const struct jit_ctx *ctx)
{ {
@ -235,13 +260,30 @@ static bool is_lsi_offset(int offset, int scale)
return true; return true;
} }
/* generated prologue:
* bti c // if CONFIG_ARM64_BTI_KERNEL
* mov x9, lr
* nop // POKE_OFFSET
* paciasp // if CONFIG_ARM64_PTR_AUTH_KERNEL
* stp x29, lr, [sp, #-16]!
* mov x29, sp
* stp x19, x20, [sp, #-16]!
* stp x21, x22, [sp, #-16]!
* stp x25, x26, [sp, #-16]!
* stp x27, x28, [sp, #-16]!
* mov x25, sp
* mov tcc, #0
* // PROLOGUE_OFFSET
*/
#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
/* Offset of nop instruction in bpf prog entry to be poked */
#define POKE_OFFSET (BTI_INSNS + 1)
/* Tail call offset to jump into */ /* Tail call offset to jump into */
#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \ #define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8)
IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
#define PROLOGUE_OFFSET 9
#else
#define PROLOGUE_OFFSET 8
#endif
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
{ {
@ -280,12 +322,14 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
* *
*/ */
emit_bti(A64_BTI_C, ctx);
emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
emit(A64_NOP, ctx);
/* Sign lr */ /* Sign lr */
if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)) if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
emit(A64_PACIASP, ctx); emit(A64_PACIASP, ctx);
/* BTI landing pad */
else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
emit(A64_BTI_C, ctx);
/* Save FP and LR registers to stay align with ARM64 AAPCS */ /* Save FP and LR registers to stay align with ARM64 AAPCS */
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
@ -312,8 +356,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
} }
/* BTI landing pad for the tail call, done with a BR */ /* BTI landing pad for the tail call, done with a BR */
if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) emit_bti(A64_BTI_J, ctx);
emit(A64_BTI_J, ctx);
} }
emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx); emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx);
@ -557,6 +600,53 @@ static int emit_ll_sc_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
return 0; return 0;
} }
void dummy_tramp(void);
asm (
" .pushsection .text, \"ax\", @progbits\n"
" .global dummy_tramp\n"
" .type dummy_tramp, %function\n"
"dummy_tramp:"
#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)
" bti j\n" /* dummy_tramp is called via "br x10" */
#endif
" mov x10, x30\n"
" mov x30, x9\n"
" ret x10\n"
" .size dummy_tramp, .-dummy_tramp\n"
" .popsection\n"
);
/* build a plt initialized like this:
*
* plt:
* ldr tmp, target
* br tmp
* target:
* .quad dummy_tramp
*
* when a long jump trampoline is attached, target is filled with the
* trampoline address, and when the trampoline is removed, target is
* restored to dummy_tramp address.
*/
static void build_plt(struct jit_ctx *ctx)
{
const u8 tmp = bpf2a64[TMP_REG_1];
struct bpf_plt *plt = NULL;
/* make sure target is 64-bit aligned */
if ((ctx->idx + PLT_TARGET_OFFSET / AARCH64_INSN_SIZE) % 2)
emit(A64_NOP, ctx);
plt = (struct bpf_plt *)(ctx->image + ctx->idx);
/* plt is called via bl, no BTI needed here */
emit(A64_LDR64LIT(tmp, 2 * AARCH64_INSN_SIZE), ctx);
emit(A64_BR(tmp), ctx);
if (ctx->image)
plt->target = (u64)&dummy_tramp;
}
static void build_epilogue(struct jit_ctx *ctx) static void build_epilogue(struct jit_ctx *ctx)
{ {
const u8 r0 = bpf2a64[BPF_REG_0]; const u8 r0 = bpf2a64[BPF_REG_0];
@ -991,8 +1081,7 @@ emit_cond_jmp:
&func_addr, &func_addr_fixed); &func_addr, &func_addr_fixed);
if (ret < 0) if (ret < 0)
return ret; return ret;
emit_addr_mov_i64(tmp, func_addr, ctx); emit_call(func_addr, ctx);
emit(A64_BLR(tmp), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx); emit(A64_MOV(1, r0, A64_R(0)), ctx);
break; break;
} }
@ -1336,6 +1425,13 @@ static int validate_code(struct jit_ctx *ctx)
if (a64_insn == AARCH64_BREAK_FAULT) if (a64_insn == AARCH64_BREAK_FAULT)
return -1; return -1;
} }
return 0;
}
static int validate_ctx(struct jit_ctx *ctx)
{
if (validate_code(ctx))
return -1;
if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries)) if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
return -1; return -1;
@ -1356,7 +1452,7 @@ struct arm64_jit_data {
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{ {
int image_size, prog_size, extable_size; int image_size, prog_size, extable_size, extable_align, extable_offset;
struct bpf_prog *tmp, *orig_prog = prog; struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header; struct bpf_binary_header *header;
struct arm64_jit_data *jit_data; struct arm64_jit_data *jit_data;
@ -1426,13 +1522,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
ctx.epilogue_offset = ctx.idx; ctx.epilogue_offset = ctx.idx;
build_epilogue(&ctx); build_epilogue(&ctx);
build_plt(&ctx);
extable_align = __alignof__(struct exception_table_entry);
extable_size = prog->aux->num_exentries * extable_size = prog->aux->num_exentries *
sizeof(struct exception_table_entry); sizeof(struct exception_table_entry);
/* Now we know the actual image size. */ /* Now we know the actual image size. */
prog_size = sizeof(u32) * ctx.idx; prog_size = sizeof(u32) * ctx.idx;
image_size = prog_size + extable_size; /* also allocate space for plt target */
extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align);
image_size = extable_offset + extable_size;
header = bpf_jit_binary_alloc(image_size, &image_ptr, header = bpf_jit_binary_alloc(image_size, &image_ptr,
sizeof(u32), jit_fill_hole); sizeof(u32), jit_fill_hole);
if (header == NULL) { if (header == NULL) {
@ -1444,7 +1544,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
ctx.image = (__le32 *)image_ptr; ctx.image = (__le32 *)image_ptr;
if (extable_size) if (extable_size)
prog->aux->extable = (void *)image_ptr + prog_size; prog->aux->extable = (void *)image_ptr + extable_offset;
skip_init_ctx: skip_init_ctx:
ctx.idx = 0; ctx.idx = 0;
ctx.exentry_idx = 0; ctx.exentry_idx = 0;
@ -1458,9 +1558,10 @@ skip_init_ctx:
} }
build_epilogue(&ctx); build_epilogue(&ctx);
build_plt(&ctx);
/* 3. Extra pass to validate JITed code. */ /* 3. Extra pass to validate JITed code. */
if (validate_code(&ctx)) { if (validate_ctx(&ctx)) {
bpf_jit_binary_free(header); bpf_jit_binary_free(header);
prog = orig_prog; prog = orig_prog;
goto out_off; goto out_off;
@ -1537,3 +1638,583 @@ bool bpf_jit_supports_subprog_tailcalls(void)
{ {
return true; return true;
} }
static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
int args_off, int retval_off, int run_ctx_off,
bool save_ret)
{
u32 *branch;
u64 enter_prog;
u64 exit_prog;
struct bpf_prog *p = l->link.prog;
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
if (p->aux->sleepable) {
enter_prog = (u64)__bpf_prog_enter_sleepable;
exit_prog = (u64)__bpf_prog_exit_sleepable;
} else {
enter_prog = (u64)__bpf_prog_enter;
exit_prog = (u64)__bpf_prog_exit;
}
if (l->cookie == 0) {
/* if cookie is zero, one instruction is enough to store it */
emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
} else {
emit_a64_mov_i64(A64_R(10), l->cookie, ctx);
emit(A64_STR64I(A64_R(10), A64_SP, run_ctx_off + cookie_off),
ctx);
}
/* save p to callee saved register x19 to avoid loading p with mov_i64
* each time.
*/
emit_addr_mov_i64(A64_R(19), (const u64)p, ctx);
/* arg1: prog */
emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
/* arg2: &run_ctx */
emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx);
emit_call(enter_prog, ctx);
/* if (__bpf_prog_enter(prog) == 0)
* goto skip_exec_of_prog;
*/
branch = ctx->image + ctx->idx;
emit(A64_NOP, ctx);
/* save return value to callee saved register x20 */
emit(A64_MOV(1, A64_R(20), A64_R(0)), ctx);
emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
if (!p->jited)
emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
emit_call((const u64)p->bpf_func, ctx);
if (save_ret)
emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
if (ctx->image) {
int offset = &ctx->image[ctx->idx] - branch;
*branch = A64_CBZ(1, A64_R(0), offset);
}
/* arg1: prog */
emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
/* arg2: start time */
emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx);
/* arg3: &run_ctx */
emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx);
emit_call(exit_prog, ctx);
}
static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
int args_off, int retval_off, int run_ctx_off,
u32 **branches)
{
int i;
/* The first fmod_ret program will receive a garbage return value.
* Set this to 0 to avoid confusing the program.
*/
emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
for (i = 0; i < tl->nr_links; i++) {
invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
run_ctx_off, true);
/* if (*(u64 *)(sp + retval_off) != 0)
* goto do_fexit;
*/
emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx);
/* Save the location of branch, and generate a nop.
* This nop will be replaced with a cbnz later.
*/
branches[i] = ctx->image + ctx->idx;
emit(A64_NOP, ctx);
}
}
static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
{
int i;
for (i = 0; i < nargs; i++) {
emit(A64_STR64I(i, A64_SP, args_off), ctx);
args_off += 8;
}
}
static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
{
int i;
for (i = 0; i < nargs; i++) {
emit(A64_LDR64I(i, A64_SP, args_off), ctx);
args_off += 8;
}
}
/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
*
* bpf prog and function entry before bpf trampoline hooked:
* mov x9, lr
* nop
*
* bpf prog and function entry after bpf trampoline hooked:
* mov x9, lr
* bl <bpf_trampoline or plt>
*
*/
static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
struct bpf_tramp_links *tlinks, void *orig_call,
int nargs, u32 flags)
{
int i;
int stack_size;
int retaddr_off;
int regs_off;
int retval_off;
int args_off;
int nargs_off;
int ip_off;
int run_ctx_off;
struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
bool save_ret;
u32 **branches = NULL;
/* trampoline stack layout:
* [ parent ip ]
* [ FP ]
* SP + retaddr_off [ self ip ]
* [ FP ]
*
* [ padding ] align SP to multiples of 16
*
* [ x20 ] callee saved reg x20
* SP + regs_off [ x19 ] callee saved reg x19
*
* SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or
* BPF_TRAMP_F_RET_FENTRY_RET
*
* [ argN ]
* [ ... ]
* SP + args_off [ arg1 ]
*
* SP + nargs_off [ args count ]
*
* SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag
*
* SP + run_ctx_off [ bpf_tramp_run_ctx ]
*/
stack_size = 0;
run_ctx_off = stack_size;
/* room for bpf_tramp_run_ctx */
stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
ip_off = stack_size;
/* room for IP address argument */
if (flags & BPF_TRAMP_F_IP_ARG)
stack_size += 8;
nargs_off = stack_size;
/* room for args count */
stack_size += 8;
args_off = stack_size;
/* room for args */
stack_size += nargs * 8;
/* room for return value */
retval_off = stack_size;
save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
if (save_ret)
stack_size += 8;
/* room for callee saved registers, currently x19 and x20 are used */
regs_off = stack_size;
stack_size += 16;
/* round up to multiples of 16 to avoid SPAlignmentFault */
stack_size = round_up(stack_size, 16);
/* return address locates above FP */
retaddr_off = stack_size + 8;
/* bpf trampoline may be invoked by 3 instruction types:
* 1. bl, attached to bpf prog or kernel function via short jump
* 2. br, attached to bpf prog or kernel function via long jump
* 3. blr, working as a function pointer, used by struct_ops.
* So BTI_JC should used here to support both br and blr.
*/
emit_bti(A64_BTI_JC, ctx);
/* frame for parent function */
emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
/* frame for patched function */
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
/* allocate stack space */
emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
if (flags & BPF_TRAMP_F_IP_ARG) {
/* save ip address of the traced function */
emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx);
emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
}
/* save args count*/
emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
/* save args */
save_args(ctx, args_off, nargs);
/* save callee saved registers */
emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (flags & BPF_TRAMP_F_CALL_ORIG) {
emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_enter, ctx);
}
for (i = 0; i < fentry->nr_links; i++)
invoke_bpf_prog(ctx, fentry->links[i], args_off,
retval_off, run_ctx_off,
flags & BPF_TRAMP_F_RET_FENTRY_RET);
if (fmod_ret->nr_links) {
branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *),
GFP_KERNEL);
if (!branches)
return -ENOMEM;
invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
run_ctx_off, branches);
}
if (flags & BPF_TRAMP_F_CALL_ORIG) {
restore_args(ctx, args_off, nargs);
/* call original func */
emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
emit(A64_BLR(A64_R(10)), ctx);
/* store return value */
emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
/* reserve a nop for bpf_tramp_image_put */
im->ip_after_call = ctx->image + ctx->idx;
emit(A64_NOP, ctx);
}
/* update the branches saved in invoke_bpf_mod_ret with cbnz */
for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
int offset = &ctx->image[ctx->idx] - branches[i];
*branches[i] = A64_CBNZ(1, A64_R(10), offset);
}
for (i = 0; i < fexit->nr_links; i++)
invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
run_ctx_off, false);
if (flags & BPF_TRAMP_F_CALL_ORIG) {
im->ip_epilogue = ctx->image + ctx->idx;
emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
emit_call((const u64)__bpf_tramp_exit, ctx);
}
if (flags & BPF_TRAMP_F_RESTORE_REGS)
restore_args(ctx, args_off, nargs);
/* restore callee saved register x19 and x20 */
emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
if (save_ret)
emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx);
/* reset SP */
emit(A64_MOV(1, A64_SP, A64_FP), ctx);
/* pop frames */
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
if (flags & BPF_TRAMP_F_SKIP_FRAME) {
/* skip patched function, return to parent */
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(9)), ctx);
} else {
/* return to patched function */
emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
emit(A64_RET(A64_R(10)), ctx);
}
if (ctx->image)
bpf_flush_icache(ctx->image, ctx->image + ctx->idx);
kfree(branches);
return ctx->idx;
}
int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
void *image_end, const struct btf_func_model *m,
u32 flags, struct bpf_tramp_links *tlinks,
void *orig_call)
{
int ret;
int nargs = m->nr_args;
int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
struct jit_ctx ctx = {
.image = NULL,
.idx = 0,
};
/* the first 8 arguments are passed by registers */
if (nargs > 8)
return -ENOTSUPP;
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
if (ret < 0)
return ret;
if (ret > max_insns)
return -EFBIG;
ctx.image = image;
ctx.idx = 0;
jit_fill_hole(image, (unsigned int)(image_end - image));
ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
if (ret > 0 && validate_code(&ctx) < 0)
ret = -EINVAL;
if (ret > 0)
ret *= AARCH64_INSN_SIZE;
return ret;
}
static bool is_long_jump(void *ip, void *target)
{
long offset;
/* NULL target means this is a NOP */
if (!target)
return false;
offset = (long)target - (long)ip;
return offset < -SZ_128M || offset >= SZ_128M;
}
static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
void *addr, void *plt, u32 *insn)
{
void *target;
if (!addr) {
*insn = aarch64_insn_gen_nop();
return 0;
}
if (is_long_jump(ip, addr))
target = plt;
else
target = addr;
*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
(unsigned long)target,
type);
return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
}
/* Replace the branch instruction from @ip to @old_addr in a bpf prog or a bpf
* trampoline with the branch instruction from @ip to @new_addr. If @old_addr
* or @new_addr is NULL, the old or new instruction is NOP.
*
* When @ip is the bpf prog entry, a bpf trampoline is being attached or
* detached. Since bpf trampoline and bpf prog are allocated separately with
* vmalloc, the address distance may exceed 128MB, the maximum branch range.
* So long jump should be handled.
*
* When a bpf prog is constructed, a plt pointing to empty trampoline
* dummy_tramp is placed at the end:
*
* bpf_prog:
* mov x9, lr
* nop // patchsite
* ...
* ret
*
* plt:
* ldr x10, target
* br x10
* target:
* .quad dummy_tramp // plt target
*
* This is also the state when no trampoline is attached.
*
* When a short-jump bpf trampoline is attached, the patchsite is patched
* to a bl instruction to the trampoline directly:
*
* bpf_prog:
* mov x9, lr
* bl <short-jump bpf trampoline address> // patchsite
* ...
* ret
*
* plt:
* ldr x10, target
* br x10
* target:
* .quad dummy_tramp // plt target
*
* When a long-jump bpf trampoline is attached, the plt target is filled with
* the trampoline address and the patchsite is patched to a bl instruction to
* the plt:
*
* bpf_prog:
* mov x9, lr
* bl plt // patchsite
* ...
* ret
*
* plt:
* ldr x10, target
* br x10
* target:
* .quad <long-jump bpf trampoline address> // plt target
*
* The dummy_tramp is used to prevent another CPU from jumping to unknown
* locations during the patching process, making the patching process easier.
*/
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
void *old_addr, void *new_addr)
{
int ret;
u32 old_insn;
u32 new_insn;
u32 replaced;
struct bpf_plt *plt = NULL;
unsigned long size = 0UL;
unsigned long offset = ~0UL;
enum aarch64_insn_branch_type branch_type;
char namebuf[KSYM_NAME_LEN];
void *image = NULL;
u64 plt_target = 0ULL;
bool poking_bpf_entry;
if (!__bpf_address_lookup((unsigned long)ip, &size, &offset, namebuf))
/* Only poking bpf text is supported. Since kernel function
* entry is set up by ftrace, we reply on ftrace to poke kernel
* functions.
*/
return -ENOTSUPP;
image = ip - offset;
/* zero offset means we're poking bpf prog entry */
poking_bpf_entry = (offset == 0UL);
/* bpf prog entry, find plt and the real patchsite */
if (poking_bpf_entry) {
/* plt locates at the end of bpf prog */
plt = image + size - PLT_TARGET_OFFSET;
/* skip to the nop instruction in bpf prog entry:
* bti c // if BTI enabled
* mov x9, x30
* nop
*/
ip = image + POKE_OFFSET * AARCH64_INSN_SIZE;
}
/* long jump is only possible at bpf prog entry */
if (WARN_ON((is_long_jump(ip, new_addr) || is_long_jump(ip, old_addr)) &&
!poking_bpf_entry))
return -EINVAL;
if (poke_type == BPF_MOD_CALL)
branch_type = AARCH64_INSN_BRANCH_LINK;
else
branch_type = AARCH64_INSN_BRANCH_NOLINK;
if (gen_branch_or_nop(branch_type, ip, old_addr, plt, &old_insn) < 0)
return -EFAULT;
if (gen_branch_or_nop(branch_type, ip, new_addr, plt, &new_insn) < 0)
return -EFAULT;
if (is_long_jump(ip, new_addr))
plt_target = (u64)new_addr;
else if (is_long_jump(ip, old_addr))
/* if the old target is a long jump and the new target is not,
* restore the plt target to dummy_tramp, so there is always a
* legal and harmless address stored in plt target, and we'll
* never jump from plt to an unknown place.
*/
plt_target = (u64)&dummy_tramp;
if (plt_target) {
/* non-zero plt_target indicates we're patching a bpf prog,
* which is read only.
*/
if (set_memory_rw(PAGE_MASK & ((uintptr_t)&plt->target), 1))
return -EFAULT;
WRITE_ONCE(plt->target, plt_target);
set_memory_ro(PAGE_MASK & ((uintptr_t)&plt->target), 1);
/* since plt target points to either the new trampoline
* or dummy_tramp, even if another CPU reads the old plt
* target value before fetching the bl instruction to plt,
* it will be brought back by dummy_tramp, so no barrier is
* required here.
*/
}
/* if the old target and the new target are both long jumps, no
* patching is required
*/
if (old_insn == new_insn)
return 0;
mutex_lock(&text_mutex);
if (aarch64_insn_read(ip, &replaced)) {
ret = -EFAULT;
goto out;
}
if (replaced != old_insn) {
ret = -EFAULT;
goto out;
}
/* We call aarch64_insn_patch_text_nosync() to replace instruction
* atomically, so no other CPUs will fetch a half-new and half-old
* instruction. But there is chance that another CPU executes the
* old instruction after the patching operation finishes (e.g.,
* pipeline not flushed, or icache not synchronized yet).
*
* 1. when a new trampoline is attached, it is not a problem for
* different CPUs to jump to different trampolines temporarily.
*
* 2. when an old trampoline is freed, we should wait for all other
* CPUs to exit the trampoline and make sure the trampoline is no
* longer reachable, since bpf_tramp_image_put() function already
* uses percpu_ref and task-based rcu to do the sync, no need to call
* the sync version here, see bpf_tramp_image_put() for details.
*/
ret = aarch64_insn_patch_text_nosync(ip, new_insn);
out:
mutex_unlock(&text_mutex);
return ret;
}

View File

@ -1950,23 +1950,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
return 0; return 0;
} }
static bool is_valid_bpf_tramp_flags(unsigned int flags)
{
if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
(flags & BPF_TRAMP_F_SKIP_FRAME))
return false;
/*
* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
* and it must be used alone.
*/
if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
(flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
return false;
return true;
}
/* Example: /* Example:
* __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev); * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
* its 'struct btf_func_model' will be nr_args=2 * its 'struct btf_func_model' will be nr_args=2
@ -2045,9 +2028,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
if (nr_args > 6) if (nr_args > 6)
return -ENOTSUPP; return -ENOTSUPP;
if (!is_valid_bpf_tramp_flags(flags))
return -EINVAL;
/* Generated trampoline stack layout: /* Generated trampoline stack layout:
* *
* RBP + 8 [ return address ] * RBP + 8 [ return address ]
@ -2153,10 +2133,15 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
if (flags & BPF_TRAMP_F_CALL_ORIG) { if (flags & BPF_TRAMP_F_CALL_ORIG) {
restore_regs(m, &prog, nr_args, regs_off); restore_regs(m, &prog, nr_args, regs_off);
/* call original function */ if (flags & BPF_TRAMP_F_ORIG_STACK) {
if (emit_call(&prog, orig_call, prog)) { emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8);
ret = -EINVAL; EMIT2(0xff, 0xd0); /* call *rax */
goto cleanup; } else {
/* call original function */
if (emit_call(&prog, orig_call, prog)) {
ret = -EINVAL;
goto cleanup;
}
} }
/* remember return value in a stack for bpf prog to access */ /* remember return value in a stack for bpf prog to access */
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
@ -2520,3 +2505,28 @@ bool bpf_jit_supports_subprog_tailcalls(void)
{ {
return true; return true;
} }
void bpf_jit_free(struct bpf_prog *prog)
{
if (prog->jited) {
struct x64_jit_data *jit_data = prog->aux->jit_data;
struct bpf_binary_header *hdr;
/*
* If we fail the final pass of JIT (from jit_subprogs),
* the program may not be finalized yet. Call finalize here
* before freeing it.
*/
if (jit_data) {
bpf_jit_binary_pack_finalize(prog, jit_data->header,
jit_data->rw_header);
kvfree(jit_data->addrs);
kfree(jit_data);
}
hdr = bpf_jit_binary_pack_hdr(prog);
bpf_jit_binary_pack_free(hdr, NULL);
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
}
bpf_prog_unlock_free(prog);
}

View File

@ -47,6 +47,7 @@ struct kobject;
struct mem_cgroup; struct mem_cgroup;
struct module; struct module;
struct bpf_func_state; struct bpf_func_state;
struct ftrace_ops;
extern struct idr btf_idr; extern struct idr btf_idr;
extern spinlock_t btf_idr_lock; extern spinlock_t btf_idr_lock;
@ -221,7 +222,7 @@ struct bpf_map {
u32 btf_vmlinux_value_type_id; u32 btf_vmlinux_value_type_id;
struct btf *btf; struct btf *btf;
#ifdef CONFIG_MEMCG_KMEM #ifdef CONFIG_MEMCG_KMEM
struct mem_cgroup *memcg; struct obj_cgroup *objcg;
#endif #endif
char name[BPF_OBJ_NAME_LEN]; char name[BPF_OBJ_NAME_LEN];
struct bpf_map_off_arr *off_arr; struct bpf_map_off_arr *off_arr;
@ -751,6 +752,16 @@ struct btf_func_model {
/* Return the return value of fentry prog. Only used by bpf_struct_ops. */ /* Return the return value of fentry prog. Only used by bpf_struct_ops. */
#define BPF_TRAMP_F_RET_FENTRY_RET BIT(4) #define BPF_TRAMP_F_RET_FENTRY_RET BIT(4)
/* Get original function from stack instead of from provided direct address.
* Makes sense for trampolines with fexit or fmod_ret programs.
*/
#define BPF_TRAMP_F_ORIG_STACK BIT(5)
/* This trampoline is on a function with another ftrace_ops with IPMODIFY,
* e.g., a live patch. This flag is set and cleared by ftrace call backs,
*/
#define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6)
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
* bytes on x86. * bytes on x86.
*/ */
@ -833,9 +844,11 @@ struct bpf_tramp_image {
struct bpf_trampoline { struct bpf_trampoline {
/* hlist for trampoline_table */ /* hlist for trampoline_table */
struct hlist_node hlist; struct hlist_node hlist;
struct ftrace_ops *fops;
/* serializes access to fields of this trampoline */ /* serializes access to fields of this trampoline */
struct mutex mutex; struct mutex mutex;
refcount_t refcnt; refcount_t refcnt;
u32 flags;
u64 key; u64 key;
struct { struct {
struct btf_func_model model; struct btf_func_model model;
@ -1044,7 +1057,6 @@ struct bpf_prog_aux {
bool sleepable; bool sleepable;
bool tail_call_reachable; bool tail_call_reachable;
bool xdp_has_frags; bool xdp_has_frags;
bool use_bpf_prog_pack;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto; const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */ /* function name for valid attach_btf_id */
@ -1255,9 +1267,6 @@ struct bpf_dummy_ops {
int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr); union bpf_attr __user *uattr);
#endif #endif
int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
int cgroup_atype);
void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
#else #else
static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
{ {
@ -1281,6 +1290,13 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
{ {
return -EINVAL; return -EINVAL;
} }
#endif
#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
int cgroup_atype);
void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog);
#else
static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog,
int cgroup_atype) int cgroup_atype)
{ {
@ -1921,7 +1937,8 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs); struct bpf_reg_state *regs);
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id, const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs); struct bpf_reg_state *regs,
u32 kfunc_flags);
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg); struct bpf_reg_state *reg);
int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog, int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,

View File

@ -345,10 +345,10 @@ struct bpf_verifier_state_list {
}; };
struct bpf_loop_inline_state { struct bpf_loop_inline_state {
int initialized:1; /* set to true upon first entry */ unsigned int initialized:1; /* set to true upon first entry */
int fit_for_inline:1; /* true if callback function is the same unsigned int fit_for_inline:1; /* true if callback function is the same
* at each call and flags are always zero * at each call and flags are always zero
*/ */
u32 callback_subprogno; /* valid when fit_for_inline is true */ u32 callback_subprogno; /* valid when fit_for_inline is true */
}; };

View File

@ -12,14 +12,43 @@
#define BTF_TYPE_EMIT(type) ((void)(type *)0) #define BTF_TYPE_EMIT(type) ((void)(type *)0)
#define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val)
enum btf_kfunc_type { /* These need to be macros, as the expressions are used in assembler input */
BTF_KFUNC_TYPE_CHECK, #define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */
BTF_KFUNC_TYPE_ACQUIRE, #define KF_RELEASE (1 << 1) /* kfunc is a release function */
BTF_KFUNC_TYPE_RELEASE, #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
BTF_KFUNC_TYPE_RET_NULL, #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
BTF_KFUNC_TYPE_KPTR_ACQUIRE, /* Trusted arguments are those which are meant to be referenced arguments with
BTF_KFUNC_TYPE_MAX, * unchanged offset. It is used to enforce that pointers obtained from acquire
}; * kfuncs remain unmodified when being passed to helpers taking trusted args.
*
* Consider
* struct foo {
* int data;
* struct foo *next;
* };
*
* struct bar {
* int data;
* struct foo f;
* };
*
* struct foo *f = alloc_foo(); // Acquire kfunc
* struct bar *b = alloc_bar(); // Acquire kfunc
*
* If a kfunc set_foo_data() wants to operate only on the allocated object, it
* will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
*
* set_foo_data(f, 42); // Allowed
* set_foo_data(f->next, 42); // Rejected, non-referenced pointer
* set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
* set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset
*
* In the final case, usually for the purposes of type matching, it is deduced
* by looking at the type of the member at the offset, but due to the
* requirement of trusted argument, this deduction will be strict and not done
* for this case.
*/
#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
struct btf; struct btf;
struct btf_member; struct btf_member;
@ -30,16 +59,7 @@ struct btf_id_set;
struct btf_kfunc_id_set { struct btf_kfunc_id_set {
struct module *owner; struct module *owner;
union { struct btf_id_set8 *set;
struct {
struct btf_id_set *check_set;
struct btf_id_set *acquire_set;
struct btf_id_set *release_set;
struct btf_id_set *ret_null_set;
struct btf_id_set *kptr_acquire_set;
};
struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
};
}; };
struct btf_id_dtor_kfunc { struct btf_id_dtor_kfunc {
@ -378,9 +398,9 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset); const char *btf_name_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void); struct btf *btf_parse_vmlinux(void);
struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
bool btf_kfunc_id_set_contains(const struct btf *btf, u32 *btf_kfunc_id_set_contains(const struct btf *btf,
enum bpf_prog_type prog_type, enum bpf_prog_type prog_type,
enum btf_kfunc_type type, u32 kfunc_btf_id); u32 kfunc_btf_id);
int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
const struct btf_kfunc_id_set *s); const struct btf_kfunc_id_set *s);
s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
@ -397,12 +417,11 @@ static inline const char *btf_name_by_offset(const struct btf *btf,
{ {
return NULL; return NULL;
} }
static inline bool btf_kfunc_id_set_contains(const struct btf *btf, static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf,
enum bpf_prog_type prog_type, enum bpf_prog_type prog_type,
enum btf_kfunc_type type,
u32 kfunc_btf_id) u32 kfunc_btf_id)
{ {
return false; return NULL;
} }
static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
const struct btf_kfunc_id_set *s) const struct btf_kfunc_id_set *s)

View File

@ -8,6 +8,15 @@ struct btf_id_set {
u32 ids[]; u32 ids[];
}; };
struct btf_id_set8 {
u32 cnt;
u32 flags;
struct {
u32 id;
u32 flags;
} pairs[];
};
#ifdef CONFIG_DEBUG_INFO_BTF #ifdef CONFIG_DEBUG_INFO_BTF
#include <linux/compiler.h> /* for __PASTE */ #include <linux/compiler.h> /* for __PASTE */
@ -25,7 +34,7 @@ struct btf_id_set {
#define BTF_IDS_SECTION ".BTF_ids" #define BTF_IDS_SECTION ".BTF_ids"
#define ____BTF_ID(symbol) \ #define ____BTF_ID(symbol, word) \
asm( \ asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
".local " #symbol " ; \n" \ ".local " #symbol " ; \n" \
@ -33,10 +42,11 @@ asm( \
".size " #symbol ", 4; \n" \ ".size " #symbol ", 4; \n" \
#symbol ": \n" \ #symbol ": \n" \
".zero 4 \n" \ ".zero 4 \n" \
word \
".popsection; \n"); ".popsection; \n");
#define __BTF_ID(symbol) \ #define __BTF_ID(symbol, word) \
____BTF_ID(symbol) ____BTF_ID(symbol, word)
#define __ID(prefix) \ #define __ID(prefix) \
__PASTE(prefix, __COUNTER__) __PASTE(prefix, __COUNTER__)
@ -46,7 +56,14 @@ asm( \
* to 4 zero bytes. * to 4 zero bytes.
*/ */
#define BTF_ID(prefix, name) \ #define BTF_ID(prefix, name) \
__BTF_ID(__ID(__BTF_ID__##prefix##__##name##__)) __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), "")
#define ____BTF_ID_FLAGS(prefix, name, flags) \
__BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), ".long " #flags "\n")
#define __BTF_ID_FLAGS(prefix, name, flags, ...) \
____BTF_ID_FLAGS(prefix, name, flags)
#define BTF_ID_FLAGS(prefix, name, ...) \
__BTF_ID_FLAGS(prefix, name, ##__VA_ARGS__, 0)
/* /*
* The BTF_ID_LIST macro defines pure (unsorted) list * The BTF_ID_LIST macro defines pure (unsorted) list
@ -145,10 +162,51 @@ asm( \
".popsection; \n"); \ ".popsection; \n"); \
extern struct btf_id_set name; extern struct btf_id_set name;
/*
* The BTF_SET8_START/END macros pair defines sorted list of
* BTF IDs and their flags plus its members count, with the
* following layout:
*
* BTF_SET8_START(list)
* BTF_ID_FLAGS(type1, name1, flags)
* BTF_ID_FLAGS(type2, name2, flags)
* BTF_SET8_END(list)
*
* __BTF_ID__set8__list:
* .zero 8
* list:
* __BTF_ID__type1__name1__3:
* .zero 4
* .word (1 << 0) | (1 << 2)
* __BTF_ID__type2__name2__5:
* .zero 4
* .word (1 << 3) | (1 << 1) | (1 << 2)
*
*/
#define __BTF_SET8_START(name, scope) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
"." #scope " __BTF_ID__set8__" #name "; \n" \
"__BTF_ID__set8__" #name ":; \n" \
".zero 8 \n" \
".popsection; \n");
#define BTF_SET8_START(name) \
__BTF_ID_LIST(name, local) \
__BTF_SET8_START(name, local)
#define BTF_SET8_END(name) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
".size __BTF_ID__set8__" #name ", .-" #name " \n" \
".popsection; \n"); \
extern struct btf_id_set8 name;
#else #else
#define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; #define BTF_ID_LIST(name) static u32 __maybe_unused name[5];
#define BTF_ID(prefix, name) #define BTF_ID(prefix, name)
#define BTF_ID_FLAGS(prefix, name, ...)
#define BTF_ID_UNUSED #define BTF_ID_UNUSED
#define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n];
#define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1];
@ -156,6 +214,8 @@ extern struct btf_id_set name;
#define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 };
#define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 };
#define BTF_SET_END(name) #define BTF_SET_END(name)
#define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 };
#define BTF_SET8_END(name)
#endif /* CONFIG_DEBUG_INFO_BTF */ #endif /* CONFIG_DEBUG_INFO_BTF */

View File

@ -1027,6 +1027,14 @@ u64 bpf_jit_alloc_exec_limit(void);
void *bpf_jit_alloc_exec(unsigned long size); void *bpf_jit_alloc_exec(unsigned long size);
void bpf_jit_free_exec(void *addr); void bpf_jit_free_exec(void *addr);
void bpf_jit_free(struct bpf_prog *fp); void bpf_jit_free(struct bpf_prog *fp);
struct bpf_binary_header *
bpf_jit_binary_pack_hdr(const struct bpf_prog *fp);
static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
{
return list_empty(&fp->aux->ksym.lnode) ||
fp->aux->ksym.lnode.prev == LIST_POISON2;
}
struct bpf_binary_header * struct bpf_binary_header *
bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image, bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,

View File

@ -208,6 +208,43 @@ enum {
FTRACE_OPS_FL_DIRECT = BIT(17), FTRACE_OPS_FL_DIRECT = BIT(17),
}; };
/*
* FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes
* to a ftrace_ops. Note, the requests may fail.
*
* ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same
* function as an ops with IPMODIFY. Called
* when the DIRECT ops is being registered.
* This is called with both direct_mutex and
* ftrace_lock are locked.
*
* ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same
* function as an ops with IPMODIFY. Called
* when the other ops (the one with IPMODIFY)
* is being registered.
* This is called with direct_mutex locked.
*
* DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same
* function as an ops with IPMODIFY. Called
* when the other ops (the one with IPMODIFY)
* is being unregistered.
* This is called with direct_mutex locked.
*/
enum ftrace_ops_cmd {
FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF,
FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER,
FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER,
};
/*
* For most ftrace_ops_cmd,
* Returns:
* 0 - Success.
* Negative on failure. The return value is dependent on the
* callback.
*/
typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd);
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
/* The hash used to know what functions callbacks trace */ /* The hash used to know what functions callbacks trace */
struct ftrace_ops_hash { struct ftrace_ops_hash {
@ -250,6 +287,7 @@ struct ftrace_ops {
unsigned long trampoline; unsigned long trampoline;
unsigned long trampoline_size; unsigned long trampoline_size;
struct list_head list; struct list_head list;
ftrace_ops_func_t ops_func;
#endif #endif
}; };
@ -340,6 +378,7 @@ unsigned long ftrace_find_rec_direct(unsigned long ip);
int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr);
int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr);
#else #else
struct ftrace_ops; struct ftrace_ops;
@ -384,6 +423,10 @@ static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned lo
{ {
return -ENODEV; return -ENODEV;
} }
static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr)
{
return -ENODEV;
}
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS

View File

@ -2487,6 +2487,14 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset)
#endif /* NET_SKBUFF_DATA_USES_OFFSET */ #endif /* NET_SKBUFF_DATA_USES_OFFSET */
static inline void skb_assert_len(struct sk_buff *skb)
{
#ifdef CONFIG_DEBUG_NET
if (WARN_ONCE(!skb->len, "%s\n", __func__))
DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
#endif /* CONFIG_DEBUG_NET */
}
/* /*
* Add data to an sk_buff * Add data to an sk_buff
*/ */

View File

@ -84,4 +84,23 @@ void nf_conntrack_lock(spinlock_t *lock);
extern spinlock_t nf_conntrack_expect_lock; extern spinlock_t nf_conntrack_expect_lock;
/* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */
#if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
(IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \
IS_ENABLED(CONFIG_NF_CT_NETLINK))
static inline void __nf_ct_set_timeout(struct nf_conn *ct, u64 timeout)
{
if (timeout > INT_MAX)
timeout = INT_MAX;
WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout);
}
int __nf_ct_change_timeout(struct nf_conn *ct, u64 cta_timeout);
void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off);
int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status);
#endif
#endif /* _NF_CONNTRACK_CORE_H */ #endif /* _NF_CONNTRACK_CORE_H */

View File

@ -44,6 +44,15 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
xp_set_rxq_info(pool, rxq); xp_set_rxq_info(pool, rxq);
} }
static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
{
#ifdef CONFIG_NET_RX_BUSY_POLL
return pool->heads[0].xdp.rxq->napi_id;
#else
return 0;
#endif
}
static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs) unsigned long attrs)
{ {
@ -198,6 +207,11 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
{ {
} }
static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
{
return 0;
}
static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
unsigned long attrs) unsigned long attrs)
{ {

View File

@ -2361,7 +2361,8 @@ union bpf_attr {
* Pull in non-linear data in case the *skb* is non-linear and not * Pull in non-linear data in case the *skb* is non-linear and not
* all of *len* are part of the linear section. Make *len* bytes * all of *len* are part of the linear section. Make *len* bytes
* from *skb* readable and writable. If a zero value is passed for * from *skb* readable and writable. If a zero value is passed for
* *len*, then the whole length of the *skb* is pulled. * *len*, then all bytes in the linear part of *skb* will be made
* readable and writable.
* *
* This helper is only needed for reading and writing with direct * This helper is only needed for reading and writing with direct
* packet access. * packet access.

View File

@ -70,10 +70,8 @@ int array_map_alloc_check(union bpf_attr *attr)
attr->map_flags & BPF_F_PRESERVE_ELEMS) attr->map_flags & BPF_F_PRESERVE_ELEMS)
return -EINVAL; return -EINVAL;
if (attr->value_size > KMALLOC_MAX_SIZE) /* avoid overflow on round_up(map->value_size) */
/* if value_size is bigger, the user space won't be able to if (attr->value_size > INT_MAX)
* access the elements.
*/
return -E2BIG; return -E2BIG;
return 0; return 0;
@ -156,6 +154,11 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
return &array->map; return &array->map;
} }
static void *array_map_elem_ptr(struct bpf_array* array, u32 index)
{
return array->value + (u64)array->elem_size * index;
}
/* Called from syscall or from eBPF program */ /* Called from syscall or from eBPF program */
static void *array_map_lookup_elem(struct bpf_map *map, void *key) static void *array_map_lookup_elem(struct bpf_map *map, void *key)
{ {
@ -165,7 +168,7 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
if (unlikely(index >= array->map.max_entries)) if (unlikely(index >= array->map.max_entries))
return NULL; return NULL;
return array->value + array->elem_size * (index & array->index_mask); return array->value + (u64)array->elem_size * (index & array->index_mask);
} }
static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
@ -203,7 +206,7 @@ static int array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{ {
struct bpf_array *array = container_of(map, struct bpf_array, map); struct bpf_array *array = container_of(map, struct bpf_array, map);
struct bpf_insn *insn = insn_buf; struct bpf_insn *insn = insn_buf;
u32 elem_size = round_up(map->value_size, 8); u32 elem_size = array->elem_size;
const int ret = BPF_REG_0; const int ret = BPF_REG_0;
const int map_ptr = BPF_REG_1; const int map_ptr = BPF_REG_1;
const int index = BPF_REG_2; const int index = BPF_REG_2;
@ -272,7 +275,7 @@ int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
* access 'value_size' of them, so copying rounded areas * access 'value_size' of them, so copying rounded areas
* will not leak any kernel data * will not leak any kernel data
*/ */
size = round_up(map->value_size, 8); size = array->elem_size;
rcu_read_lock(); rcu_read_lock();
pptr = array->pptrs[index & array->index_mask]; pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
@ -339,7 +342,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
value, map->value_size); value, map->value_size);
} else { } else {
val = array->value + val = array->value +
array->elem_size * (index & array->index_mask); (u64)array->elem_size * (index & array->index_mask);
if (map_flags & BPF_F_LOCK) if (map_flags & BPF_F_LOCK)
copy_map_value_locked(map, val, value, false); copy_map_value_locked(map, val, value, false);
else else
@ -376,7 +379,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
* returned or zeros which were zero-filled by percpu_alloc, * returned or zeros which were zero-filled by percpu_alloc,
* so no kernel data leaks possible * so no kernel data leaks possible
*/ */
size = round_up(map->value_size, 8); size = array->elem_size;
rcu_read_lock(); rcu_read_lock();
pptr = array->pptrs[index & array->index_mask]; pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
@ -408,8 +411,7 @@ static void array_map_free_timers(struct bpf_map *map)
return; return;
for (i = 0; i < array->map.max_entries; i++) for (i = 0; i < array->map.max_entries; i++)
bpf_timer_cancel_and_free(array->value + array->elem_size * i + bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off);
map->timer_off);
} }
/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@ -420,7 +422,7 @@ static void array_map_free(struct bpf_map *map)
if (map_value_has_kptrs(map)) { if (map_value_has_kptrs(map)) {
for (i = 0; i < array->map.max_entries; i++) for (i = 0; i < array->map.max_entries; i++)
bpf_map_free_kptrs(map, array->value + array->elem_size * i); bpf_map_free_kptrs(map, array_map_elem_ptr(array, i));
bpf_map_free_kptr_off_tab(map); bpf_map_free_kptr_off_tab(map);
} }
@ -556,7 +558,7 @@ static void *bpf_array_map_seq_start(struct seq_file *seq, loff_t *pos)
index = info->index & array->index_mask; index = info->index & array->index_mask;
if (info->percpu_value_buf) if (info->percpu_value_buf)
return array->pptrs[index]; return array->pptrs[index];
return array->value + array->elem_size * index; return array_map_elem_ptr(array, index);
} }
static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos) static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
@ -575,7 +577,7 @@ static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
index = info->index & array->index_mask; index = info->index & array->index_mask;
if (info->percpu_value_buf) if (info->percpu_value_buf)
return array->pptrs[index]; return array->pptrs[index];
return array->value + array->elem_size * index; return array_map_elem_ptr(array, index);
} }
static int __bpf_array_map_seq_show(struct seq_file *seq, void *v) static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
@ -583,6 +585,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
struct bpf_iter_seq_array_map_info *info = seq->private; struct bpf_iter_seq_array_map_info *info = seq->private;
struct bpf_iter__bpf_map_elem ctx = {}; struct bpf_iter__bpf_map_elem ctx = {};
struct bpf_map *map = info->map; struct bpf_map *map = info->map;
struct bpf_array *array = container_of(map, struct bpf_array, map);
struct bpf_iter_meta meta; struct bpf_iter_meta meta;
struct bpf_prog *prog; struct bpf_prog *prog;
int off = 0, cpu = 0; int off = 0, cpu = 0;
@ -603,7 +606,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
ctx.value = v; ctx.value = v;
} else { } else {
pptr = v; pptr = v;
size = round_up(map->value_size, 8); size = array->elem_size;
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
bpf_long_memcpy(info->percpu_value_buf + off, bpf_long_memcpy(info->percpu_value_buf + off,
per_cpu_ptr(pptr, cpu), per_cpu_ptr(pptr, cpu),
@ -633,11 +636,12 @@ static int bpf_iter_init_array_map(void *priv_data,
{ {
struct bpf_iter_seq_array_map_info *seq_info = priv_data; struct bpf_iter_seq_array_map_info *seq_info = priv_data;
struct bpf_map *map = aux->map; struct bpf_map *map = aux->map;
struct bpf_array *array = container_of(map, struct bpf_array, map);
void *value_buf; void *value_buf;
u32 buf_size; u32 buf_size;
if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
buf_size = round_up(map->value_size, 8) * num_possible_cpus(); buf_size = array->elem_size * num_possible_cpus();
value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN); value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN);
if (!value_buf) if (!value_buf)
return -ENOMEM; return -ENOMEM;
@ -690,7 +694,7 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
if (is_percpu) if (is_percpu)
val = this_cpu_ptr(array->pptrs[i]); val = this_cpu_ptr(array->pptrs[i]);
else else
val = array->value + array->elem_size * i; val = array_map_elem_ptr(array, i);
num_elems++; num_elems++;
key = i; key = i;
ret = callback_fn((u64)(long)map, (u64)(long)&key, ret = callback_fn((u64)(long)map, (u64)(long)&key,
@ -1322,7 +1326,7 @@ static int array_of_map_gen_lookup(struct bpf_map *map,
struct bpf_insn *insn_buf) struct bpf_insn *insn_buf)
{ {
struct bpf_array *array = container_of(map, struct bpf_array, map); struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 elem_size = round_up(map->value_size, 8); u32 elem_size = array->elem_size;
struct bpf_insn *insn = insn_buf; struct bpf_insn *insn = insn_buf;
const int ret = BPF_REG_0; const int ret = BPF_REG_0;
const int map_ptr = BPF_REG_1; const int map_ptr = BPF_REG_1;

View File

@ -63,10 +63,11 @@ BTF_ID(func, bpf_lsm_socket_post_create)
BTF_ID(func, bpf_lsm_socket_socketpair) BTF_ID(func, bpf_lsm_socket_socketpair)
BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks) BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks)
#ifdef CONFIG_CGROUP_BPF
void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
bpf_func_t *bpf_func) bpf_func_t *bpf_func)
{ {
const struct btf_param *args; const struct btf_param *args __maybe_unused;
if (btf_type_vlen(prog->aux->attach_func_proto) < 1 || if (btf_type_vlen(prog->aux->attach_func_proto) < 1 ||
btf_id_set_contains(&bpf_lsm_current_hooks, btf_id_set_contains(&bpf_lsm_current_hooks,
@ -75,9 +76,9 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
return; return;
} }
#ifdef CONFIG_NET
args = btf_params(prog->aux->attach_func_proto); args = btf_params(prog->aux->attach_func_proto);
#ifdef CONFIG_NET
if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET]) if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET])
*bpf_func = __cgroup_bpf_run_lsm_socket; *bpf_func = __cgroup_bpf_run_lsm_socket;
else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK]) else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK])
@ -86,6 +87,7 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
#endif #endif
*bpf_func = __cgroup_bpf_run_lsm_current; *bpf_func = __cgroup_bpf_run_lsm_current;
} }
#endif
int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog) const struct bpf_prog *prog)
@ -219,6 +221,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_retval: case BPF_FUNC_get_retval:
return prog->expected_attach_type == BPF_LSM_CGROUP ? return prog->expected_attach_type == BPF_LSM_CGROUP ?
&bpf_get_retval_proto : NULL; &bpf_get_retval_proto : NULL;
#ifdef CONFIG_NET
case BPF_FUNC_setsockopt: case BPF_FUNC_setsockopt:
if (prog->expected_attach_type != BPF_LSM_CGROUP) if (prog->expected_attach_type != BPF_LSM_CGROUP)
return NULL; return NULL;
@ -239,6 +242,7 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
prog->aux->attach_btf_id)) prog->aux->attach_btf_id))
return &bpf_unlocked_sk_getsockopt_proto; return &bpf_unlocked_sk_getsockopt_proto;
return NULL; return NULL;
#endif
default: default:
return tracing_prog_func_proto(func_id, prog); return tracing_prog_func_proto(func_id, prog);
} }

View File

@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
tlinks[BPF_TRAMP_FENTRY].links[0] = link; tlinks[BPF_TRAMP_FENTRY].links[0] = link;
tlinks[BPF_TRAMP_FENTRY].nr_links = 1; tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
* and it must be used alone.
*/
flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0; flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
return arch_prepare_bpf_trampoline(NULL, image, image_end, return arch_prepare_bpf_trampoline(NULL, image, image_end,
model, flags, tlinks, NULL); model, flags, tlinks, NULL);

View File

@ -213,7 +213,7 @@ enum {
}; };
struct btf_kfunc_set_tab { struct btf_kfunc_set_tab {
struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX]; struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX];
}; };
struct btf_id_dtor_kfunc_tab { struct btf_id_dtor_kfunc_tab {
@ -1116,7 +1116,8 @@ __printf(2, 3) static void btf_show(struct btf_show *show, const char *fmt, ...)
*/ */
#define btf_show_type_value(show, fmt, value) \ #define btf_show_type_value(show, fmt, value) \
do { \ do { \
if ((value) != 0 || (show->flags & BTF_SHOW_ZERO) || \ if ((value) != (__typeof__(value))0 || \
(show->flags & BTF_SHOW_ZERO) || \
show->state.depth == 0) { \ show->state.depth == 0) { \
btf_show(show, "%s%s" fmt "%s%s", \ btf_show(show, "%s%s" fmt "%s%s", \
btf_show_indent(show), \ btf_show_indent(show), \
@ -1615,7 +1616,7 @@ static void btf_free_id(struct btf *btf)
static void btf_free_kfunc_set_tab(struct btf *btf) static void btf_free_kfunc_set_tab(struct btf *btf)
{ {
struct btf_kfunc_set_tab *tab = btf->kfunc_set_tab; struct btf_kfunc_set_tab *tab = btf->kfunc_set_tab;
int hook, type; int hook;
if (!tab) if (!tab)
return; return;
@ -1624,10 +1625,8 @@ static void btf_free_kfunc_set_tab(struct btf *btf)
*/ */
if (btf_is_module(btf)) if (btf_is_module(btf))
goto free_tab; goto free_tab;
for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) { for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++)
for (type = 0; type < ARRAY_SIZE(tab->sets[0]); type++) kfree(tab->sets[hook]);
kfree(tab->sets[hook][type]);
}
free_tab: free_tab:
kfree(tab); kfree(tab);
btf->kfunc_set_tab = NULL; btf->kfunc_set_tab = NULL;
@ -6171,13 +6170,14 @@ static bool is_kfunc_arg_mem_size(const struct btf *btf,
static int btf_check_func_arg_match(struct bpf_verifier_env *env, static int btf_check_func_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id, const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs, struct bpf_reg_state *regs,
bool ptr_to_mem_ok) bool ptr_to_mem_ok,
u32 kfunc_flags)
{ {
enum bpf_prog_type prog_type = resolve_prog_type(env->prog); enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
bool rel = false, kptr_get = false, trusted_arg = false;
struct bpf_verifier_log *log = &env->log; struct bpf_verifier_log *log = &env->log;
u32 i, nargs, ref_id, ref_obj_id = 0; u32 i, nargs, ref_id, ref_obj_id = 0;
bool is_kfunc = btf_is_kernel(btf); bool is_kfunc = btf_is_kernel(btf);
bool rel = false, kptr_get = false;
const char *func_name, *ref_tname; const char *func_name, *ref_tname;
const struct btf_type *t, *ref_t; const struct btf_type *t, *ref_t;
const struct btf_param *args; const struct btf_param *args;
@ -6209,10 +6209,9 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
if (is_kfunc) { if (is_kfunc) {
/* Only kfunc can be release func */ /* Only kfunc can be release func */
rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), rel = kfunc_flags & KF_RELEASE;
BTF_KFUNC_TYPE_RELEASE, func_id); kptr_get = kfunc_flags & KF_KPTR_GET;
kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), trusted_arg = kfunc_flags & KF_TRUSTED_ARGS;
BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id);
} }
/* check that BTF function arguments match actual types that the /* check that BTF function arguments match actual types that the
@ -6237,10 +6236,19 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
return -EINVAL; return -EINVAL;
} }
/* Check if argument must be a referenced pointer, args + i has
* been verified to be a pointer (after skipping modifiers).
*/
if (is_kfunc && trusted_arg && !reg->ref_obj_id) {
bpf_log(log, "R%d must be referenced\n", regno);
return -EINVAL;
}
ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
ref_tname = btf_name_by_offset(btf, ref_t->name_off); ref_tname = btf_name_by_offset(btf, ref_t->name_off);
if (rel && reg->ref_obj_id) /* Trusted args have the same offset checks as release arguments */
if (trusted_arg || (rel && reg->ref_obj_id))
arg_type |= OBJ_RELEASE; arg_type |= OBJ_RELEASE;
ret = check_func_arg_reg_off(env, reg, regno, arg_type); ret = check_func_arg_reg_off(env, reg, regno, arg_type);
if (ret < 0) if (ret < 0)
@ -6338,7 +6346,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_tname = btf_name_by_offset(reg_btf,
reg_ref_t->name_off); reg_ref_t->name_off);
if (!btf_struct_ids_match(log, reg_btf, reg_ref_id, if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
reg->off, btf, ref_id, rel && reg->ref_obj_id)) { reg->off, btf, ref_id,
trusted_arg || (rel && reg->ref_obj_id))) {
bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
func_name, i, func_name, i,
btf_type_str(ref_t), ref_tname, btf_type_str(ref_t), ref_tname,
@ -6441,7 +6450,7 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
return -EINVAL; return -EINVAL;
is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global); err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0);
/* Compiler optimizations can remove arguments from static functions /* Compiler optimizations can remove arguments from static functions
* or mismatched type can be passed into a global function. * or mismatched type can be passed into a global function.
@ -6454,9 +6463,10 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, int btf_check_kfunc_arg_match(struct bpf_verifier_env *env,
const struct btf *btf, u32 func_id, const struct btf *btf, u32 func_id,
struct bpf_reg_state *regs) struct bpf_reg_state *regs,
u32 kfunc_flags)
{ {
return btf_check_func_arg_match(env, btf, func_id, regs, true); return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags);
} }
/* Convert BTF of a function into bpf_reg_state if possible /* Convert BTF of a function into bpf_reg_state if possible
@ -6853,6 +6863,11 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id)
return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL;
} }
static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
{
return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func);
}
enum { enum {
BTF_MODULE_F_LIVE = (1 << 0), BTF_MODULE_F_LIVE = (1 << 0),
}; };
@ -7101,16 +7116,16 @@ BTF_TRACING_TYPE_xxx
/* Kernel Function (kfunc) BTF ID set registration API */ /* Kernel Function (kfunc) BTF ID set registration API */
static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
enum btf_kfunc_type type, struct btf_id_set8 *add_set)
struct btf_id_set *add_set, bool vmlinux_set)
{ {
bool vmlinux_set = !btf_is_module(btf);
struct btf_kfunc_set_tab *tab; struct btf_kfunc_set_tab *tab;
struct btf_id_set *set; struct btf_id_set8 *set;
u32 set_cnt; u32 set_cnt;
int ret; int ret;
if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) { if (hook >= BTF_KFUNC_HOOK_MAX) {
ret = -EINVAL; ret = -EINVAL;
goto end; goto end;
} }
@ -7126,7 +7141,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
btf->kfunc_set_tab = tab; btf->kfunc_set_tab = tab;
} }
set = tab->sets[hook][type]; set = tab->sets[hook];
/* Warn when register_btf_kfunc_id_set is called twice for the same hook /* Warn when register_btf_kfunc_id_set is called twice for the same hook
* for module sets. * for module sets.
*/ */
@ -7140,7 +7155,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
* pointer and return. * pointer and return.
*/ */
if (!vmlinux_set) { if (!vmlinux_set) {
tab->sets[hook][type] = add_set; tab->sets[hook] = add_set;
return 0; return 0;
} }
@ -7149,7 +7164,7 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
* and concatenate all individual sets being registered. While each set * and concatenate all individual sets being registered. While each set
* is individually sorted, they may become unsorted when concatenated, * is individually sorted, they may become unsorted when concatenated,
* hence re-sorting the final set again is required to make binary * hence re-sorting the final set again is required to make binary
* searching the set using btf_id_set_contains function work. * searching the set using btf_id_set8_contains function work.
*/ */
set_cnt = set ? set->cnt : 0; set_cnt = set ? set->cnt : 0;
@ -7164,8 +7179,8 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
} }
/* Grow set */ /* Grow set */
set = krealloc(tab->sets[hook][type], set = krealloc(tab->sets[hook],
offsetof(struct btf_id_set, ids[set_cnt + add_set->cnt]), offsetof(struct btf_id_set8, pairs[set_cnt + add_set->cnt]),
GFP_KERNEL | __GFP_NOWARN); GFP_KERNEL | __GFP_NOWARN);
if (!set) { if (!set) {
ret = -ENOMEM; ret = -ENOMEM;
@ -7173,15 +7188,15 @@ static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
} }
/* For newly allocated set, initialize set->cnt to 0 */ /* For newly allocated set, initialize set->cnt to 0 */
if (!tab->sets[hook][type]) if (!tab->sets[hook])
set->cnt = 0; set->cnt = 0;
tab->sets[hook][type] = set; tab->sets[hook] = set;
/* Concatenate the two sets */ /* Concatenate the two sets */
memcpy(set->ids + set->cnt, add_set->ids, add_set->cnt * sizeof(set->ids[0])); memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0]));
set->cnt += add_set->cnt; set->cnt += add_set->cnt;
sort(set->ids, set->cnt, sizeof(set->ids[0]), btf_id_cmp_func, NULL); sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
return 0; return 0;
end: end:
@ -7189,38 +7204,25 @@ end:
return ret; return ret;
} }
static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, static u32 *__btf_kfunc_id_set_contains(const struct btf *btf,
const struct btf_kfunc_id_set *kset)
{
bool vmlinux_set = !btf_is_module(btf);
int type, ret = 0;
for (type = 0; type < ARRAY_SIZE(kset->sets); type++) {
if (!kset->sets[type])
continue;
ret = __btf_populate_kfunc_set(btf, hook, type, kset->sets[type], vmlinux_set);
if (ret)
break;
}
return ret;
}
static bool __btf_kfunc_id_set_contains(const struct btf *btf,
enum btf_kfunc_hook hook, enum btf_kfunc_hook hook,
enum btf_kfunc_type type,
u32 kfunc_btf_id) u32 kfunc_btf_id)
{ {
struct btf_id_set *set; struct btf_id_set8 *set;
u32 *id;
if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) if (hook >= BTF_KFUNC_HOOK_MAX)
return false; return NULL;
if (!btf->kfunc_set_tab) if (!btf->kfunc_set_tab)
return false; return NULL;
set = btf->kfunc_set_tab->sets[hook][type]; set = btf->kfunc_set_tab->sets[hook];
if (!set) if (!set)
return false; return NULL;
return btf_id_set_contains(set, kfunc_btf_id); id = btf_id_set8_contains(set, kfunc_btf_id);
if (!id)
return NULL;
/* The flags for BTF ID are located next to it */
return id + 1;
} }
static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
@ -7248,14 +7250,14 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
* keeping the reference for the duration of the call provides the necessary * keeping the reference for the duration of the call provides the necessary
* protection for looking up a well-formed btf->kfunc_set_tab. * protection for looking up a well-formed btf->kfunc_set_tab.
*/ */
bool btf_kfunc_id_set_contains(const struct btf *btf, u32 *btf_kfunc_id_set_contains(const struct btf *btf,
enum bpf_prog_type prog_type, enum bpf_prog_type prog_type,
enum btf_kfunc_type type, u32 kfunc_btf_id) u32 kfunc_btf_id)
{ {
enum btf_kfunc_hook hook; enum btf_kfunc_hook hook;
hook = bpf_prog_type_to_kfunc_hook(prog_type); hook = bpf_prog_type_to_kfunc_hook(prog_type);
return __btf_kfunc_id_set_contains(btf, hook, type, kfunc_btf_id); return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);
} }
/* This function must be invoked only from initcalls/module init functions */ /* This function must be invoked only from initcalls/module init functions */
@ -7282,7 +7284,7 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
return PTR_ERR(btf); return PTR_ERR(btf);
hook = bpf_prog_type_to_kfunc_hook(prog_type); hook = bpf_prog_type_to_kfunc_hook(prog_type);
ret = btf_populate_kfunc_set(btf, hook, kset); ret = btf_populate_kfunc_set(btf, hook, kset->set);
btf_put(btf); btf_put(btf);
return ret; return ret;
} }

View File

@ -652,12 +652,6 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
return fp->jited && !bpf_prog_was_classic(fp); return fp->jited && !bpf_prog_was_classic(fp);
} }
static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
{
return list_empty(&fp->aux->ksym.lnode) ||
fp->aux->ksym.lnode.prev == LIST_POISON2;
}
void bpf_prog_kallsyms_add(struct bpf_prog *fp) void bpf_prog_kallsyms_add(struct bpf_prog *fp)
{ {
if (!bpf_prog_kallsyms_candidate(fp) || if (!bpf_prog_kallsyms_candidate(fp) ||
@ -833,15 +827,6 @@ struct bpf_prog_pack {
#define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) #define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE)
static size_t bpf_prog_pack_size = -1;
static size_t bpf_prog_pack_mask = -1;
static int bpf_prog_chunk_count(void)
{
WARN_ON_ONCE(bpf_prog_pack_size == -1);
return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE;
}
static DEFINE_MUTEX(pack_mutex); static DEFINE_MUTEX(pack_mutex);
static LIST_HEAD(pack_list); static LIST_HEAD(pack_list);
@ -849,55 +834,33 @@ static LIST_HEAD(pack_list);
* CONFIG_MMU=n. Use PAGE_SIZE in these cases. * CONFIG_MMU=n. Use PAGE_SIZE in these cases.
*/ */
#ifdef PMD_SIZE #ifdef PMD_SIZE
#define BPF_HPAGE_SIZE PMD_SIZE #define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes())
#define BPF_HPAGE_MASK PMD_MASK
#else #else
#define BPF_HPAGE_SIZE PAGE_SIZE #define BPF_PROG_PACK_SIZE PAGE_SIZE
#define BPF_HPAGE_MASK PAGE_MASK
#endif #endif
static size_t select_bpf_prog_pack_size(void) #define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE)
{
size_t size;
void *ptr;
size = BPF_HPAGE_SIZE * num_online_nodes();
ptr = module_alloc(size);
/* Test whether we can get huge pages. If not just use PAGE_SIZE
* packs.
*/
if (!ptr || !is_vm_area_hugepages(ptr)) {
size = PAGE_SIZE;
bpf_prog_pack_mask = PAGE_MASK;
} else {
bpf_prog_pack_mask = BPF_HPAGE_MASK;
}
vfree(ptr);
return size;
}
static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns)
{ {
struct bpf_prog_pack *pack; struct bpf_prog_pack *pack;
pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())), pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)),
GFP_KERNEL); GFP_KERNEL);
if (!pack) if (!pack)
return NULL; return NULL;
pack->ptr = module_alloc(bpf_prog_pack_size); pack->ptr = module_alloc(BPF_PROG_PACK_SIZE);
if (!pack->ptr) { if (!pack->ptr) {
kfree(pack); kfree(pack);
return NULL; return NULL;
} }
bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size); bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE);
bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE);
list_add_tail(&pack->list, &pack_list); list_add_tail(&pack->list, &pack_list);
set_vm_flush_reset_perms(pack->ptr); set_vm_flush_reset_perms(pack->ptr);
set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
return pack; return pack;
} }
@ -909,10 +872,7 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn
void *ptr = NULL; void *ptr = NULL;
mutex_lock(&pack_mutex); mutex_lock(&pack_mutex);
if (bpf_prog_pack_size == -1) if (size > BPF_PROG_PACK_SIZE) {
bpf_prog_pack_size = select_bpf_prog_pack_size();
if (size > bpf_prog_pack_size) {
size = round_up(size, PAGE_SIZE); size = round_up(size, PAGE_SIZE);
ptr = module_alloc(size); ptr = module_alloc(size);
if (ptr) { if (ptr) {
@ -924,9 +884,9 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn
goto out; goto out;
} }
list_for_each_entry(pack, &pack_list, list) { list_for_each_entry(pack, &pack_list, list) {
pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
nbits, 0); nbits, 0);
if (pos < bpf_prog_chunk_count()) if (pos < BPF_PROG_CHUNK_COUNT)
goto found_free_area; goto found_free_area;
} }
@ -950,18 +910,15 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
struct bpf_prog_pack *pack = NULL, *tmp; struct bpf_prog_pack *pack = NULL, *tmp;
unsigned int nbits; unsigned int nbits;
unsigned long pos; unsigned long pos;
void *pack_ptr;
mutex_lock(&pack_mutex); mutex_lock(&pack_mutex);
if (hdr->size > bpf_prog_pack_size) { if (hdr->size > BPF_PROG_PACK_SIZE) {
module_memfree(hdr); module_memfree(hdr);
goto out; goto out;
} }
pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask);
list_for_each_entry(tmp, &pack_list, list) { list_for_each_entry(tmp, &pack_list, list) {
if (tmp->ptr == pack_ptr) { if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) {
pack = tmp; pack = tmp;
break; break;
} }
@ -971,14 +928,14 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
goto out; goto out;
nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size);
pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT; pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT;
WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size),
"bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n");
bitmap_clear(pack->bitmap, pos, nbits); bitmap_clear(pack->bitmap, pos, nbits);
if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0,
bpf_prog_chunk_count(), 0) == 0) { BPF_PROG_CHUNK_COUNT, 0) == 0) {
list_del(&pack->list); list_del(&pack->list);
module_memfree(pack->ptr); module_memfree(pack->ptr);
kfree(pack); kfree(pack);
@ -1155,7 +1112,6 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog,
bpf_prog_pack_free(ro_header); bpf_prog_pack_free(ro_header);
return PTR_ERR(ptr); return PTR_ERR(ptr);
} }
prog->aux->use_bpf_prog_pack = true;
return 0; return 0;
} }
@ -1179,17 +1135,23 @@ void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header,
bpf_jit_uncharge_modmem(size); bpf_jit_uncharge_modmem(size);
} }
struct bpf_binary_header *
bpf_jit_binary_pack_hdr(const struct bpf_prog *fp)
{
unsigned long real_start = (unsigned long)fp->bpf_func;
unsigned long addr;
addr = real_start & BPF_PROG_CHUNK_MASK;
return (void *)addr;
}
static inline struct bpf_binary_header * static inline struct bpf_binary_header *
bpf_jit_binary_hdr(const struct bpf_prog *fp) bpf_jit_binary_hdr(const struct bpf_prog *fp)
{ {
unsigned long real_start = (unsigned long)fp->bpf_func; unsigned long real_start = (unsigned long)fp->bpf_func;
unsigned long addr; unsigned long addr;
if (fp->aux->use_bpf_prog_pack) addr = real_start & PAGE_MASK;
addr = real_start & BPF_PROG_CHUNK_MASK;
else
addr = real_start & PAGE_MASK;
return (void *)addr; return (void *)addr;
} }
@ -1202,11 +1164,7 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
if (fp->jited) { if (fp->jited) {
struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
if (fp->aux->use_bpf_prog_pack) bpf_jit_binary_free(hdr);
bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */);
else
bpf_jit_binary_free(hdr);
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
} }

View File

@ -845,7 +845,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
struct bpf_dtab_netdev *dev; struct bpf_dtab_netdev *dev;
dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev), dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev),
GFP_ATOMIC | __GFP_NOWARN, GFP_NOWAIT | __GFP_NOWARN,
dtab->map.numa_node); dtab->map.numa_node);
if (!dev) if (!dev)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);

View File

@ -61,7 +61,7 @@
* *
* As regular device interrupt handlers and soft interrupts are forced into * As regular device interrupt handlers and soft interrupts are forced into
* thread context, the existing code which does * thread context, the existing code which does
* spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*(); * spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
* just works. * just works.
* *
* In theory the BPF locks could be converted to regular spinlocks as well, * In theory the BPF locks could be converted to regular spinlocks as well,
@ -978,7 +978,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
goto dec_count; goto dec_count;
} }
l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
GFP_ATOMIC | __GFP_NOWARN, GFP_NOWAIT | __GFP_NOWARN,
htab->map.numa_node); htab->map.numa_node);
if (!l_new) { if (!l_new) {
l_new = ERR_PTR(-ENOMEM); l_new = ERR_PTR(-ENOMEM);
@ -996,7 +996,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
} else { } else {
/* alloc_percpu zero-fills */ /* alloc_percpu zero-fills */
pptr = bpf_map_alloc_percpu(&htab->map, size, 8, pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
GFP_ATOMIC | __GFP_NOWARN); GFP_NOWAIT | __GFP_NOWARN);
if (!pptr) { if (!pptr) {
kfree(l_new); kfree(l_new);
l_new = ERR_PTR(-ENOMEM); l_new = ERR_PTR(-ENOMEM);

View File

@ -165,7 +165,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
} }
new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size), new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size),
__GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN, __GFP_ZERO | GFP_NOWAIT | __GFP_NOWARN,
map->numa_node); map->numa_node);
if (!new) if (!new)
return -ENOMEM; return -ENOMEM;

View File

@ -285,7 +285,7 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
if (value) if (value)
size += trie->map.value_size; size += trie->map.value_size;
node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN, node = bpf_map_kmalloc_node(&trie->map, size, GFP_NOWAIT | __GFP_NOWARN,
trie->map.numa_node); trie->map.numa_node);
if (!node) if (!node)
return NULL; return NULL;

View File

@ -9,7 +9,7 @@ LLVM_STRIP ?= llvm-strip
TOOLS_PATH := $(abspath ../../../../tools) TOOLS_PATH := $(abspath ../../../../tools)
BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool
BPFTOOL_OUTPUT := $(abs_out)/bpftool BPFTOOL_OUTPUT := $(abs_out)/bpftool
DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL) BPFTOOL ?= $(DEFAULT_BPFTOOL)
LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf
@ -61,9 +61,5 @@ $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OU
OUTPUT=$(abspath $(dir $@))/ prefix= \ OUTPUT=$(abspath $(dir $@))/ prefix= \
DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers
$(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) \ $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap
OUTPUT=$(BPFTOOL_OUTPUT)/ \
LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \
LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ \
prefix= DESTDIR=$(abs_out)/ install-bin

View File

@ -419,35 +419,53 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
#ifdef CONFIG_MEMCG_KMEM #ifdef CONFIG_MEMCG_KMEM
static void bpf_map_save_memcg(struct bpf_map *map) static void bpf_map_save_memcg(struct bpf_map *map)
{ {
map->memcg = get_mem_cgroup_from_mm(current->mm); /* Currently if a map is created by a process belonging to the root
* memory cgroup, get_obj_cgroup_from_current() will return NULL.
* So we have to check map->objcg for being NULL each time it's
* being used.
*/
map->objcg = get_obj_cgroup_from_current();
} }
static void bpf_map_release_memcg(struct bpf_map *map) static void bpf_map_release_memcg(struct bpf_map *map)
{ {
mem_cgroup_put(map->memcg); if (map->objcg)
obj_cgroup_put(map->objcg);
}
static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
{
if (map->objcg)
return get_mem_cgroup_from_objcg(map->objcg);
return root_mem_cgroup;
} }
void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
int node) int node)
{ {
struct mem_cgroup *old_memcg; struct mem_cgroup *memcg, *old_memcg;
void *ptr; void *ptr;
old_memcg = set_active_memcg(map->memcg); memcg = bpf_map_get_memcg(map);
old_memcg = set_active_memcg(memcg);
ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
set_active_memcg(old_memcg); set_active_memcg(old_memcg);
mem_cgroup_put(memcg);
return ptr; return ptr;
} }
void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
{ {
struct mem_cgroup *old_memcg; struct mem_cgroup *memcg, *old_memcg;
void *ptr; void *ptr;
old_memcg = set_active_memcg(map->memcg); memcg = bpf_map_get_memcg(map);
old_memcg = set_active_memcg(memcg);
ptr = kzalloc(size, flags | __GFP_ACCOUNT); ptr = kzalloc(size, flags | __GFP_ACCOUNT);
set_active_memcg(old_memcg); set_active_memcg(old_memcg);
mem_cgroup_put(memcg);
return ptr; return ptr;
} }
@ -455,12 +473,14 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
size_t align, gfp_t flags) size_t align, gfp_t flags)
{ {
struct mem_cgroup *old_memcg; struct mem_cgroup *memcg, *old_memcg;
void __percpu *ptr; void __percpu *ptr;
old_memcg = set_active_memcg(map->memcg); memcg = bpf_map_get_memcg(map);
old_memcg = set_active_memcg(memcg);
ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT);
set_active_memcg(old_memcg); set_active_memcg(old_memcg);
mem_cgroup_put(memcg);
return ptr; return ptr;
} }

View File

@ -13,6 +13,7 @@
#include <linux/static_call.h> #include <linux/static_call.h>
#include <linux/bpf_verifier.h> #include <linux/bpf_verifier.h>
#include <linux/bpf_lsm.h> #include <linux/bpf_lsm.h>
#include <linux/delay.h>
/* dummy _ops. The verifier will operate on target program's ops. */ /* dummy _ops. The verifier will operate on target program's ops. */
const struct bpf_verifier_ops bpf_extension_verifier_ops = { const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@ -29,6 +30,81 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline_table */ /* serializes access to trampoline_table */
static DEFINE_MUTEX(trampoline_mutex); static DEFINE_MUTEX(trampoline_mutex);
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex);
static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, enum ftrace_ops_cmd cmd)
{
struct bpf_trampoline *tr = ops->private;
int ret = 0;
if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) {
/* This is called inside register_ftrace_direct_multi(), so
* tr->mutex is already locked.
*/
lockdep_assert_held_once(&tr->mutex);
/* Instead of updating the trampoline here, we propagate
* -EAGAIN to register_ftrace_direct_multi(). Then we can
* retry register_ftrace_direct_multi() after updating the
* trampoline.
*/
if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
!(tr->flags & BPF_TRAMP_F_ORIG_STACK)) {
if (WARN_ON_ONCE(tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY))
return -EBUSY;
tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
return -EAGAIN;
}
return 0;
}
/* The normal locking order is
* tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c)
*
* The following two commands are called from
*
* prepare_direct_functions_for_ipmodify
* cleanup_direct_functions_after_ipmodify
*
* In both cases, direct_mutex is already locked. Use
* mutex_trylock(&tr->mutex) to avoid deadlock in race condition
* (something else is making changes to this same trampoline).
*/
if (!mutex_trylock(&tr->mutex)) {
/* sleep 1 ms to make sure whatever holding tr->mutex makes
* some progress.
*/
msleep(1);
return -EAGAIN;
}
switch (cmd) {
case FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER:
tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY;
if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) &&
!(tr->flags & BPF_TRAMP_F_ORIG_STACK))
ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
break;
case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER:
tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY;
if (tr->flags & BPF_TRAMP_F_ORIG_STACK)
ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */);
break;
default:
ret = -EINVAL;
break;
};
mutex_unlock(&tr->mutex);
return ret;
}
#endif
bool bpf_prog_has_trampoline(const struct bpf_prog *prog) bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
{ {
enum bpf_attach_type eatype = prog->expected_attach_type; enum bpf_attach_type eatype = prog->expected_attach_type;
@ -89,6 +165,16 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
tr = kzalloc(sizeof(*tr), GFP_KERNEL); tr = kzalloc(sizeof(*tr), GFP_KERNEL);
if (!tr) if (!tr)
goto out; goto out;
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
tr->fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL);
if (!tr->fops) {
kfree(tr);
tr = NULL;
goto out;
}
tr->fops->private = tr;
tr->fops->ops_func = bpf_tramp_ftrace_ops_func;
#endif
tr->key = key; tr->key = key;
INIT_HLIST_NODE(&tr->hlist); INIT_HLIST_NODE(&tr->hlist);
@ -128,7 +214,7 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
int ret; int ret;
if (tr->func.ftrace_managed) if (tr->func.ftrace_managed)
ret = unregister_ftrace_direct((long)ip, (long)old_addr); ret = unregister_ftrace_direct_multi(tr->fops, (long)old_addr);
else else
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL); ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
@ -137,15 +223,20 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
return ret; return ret;
} }
static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr) static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr,
bool lock_direct_mutex)
{ {
void *ip = tr->func.addr; void *ip = tr->func.addr;
int ret; int ret;
if (tr->func.ftrace_managed) if (tr->func.ftrace_managed) {
ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr); if (lock_direct_mutex)
else ret = modify_ftrace_direct_multi(tr->fops, (long)new_addr);
else
ret = modify_ftrace_direct_multi_nolock(tr->fops, (long)new_addr);
} else {
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr); ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr);
}
return ret; return ret;
} }
@ -163,10 +254,12 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
if (bpf_trampoline_module_get(tr)) if (bpf_trampoline_module_get(tr))
return -ENOENT; return -ENOENT;
if (tr->func.ftrace_managed) if (tr->func.ftrace_managed) {
ret = register_ftrace_direct((long)ip, (long)new_addr); ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 0);
else ret = register_ftrace_direct_multi(tr->fops, (long)new_addr);
} else {
ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr); ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
}
if (ret) if (ret)
bpf_trampoline_module_put(tr); bpf_trampoline_module_put(tr);
@ -332,11 +425,11 @@ out:
return ERR_PTR(err); return ERR_PTR(err);
} }
static int bpf_trampoline_update(struct bpf_trampoline *tr) static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex)
{ {
struct bpf_tramp_image *im; struct bpf_tramp_image *im;
struct bpf_tramp_links *tlinks; struct bpf_tramp_links *tlinks;
u32 flags = BPF_TRAMP_F_RESTORE_REGS; u32 orig_flags = tr->flags;
bool ip_arg = false; bool ip_arg = false;
int err, total; int err, total;
@ -358,15 +451,31 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
goto out; goto out;
} }
/* clear all bits except SHARE_IPMODIFY */
tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;
if (tlinks[BPF_TRAMP_FEXIT].nr_links || if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; /* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
* should not be set together.
*/
tr->flags |= BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
} else {
tr->flags |= BPF_TRAMP_F_RESTORE_REGS;
}
if (ip_arg) if (ip_arg)
flags |= BPF_TRAMP_F_IP_ARG; tr->flags |= BPF_TRAMP_F_IP_ARG;
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
again:
if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) &&
(tr->flags & BPF_TRAMP_F_CALL_ORIG))
tr->flags |= BPF_TRAMP_F_ORIG_STACK;
#endif
err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE, err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE,
&tr->func.model, flags, tlinks, &tr->func.model, tr->flags, tlinks,
tr->func.addr); tr->func.addr);
if (err < 0) if (err < 0)
goto out; goto out;
@ -375,17 +484,34 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
WARN_ON(!tr->cur_image && tr->selector); WARN_ON(!tr->cur_image && tr->selector);
if (tr->cur_image) if (tr->cur_image)
/* progs already running at this address */ /* progs already running at this address */
err = modify_fentry(tr, tr->cur_image->image, im->image); err = modify_fentry(tr, tr->cur_image->image, im->image, lock_direct_mutex);
else else
/* first time registering */ /* first time registering */
err = register_fentry(tr, im->image); err = register_fentry(tr, im->image);
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
if (err == -EAGAIN) {
/* -EAGAIN from bpf_tramp_ftrace_ops_func. Now
* BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the
* trampoline again, and retry register.
*/
/* reset fops->func and fops->trampoline for re-register */
tr->fops->func = NULL;
tr->fops->trampoline = 0;
goto again;
}
#endif
if (err) if (err)
goto out; goto out;
if (tr->cur_image) if (tr->cur_image)
bpf_tramp_image_put(tr->cur_image); bpf_tramp_image_put(tr->cur_image);
tr->cur_image = im; tr->cur_image = im;
tr->selector++; tr->selector++;
out: out:
/* If any error happens, restore previous flags */
if (err)
tr->flags = orig_flags;
kfree(tlinks); kfree(tlinks);
return err; return err;
} }
@ -451,7 +577,7 @@ static int __bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_tr
hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]); hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]);
tr->progs_cnt[kind]++; tr->progs_cnt[kind]++;
err = bpf_trampoline_update(tr); err = bpf_trampoline_update(tr, true /* lock_direct_mutex */);
if (err) { if (err) {
hlist_del_init(&link->tramp_hlist); hlist_del_init(&link->tramp_hlist);
tr->progs_cnt[kind]--; tr->progs_cnt[kind]--;
@ -484,7 +610,7 @@ static int __bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_
} }
hlist_del_init(&link->tramp_hlist); hlist_del_init(&link->tramp_hlist);
tr->progs_cnt[kind]--; tr->progs_cnt[kind]--;
return bpf_trampoline_update(tr); return bpf_trampoline_update(tr, true /* lock_direct_mutex */);
} }
/* bpf_trampoline_unlink_prog() should never fail. */ /* bpf_trampoline_unlink_prog() should never fail. */
@ -498,7 +624,7 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampolin
return err; return err;
} }
#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM)
static void bpf_shim_tramp_link_release(struct bpf_link *link) static void bpf_shim_tramp_link_release(struct bpf_link *link)
{ {
struct bpf_shim_tramp_link *shim_link = struct bpf_shim_tramp_link *shim_link =
@ -712,6 +838,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
* multiple rcu callbacks. * multiple rcu callbacks.
*/ */
hlist_del(&tr->hlist); hlist_del(&tr->hlist);
kfree(tr->fops);
kfree(tr); kfree(tr);
out: out:
mutex_unlock(&trampoline_mutex); mutex_unlock(&trampoline_mutex);

View File

@ -5533,17 +5533,6 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
type == ARG_CONST_SIZE_OR_ZERO; type == ARG_CONST_SIZE_OR_ZERO;
} }
static bool arg_type_is_alloc_size(enum bpf_arg_type type)
{
return type == ARG_CONST_ALLOC_SIZE_OR_ZERO;
}
static bool arg_type_is_int_ptr(enum bpf_arg_type type)
{
return type == ARG_PTR_TO_INT ||
type == ARG_PTR_TO_LONG;
}
static bool arg_type_is_release(enum bpf_arg_type type) static bool arg_type_is_release(enum bpf_arg_type type)
{ {
return type & OBJ_RELEASE; return type & OBJ_RELEASE;
@ -5929,7 +5918,8 @@ skip_type_check:
meta->ref_obj_id = reg->ref_obj_id; meta->ref_obj_id = reg->ref_obj_id;
} }
if (arg_type == ARG_CONST_MAP_PTR) { switch (base_type(arg_type)) {
case ARG_CONST_MAP_PTR:
/* bpf_map_xxx(map_ptr) call: remember that map_ptr */ /* bpf_map_xxx(map_ptr) call: remember that map_ptr */
if (meta->map_ptr) { if (meta->map_ptr) {
/* Use map_uid (which is unique id of inner map) to reject: /* Use map_uid (which is unique id of inner map) to reject:
@ -5954,7 +5944,8 @@ skip_type_check:
} }
meta->map_ptr = reg->map_ptr; meta->map_ptr = reg->map_ptr;
meta->map_uid = reg->map_uid; meta->map_uid = reg->map_uid;
} else if (arg_type == ARG_PTR_TO_MAP_KEY) { break;
case ARG_PTR_TO_MAP_KEY:
/* bpf_map_xxx(..., map_ptr, ..., key) call: /* bpf_map_xxx(..., map_ptr, ..., key) call:
* check that [key, key + map->key_size) are within * check that [key, key + map->key_size) are within
* stack limits and initialized * stack limits and initialized
@ -5971,7 +5962,8 @@ skip_type_check:
err = check_helper_mem_access(env, regno, err = check_helper_mem_access(env, regno,
meta->map_ptr->key_size, false, meta->map_ptr->key_size, false,
NULL); NULL);
} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) { break;
case ARG_PTR_TO_MAP_VALUE:
if (type_may_be_null(arg_type) && register_is_null(reg)) if (type_may_be_null(arg_type) && register_is_null(reg))
return 0; return 0;
@ -5987,14 +5979,16 @@ skip_type_check:
err = check_helper_mem_access(env, regno, err = check_helper_mem_access(env, regno,
meta->map_ptr->value_size, false, meta->map_ptr->value_size, false,
meta); meta);
} else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) { break;
case ARG_PTR_TO_PERCPU_BTF_ID:
if (!reg->btf_id) { if (!reg->btf_id) {
verbose(env, "Helper has invalid btf_id in R%d\n", regno); verbose(env, "Helper has invalid btf_id in R%d\n", regno);
return -EACCES; return -EACCES;
} }
meta->ret_btf = reg->btf; meta->ret_btf = reg->btf;
meta->ret_btf_id = reg->btf_id; meta->ret_btf_id = reg->btf_id;
} else if (arg_type == ARG_PTR_TO_SPIN_LOCK) { break;
case ARG_PTR_TO_SPIN_LOCK:
if (meta->func_id == BPF_FUNC_spin_lock) { if (meta->func_id == BPF_FUNC_spin_lock) {
if (process_spin_lock(env, regno, true)) if (process_spin_lock(env, regno, true))
return -EACCES; return -EACCES;
@ -6005,12 +5999,15 @@ skip_type_check:
verbose(env, "verifier internal error\n"); verbose(env, "verifier internal error\n");
return -EFAULT; return -EFAULT;
} }
} else if (arg_type == ARG_PTR_TO_TIMER) { break;
case ARG_PTR_TO_TIMER:
if (process_timer_func(env, regno, meta)) if (process_timer_func(env, regno, meta))
return -EACCES; return -EACCES;
} else if (arg_type == ARG_PTR_TO_FUNC) { break;
case ARG_PTR_TO_FUNC:
meta->subprogno = reg->subprogno; meta->subprogno = reg->subprogno;
} else if (base_type(arg_type) == ARG_PTR_TO_MEM) { break;
case ARG_PTR_TO_MEM:
/* The access to this pointer is only checked when we hit the /* The access to this pointer is only checked when we hit the
* next is_mem_size argument below. * next is_mem_size argument below.
*/ */
@ -6020,11 +6017,14 @@ skip_type_check:
fn->arg_size[arg], false, fn->arg_size[arg], false,
meta); meta);
} }
} else if (arg_type_is_mem_size(arg_type)) { break;
bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO); case ARG_CONST_SIZE:
err = check_mem_size_reg(env, reg, regno, false, meta);
err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta); break;
} else if (arg_type_is_dynptr(arg_type)) { case ARG_CONST_SIZE_OR_ZERO:
err = check_mem_size_reg(env, reg, regno, true, meta);
break;
case ARG_PTR_TO_DYNPTR:
if (arg_type & MEM_UNINIT) { if (arg_type & MEM_UNINIT) {
if (!is_dynptr_reg_valid_uninit(env, reg)) { if (!is_dynptr_reg_valid_uninit(env, reg)) {
verbose(env, "Dynptr has to be an uninitialized dynptr\n"); verbose(env, "Dynptr has to be an uninitialized dynptr\n");
@ -6058,21 +6058,28 @@ skip_type_check:
err_extra, arg + 1); err_extra, arg + 1);
return -EINVAL; return -EINVAL;
} }
} else if (arg_type_is_alloc_size(arg_type)) { break;
case ARG_CONST_ALLOC_SIZE_OR_ZERO:
if (!tnum_is_const(reg->var_off)) { if (!tnum_is_const(reg->var_off)) {
verbose(env, "R%d is not a known constant'\n", verbose(env, "R%d is not a known constant'\n",
regno); regno);
return -EACCES; return -EACCES;
} }
meta->mem_size = reg->var_off.value; meta->mem_size = reg->var_off.value;
} else if (arg_type_is_int_ptr(arg_type)) { break;
case ARG_PTR_TO_INT:
case ARG_PTR_TO_LONG:
{
int size = int_ptr_type_to_size(arg_type); int size = int_ptr_type_to_size(arg_type);
err = check_helper_mem_access(env, regno, size, false, meta); err = check_helper_mem_access(env, regno, size, false, meta);
if (err) if (err)
return err; return err;
err = check_ptr_alignment(env, reg, 0, size, true); err = check_ptr_alignment(env, reg, 0, size, true);
} else if (arg_type == ARG_PTR_TO_CONST_STR) { break;
}
case ARG_PTR_TO_CONST_STR:
{
struct bpf_map *map = reg->map_ptr; struct bpf_map *map = reg->map_ptr;
int map_off; int map_off;
u64 map_addr; u64 map_addr;
@ -6111,9 +6118,12 @@ skip_type_check:
verbose(env, "string is not zero-terminated\n"); verbose(env, "string is not zero-terminated\n");
return -EINVAL; return -EINVAL;
} }
} else if (arg_type == ARG_PTR_TO_KPTR) { break;
}
case ARG_PTR_TO_KPTR:
if (process_kptr_func(env, regno, meta)) if (process_kptr_func(env, regno, meta))
return -EACCES; return -EACCES;
break;
} }
return err; return err;
@ -7160,6 +7170,7 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p) int *insn_idx_p)
{ {
enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
const struct bpf_func_proto *fn = NULL; const struct bpf_func_proto *fn = NULL;
enum bpf_return_type ret_type; enum bpf_return_type ret_type;
enum bpf_type_flag ret_flag; enum bpf_type_flag ret_flag;
@ -7321,7 +7332,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
} }
break; break;
case BPF_FUNC_set_retval: case BPF_FUNC_set_retval:
if (env->prog->expected_attach_type == BPF_LSM_CGROUP) { if (prog_type == BPF_PROG_TYPE_LSM &&
env->prog->expected_attach_type == BPF_LSM_CGROUP) {
if (!env->prog->aux->attach_func_proto->type) { if (!env->prog->aux->attach_func_proto->type) {
/* Make sure programs that attach to void /* Make sure programs that attach to void
* hooks don't try to modify return value. * hooks don't try to modify return value.
@ -7550,6 +7562,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int err, insn_idx = *insn_idx_p; int err, insn_idx = *insn_idx_p;
const struct btf_param *args; const struct btf_param *args;
struct btf *desc_btf; struct btf *desc_btf;
u32 *kfunc_flags;
bool acq; bool acq;
/* skip for now, but return error when we find this in fixup_kfunc_call */ /* skip for now, but return error when we find this in fixup_kfunc_call */
@ -7565,18 +7578,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
func_name = btf_name_by_offset(desc_btf, func->name_off); func_name = btf_name_by_offset(desc_btf, func->name_off);
func_proto = btf_type_by_id(desc_btf, func->type); func_proto = btf_type_by_id(desc_btf, func->type);
if (!btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id);
BTF_KFUNC_TYPE_CHECK, func_id)) { if (!kfunc_flags) {
verbose(env, "calling kernel function %s is not allowed\n", verbose(env, "calling kernel function %s is not allowed\n",
func_name); func_name);
return -EACCES; return -EACCES;
} }
acq = *kfunc_flags & KF_ACQUIRE;
acq = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog),
BTF_KFUNC_TYPE_ACQUIRE, func_id);
/* Check the arguments */ /* Check the arguments */
err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs); err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags);
if (err < 0) if (err < 0)
return err; return err;
/* In case of release function, we get register number of refcounted /* In case of release function, we get register number of refcounted
@ -7620,8 +7631,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
regs[BPF_REG_0].btf = desc_btf; regs[BPF_REG_0].btf = desc_btf;
regs[BPF_REG_0].type = PTR_TO_BTF_ID; regs[BPF_REG_0].type = PTR_TO_BTF_ID;
regs[BPF_REG_0].btf_id = ptr_type_id; regs[BPF_REG_0].btf_id = ptr_type_id;
if (btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), if (*kfunc_flags & KF_RET_NULL) {
BTF_KFUNC_TYPE_RET_NULL, func_id)) {
regs[BPF_REG_0].type |= PTR_MAYBE_NULL; regs[BPF_REG_0].type |= PTR_MAYBE_NULL;
/* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */ /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */
regs[BPF_REG_0].id = ++env->id_gen; regs[BPF_REG_0].id = ++env->id_gen;
@ -12562,6 +12572,7 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
case BPF_PROG_TYPE_TRACEPOINT: case BPF_PROG_TYPE_TRACEPOINT:
case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_RAW_TRACEPOINT: case BPF_PROG_TYPE_RAW_TRACEPOINT:
case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
return true; return true;
default: default:
return false; return false;
@ -13620,6 +13631,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
/* Below members will be freed only at prog->aux */ /* Below members will be freed only at prog->aux */
func[i]->aux->btf = prog->aux->btf; func[i]->aux->btf = prog->aux->btf;
func[i]->aux->func_info = prog->aux->func_info; func[i]->aux->func_info = prog->aux->func_info;
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
func[i]->aux->poke_tab = prog->aux->poke_tab; func[i]->aux->poke_tab = prog->aux->poke_tab;
func[i]->aux->size_poke_tab = prog->aux->size_poke_tab; func[i]->aux->size_poke_tab = prog->aux->size_poke_tab;
@ -13632,9 +13644,6 @@ static int jit_subprogs(struct bpf_verifier_env *env)
poke->aux = func[i]->aux; poke->aux = func[i]->aux;
} }
/* Use bpf_prog_F_tag to indicate functions in stack traces.
* Long term would need debug info to populate names
*/
func[i]->aux->name[0] = 'F'; func[i]->aux->name[0] = 'F';
func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
func[i]->jit_requested = 1; func[i]->jit_requested = 1;

View File

@ -30,6 +30,7 @@
#include <linux/module.h> #include <linux/module.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/bsearch.h> #include <linux/bsearch.h>
#include <linux/btf_ids.h>
/* /*
* These will be re-linked against their real values * These will be re-linked against their real values
@ -799,6 +800,96 @@ static const struct seq_operations kallsyms_op = {
.show = s_show .show = s_show
}; };
#ifdef CONFIG_BPF_SYSCALL
struct bpf_iter__ksym {
__bpf_md_ptr(struct bpf_iter_meta *, meta);
__bpf_md_ptr(struct kallsym_iter *, ksym);
};
static int ksym_prog_seq_show(struct seq_file *m, bool in_stop)
{
struct bpf_iter__ksym ctx;
struct bpf_iter_meta meta;
struct bpf_prog *prog;
meta.seq = m;
prog = bpf_iter_get_info(&meta, in_stop);
if (!prog)
return 0;
ctx.meta = &meta;
ctx.ksym = m ? m->private : NULL;
return bpf_iter_run_prog(prog, &ctx);
}
static int bpf_iter_ksym_seq_show(struct seq_file *m, void *p)
{
return ksym_prog_seq_show(m, false);
}
static void bpf_iter_ksym_seq_stop(struct seq_file *m, void *p)
{
if (!p)
(void) ksym_prog_seq_show(m, true);
else
s_stop(m, p);
}
static const struct seq_operations bpf_iter_ksym_ops = {
.start = s_start,
.next = s_next,
.stop = bpf_iter_ksym_seq_stop,
.show = bpf_iter_ksym_seq_show,
};
static int bpf_iter_ksym_init(void *priv_data, struct bpf_iter_aux_info *aux)
{
struct kallsym_iter *iter = priv_data;
reset_iter(iter, 0);
/* cache here as in kallsyms_open() case; use current process
* credentials to tell BPF iterators if values should be shown.
*/
iter->show_value = kallsyms_show_value(current_cred());
return 0;
}
DEFINE_BPF_ITER_FUNC(ksym, struct bpf_iter_meta *meta, struct kallsym_iter *ksym)
static const struct bpf_iter_seq_info ksym_iter_seq_info = {
.seq_ops = &bpf_iter_ksym_ops,
.init_seq_private = bpf_iter_ksym_init,
.fini_seq_private = NULL,
.seq_priv_size = sizeof(struct kallsym_iter),
};
static struct bpf_iter_reg ksym_iter_reg_info = {
.target = "ksym",
.feature = BPF_ITER_RESCHED,
.ctx_arg_info_size = 1,
.ctx_arg_info = {
{ offsetof(struct bpf_iter__ksym, ksym),
PTR_TO_BTF_ID_OR_NULL },
},
.seq_info = &ksym_iter_seq_info,
};
BTF_ID_LIST(btf_ksym_iter_id)
BTF_ID(struct, kallsym_iter)
static int __init bpf_ksym_iter_register(void)
{
ksym_iter_reg_info.ctx_arg_info[0].btf_id = *btf_ksym_iter_id;
return bpf_iter_reg_target(&ksym_iter_reg_info);
}
late_initcall(bpf_ksym_iter_register);
#endif /* CONFIG_BPF_SYSCALL */
static inline int kallsyms_for_perf(void) static inline int kallsyms_for_perf(void)
{ {
#ifdef CONFIG_PERF_EVENTS #ifdef CONFIG_PERF_EVENTS

View File

@ -1861,6 +1861,8 @@ static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
ftrace_hash_rec_update_modify(ops, filter_hash, 1); ftrace_hash_rec_update_modify(ops, filter_hash, 1);
} }
static bool ops_references_ip(struct ftrace_ops *ops, unsigned long ip);
/* /*
* Try to update IPMODIFY flag on each ftrace_rec. Return 0 if it is OK * Try to update IPMODIFY flag on each ftrace_rec. Return 0 if it is OK
* or no-needed to update, -EBUSY if it detects a conflict of the flag * or no-needed to update, -EBUSY if it detects a conflict of the flag
@ -1869,6 +1871,13 @@ static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops,
* - If the hash is NULL, it hits all recs (if IPMODIFY is set, this is rejected) * - If the hash is NULL, it hits all recs (if IPMODIFY is set, this is rejected)
* - If the hash is EMPTY_HASH, it hits nothing * - If the hash is EMPTY_HASH, it hits nothing
* - Anything else hits the recs which match the hash entries. * - Anything else hits the recs which match the hash entries.
*
* DIRECT ops does not have IPMODIFY flag, but we still need to check it
* against functions with FTRACE_FL_IPMODIFY. If there is any overlap, call
* ops_func(SHARE_IPMODIFY_SELF) to make sure current ops can share with
* IPMODIFY. If ops_func(SHARE_IPMODIFY_SELF) returns non-zero, propagate
* the return value to the caller and eventually to the owner of the DIRECT
* ops.
*/ */
static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops, static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
struct ftrace_hash *old_hash, struct ftrace_hash *old_hash,
@ -1877,17 +1886,26 @@ static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
struct ftrace_page *pg; struct ftrace_page *pg;
struct dyn_ftrace *rec, *end = NULL; struct dyn_ftrace *rec, *end = NULL;
int in_old, in_new; int in_old, in_new;
bool is_ipmodify, is_direct;
/* Only update if the ops has been registered */ /* Only update if the ops has been registered */
if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
return 0; return 0;
if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) is_ipmodify = ops->flags & FTRACE_OPS_FL_IPMODIFY;
is_direct = ops->flags & FTRACE_OPS_FL_DIRECT;
/* neither IPMODIFY nor DIRECT, skip */
if (!is_ipmodify && !is_direct)
return 0;
if (WARN_ON_ONCE(is_ipmodify && is_direct))
return 0; return 0;
/* /*
* Since the IPMODIFY is a very address sensitive action, we do not * Since the IPMODIFY and DIRECT are very address sensitive
* allow ftrace_ops to set all functions to new hash. * actions, we do not allow ftrace_ops to set all functions to new
* hash.
*/ */
if (!new_hash || !old_hash) if (!new_hash || !old_hash)
return -EINVAL; return -EINVAL;
@ -1905,12 +1923,32 @@ static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
continue; continue;
if (in_new) { if (in_new) {
/* New entries must ensure no others are using it */ if (rec->flags & FTRACE_FL_IPMODIFY) {
if (rec->flags & FTRACE_FL_IPMODIFY) int ret;
goto rollback;
rec->flags |= FTRACE_FL_IPMODIFY; /* Cannot have two ipmodify on same rec */
} else /* Removed entry */ if (is_ipmodify)
goto rollback;
FTRACE_WARN_ON(rec->flags & FTRACE_FL_DIRECT);
/*
* Another ops with IPMODIFY is already
* attached. We are now attaching a direct
* ops. Run SHARE_IPMODIFY_SELF, to check
* whether sharing is supported.
*/
if (!ops->ops_func)
return -EBUSY;
ret = ops->ops_func(ops, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF);
if (ret)
return ret;
} else if (is_ipmodify) {
rec->flags |= FTRACE_FL_IPMODIFY;
}
} else if (is_ipmodify) {
rec->flags &= ~FTRACE_FL_IPMODIFY; rec->flags &= ~FTRACE_FL_IPMODIFY;
}
} while_for_each_ftrace_rec(); } while_for_each_ftrace_rec();
return 0; return 0;
@ -2454,8 +2492,7 @@ static void call_direct_funcs(unsigned long ip, unsigned long pip,
struct ftrace_ops direct_ops = { struct ftrace_ops direct_ops = {
.func = call_direct_funcs, .func = call_direct_funcs,
.flags = FTRACE_OPS_FL_IPMODIFY .flags = FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS
| FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS
| FTRACE_OPS_FL_PERMANENT, | FTRACE_OPS_FL_PERMANENT,
/* /*
* By declaring the main trampoline as this trampoline * By declaring the main trampoline as this trampoline
@ -3072,14 +3109,14 @@ static inline int ops_traces_mod(struct ftrace_ops *ops)
} }
/* /*
* Check if the current ops references the record. * Check if the current ops references the given ip.
* *
* If the ops traces all functions, then it was already accounted for. * If the ops traces all functions, then it was already accounted for.
* If the ops does not trace the current record function, skip it. * If the ops does not trace the current record function, skip it.
* If the ops ignores the function via notrace filter, skip it. * If the ops ignores the function via notrace filter, skip it.
*/ */
static inline bool static bool
ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) ops_references_ip(struct ftrace_ops *ops, unsigned long ip)
{ {
/* If ops isn't enabled, ignore it */ /* If ops isn't enabled, ignore it */
if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
@ -3091,16 +3128,29 @@ ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
/* The function must be in the filter */ /* The function must be in the filter */
if (!ftrace_hash_empty(ops->func_hash->filter_hash) && if (!ftrace_hash_empty(ops->func_hash->filter_hash) &&
!__ftrace_lookup_ip(ops->func_hash->filter_hash, rec->ip)) !__ftrace_lookup_ip(ops->func_hash->filter_hash, ip))
return false; return false;
/* If in notrace hash, we ignore it too */ /* If in notrace hash, we ignore it too */
if (ftrace_lookup_ip(ops->func_hash->notrace_hash, rec->ip)) if (ftrace_lookup_ip(ops->func_hash->notrace_hash, ip))
return false; return false;
return true; return true;
} }
/*
* Check if the current ops references the record.
*
* If the ops traces all functions, then it was already accounted for.
* If the ops does not trace the current record function, skip it.
* If the ops ignores the function via notrace filter, skip it.
*/
static bool
ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
{
return ops_references_ip(ops, rec->ip);
}
static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
{ {
bool init_nop = ftrace_need_init_nop(); bool init_nop = ftrace_need_init_nop();
@ -5215,6 +5265,8 @@ static struct ftrace_direct_func *ftrace_alloc_direct_func(unsigned long addr)
return direct; return direct;
} }
static int register_ftrace_function_nolock(struct ftrace_ops *ops);
/** /**
* register_ftrace_direct - Call a custom trampoline directly * register_ftrace_direct - Call a custom trampoline directly
* @ip: The address of the nop at the beginning of a function * @ip: The address of the nop at the beginning of a function
@ -5286,7 +5338,7 @@ int register_ftrace_direct(unsigned long ip, unsigned long addr)
ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0); ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0);
if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED)) { if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED)) {
ret = register_ftrace_function(&direct_ops); ret = register_ftrace_function_nolock(&direct_ops);
if (ret) if (ret)
ftrace_set_filter_ip(&direct_ops, ip, 1, 0); ftrace_set_filter_ip(&direct_ops, ip, 1, 0);
} }
@ -5545,8 +5597,7 @@ int modify_ftrace_direct(unsigned long ip,
} }
EXPORT_SYMBOL_GPL(modify_ftrace_direct); EXPORT_SYMBOL_GPL(modify_ftrace_direct);
#define MULTI_FLAGS (FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_DIRECT | \ #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
FTRACE_OPS_FL_SAVE_REGS)
static int check_direct_multi(struct ftrace_ops *ops) static int check_direct_multi(struct ftrace_ops *ops)
{ {
@ -5639,7 +5690,7 @@ int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
ops->flags = MULTI_FLAGS; ops->flags = MULTI_FLAGS;
ops->trampoline = FTRACE_REGS_ADDR; ops->trampoline = FTRACE_REGS_ADDR;
err = register_ftrace_function(ops); err = register_ftrace_function_nolock(ops);
out_remove: out_remove:
if (err) if (err)
@ -5691,22 +5742,8 @@ int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
} }
EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi); EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi);
/** static int
* modify_ftrace_direct_multi - Modify an existing direct 'multi' call __modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
* to call something else
* @ops: The address of the struct ftrace_ops object
* @addr: The address of the new trampoline to call at @ops functions
*
* This is used to unregister currently registered direct caller and
* register new one @addr on functions registered in @ops object.
*
* Note there's window between ftrace_shutdown and ftrace_startup calls
* where there will be no callbacks called.
*
* Returns: zero on success. Non zero on error, which includes:
* -EINVAL - The @ops object was not properly registered.
*/
int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
{ {
struct ftrace_hash *hash; struct ftrace_hash *hash;
struct ftrace_func_entry *entry, *iter; struct ftrace_func_entry *entry, *iter;
@ -5717,20 +5754,15 @@ int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
int i, size; int i, size;
int err; int err;
if (check_direct_multi(ops)) lockdep_assert_held_once(&direct_mutex);
return -EINVAL;
if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
return -EINVAL;
mutex_lock(&direct_mutex);
/* Enable the tmp_ops to have the same functions as the direct ops */ /* Enable the tmp_ops to have the same functions as the direct ops */
ftrace_ops_init(&tmp_ops); ftrace_ops_init(&tmp_ops);
tmp_ops.func_hash = ops->func_hash; tmp_ops.func_hash = ops->func_hash;
err = register_ftrace_function(&tmp_ops); err = register_ftrace_function_nolock(&tmp_ops);
if (err) if (err)
goto out_direct; return err;
/* /*
* Now the ftrace_ops_list_func() is called to do the direct callers. * Now the ftrace_ops_list_func() is called to do the direct callers.
@ -5754,7 +5786,64 @@ int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
/* Removing the tmp_ops will add the updated direct callers to the functions */ /* Removing the tmp_ops will add the updated direct callers to the functions */
unregister_ftrace_function(&tmp_ops); unregister_ftrace_function(&tmp_ops);
out_direct: return err;
}
/**
* modify_ftrace_direct_multi_nolock - Modify an existing direct 'multi' call
* to call something else
* @ops: The address of the struct ftrace_ops object
* @addr: The address of the new trampoline to call at @ops functions
*
* This is used to unregister currently registered direct caller and
* register new one @addr on functions registered in @ops object.
*
* Note there's window between ftrace_shutdown and ftrace_startup calls
* where there will be no callbacks called.
*
* Caller should already have direct_mutex locked, so we don't lock
* direct_mutex here.
*
* Returns: zero on success. Non zero on error, which includes:
* -EINVAL - The @ops object was not properly registered.
*/
int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr)
{
if (check_direct_multi(ops))
return -EINVAL;
if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
return -EINVAL;
return __modify_ftrace_direct_multi(ops, addr);
}
EXPORT_SYMBOL_GPL(modify_ftrace_direct_multi_nolock);
/**
* modify_ftrace_direct_multi - Modify an existing direct 'multi' call
* to call something else
* @ops: The address of the struct ftrace_ops object
* @addr: The address of the new trampoline to call at @ops functions
*
* This is used to unregister currently registered direct caller and
* register new one @addr on functions registered in @ops object.
*
* Note there's window between ftrace_shutdown and ftrace_startup calls
* where there will be no callbacks called.
*
* Returns: zero on success. Non zero on error, which includes:
* -EINVAL - The @ops object was not properly registered.
*/
int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr)
{
int err;
if (check_direct_multi(ops))
return -EINVAL;
if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
return -EINVAL;
mutex_lock(&direct_mutex);
err = __modify_ftrace_direct_multi(ops, addr);
mutex_unlock(&direct_mutex); mutex_unlock(&direct_mutex);
return err; return err;
} }
@ -7965,6 +8054,143 @@ int ftrace_is_dead(void)
return ftrace_disabled; return ftrace_disabled;
} }
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/*
* When registering ftrace_ops with IPMODIFY, it is necessary to make sure
* it doesn't conflict with any direct ftrace_ops. If there is existing
* direct ftrace_ops on a kernel function being patched, call
* FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER on it to enable sharing.
*
* @ops: ftrace_ops being registered.
*
* Returns:
* 0 on success;
* Negative on failure.
*/
static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops)
{
struct ftrace_func_entry *entry;
struct ftrace_hash *hash;
struct ftrace_ops *op;
int size, i, ret;
lockdep_assert_held_once(&direct_mutex);
if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
return 0;
hash = ops->func_hash->filter_hash;
size = 1 << hash->size_bits;
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
unsigned long ip = entry->ip;
bool found_op = false;
mutex_lock(&ftrace_lock);
do_for_each_ftrace_op(op, ftrace_ops_list) {
if (!(op->flags & FTRACE_OPS_FL_DIRECT))
continue;
if (ops_references_ip(op, ip)) {
found_op = true;
break;
}
} while_for_each_ftrace_op(op);
mutex_unlock(&ftrace_lock);
if (found_op) {
if (!op->ops_func)
return -EBUSY;
ret = op->ops_func(op, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER);
if (ret)
return ret;
}
}
}
return 0;
}
/*
* Similar to prepare_direct_functions_for_ipmodify, clean up after ops
* with IPMODIFY is unregistered. The cleanup is optional for most DIRECT
* ops.
*/
static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops)
{
struct ftrace_func_entry *entry;
struct ftrace_hash *hash;
struct ftrace_ops *op;
int size, i;
if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
return;
mutex_lock(&direct_mutex);
hash = ops->func_hash->filter_hash;
size = 1 << hash->size_bits;
for (i = 0; i < size; i++) {
hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
unsigned long ip = entry->ip;
bool found_op = false;
mutex_lock(&ftrace_lock);
do_for_each_ftrace_op(op, ftrace_ops_list) {
if (!(op->flags & FTRACE_OPS_FL_DIRECT))
continue;
if (ops_references_ip(op, ip)) {
found_op = true;
break;
}
} while_for_each_ftrace_op(op);
mutex_unlock(&ftrace_lock);
/* The cleanup is optional, ignore any errors */
if (found_op && op->ops_func)
op->ops_func(op, FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER);
}
}
mutex_unlock(&direct_mutex);
}
#define lock_direct_mutex() mutex_lock(&direct_mutex)
#define unlock_direct_mutex() mutex_unlock(&direct_mutex)
#else /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops)
{
return 0;
}
static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops)
{
}
#define lock_direct_mutex() do { } while (0)
#define unlock_direct_mutex() do { } while (0)
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
/*
* Similar to register_ftrace_function, except we don't lock direct_mutex.
*/
static int register_ftrace_function_nolock(struct ftrace_ops *ops)
{
int ret;
ftrace_ops_init(ops);
mutex_lock(&ftrace_lock);
ret = ftrace_startup(ops, 0);
mutex_unlock(&ftrace_lock);
return ret;
}
/** /**
* register_ftrace_function - register a function for profiling * register_ftrace_function - register a function for profiling
* @ops: ops structure that holds the function for profiling. * @ops: ops structure that holds the function for profiling.
@ -7980,14 +8206,15 @@ int register_ftrace_function(struct ftrace_ops *ops)
{ {
int ret; int ret;
ftrace_ops_init(ops); lock_direct_mutex();
ret = prepare_direct_functions_for_ipmodify(ops);
if (ret < 0)
goto out_unlock;
mutex_lock(&ftrace_lock); ret = register_ftrace_function_nolock(ops);
ret = ftrace_startup(ops, 0);
mutex_unlock(&ftrace_lock);
out_unlock:
unlock_direct_mutex();
return ret; return ret;
} }
EXPORT_SYMBOL_GPL(register_ftrace_function); EXPORT_SYMBOL_GPL(register_ftrace_function);
@ -8006,6 +8233,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
ret = ftrace_shutdown(ops, 0); ret = ftrace_shutdown(ops, 0);
mutex_unlock(&ftrace_lock); mutex_unlock(&ftrace_lock);
cleanup_direct_functions_after_ipmodify(ops);
return ret; return ret;
} }
EXPORT_SYMBOL_GPL(unregister_ftrace_function); EXPORT_SYMBOL_GPL(unregister_ftrace_function);

View File

@ -691,52 +691,35 @@ noinline void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len)
{ {
} }
noinline void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p)
{
}
__diag_pop(); __diag_pop();
ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO); ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO);
BTF_SET_START(test_sk_check_kfunc_ids) BTF_SET8_START(test_sk_check_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test1) BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
BTF_ID(func, bpf_kfunc_call_test2) BTF_ID_FLAGS(func, bpf_kfunc_call_test2)
BTF_ID(func, bpf_kfunc_call_test3) BTF_ID_FLAGS(func, bpf_kfunc_call_test3)
BTF_ID(func, bpf_kfunc_call_test_acquire) BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL)
BTF_ID(func, bpf_kfunc_call_memb_acquire) BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL)
BTF_ID(func, bpf_kfunc_call_test_release) BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE)
BTF_ID(func, bpf_kfunc_call_memb_release) BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE)
BTF_ID(func, bpf_kfunc_call_memb1_release) BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE)
BTF_ID(func, bpf_kfunc_call_test_kptr_get) BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET)
BTF_ID(func, bpf_kfunc_call_test_pass_ctx) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx)
BTF_ID(func, bpf_kfunc_call_test_pass1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1)
BTF_ID(func, bpf_kfunc_call_test_pass2) BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2)
BTF_ID(func, bpf_kfunc_call_test_fail1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail1)
BTF_ID(func, bpf_kfunc_call_test_fail2) BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail2)
BTF_ID(func, bpf_kfunc_call_test_fail3) BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
BTF_ID(func, bpf_kfunc_call_test_mem_len_pass1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
BTF_ID(func, bpf_kfunc_call_test_mem_len_fail1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
BTF_ID(func, bpf_kfunc_call_test_mem_len_fail2) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
BTF_SET_END(test_sk_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
BTF_SET8_END(test_sk_check_kfunc_ids)
BTF_SET_START(test_sk_acquire_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_memb_acquire)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_acquire_kfunc_ids)
BTF_SET_START(test_sk_release_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(func, bpf_kfunc_call_memb_release)
BTF_ID(func, bpf_kfunc_call_memb1_release)
BTF_SET_END(test_sk_release_kfunc_ids)
BTF_SET_START(test_sk_ret_null_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_memb_acquire)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_ret_null_kfunc_ids)
BTF_SET_START(test_sk_kptr_acquire_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_kptr_acquire_kfunc_ids)
static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
u32 size, u32 headroom, u32 tailroom) u32 size, u32 headroom, u32 tailroom)
@ -955,6 +938,9 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
{ {
struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb;
if (!skb->len)
return -EINVAL;
if (!__skb) if (!__skb)
return 0; return 0;
@ -1617,12 +1603,8 @@ out:
} }
static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &test_sk_check_kfunc_ids, .set = &test_sk_check_kfunc_ids,
.acquire_set = &test_sk_acquire_kfunc_ids,
.release_set = &test_sk_release_kfunc_ids,
.ret_null_set = &test_sk_ret_null_kfunc_ids,
.kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids
}; };
BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids) BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)

View File

@ -4168,6 +4168,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
bool again = false; bool again = false;
skb_reset_mac_header(skb); skb_reset_mac_header(skb);
skb_assert_len(skb);
if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
__skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED); __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);

View File

@ -237,7 +237,7 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
data, int, headlen, int, offset) data, int, headlen, int, offset)
{ {
u16 tmp, *ptr; __be16 tmp, *ptr;
const int len = sizeof(tmp); const int len = sizeof(tmp);
if (offset >= 0) { if (offset >= 0) {
@ -264,7 +264,7 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
data, int, headlen, int, offset) data, int, headlen, int, offset)
{ {
u32 tmp, *ptr; __be32 tmp, *ptr;
const int len = sizeof(tmp); const int len = sizeof(tmp);
if (likely(offset >= 0)) { if (likely(offset >= 0)) {

View File

@ -462,7 +462,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
if (copied == len) if (copied == len)
break; break;
} while (i != msg_rx->sg.end); } while (!sg_is_last(sge));
if (unlikely(peek)) { if (unlikely(peek)) {
msg_rx = sk_psock_next_msg(psock, msg_rx); msg_rx = sk_psock_next_msg(psock, msg_rx);
@ -472,7 +472,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
} }
msg_rx->sg.start = i; msg_rx->sg.start = i;
if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { if (!sge->length && sg_is_last(sge)) {
msg_rx = sk_psock_dequeue_msg(psock); msg_rx = sk_psock_dequeue_msg(psock);
kfree_sk_msg(msg_rx); kfree_sk_msg(msg_rx);
} }

View File

@ -197,17 +197,17 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
} }
} }
BTF_SET_START(bpf_tcp_ca_check_kfunc_ids) BTF_SET8_START(bpf_tcp_ca_check_kfunc_ids)
BTF_ID(func, tcp_reno_ssthresh) BTF_ID_FLAGS(func, tcp_reno_ssthresh)
BTF_ID(func, tcp_reno_cong_avoid) BTF_ID_FLAGS(func, tcp_reno_cong_avoid)
BTF_ID(func, tcp_reno_undo_cwnd) BTF_ID_FLAGS(func, tcp_reno_undo_cwnd)
BTF_ID(func, tcp_slow_start) BTF_ID_FLAGS(func, tcp_slow_start)
BTF_ID(func, tcp_cong_avoid_ai) BTF_ID_FLAGS(func, tcp_cong_avoid_ai)
BTF_SET_END(bpf_tcp_ca_check_kfunc_ids) BTF_SET8_END(bpf_tcp_ca_check_kfunc_ids)
static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = { static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &bpf_tcp_ca_check_kfunc_ids, .set = &bpf_tcp_ca_check_kfunc_ids,
}; };
static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = { static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {

View File

@ -1154,24 +1154,24 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
.set_state = bbr_set_state, .set_state = bbr_set_state,
}; };
BTF_SET_START(tcp_bbr_check_kfunc_ids) BTF_SET8_START(tcp_bbr_check_kfunc_ids)
#ifdef CONFIG_X86 #ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID(func, bbr_init) BTF_ID_FLAGS(func, bbr_init)
BTF_ID(func, bbr_main) BTF_ID_FLAGS(func, bbr_main)
BTF_ID(func, bbr_sndbuf_expand) BTF_ID_FLAGS(func, bbr_sndbuf_expand)
BTF_ID(func, bbr_undo_cwnd) BTF_ID_FLAGS(func, bbr_undo_cwnd)
BTF_ID(func, bbr_cwnd_event) BTF_ID_FLAGS(func, bbr_cwnd_event)
BTF_ID(func, bbr_ssthresh) BTF_ID_FLAGS(func, bbr_ssthresh)
BTF_ID(func, bbr_min_tso_segs) BTF_ID_FLAGS(func, bbr_min_tso_segs)
BTF_ID(func, bbr_set_state) BTF_ID_FLAGS(func, bbr_set_state)
#endif #endif
#endif #endif
BTF_SET_END(tcp_bbr_check_kfunc_ids) BTF_SET8_END(tcp_bbr_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = { static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &tcp_bbr_check_kfunc_ids, .set = &tcp_bbr_check_kfunc_ids,
}; };
static int __init bbr_register(void) static int __init bbr_register(void)

View File

@ -485,22 +485,22 @@ static struct tcp_congestion_ops cubictcp __read_mostly = {
.name = "cubic", .name = "cubic",
}; };
BTF_SET_START(tcp_cubic_check_kfunc_ids) BTF_SET8_START(tcp_cubic_check_kfunc_ids)
#ifdef CONFIG_X86 #ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID(func, cubictcp_init) BTF_ID_FLAGS(func, cubictcp_init)
BTF_ID(func, cubictcp_recalc_ssthresh) BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh)
BTF_ID(func, cubictcp_cong_avoid) BTF_ID_FLAGS(func, cubictcp_cong_avoid)
BTF_ID(func, cubictcp_state) BTF_ID_FLAGS(func, cubictcp_state)
BTF_ID(func, cubictcp_cwnd_event) BTF_ID_FLAGS(func, cubictcp_cwnd_event)
BTF_ID(func, cubictcp_acked) BTF_ID_FLAGS(func, cubictcp_acked)
#endif #endif
#endif #endif
BTF_SET_END(tcp_cubic_check_kfunc_ids) BTF_SET8_END(tcp_cubic_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = { static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &tcp_cubic_check_kfunc_ids, .set = &tcp_cubic_check_kfunc_ids,
}; };
static int __init cubictcp_register(void) static int __init cubictcp_register(void)

View File

@ -239,22 +239,22 @@ static struct tcp_congestion_ops dctcp_reno __read_mostly = {
.name = "dctcp-reno", .name = "dctcp-reno",
}; };
BTF_SET_START(tcp_dctcp_check_kfunc_ids) BTF_SET8_START(tcp_dctcp_check_kfunc_ids)
#ifdef CONFIG_X86 #ifdef CONFIG_X86
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
BTF_ID(func, dctcp_init) BTF_ID_FLAGS(func, dctcp_init)
BTF_ID(func, dctcp_update_alpha) BTF_ID_FLAGS(func, dctcp_update_alpha)
BTF_ID(func, dctcp_cwnd_event) BTF_ID_FLAGS(func, dctcp_cwnd_event)
BTF_ID(func, dctcp_ssthresh) BTF_ID_FLAGS(func, dctcp_ssthresh)
BTF_ID(func, dctcp_cwnd_undo) BTF_ID_FLAGS(func, dctcp_cwnd_undo)
BTF_ID(func, dctcp_state) BTF_ID_FLAGS(func, dctcp_state)
#endif #endif
#endif #endif
BTF_SET_END(tcp_dctcp_check_kfunc_ids) BTF_SET8_END(tcp_dctcp_check_kfunc_ids)
static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = { static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &tcp_dctcp_check_kfunc_ids, .set = &tcp_dctcp_check_kfunc_ids,
}; };
static int __init dctcp_register(void) static int __init dctcp_register(void)

View File

@ -55,57 +55,131 @@ enum {
NF_BPF_CT_OPTS_SZ = 12, NF_BPF_CT_OPTS_SZ = 12,
}; };
static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
u32 tuple_len, u8 protonum, u8 dir,
struct nf_conntrack_tuple *tuple)
{
union nf_inet_addr *src = dir ? &tuple->dst.u3 : &tuple->src.u3;
union nf_inet_addr *dst = dir ? &tuple->src.u3 : &tuple->dst.u3;
union nf_conntrack_man_proto *sport = dir ? (void *)&tuple->dst.u
: &tuple->src.u;
union nf_conntrack_man_proto *dport = dir ? &tuple->src.u
: (void *)&tuple->dst.u;
if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP))
return -EPROTO;
memset(tuple, 0, sizeof(*tuple));
switch (tuple_len) {
case sizeof(bpf_tuple->ipv4):
tuple->src.l3num = AF_INET;
src->ip = bpf_tuple->ipv4.saddr;
sport->tcp.port = bpf_tuple->ipv4.sport;
dst->ip = bpf_tuple->ipv4.daddr;
dport->tcp.port = bpf_tuple->ipv4.dport;
break;
case sizeof(bpf_tuple->ipv6):
tuple->src.l3num = AF_INET6;
memcpy(src->ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
sport->tcp.port = bpf_tuple->ipv6.sport;
memcpy(dst->ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
dport->tcp.port = bpf_tuple->ipv6.dport;
break;
default:
return -EAFNOSUPPORT;
}
tuple->dst.protonum = protonum;
tuple->dst.dir = dir;
return 0;
}
static struct nf_conn *
__bpf_nf_ct_alloc_entry(struct net *net, struct bpf_sock_tuple *bpf_tuple,
u32 tuple_len, struct bpf_ct_opts *opts, u32 opts_len,
u32 timeout)
{
struct nf_conntrack_tuple otuple, rtuple;
struct nf_conn *ct;
int err;
if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
opts_len != NF_BPF_CT_OPTS_SZ)
return ERR_PTR(-EINVAL);
if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS))
return ERR_PTR(-EINVAL);
err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
IP_CT_DIR_ORIGINAL, &otuple);
if (err < 0)
return ERR_PTR(err);
err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
IP_CT_DIR_REPLY, &rtuple);
if (err < 0)
return ERR_PTR(err);
if (opts->netns_id >= 0) {
net = get_net_ns_by_id(net, opts->netns_id);
if (unlikely(!net))
return ERR_PTR(-ENONET);
}
ct = nf_conntrack_alloc(net, &nf_ct_zone_dflt, &otuple, &rtuple,
GFP_ATOMIC);
if (IS_ERR(ct))
goto out;
memset(&ct->proto, 0, sizeof(ct->proto));
__nf_ct_set_timeout(ct, timeout * HZ);
ct->status |= IPS_CONFIRMED;
out:
if (opts->netns_id >= 0)
put_net(net);
return ct;
}
static struct nf_conn *__bpf_nf_ct_lookup(struct net *net, static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
struct bpf_sock_tuple *bpf_tuple, struct bpf_sock_tuple *bpf_tuple,
u32 tuple_len, u8 protonum, u32 tuple_len, struct bpf_ct_opts *opts,
s32 netns_id, u8 *dir) u32 opts_len)
{ {
struct nf_conntrack_tuple_hash *hash; struct nf_conntrack_tuple_hash *hash;
struct nf_conntrack_tuple tuple; struct nf_conntrack_tuple tuple;
struct nf_conn *ct; struct nf_conn *ct;
int err;
if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP)) if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
opts_len != NF_BPF_CT_OPTS_SZ)
return ERR_PTR(-EINVAL);
if (unlikely(opts->l4proto != IPPROTO_TCP && opts->l4proto != IPPROTO_UDP))
return ERR_PTR(-EPROTO); return ERR_PTR(-EPROTO);
if (unlikely(netns_id < BPF_F_CURRENT_NETNS)) if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS))
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
memset(&tuple, 0, sizeof(tuple)); err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto,
switch (tuple_len) { IP_CT_DIR_ORIGINAL, &tuple);
case sizeof(bpf_tuple->ipv4): if (err < 0)
tuple.src.l3num = AF_INET; return ERR_PTR(err);
tuple.src.u3.ip = bpf_tuple->ipv4.saddr;
tuple.src.u.tcp.port = bpf_tuple->ipv4.sport;
tuple.dst.u3.ip = bpf_tuple->ipv4.daddr;
tuple.dst.u.tcp.port = bpf_tuple->ipv4.dport;
break;
case sizeof(bpf_tuple->ipv6):
tuple.src.l3num = AF_INET6;
memcpy(tuple.src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr));
tuple.src.u.tcp.port = bpf_tuple->ipv6.sport;
memcpy(tuple.dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr));
tuple.dst.u.tcp.port = bpf_tuple->ipv6.dport;
break;
default:
return ERR_PTR(-EAFNOSUPPORT);
}
tuple.dst.protonum = protonum; if (opts->netns_id >= 0) {
net = get_net_ns_by_id(net, opts->netns_id);
if (netns_id >= 0) {
net = get_net_ns_by_id(net, netns_id);
if (unlikely(!net)) if (unlikely(!net))
return ERR_PTR(-ENONET); return ERR_PTR(-ENONET);
} }
hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple); hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple);
if (netns_id >= 0) if (opts->netns_id >= 0)
put_net(net); put_net(net);
if (!hash) if (!hash)
return ERR_PTR(-ENOENT); return ERR_PTR(-ENOENT);
ct = nf_ct_tuplehash_to_ctrack(hash); ct = nf_ct_tuplehash_to_ctrack(hash);
if (dir) opts->dir = NF_CT_DIRECTION(hash);
*dir = NF_CT_DIRECTION(hash);
return ct; return ct;
} }
@ -114,6 +188,43 @@ __diag_push();
__diag_ignore_all("-Wmissing-prototypes", __diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in nf_conntrack BTF"); "Global functions as their definitions will be in nf_conntrack BTF");
struct nf_conn___init {
struct nf_conn ct;
};
/* bpf_xdp_ct_alloc - Allocate a new CT entry
*
* Parameters:
* @xdp_ctx - Pointer to ctx (xdp_md) in XDP program
* Cannot be NULL
* @bpf_tuple - Pointer to memory representing the tuple to look up
* Cannot be NULL
* @tuple__sz - Length of the tuple structure
* Must be one of sizeof(bpf_tuple->ipv4) or
* sizeof(bpf_tuple->ipv6)
* @opts - Additional options for allocation (documented above)
* Cannot be NULL
* @opts__sz - Length of the bpf_ct_opts structure
* Must be NF_BPF_CT_OPTS_SZ (12)
*/
struct nf_conn___init *
bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz)
{
struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx;
struct nf_conn *nfct;
nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
opts, opts__sz, 10);
if (IS_ERR(nfct)) {
if (opts)
opts->error = PTR_ERR(nfct);
return NULL;
}
return (struct nf_conn___init *)nfct;
}
/* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a
* reference to it * reference to it
* *
@ -138,25 +249,50 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
struct net *caller_net; struct net *caller_net;
struct nf_conn *nfct; struct nf_conn *nfct;
BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ);
if (!opts)
return NULL;
if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] ||
opts__sz != NF_BPF_CT_OPTS_SZ) {
opts->error = -EINVAL;
return NULL;
}
caller_net = dev_net(ctx->rxq->dev); caller_net = dev_net(ctx->rxq->dev);
nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
opts->netns_id, &opts->dir);
if (IS_ERR(nfct)) { if (IS_ERR(nfct)) {
opts->error = PTR_ERR(nfct); if (opts)
opts->error = PTR_ERR(nfct);
return NULL; return NULL;
} }
return nfct; return nfct;
} }
/* bpf_skb_ct_alloc - Allocate a new CT entry
*
* Parameters:
* @skb_ctx - Pointer to ctx (__sk_buff) in TC program
* Cannot be NULL
* @bpf_tuple - Pointer to memory representing the tuple to look up
* Cannot be NULL
* @tuple__sz - Length of the tuple structure
* Must be one of sizeof(bpf_tuple->ipv4) or
* sizeof(bpf_tuple->ipv6)
* @opts - Additional options for allocation (documented above)
* Cannot be NULL
* @opts__sz - Length of the bpf_ct_opts structure
* Must be NF_BPF_CT_OPTS_SZ (12)
*/
struct nf_conn___init *
bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz)
{
struct sk_buff *skb = (struct sk_buff *)skb_ctx;
struct nf_conn *nfct;
struct net *net;
net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10);
if (IS_ERR(nfct)) {
if (opts)
opts->error = PTR_ERR(nfct);
return NULL;
}
return (struct nf_conn___init *)nfct;
}
/* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a /* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a
* reference to it * reference to it
* *
@ -181,20 +317,31 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
struct net *caller_net; struct net *caller_net;
struct nf_conn *nfct; struct nf_conn *nfct;
BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ); caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
if (!opts) if (IS_ERR(nfct)) {
return NULL; if (opts)
if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] || opts->error = PTR_ERR(nfct);
opts__sz != NF_BPF_CT_OPTS_SZ) {
opts->error = -EINVAL;
return NULL; return NULL;
} }
caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); return nfct;
nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, }
opts->netns_id, &opts->dir);
if (IS_ERR(nfct)) { /* bpf_ct_insert_entry - Add the provided entry into a CT map
opts->error = PTR_ERR(nfct); *
* This must be invoked for referenced PTR_TO_BTF_ID.
*
* @nfct - Pointer to referenced nf_conn___init object, obtained
* using bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
*/
struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i)
{
struct nf_conn *nfct = (struct nf_conn *)nfct_i;
int err;
err = nf_conntrack_hash_check_insert(nfct);
if (err < 0) {
nf_conntrack_free(nfct);
return NULL; return NULL;
} }
return nfct; return nfct;
@ -217,50 +364,90 @@ void bpf_ct_release(struct nf_conn *nfct)
nf_ct_put(nfct); nf_ct_put(nfct);
} }
/* bpf_ct_set_timeout - Set timeout of allocated nf_conn
*
* Sets the default timeout of newly allocated nf_conn before insertion.
* This helper must be invoked for refcounted pointer to nf_conn___init.
*
* Parameters:
* @nfct - Pointer to referenced nf_conn object, obtained using
* bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
* @timeout - Timeout in msecs.
*/
void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout)
{
__nf_ct_set_timeout((struct nf_conn *)nfct, msecs_to_jiffies(timeout));
}
/* bpf_ct_change_timeout - Change timeout of inserted nf_conn
*
* Change timeout associated of the inserted or looked up nf_conn.
* This helper must be invoked for refcounted pointer to nf_conn.
*
* Parameters:
* @nfct - Pointer to referenced nf_conn object, obtained using
* bpf_ct_insert_entry, bpf_xdp_ct_lookup, or bpf_skb_ct_lookup.
* @timeout - New timeout in msecs.
*/
int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout)
{
return __nf_ct_change_timeout(nfct, msecs_to_jiffies(timeout));
}
/* bpf_ct_set_status - Set status field of allocated nf_conn
*
* Set the status field of the newly allocated nf_conn before insertion.
* This must be invoked for referenced PTR_TO_BTF_ID to nf_conn___init.
*
* Parameters:
* @nfct - Pointer to referenced nf_conn object, obtained using
* bpf_xdp_ct_alloc or bpf_skb_ct_alloc.
* @status - New status value.
*/
int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status)
{
return nf_ct_change_status_common((struct nf_conn *)nfct, status);
}
/* bpf_ct_change_status - Change status of inserted nf_conn
*
* Change the status field of the provided connection tracking entry.
* This must be invoked for referenced PTR_TO_BTF_ID to nf_conn.
*
* Parameters:
* @nfct - Pointer to referenced nf_conn object, obtained using
* bpf_ct_insert_entry, bpf_xdp_ct_lookup or bpf_skb_ct_lookup.
* @status - New status value.
*/
int bpf_ct_change_status(struct nf_conn *nfct, u32 status)
{
return nf_ct_change_status_common(nfct, status);
}
__diag_pop() __diag_pop()
BTF_SET_START(nf_ct_xdp_check_kfunc_ids) BTF_SET8_START(nf_ct_kfunc_set)
BTF_ID(func, bpf_xdp_ct_lookup) BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
BTF_ID(func, bpf_ct_release) BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
BTF_SET_END(nf_ct_xdp_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE)
BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS)
BTF_SET8_END(nf_ct_kfunc_set)
BTF_SET_START(nf_ct_tc_check_kfunc_ids) static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
BTF_ID(func, bpf_skb_ct_lookup) .owner = THIS_MODULE,
BTF_ID(func, bpf_ct_release) .set = &nf_ct_kfunc_set,
BTF_SET_END(nf_ct_tc_check_kfunc_ids)
BTF_SET_START(nf_ct_acquire_kfunc_ids)
BTF_ID(func, bpf_xdp_ct_lookup)
BTF_ID(func, bpf_skb_ct_lookup)
BTF_SET_END(nf_ct_acquire_kfunc_ids)
BTF_SET_START(nf_ct_release_kfunc_ids)
BTF_ID(func, bpf_ct_release)
BTF_SET_END(nf_ct_release_kfunc_ids)
/* Both sets are identical */
#define nf_ct_ret_null_kfunc_ids nf_ct_acquire_kfunc_ids
static const struct btf_kfunc_id_set nf_conntrack_xdp_kfunc_set = {
.owner = THIS_MODULE,
.check_set = &nf_ct_xdp_check_kfunc_ids,
.acquire_set = &nf_ct_acquire_kfunc_ids,
.release_set = &nf_ct_release_kfunc_ids,
.ret_null_set = &nf_ct_ret_null_kfunc_ids,
};
static const struct btf_kfunc_id_set nf_conntrack_tc_kfunc_set = {
.owner = THIS_MODULE,
.check_set = &nf_ct_tc_check_kfunc_ids,
.acquire_set = &nf_ct_acquire_kfunc_ids,
.release_set = &nf_ct_release_kfunc_ids,
.ret_null_set = &nf_ct_ret_null_kfunc_ids,
}; };
int register_nf_conntrack_bpf(void) int register_nf_conntrack_bpf(void)
{ {
int ret; int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set); ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_tc_kfunc_set); return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
} }

View File

@ -2806,3 +2806,65 @@ err_expect:
free_percpu(net->ct.stat); free_percpu(net->ct.stat);
return ret; return ret;
} }
#if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
(IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \
IS_ENABLED(CONFIG_NF_CT_NETLINK))
/* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */
int __nf_ct_change_timeout(struct nf_conn *ct, u64 timeout)
{
if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status))
return -EPERM;
__nf_ct_set_timeout(ct, timeout);
if (test_bit(IPS_DYING_BIT, &ct->status))
return -ETIME;
return 0;
}
EXPORT_SYMBOL_GPL(__nf_ct_change_timeout);
void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off)
{
unsigned int bit;
/* Ignore these unchangable bits */
on &= ~IPS_UNCHANGEABLE_MASK;
off &= ~IPS_UNCHANGEABLE_MASK;
for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
if (on & (1 << bit))
set_bit(bit, &ct->status);
else if (off & (1 << bit))
clear_bit(bit, &ct->status);
}
}
EXPORT_SYMBOL_GPL(__nf_ct_change_status);
int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status)
{
unsigned long d;
d = ct->status ^ status;
if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING))
/* unchangeable */
return -EBUSY;
if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
/* SEEN_REPLY bit can only be set */
return -EBUSY;
if (d & IPS_ASSURED && !(status & IPS_ASSURED))
/* ASSURED bit can only be set */
return -EBUSY;
__nf_ct_change_status(ct, status, 0);
return 0;
}
EXPORT_SYMBOL_GPL(nf_ct_change_status_common);
#endif

View File

@ -1891,45 +1891,10 @@ ctnetlink_parse_nat_setup(struct nf_conn *ct,
} }
#endif #endif
static void
__ctnetlink_change_status(struct nf_conn *ct, unsigned long on,
unsigned long off)
{
unsigned int bit;
/* Ignore these unchangable bits */
on &= ~IPS_UNCHANGEABLE_MASK;
off &= ~IPS_UNCHANGEABLE_MASK;
for (bit = 0; bit < __IPS_MAX_BIT; bit++) {
if (on & (1 << bit))
set_bit(bit, &ct->status);
else if (off & (1 << bit))
clear_bit(bit, &ct->status);
}
}
static int static int
ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
{ {
unsigned long d; return nf_ct_change_status_common(ct, ntohl(nla_get_be32(cda[CTA_STATUS])));
unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
d = ct->status ^ status;
if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING))
/* unchangeable */
return -EBUSY;
if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
/* SEEN_REPLY bit can only be set */
return -EBUSY;
if (d & IPS_ASSURED && !(status & IPS_ASSURED))
/* ASSURED bit can only be set */
return -EBUSY;
__ctnetlink_change_status(ct, status, 0);
return 0;
} }
static int static int
@ -2024,16 +1989,7 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
static int ctnetlink_change_timeout(struct nf_conn *ct, static int ctnetlink_change_timeout(struct nf_conn *ct,
const struct nlattr * const cda[]) const struct nlattr * const cda[])
{ {
u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; return __nf_ct_change_timeout(ct, (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ);
if (timeout > INT_MAX)
timeout = INT_MAX;
WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout);
if (test_bit(IPS_DYING_BIT, &ct->status))
return -ETIME;
return 0;
} }
#if defined(CONFIG_NF_CONNTRACK_MARK) #if defined(CONFIG_NF_CONNTRACK_MARK)
@ -2293,9 +2249,7 @@ ctnetlink_create_conntrack(struct net *net,
goto err1; goto err1;
timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
if (timeout > INT_MAX) __nf_ct_set_timeout(ct, timeout);
timeout = INT_MAX;
ct->timeout = (u32)timeout + nfct_time_stamp;
rcu_read_lock(); rcu_read_lock();
if (cda[CTA_HELP]) { if (cda[CTA_HELP]) {
@ -2837,7 +2791,7 @@ ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
* unchangeable bits but do not error out. Also user programs * unchangeable bits but do not error out. Also user programs
* are allowed to clear the bits that they are allowed to change. * are allowed to clear the bits that they are allowed to change.
*/ */
__ctnetlink_change_status(ct, status, ~status); __nf_ct_change_status(ct, status, ~status);
return 0; return 0;
} }

View File

@ -639,8 +639,11 @@ static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len
if (unlikely(need_wait)) if (unlikely(need_wait))
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (sk_can_busy_loop(sk)) if (sk_can_busy_loop(sk)) {
if (xs->zc)
__sk_mark_napi_id_once(sk, xsk_pool_get_napi_id(xs->pool));
sk_busy_loop(sk, 1); /* only support non-blocking sockets */ sk_busy_loop(sk, 1); /* only support non-blocking sockets */
}
if (xs->zc && xsk_no_wakeup(sk)) if (xs->zc && xsk_no_wakeup(sk))
return 0; return 0;

View File

@ -282,12 +282,10 @@ $(LIBBPF): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OU
BPFTOOLDIR := $(TOOLS_PATH)/bpf/bpftool BPFTOOLDIR := $(TOOLS_PATH)/bpf/bpftool
BPFTOOL_OUTPUT := $(abspath $(BPF_SAMPLES_PATH))/bpftool BPFTOOL_OUTPUT := $(abspath $(BPF_SAMPLES_PATH))/bpftool
BPFTOOL := $(BPFTOOL_OUTPUT)/bpftool BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool
$(BPFTOOL): $(LIBBPF) $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT) $(BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT)
$(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \ $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \
OUTPUT=$(BPFTOOL_OUTPUT)/ \ OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap
LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \
LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/
$(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT): $(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT):
$(call msg,MKDIR,$@) $(call msg,MKDIR,$@)

View File

@ -17,6 +17,7 @@
#include <bpf/libbpf.h> #include <bpf/libbpf.h>
#include "bpf_insn.h" #include "bpf_insn.h"
#include "sock_example.h" #include "sock_example.h"
#include "bpf_util.h"
#define BPF_F_PIN (1 << 0) #define BPF_F_PIN (1 << 0)
#define BPF_F_GET (1 << 1) #define BPF_F_GET (1 << 1)
@ -52,7 +53,7 @@ static int bpf_prog_create(const char *object)
BPF_MOV64_IMM(BPF_REG_0, 1), BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn); size_t insns_cnt = ARRAY_SIZE(insns);
struct bpf_object *obj; struct bpf_object *obj;
int err; int err;

View File

@ -29,6 +29,7 @@
#include <bpf/bpf.h> #include <bpf/bpf.h>
#include "bpf_insn.h" #include "bpf_insn.h"
#include "sock_example.h" #include "sock_example.h"
#include "bpf_util.h"
char bpf_log_buf[BPF_LOG_BUF_SIZE]; char bpf_log_buf[BPF_LOG_BUF_SIZE];
@ -58,7 +59,7 @@ static int test_sock(void)
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); size_t insns_cnt = ARRAY_SIZE(prog);
LIBBPF_OPTS(bpf_prog_load_opts, opts, LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf, .log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE, .log_size = BPF_LOG_BUF_SIZE,

View File

@ -31,6 +31,7 @@
#include <bpf/bpf.h> #include <bpf/bpf.h>
#include "bpf_insn.h" #include "bpf_insn.h"
#include "bpf_util.h"
enum { enum {
MAP_KEY_PACKETS, MAP_KEY_PACKETS,
@ -70,7 +71,7 @@ static int prog_load(int map_fd, int verdict)
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(), BPF_EXIT_INSN(),
}; };
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); size_t insns_cnt = ARRAY_SIZE(prog);
LIBBPF_OPTS(bpf_prog_load_opts, opts, LIBBPF_OPTS(bpf_prog_load_opts, opts,
.log_buf = bpf_log_buf, .log_buf = bpf_log_buf,
.log_size = BPF_LOG_BUF_SIZE, .log_size = BPF_LOG_BUF_SIZE,

View File

@ -523,7 +523,7 @@ int main(int argc, char **argv)
return -1; return -1;
} }
for (f = 0; f < sizeof(map_flags) / sizeof(*map_flags); f++) { for (f = 0; f < ARRAY_SIZE(map_flags); f++) {
test_lru_loss0(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); test_lru_loss0(BPF_MAP_TYPE_LRU_HASH, map_flags[f]);
test_lru_loss1(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); test_lru_loss1(BPF_MAP_TYPE_LRU_HASH, map_flags[f]);
test_parallel_lru_loss(BPF_MAP_TYPE_LRU_HASH, map_flags[f], test_parallel_lru_loss(BPF_MAP_TYPE_LRU_HASH, map_flags[f],

View File

@ -12,6 +12,8 @@
#include <bpf/bpf.h> #include <bpf/bpf.h>
#include <bpf/libbpf.h> #include <bpf/libbpf.h>
#include "bpf_util.h"
static int map_fd[7]; static int map_fd[7];
#define PORT_A (map_fd[0]) #define PORT_A (map_fd[0])
@ -28,7 +30,7 @@ static const char * const test_names[] = {
"Hash of Hash", "Hash of Hash",
}; };
#define NR_TESTS (sizeof(test_names) / sizeof(*test_names)) #define NR_TESTS ARRAY_SIZE(test_names)
static void check_map_id(int inner_map_fd, int map_in_map_fd, uint32_t key) static void check_map_id(int inner_map_fd, int map_in_map_fd, uint32_t key)
{ {

View File

@ -8,6 +8,7 @@
#include <bpf/bpf.h> #include <bpf/bpf.h>
#include <bpf/libbpf.h> #include <bpf/libbpf.h>
#include "trace_helpers.h" #include "trace_helpers.h"
#include "bpf_util.h"
#ifdef __mips__ #ifdef __mips__
#define MAX_ENTRIES 6000 /* MIPS n64 syscalls start at 5000 */ #define MAX_ENTRIES 6000 /* MIPS n64 syscalls start at 5000 */
@ -24,7 +25,7 @@ static void install_accept_all_seccomp(void)
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
}; };
struct sock_fprog prog = { struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), .len = (unsigned short)ARRAY_SIZE(filter),
.filter = filter, .filter = filter,
}; };
if (prctl(PR_SET_SECCOMP, 2, &prog)) if (prctl(PR_SET_SECCOMP, 2, &prog))

View File

@ -33,7 +33,7 @@ struct {
} tx_port_native SEC(".maps"); } tx_port_native SEC(".maps");
/* store egress interface mac address */ /* store egress interface mac address */
const volatile char tx_mac_addr[ETH_ALEN]; const volatile __u8 tx_mac_addr[ETH_ALEN];
static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map) static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map)
{ {
@ -73,6 +73,7 @@ int xdp_redirect_map_egress(struct xdp_md *ctx)
{ {
void *data_end = (void *)(long)ctx->data_end; void *data_end = (void *)(long)ctx->data_end;
void *data = (void *)(long)ctx->data; void *data = (void *)(long)ctx->data;
u8 *mac_addr = (u8 *) tx_mac_addr;
struct ethhdr *eth = data; struct ethhdr *eth = data;
u64 nh_off; u64 nh_off;
@ -80,7 +81,8 @@ int xdp_redirect_map_egress(struct xdp_md *ctx)
if (data + nh_off > data_end) if (data + nh_off > data_end)
return XDP_DROP; return XDP_DROP;
__builtin_memcpy(eth->h_source, (const char *)tx_mac_addr, ETH_ALEN); barrier_var(mac_addr); /* prevent optimizing out memcpy */
__builtin_memcpy(eth->h_source, mac_addr, ETH_ALEN);
return XDP_PASS; return XDP_PASS;
} }

View File

@ -40,6 +40,8 @@ static const struct option long_options[] = {
{} {}
}; };
static int verbose = 0;
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
struct bpf_devmap_val devmap_val = {}; struct bpf_devmap_val devmap_val = {};
@ -79,6 +81,7 @@ int main(int argc, char **argv)
break; break;
case 'v': case 'v':
sample_switch_mode(); sample_switch_mode();
verbose = 1;
break; break;
case 's': case 's':
mask |= SAMPLE_REDIRECT_MAP_CNT; mask |= SAMPLE_REDIRECT_MAP_CNT;
@ -134,6 +137,12 @@ int main(int argc, char **argv)
ret = EXIT_FAIL; ret = EXIT_FAIL;
goto end_destroy; goto end_destroy;
} }
if (verbose)
printf("Egress ifindex:%d using src MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
ifindex_out,
skel->rodata->tx_mac_addr[0], skel->rodata->tx_mac_addr[1],
skel->rodata->tx_mac_addr[2], skel->rodata->tx_mac_addr[3],
skel->rodata->tx_mac_addr[4], skel->rodata->tx_mac_addr[5]);
} }
skel->rodata->from_match[0] = ifindex_in; skel->rodata->from_match[0] = ifindex_in;

View File

@ -333,27 +333,7 @@ class PrinterRST(Printer):
.. Copyright (C) All BPF authors and contributors from 2014 to present. .. Copyright (C) All BPF authors and contributors from 2014 to present.
.. See git log include/uapi/linux/bpf.h in kernel tree for details. .. See git log include/uapi/linux/bpf.h in kernel tree for details.
.. ..
.. %%%LICENSE_START(VERBATIM) .. SPDX-License-Identifier: Linux-man-pages-copyleft
.. Permission is granted to make and distribute verbatim copies of this
.. manual provided the copyright notice and this permission notice are
.. preserved on all copies.
..
.. Permission is granted to copy and distribute modified versions of this
.. manual under the conditions for verbatim copying, provided that the
.. entire resulting derived work is distributed under the terms of a
.. permission notice identical to this one.
..
.. Since the Linux kernel and libraries are constantly changing, this
.. manual page may be incorrect or out-of-date. The author(s) assume no
.. responsibility for errors or omissions, or for damages resulting from
.. the use of the information contained herein. The author(s) may not
.. have taken the same level of care in the production of this manual,
.. which is licensed free of charge, as they might when working
.. professionally.
..
.. Formatted or processed versions of this manual, if unaccompanied by
.. the source, must acknowledge the copyright and authors of this work.
.. %%%LICENSE_END
.. ..
.. Please do not edit this file. It was generated from the documentation .. Please do not edit this file. It was generated from the documentation
.. located in file include/uapi/linux/bpf.h of the Linux kernel sources .. located in file include/uapi/linux/bpf.h of the Linux kernel sources

View File

@ -45,6 +45,19 @@
* .zero 4 * .zero 4
* __BTF_ID__func__vfs_fallocate__4: * __BTF_ID__func__vfs_fallocate__4:
* .zero 4 * .zero 4
*
* set8 - store symbol size into first 4 bytes and sort following
* ID list
*
* __BTF_ID__set8__list:
* .zero 8
* list:
* __BTF_ID__func__vfs_getattr__3:
* .zero 4
* .word (1 << 0) | (1 << 2)
* __BTF_ID__func__vfs_fallocate__5:
* .zero 4
* .word (1 << 3) | (1 << 1) | (1 << 2)
*/ */
#define _GNU_SOURCE #define _GNU_SOURCE
@ -72,6 +85,7 @@
#define BTF_TYPEDEF "typedef" #define BTF_TYPEDEF "typedef"
#define BTF_FUNC "func" #define BTF_FUNC "func"
#define BTF_SET "set" #define BTF_SET "set"
#define BTF_SET8 "set8"
#define ADDR_CNT 100 #define ADDR_CNT 100
@ -84,6 +98,7 @@ struct btf_id {
}; };
int addr_cnt; int addr_cnt;
bool is_set; bool is_set;
bool is_set8;
Elf64_Addr addr[ADDR_CNT]; Elf64_Addr addr[ADDR_CNT];
}; };
@ -231,14 +246,14 @@ static char *get_id(const char *prefix_end)
return id; return id;
} }
static struct btf_id *add_set(struct object *obj, char *name) static struct btf_id *add_set(struct object *obj, char *name, bool is_set8)
{ {
/* /*
* __BTF_ID__set__name * __BTF_ID__set__name
* name = ^ * name = ^
* id = ^ * id = ^
*/ */
char *id = name + sizeof(BTF_SET "__") - 1; char *id = name + (is_set8 ? sizeof(BTF_SET8 "__") : sizeof(BTF_SET "__")) - 1;
int len = strlen(name); int len = strlen(name);
if (id >= name + len) { if (id >= name + len) {
@ -444,9 +459,21 @@ static int symbols_collect(struct object *obj)
} else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) { } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) {
obj->nr_funcs++; obj->nr_funcs++;
id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1); id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1);
/* set8 */
} else if (!strncmp(prefix, BTF_SET8, sizeof(BTF_SET8) - 1)) {
id = add_set(obj, prefix, true);
/*
* SET8 objects store list's count, which is encoded
* in symbol's size, together with 'cnt' field hence
* that - 1.
*/
if (id) {
id->cnt = sym.st_size / sizeof(uint64_t) - 1;
id->is_set8 = true;
}
/* set */ /* set */
} else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) { } else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) {
id = add_set(obj, prefix); id = add_set(obj, prefix, false);
/* /*
* SET objects store list's count, which is encoded * SET objects store list's count, which is encoded
* in symbol's size, together with 'cnt' field hence * in symbol's size, together with 'cnt' field hence
@ -571,7 +598,8 @@ static int id_patch(struct object *obj, struct btf_id *id)
int *ptr = data->d_buf; int *ptr = data->d_buf;
int i; int i;
if (!id->id && !id->is_set) /* For set, set8, id->id may be 0 */
if (!id->id && !id->is_set && !id->is_set8)
pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name); pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name);
for (i = 0; i < id->addr_cnt; i++) { for (i = 0; i < id->addr_cnt; i++) {
@ -643,13 +671,13 @@ static int sets_patch(struct object *obj)
} }
idx = idx / sizeof(int); idx = idx / sizeof(int);
base = &ptr[idx] + 1; base = &ptr[idx] + (id->is_set8 ? 2 : 1);
cnt = ptr[idx]; cnt = ptr[idx];
pr_debug("sorting addr %5lu: cnt %6d [%s]\n", pr_debug("sorting addr %5lu: cnt %6d [%s]\n",
(idx + 1) * sizeof(int), cnt, id->name); (idx + 1) * sizeof(int), cnt, id->name);
qsort(base, cnt, sizeof(int), cmp_id); qsort(base, cnt, id->is_set8 ? sizeof(uint64_t) : sizeof(int), cmp_id);
next = rb_next(next); next = rb_next(next);
} }

View File

@ -4,7 +4,7 @@ include ../../scripts/Makefile.include
OUTPUT ?= $(abspath .output)/ OUTPUT ?= $(abspath .output)/
BPFTOOL_OUTPUT := $(OUTPUT)bpftool/ BPFTOOL_OUTPUT := $(OUTPUT)bpftool/
DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bootstrap/bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL) BPFTOOL ?= $(DEFAULT_BPFTOOL)
LIBBPF_SRC := $(abspath ../../lib/bpf) LIBBPF_SRC := $(abspath ../../lib/bpf)
BPFOBJ_OUTPUT := $(OUTPUT)libbpf/ BPFOBJ_OUTPUT := $(OUTPUT)libbpf/
@ -86,6 +86,5 @@ $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(BPFOBJ_OU
$(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \ $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \
DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers
$(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) \ $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) bootstrap
ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD)

View File

@ -2361,7 +2361,8 @@ union bpf_attr {
* Pull in non-linear data in case the *skb* is non-linear and not * Pull in non-linear data in case the *skb* is non-linear and not
* all of *len* are part of the linear section. Make *len* bytes * all of *len* are part of the linear section. Make *len* bytes
* from *skb* readable and writable. If a zero value is passed for * from *skb* readable and writable. If a zero value is passed for
* *len*, then the whole length of the *skb* is pulled. * *len*, then all bytes in the linear part of *skb* will be made
* readable and writable.
* *
* This helper is only needed for reading and writing with direct * This helper is only needed for reading and writing with direct
* packet access. * packet access.

View File

@ -2,6 +2,8 @@
#ifndef __BPF_TRACING_H__ #ifndef __BPF_TRACING_H__
#define __BPF_TRACING_H__ #define __BPF_TRACING_H__
#include <bpf/bpf_helpers.h>
/* Scan the ARCH passed in from ARCH env variable (see Makefile) */ /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
#if defined(__TARGET_ARCH_x86) #if defined(__TARGET_ARCH_x86)
#define bpf_target_x86 #define bpf_target_x86
@ -140,7 +142,7 @@ struct pt_regs___s390 {
#define __PT_RC_REG gprs[2] #define __PT_RC_REG gprs[2]
#define __PT_SP_REG gprs[15] #define __PT_SP_REG gprs[15]
#define __PT_IP_REG psw.addr #define __PT_IP_REG psw.addr
#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2) #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2)
#elif defined(bpf_target_arm) #elif defined(bpf_target_arm)
@ -174,7 +176,7 @@ struct pt_regs___arm64 {
#define __PT_RC_REG regs[0] #define __PT_RC_REG regs[0]
#define __PT_SP_REG sp #define __PT_SP_REG sp
#define __PT_IP_REG pc #define __PT_IP_REG pc
#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x)
#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0) #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0)
#elif defined(bpf_target_mips) #elif defined(bpf_target_mips)
@ -493,39 +495,62 @@ typeof(name(0)) name(struct pt_regs *ctx) \
} \ } \
static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args) static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
/* If kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER, read pt_regs directly */
#define ___bpf_syscall_args0() ctx #define ___bpf_syscall_args0() ctx
#define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) #define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_SYSCALL(regs)
#define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) #define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_SYSCALL(regs)
#define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) #define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs)
#define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) #define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs)
#define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) #define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs)
#define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args) #define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args)
/* If kernel doesn't have CONFIG_ARCH_HAS_SYSCALL_WRAPPER, we have to BPF_CORE_READ from pt_regs */
#define ___bpf_syswrap_args0() ctx
#define ___bpf_syswrap_args1(x) ___bpf_syswrap_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args2(x, args...) ___bpf_syswrap_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs)
#define ___bpf_syswrap_args(args...) ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args)
/* /*
* BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for * BPF_KSYSCALL is a variant of BPF_KPROBE, which is intended for
* tracing syscall functions, like __x64_sys_close. It hides the underlying * tracing syscall functions, like __x64_sys_close. It hides the underlying
* platform-specific low-level way of getting syscall input arguments from * platform-specific low-level way of getting syscall input arguments from
* struct pt_regs, and provides a familiar typed and named function arguments * struct pt_regs, and provides a familiar typed and named function arguments
* syntax and semantics of accessing syscall input parameters. * syntax and semantics of accessing syscall input parameters.
* *
* Original struct pt_regs* context is preserved as 'ctx' argument. This might * Original struct pt_regs * context is preserved as 'ctx' argument. This might
* be necessary when using BPF helpers like bpf_perf_event_output(). * be necessary when using BPF helpers like bpf_perf_event_output().
* *
* This macro relies on BPF CO-RE support. * At the moment BPF_KSYSCALL does not handle all the calling convention
* quirks for mmap(), clone() and compat syscalls transparrently. This may or
* may not change in the future. User needs to take extra measures to handle
* such quirks explicitly, if necessary.
*
* This macro relies on BPF CO-RE support and virtual __kconfig externs.
*/ */
#define BPF_KPROBE_SYSCALL(name, args...) \ #define BPF_KSYSCALL(name, args...) \
name(struct pt_regs *ctx); \ name(struct pt_regs *ctx); \
extern _Bool LINUX_HAS_SYSCALL_WRAPPER __kconfig; \
static __attribute__((always_inline)) typeof(name(0)) \ static __attribute__((always_inline)) typeof(name(0)) \
____##name(struct pt_regs *ctx, ##args); \ ____##name(struct pt_regs *ctx, ##args); \
typeof(name(0)) name(struct pt_regs *ctx) \ typeof(name(0)) name(struct pt_regs *ctx) \
{ \ { \
struct pt_regs *regs = PT_REGS_SYSCALL_REGS(ctx); \ struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER \
? (struct pt_regs *)PT_REGS_PARM1(ctx) \
: ctx; \
_Pragma("GCC diagnostic push") \ _Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
return ____##name(___bpf_syscall_args(args)); \ if (LINUX_HAS_SYSCALL_WRAPPER) \
return ____##name(___bpf_syswrap_args(args)); \
else \
return ____##name(___bpf_syscall_args(args)); \
_Pragma("GCC diagnostic pop") \ _Pragma("GCC diagnostic pop") \
} \ } \
static __attribute__((always_inline)) typeof(name(0)) \ static __attribute__((always_inline)) typeof(name(0)) \
____##name(struct pt_regs *ctx, ##args) ____##name(struct pt_regs *ctx, ##args)
#define BPF_KPROBE_SYSCALL BPF_KSYSCALL
#endif #endif

View File

@ -2045,7 +2045,7 @@ static int btf_dump_get_enum_value(struct btf_dump *d,
*value = *(__s64 *)data; *value = *(__s64 *)data;
return 0; return 0;
case 4: case 4:
*value = is_signed ? *(__s32 *)data : *(__u32 *)data; *value = is_signed ? (__s64)*(__s32 *)data : *(__u32 *)data;
return 0; return 0;
case 2: case 2:
*value = is_signed ? *(__s16 *)data : *(__u16 *)data; *value = is_signed ? *(__s16 *)data : *(__u16 *)data;

View File

@ -533,7 +533,7 @@ void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *attach_name,
gen->attach_kind = kind; gen->attach_kind = kind;
ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s", ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s",
prefix, attach_name); prefix, attach_name);
if (ret == sizeof(gen->attach_target)) if (ret >= sizeof(gen->attach_target))
gen->error = -ENOSPC; gen->error = -ENOSPC;
} }

View File

@ -1694,7 +1694,7 @@ static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
switch (ext->kcfg.type) { switch (ext->kcfg.type) {
case KCFG_BOOL: case KCFG_BOOL:
if (value == 'm') { if (value == 'm') {
pr_warn("extern (kcfg) %s=%c should be tristate or char\n", pr_warn("extern (kcfg) '%s': value '%c' implies tristate or char type\n",
ext->name, value); ext->name, value);
return -EINVAL; return -EINVAL;
} }
@ -1715,7 +1715,7 @@ static int set_kcfg_value_tri(struct extern_desc *ext, void *ext_val,
case KCFG_INT: case KCFG_INT:
case KCFG_CHAR_ARR: case KCFG_CHAR_ARR:
default: default:
pr_warn("extern (kcfg) %s=%c should be bool, tristate, or char\n", pr_warn("extern (kcfg) '%s': value '%c' implies bool, tristate, or char type\n",
ext->name, value); ext->name, value);
return -EINVAL; return -EINVAL;
} }
@ -1729,7 +1729,8 @@ static int set_kcfg_value_str(struct extern_desc *ext, char *ext_val,
size_t len; size_t len;
if (ext->kcfg.type != KCFG_CHAR_ARR) { if (ext->kcfg.type != KCFG_CHAR_ARR) {
pr_warn("extern (kcfg) %s=%s should be char array\n", ext->name, value); pr_warn("extern (kcfg) '%s': value '%s' implies char array type\n",
ext->name, value);
return -EINVAL; return -EINVAL;
} }
@ -1743,7 +1744,7 @@ static int set_kcfg_value_str(struct extern_desc *ext, char *ext_val,
/* strip quotes */ /* strip quotes */
len -= 2; len -= 2;
if (len >= ext->kcfg.sz) { if (len >= ext->kcfg.sz) {
pr_warn("extern (kcfg) '%s': long string config %s of (%zu bytes) truncated to %d bytes\n", pr_warn("extern (kcfg) '%s': long string '%s' of (%zu bytes) truncated to %d bytes\n",
ext->name, value, len, ext->kcfg.sz - 1); ext->name, value, len, ext->kcfg.sz - 1);
len = ext->kcfg.sz - 1; len = ext->kcfg.sz - 1;
} }
@ -1800,13 +1801,20 @@ static bool is_kcfg_value_in_range(const struct extern_desc *ext, __u64 v)
static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val, static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val,
__u64 value) __u64 value)
{ {
if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) { if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR &&
pr_warn("extern (kcfg) %s=%llu should be integer\n", ext->kcfg.type != KCFG_BOOL) {
pr_warn("extern (kcfg) '%s': value '%llu' implies integer, char, or boolean type\n",
ext->name, (unsigned long long)value); ext->name, (unsigned long long)value);
return -EINVAL; return -EINVAL;
} }
if (ext->kcfg.type == KCFG_BOOL && value > 1) {
pr_warn("extern (kcfg) '%s': value '%llu' isn't boolean compatible\n",
ext->name, (unsigned long long)value);
return -EINVAL;
}
if (!is_kcfg_value_in_range(ext, value)) { if (!is_kcfg_value_in_range(ext, value)) {
pr_warn("extern (kcfg) %s=%llu value doesn't fit in %d bytes\n", pr_warn("extern (kcfg) '%s': value '%llu' doesn't fit in %d bytes\n",
ext->name, (unsigned long long)value, ext->kcfg.sz); ext->name, (unsigned long long)value, ext->kcfg.sz);
return -ERANGE; return -ERANGE;
} }
@ -1870,16 +1878,19 @@ static int bpf_object__process_kconfig_line(struct bpf_object *obj,
/* assume integer */ /* assume integer */
err = parse_u64(value, &num); err = parse_u64(value, &num);
if (err) { if (err) {
pr_warn("extern (kcfg) %s=%s should be integer\n", pr_warn("extern (kcfg) '%s': value '%s' isn't a valid integer\n", ext->name, value);
ext->name, value);
return err; return err;
} }
if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) {
pr_warn("extern (kcfg) '%s': value '%s' implies integer type\n", ext->name, value);
return -EINVAL;
}
err = set_kcfg_value_num(ext, ext_val, num); err = set_kcfg_value_num(ext, ext_val, num);
break; break;
} }
if (err) if (err)
return err; return err;
pr_debug("extern (kcfg) %s=%s\n", ext->name, value); pr_debug("extern (kcfg) '%s': set to %s\n", ext->name, value);
return 0; return 0;
} }
@ -2320,6 +2331,37 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
return 0; return 0;
} }
static size_t adjust_ringbuf_sz(size_t sz)
{
__u32 page_sz = sysconf(_SC_PAGE_SIZE);
__u32 mul;
/* if user forgot to set any size, make sure they see error */
if (sz == 0)
return 0;
/* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be
* a power-of-2 multiple of kernel's page size. If user diligently
* satisified these conditions, pass the size through.
*/
if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz))
return sz;
/* Otherwise find closest (page_sz * power_of_2) product bigger than
* user-set size to satisfy both user size request and kernel
* requirements and substitute correct max_entries for map creation.
*/
for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) {
if (mul * page_sz > sz)
return mul * page_sz;
}
/* if it's impossible to satisfy the conditions (i.e., user size is
* very close to UINT_MAX but is not a power-of-2 multiple of
* page_size) then just return original size and let kernel reject it
*/
return sz;
}
static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def) static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def)
{ {
map->def.type = def->map_type; map->def.type = def->map_type;
@ -2333,6 +2375,10 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
map->btf_key_type_id = def->key_type_id; map->btf_key_type_id = def->key_type_id;
map->btf_value_type_id = def->value_type_id; map->btf_value_type_id = def->value_type_id;
/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
if (map->def.type == BPF_MAP_TYPE_RINGBUF)
map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
if (def->parts & MAP_DEF_MAP_TYPE) if (def->parts & MAP_DEF_MAP_TYPE)
pr_debug("map '%s': found type = %u.\n", map->name, def->map_type); pr_debug("map '%s': found type = %u.\n", map->name, def->map_type);
@ -3687,7 +3733,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
ext->kcfg.type = find_kcfg_type(obj->btf, t->type, ext->kcfg.type = find_kcfg_type(obj->btf, t->type,
&ext->kcfg.is_signed); &ext->kcfg.is_signed);
if (ext->kcfg.type == KCFG_UNKNOWN) { if (ext->kcfg.type == KCFG_UNKNOWN) {
pr_warn("extern (kcfg) '%s' type is unsupported\n", ext_name); pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name);
return -ENOTSUP; return -ENOTSUP;
} }
} else if (strcmp(sec_name, KSYMS_SEC) == 0) { } else if (strcmp(sec_name, KSYMS_SEC) == 0) {
@ -4232,7 +4278,7 @@ int bpf_map__set_autocreate(struct bpf_map *map, bool autocreate)
int bpf_map__reuse_fd(struct bpf_map *map, int fd) int bpf_map__reuse_fd(struct bpf_map *map, int fd)
{ {
struct bpf_map_info info = {}; struct bpf_map_info info = {};
__u32 len = sizeof(info); __u32 len = sizeof(info), name_len;
int new_fd, err; int new_fd, err;
char *new_name; char *new_name;
@ -4242,7 +4288,12 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
if (err) if (err)
return libbpf_err(err); return libbpf_err(err);
new_name = strdup(info.name); name_len = strlen(info.name);
if (name_len == BPF_OBJ_NAME_LEN - 1 && strncmp(map->name, info.name, name_len) == 0)
new_name = strdup(map->name);
else
new_name = strdup(info.name);
if (!new_name) if (!new_name)
return libbpf_err(-errno); return libbpf_err(-errno);
@ -4301,9 +4352,15 @@ struct bpf_map *bpf_map__inner_map(struct bpf_map *map)
int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries) int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
{ {
if (map->fd >= 0) if (map->obj->loaded)
return libbpf_err(-EBUSY); return libbpf_err(-EBUSY);
map->def.max_entries = max_entries; map->def.max_entries = max_entries;
/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
if (map->def.type == BPF_MAP_TYPE_RINGBUF)
map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
return 0; return 0;
} }
@ -4654,6 +4711,8 @@ static int probe_kern_btf_enum64(void)
strs, sizeof(strs))); strs, sizeof(strs)));
} }
static int probe_kern_syscall_wrapper(void);
enum kern_feature_result { enum kern_feature_result {
FEAT_UNKNOWN = 0, FEAT_UNKNOWN = 0,
FEAT_SUPPORTED = 1, FEAT_SUPPORTED = 1,
@ -4722,6 +4781,9 @@ static struct kern_feature_desc {
[FEAT_BTF_ENUM64] = { [FEAT_BTF_ENUM64] = {
"BTF_KIND_ENUM64 support", probe_kern_btf_enum64, "BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
}, },
[FEAT_SYSCALL_WRAPPER] = {
"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
},
}; };
bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id) bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
@ -4854,37 +4916,6 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
static void bpf_map__destroy(struct bpf_map *map); static void bpf_map__destroy(struct bpf_map *map);
static size_t adjust_ringbuf_sz(size_t sz)
{
__u32 page_sz = sysconf(_SC_PAGE_SIZE);
__u32 mul;
/* if user forgot to set any size, make sure they see error */
if (sz == 0)
return 0;
/* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be
* a power-of-2 multiple of kernel's page size. If user diligently
* satisified these conditions, pass the size through.
*/
if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz))
return sz;
/* Otherwise find closest (page_sz * power_of_2) product bigger than
* user-set size to satisfy both user size request and kernel
* requirements and substitute correct max_entries for map creation.
*/
for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) {
if (mul * page_sz > sz)
return mul * page_sz;
}
/* if it's impossible to satisfy the conditions (i.e., user size is
* very close to UINT_MAX but is not a power-of-2 multiple of
* page_size) then just return original size and let kernel reject it
*/
return sz;
}
static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner) static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner)
{ {
LIBBPF_OPTS(bpf_map_create_opts, create_attr); LIBBPF_OPTS(bpf_map_create_opts, create_attr);
@ -4923,9 +4954,6 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
} }
switch (def->type) { switch (def->type) {
case BPF_MAP_TYPE_RINGBUF:
map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
/* fallthrough */
case BPF_MAP_TYPE_PERF_EVENT_ARRAY: case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
case BPF_MAP_TYPE_CGROUP_ARRAY: case BPF_MAP_TYPE_CGROUP_ARRAY:
case BPF_MAP_TYPE_STACK_TRACE: case BPF_MAP_TYPE_STACK_TRACE:
@ -7282,14 +7310,14 @@ static int kallsyms_cb(unsigned long long sym_addr, char sym_type,
return 0; return 0;
if (ext->is_set && ext->ksym.addr != sym_addr) { if (ext->is_set && ext->ksym.addr != sym_addr) {
pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n", pr_warn("extern (ksym) '%s': resolution is ambiguous: 0x%llx or 0x%llx\n",
sym_name, ext->ksym.addr, sym_addr); sym_name, ext->ksym.addr, sym_addr);
return -EINVAL; return -EINVAL;
} }
if (!ext->is_set) { if (!ext->is_set) {
ext->is_set = true; ext->is_set = true;
ext->ksym.addr = sym_addr; ext->ksym.addr = sym_addr;
pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr); pr_debug("extern (ksym) '%s': set to 0x%llx\n", sym_name, sym_addr);
} }
return 0; return 0;
} }
@ -7493,28 +7521,52 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
for (i = 0; i < obj->nr_extern; i++) { for (i = 0; i < obj->nr_extern; i++) {
ext = &obj->externs[i]; ext = &obj->externs[i];
if (ext->type == EXT_KCFG && if (ext->type == EXT_KSYM) {
strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) {
void *ext_val = kcfg_data + ext->kcfg.data_off;
__u32 kver = get_kernel_version();
if (!kver) {
pr_warn("failed to get kernel version\n");
return -EINVAL;
}
err = set_kcfg_value_num(ext, ext_val, kver);
if (err)
return err;
pr_debug("extern (kcfg) %s=0x%x\n", ext->name, kver);
} else if (ext->type == EXT_KCFG && str_has_pfx(ext->name, "CONFIG_")) {
need_config = true;
} else if (ext->type == EXT_KSYM) {
if (ext->ksym.type_id) if (ext->ksym.type_id)
need_vmlinux_btf = true; need_vmlinux_btf = true;
else else
need_kallsyms = true; need_kallsyms = true;
continue;
} else if (ext->type == EXT_KCFG) {
void *ext_ptr = kcfg_data + ext->kcfg.data_off;
__u64 value = 0;
/* Kconfig externs need actual /proc/config.gz */
if (str_has_pfx(ext->name, "CONFIG_")) {
need_config = true;
continue;
}
/* Virtual kcfg externs are customly handled by libbpf */
if (strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) {
value = get_kernel_version();
if (!value) {
pr_warn("extern (kcfg) '%s': failed to get kernel version\n", ext->name);
return -EINVAL;
}
} else if (strcmp(ext->name, "LINUX_HAS_BPF_COOKIE") == 0) {
value = kernel_supports(obj, FEAT_BPF_COOKIE);
} else if (strcmp(ext->name, "LINUX_HAS_SYSCALL_WRAPPER") == 0) {
value = kernel_supports(obj, FEAT_SYSCALL_WRAPPER);
} else if (!str_has_pfx(ext->name, "LINUX_") || !ext->is_weak) {
/* Currently libbpf supports only CONFIG_ and LINUX_ prefixed
* __kconfig externs, where LINUX_ ones are virtual and filled out
* customly by libbpf (their values don't come from Kconfig).
* If LINUX_xxx variable is not recognized by libbpf, but is marked
* __weak, it defaults to zero value, just like for CONFIG_xxx
* externs.
*/
pr_warn("extern (kcfg) '%s': unrecognized virtual extern\n", ext->name);
return -EINVAL;
}
err = set_kcfg_value_num(ext, ext_ptr, value);
if (err)
return err;
pr_debug("extern (kcfg) '%s': set to 0x%llx\n",
ext->name, (long long)value);
} else { } else {
pr_warn("unrecognized extern '%s'\n", ext->name); pr_warn("extern '%s': unrecognized extern kind\n", ext->name);
return -EINVAL; return -EINVAL;
} }
} }
@ -7550,10 +7602,10 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
ext = &obj->externs[i]; ext = &obj->externs[i];
if (!ext->is_set && !ext->is_weak) { if (!ext->is_set && !ext->is_weak) {
pr_warn("extern %s (strong) not resolved\n", ext->name); pr_warn("extern '%s' (strong): not resolved\n", ext->name);
return -ESRCH; return -ESRCH;
} else if (!ext->is_set) { } else if (!ext->is_set) {
pr_debug("extern %s (weak) not resolved, defaulting to zero\n", pr_debug("extern '%s' (weak): not resolved, defaulting to zero\n",
ext->name); ext->name);
} }
} }
@ -8381,6 +8433,7 @@ int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log
static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link); static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link);
@ -8401,6 +8454,8 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe), SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt), SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt),
SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE),
SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE),
@ -9757,7 +9812,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
{ {
struct perf_event_attr attr = {}; struct perf_event_attr attr = {};
char errmsg[STRERR_BUFSIZE]; char errmsg[STRERR_BUFSIZE];
int type, pfd, err; int type, pfd;
if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS)) if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
return -EINVAL; return -EINVAL;
@ -9793,14 +9848,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
pid < 0 ? -1 : pid /* pid */, pid < 0 ? -1 : pid /* pid */,
pid == -1 ? 0 : -1 /* cpu */, pid == -1 ? 0 : -1 /* cpu */,
-1 /* group_fd */, PERF_FLAG_FD_CLOEXEC); -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
if (pfd < 0) { return pfd >= 0 ? pfd : -errno;
err = -errno;
pr_warn("%s perf_event_open() failed: %s\n",
uprobe ? "uprobe" : "kprobe",
libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
return err;
}
return pfd;
} }
static int append_to_file(const char *file, const char *fmt, ...) static int append_to_file(const char *file, const char *fmt, ...)
@ -9823,6 +9871,34 @@ static int append_to_file(const char *file, const char *fmt, ...)
return err; return err;
} }
#define DEBUGFS "/sys/kernel/debug/tracing"
#define TRACEFS "/sys/kernel/tracing"
static bool use_debugfs(void)
{
static int has_debugfs = -1;
if (has_debugfs < 0)
has_debugfs = access(DEBUGFS, F_OK) == 0;
return has_debugfs == 1;
}
static const char *tracefs_path(void)
{
return use_debugfs() ? DEBUGFS : TRACEFS;
}
static const char *tracefs_kprobe_events(void)
{
return use_debugfs() ? DEBUGFS"/kprobe_events" : TRACEFS"/kprobe_events";
}
static const char *tracefs_uprobe_events(void)
{
return use_debugfs() ? DEBUGFS"/uprobe_events" : TRACEFS"/uprobe_events";
}
static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz, static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz,
const char *kfunc_name, size_t offset) const char *kfunc_name, size_t offset)
{ {
@ -9835,9 +9911,7 @@ static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz,
static int add_kprobe_event_legacy(const char *probe_name, bool retprobe, static int add_kprobe_event_legacy(const char *probe_name, bool retprobe,
const char *kfunc_name, size_t offset) const char *kfunc_name, size_t offset)
{ {
const char *file = "/sys/kernel/debug/tracing/kprobe_events"; return append_to_file(tracefs_kprobe_events(), "%c:%s/%s %s+0x%zx",
return append_to_file(file, "%c:%s/%s %s+0x%zx",
retprobe ? 'r' : 'p', retprobe ? 'r' : 'p',
retprobe ? "kretprobes" : "kprobes", retprobe ? "kretprobes" : "kprobes",
probe_name, kfunc_name, offset); probe_name, kfunc_name, offset);
@ -9845,18 +9919,16 @@ static int add_kprobe_event_legacy(const char *probe_name, bool retprobe,
static int remove_kprobe_event_legacy(const char *probe_name, bool retprobe) static int remove_kprobe_event_legacy(const char *probe_name, bool retprobe)
{ {
const char *file = "/sys/kernel/debug/tracing/kprobe_events"; return append_to_file(tracefs_kprobe_events(), "-:%s/%s",
retprobe ? "kretprobes" : "kprobes", probe_name);
return append_to_file(file, "-:%s/%s", retprobe ? "kretprobes" : "kprobes", probe_name);
} }
static int determine_kprobe_perf_type_legacy(const char *probe_name, bool retprobe) static int determine_kprobe_perf_type_legacy(const char *probe_name, bool retprobe)
{ {
char file[256]; char file[256];
snprintf(file, sizeof(file), snprintf(file, sizeof(file), "%s/events/%s/%s/id",
"/sys/kernel/debug/tracing/events/%s/%s/id", tracefs_path(), retprobe ? "kretprobes" : "kprobes", probe_name);
retprobe ? "kretprobes" : "kprobes", probe_name);
return parse_uint_from_file(file, "%d\n"); return parse_uint_from_file(file, "%d\n");
} }
@ -9905,6 +9977,60 @@ err_clean_legacy:
return err; return err;
} }
static const char *arch_specific_syscall_pfx(void)
{
#if defined(__x86_64__)
return "x64";
#elif defined(__i386__)
return "ia32";
#elif defined(__s390x__)
return "s390x";
#elif defined(__s390__)
return "s390";
#elif defined(__arm__)
return "arm";
#elif defined(__aarch64__)
return "arm64";
#elif defined(__mips__)
return "mips";
#elif defined(__riscv)
return "riscv";
#else
return NULL;
#endif
}
static int probe_kern_syscall_wrapper(void)
{
char syscall_name[64];
const char *ksys_pfx;
ksys_pfx = arch_specific_syscall_pfx();
if (!ksys_pfx)
return 0;
snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
if (determine_kprobe_perf_type() >= 0) {
int pfd;
pfd = perf_event_open_probe(false, false, syscall_name, 0, getpid(), 0);
if (pfd >= 0)
close(pfd);
return pfd >= 0 ? 1 : 0;
} else { /* legacy mode */
char probe_name[128];
gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0);
if (add_kprobe_event_legacy(probe_name, false, syscall_name, 0) < 0)
return 0;
(void)remove_kprobe_event_legacy(probe_name, false);
return 1;
}
}
struct bpf_link * struct bpf_link *
bpf_program__attach_kprobe_opts(const struct bpf_program *prog, bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
const char *func_name, const char *func_name,
@ -9990,6 +10116,29 @@ struct bpf_link *bpf_program__attach_kprobe(const struct bpf_program *prog,
return bpf_program__attach_kprobe_opts(prog, func_name, &opts); return bpf_program__attach_kprobe_opts(prog, func_name, &opts);
} }
struct bpf_link *bpf_program__attach_ksyscall(const struct bpf_program *prog,
const char *syscall_name,
const struct bpf_ksyscall_opts *opts)
{
LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
char func_name[128];
if (!OPTS_VALID(opts, bpf_ksyscall_opts))
return libbpf_err_ptr(-EINVAL);
if (kernel_supports(prog->obj, FEAT_SYSCALL_WRAPPER)) {
snprintf(func_name, sizeof(func_name), "__%s_sys_%s",
arch_specific_syscall_pfx(), syscall_name);
} else {
snprintf(func_name, sizeof(func_name), "__se_sys_%s", syscall_name);
}
kprobe_opts.retprobe = OPTS_GET(opts, retprobe, false);
kprobe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
return bpf_program__attach_kprobe_opts(prog, func_name, &kprobe_opts);
}
/* Adapted from perf/util/string.c */ /* Adapted from perf/util/string.c */
static bool glob_match(const char *str, const char *pat) static bool glob_match(const char *str, const char *pat)
{ {
@ -10160,6 +10309,27 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf
return libbpf_get_error(*link); return libbpf_get_error(*link);
} }
static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
{
LIBBPF_OPTS(bpf_ksyscall_opts, opts);
const char *syscall_name;
*link = NULL;
/* no auto-attach for SEC("ksyscall") and SEC("kretsyscall") */
if (strcmp(prog->sec_name, "ksyscall") == 0 || strcmp(prog->sec_name, "kretsyscall") == 0)
return 0;
opts.retprobe = str_has_pfx(prog->sec_name, "kretsyscall/");
if (opts.retprobe)
syscall_name = prog->sec_name + sizeof("kretsyscall/") - 1;
else
syscall_name = prog->sec_name + sizeof("ksyscall/") - 1;
*link = bpf_program__attach_ksyscall(prog, syscall_name, &opts);
return *link ? 0 : -errno;
}
static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link) static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link)
{ {
LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
@ -10208,9 +10378,7 @@ static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz,
static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe, static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe,
const char *binary_path, size_t offset) const char *binary_path, size_t offset)
{ {
const char *file = "/sys/kernel/debug/tracing/uprobe_events"; return append_to_file(tracefs_uprobe_events(), "%c:%s/%s %s:0x%zx",
return append_to_file(file, "%c:%s/%s %s:0x%zx",
retprobe ? 'r' : 'p', retprobe ? 'r' : 'p',
retprobe ? "uretprobes" : "uprobes", retprobe ? "uretprobes" : "uprobes",
probe_name, binary_path, offset); probe_name, binary_path, offset);
@ -10218,18 +10386,16 @@ static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe,
static inline int remove_uprobe_event_legacy(const char *probe_name, bool retprobe) static inline int remove_uprobe_event_legacy(const char *probe_name, bool retprobe)
{ {
const char *file = "/sys/kernel/debug/tracing/uprobe_events"; return append_to_file(tracefs_uprobe_events(), "-:%s/%s",
retprobe ? "uretprobes" : "uprobes", probe_name);
return append_to_file(file, "-:%s/%s", retprobe ? "uretprobes" : "uprobes", probe_name);
} }
static int determine_uprobe_perf_type_legacy(const char *probe_name, bool retprobe) static int determine_uprobe_perf_type_legacy(const char *probe_name, bool retprobe)
{ {
char file[512]; char file[512];
snprintf(file, sizeof(file), snprintf(file, sizeof(file), "%s/events/%s/%s/id",
"/sys/kernel/debug/tracing/events/%s/%s/id", tracefs_path(), retprobe ? "uretprobes" : "uprobes", probe_name);
retprobe ? "uretprobes" : "uprobes", probe_name);
return parse_uint_from_file(file, "%d\n"); return parse_uint_from_file(file, "%d\n");
} }
@ -10545,7 +10711,10 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0); ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0);
pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
if (binary_path && !strchr(binary_path, '/')) { if (!binary_path)
return libbpf_err_ptr(-EINVAL);
if (!strchr(binary_path, '/')) {
err = resolve_full_path(binary_path, full_binary_path, err = resolve_full_path(binary_path, full_binary_path,
sizeof(full_binary_path)); sizeof(full_binary_path));
if (err) { if (err) {
@ -10559,11 +10728,6 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
if (func_name) { if (func_name) {
long sym_off; long sym_off;
if (!binary_path) {
pr_warn("prog '%s': name-based attach requires binary_path\n",
prog->name);
return libbpf_err_ptr(-EINVAL);
}
sym_off = elf_find_func_offset(binary_path, func_name); sym_off = elf_find_func_offset(binary_path, func_name);
if (sym_off < 0) if (sym_off < 0)
return libbpf_err_ptr(sym_off); return libbpf_err_ptr(sym_off);
@ -10711,6 +10875,9 @@ struct bpf_link *bpf_program__attach_usdt(const struct bpf_program *prog,
return libbpf_err_ptr(-EINVAL); return libbpf_err_ptr(-EINVAL);
} }
if (!binary_path)
return libbpf_err_ptr(-EINVAL);
if (!strchr(binary_path, '/')) { if (!strchr(binary_path, '/')) {
err = resolve_full_path(binary_path, resolved_path, sizeof(resolved_path)); err = resolve_full_path(binary_path, resolved_path, sizeof(resolved_path));
if (err) { if (err) {
@ -10776,9 +10943,8 @@ static int determine_tracepoint_id(const char *tp_category,
char file[PATH_MAX]; char file[PATH_MAX];
int ret; int ret;
ret = snprintf(file, sizeof(file), ret = snprintf(file, sizeof(file), "%s/events/%s/%s/id",
"/sys/kernel/debug/tracing/events/%s/%s/id", tracefs_path(), tp_category, tp_name);
tp_category, tp_name);
if (ret < 0) if (ret < 0)
return -errno; return -errno;
if (ret >= sizeof(file)) { if (ret >= sizeof(file)) {
@ -11728,6 +11894,22 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
return cpu_buf->fd; return cpu_buf->fd;
} }
int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size)
{
struct perf_cpu_buf *cpu_buf;
if (buf_idx >= pb->cpu_cnt)
return libbpf_err(-EINVAL);
cpu_buf = pb->cpu_bufs[buf_idx];
if (!cpu_buf)
return libbpf_err(-ENOENT);
*buf = cpu_buf->base;
*buf_size = pb->mmap_size;
return 0;
}
/* /*
* Consume data from perf ring buffer corresponding to slot *buf_idx* in * Consume data from perf ring buffer corresponding to slot *buf_idx* in
* PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to

View File

@ -457,6 +457,52 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog,
const char *pattern, const char *pattern,
const struct bpf_kprobe_multi_opts *opts); const struct bpf_kprobe_multi_opts *opts);
struct bpf_ksyscall_opts {
/* size of this struct, for forward/backward compatiblity */
size_t sz;
/* custom user-provided value fetchable through bpf_get_attach_cookie() */
__u64 bpf_cookie;
/* attach as return probe? */
bool retprobe;
size_t :0;
};
#define bpf_ksyscall_opts__last_field retprobe
/**
* @brief **bpf_program__attach_ksyscall()** attaches a BPF program
* to kernel syscall handler of a specified syscall. Optionally it's possible
* to request to install retprobe that will be triggered at syscall exit. It's
* also possible to associate BPF cookie (though options).
*
* Libbpf automatically will determine correct full kernel function name,
* which depending on system architecture and kernel version/configuration
* could be of the form __<arch>_sys_<syscall> or __se_sys_<syscall>, and will
* attach specified program using kprobe/kretprobe mechanism.
*
* **bpf_program__attach_ksyscall()** is an API counterpart of declarative
* **SEC("ksyscall/<syscall>")** annotation of BPF programs.
*
* At the moment **SEC("ksyscall")** and **bpf_program__attach_ksyscall()** do
* not handle all the calling convention quirks for mmap(), clone() and compat
* syscalls. It also only attaches to "native" syscall interfaces. If host
* system supports compat syscalls or defines 32-bit syscalls in 64-bit
* kernel, such syscall interfaces won't be attached to by libbpf.
*
* These limitations may or may not change in the future. Therefore it is
* recommended to use SEC("kprobe") for these syscalls or if working with
* compat and 32-bit interfaces is required.
*
* @param prog BPF program to attach
* @param syscall_name Symbolic name of the syscall (e.g., "bpf")
* @param opts Additional options (see **struct bpf_ksyscall_opts**)
* @return Reference to the newly created BPF link; or NULL is returned on
* error, error code is stored in errno
*/
LIBBPF_API struct bpf_link *
bpf_program__attach_ksyscall(const struct bpf_program *prog,
const char *syscall_name,
const struct bpf_ksyscall_opts *opts);
struct bpf_uprobe_opts { struct bpf_uprobe_opts {
/* size of this struct, for forward/backward compatiblity */ /* size of this struct, for forward/backward compatiblity */
size_t sz; size_t sz;
@ -1053,6 +1099,22 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
/**
* @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying
* memory region of the ring buffer.
* This ring buffer can be used to implement a custom events consumer.
* The ring buffer starts with the *struct perf_event_mmap_page*, which
* holds the ring buffer managment fields, when accessing the header
* structure it's important to be SMP aware.
* You can refer to *perf_event_read_simple* for a simple example.
* @param pb the perf buffer structure
* @param buf_idx the buffer index to retreive
* @param buf (out) gets the base pointer of the mmap()'ed memory
* @param buf_size (out) gets the size of the mmap()'ed region
* @return 0 on success, negative error code for failure
*/
LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
size_t *buf_size);
struct bpf_prog_linfo; struct bpf_prog_linfo;
struct bpf_prog_info; struct bpf_prog_info;

View File

@ -356,10 +356,12 @@ LIBBPF_0.8.0 {
LIBBPF_1.0.0 { LIBBPF_1.0.0 {
global: global:
bpf_prog_query_opts; bpf_prog_query_opts;
bpf_program__attach_ksyscall;
btf__add_enum64; btf__add_enum64;
btf__add_enum64_value; btf__add_enum64_value;
libbpf_bpf_attach_type_str; libbpf_bpf_attach_type_str;
libbpf_bpf_link_type_str; libbpf_bpf_link_type_str;
libbpf_bpf_map_type_str; libbpf_bpf_map_type_str;
libbpf_bpf_prog_type_str; libbpf_bpf_prog_type_str;
perf_buffer__buffer;
}; };

View File

@ -108,9 +108,9 @@ static inline bool str_has_sfx(const char *str, const char *sfx)
size_t str_len = strlen(str); size_t str_len = strlen(str);
size_t sfx_len = strlen(sfx); size_t sfx_len = strlen(sfx);
if (sfx_len <= str_len) if (sfx_len > str_len)
return strcmp(str + str_len - sfx_len, sfx); return false;
return false; return strcmp(str + str_len - sfx_len, sfx) == 0;
} }
/* Symbol versioning is different between static and shared library. /* Symbol versioning is different between static and shared library.
@ -352,6 +352,8 @@ enum kern_feature_id {
FEAT_BPF_COOKIE, FEAT_BPF_COOKIE,
/* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */ /* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */
FEAT_BTF_ENUM64, FEAT_BTF_ENUM64,
/* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */
FEAT_SYSCALL_WRAPPER,
__FEAT_CNT, __FEAT_CNT,
}; };

View File

@ -6,7 +6,6 @@
#include <linux/errno.h> #include <linux/errno.h>
#include <bpf/bpf_helpers.h> #include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h> #include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
/* Below types and maps are internal implementation details of libbpf's USDT /* Below types and maps are internal implementation details of libbpf's USDT
* support and are subjects to change. Also, bpf_usdt_xxx() API helpers should * support and are subjects to change. Also, bpf_usdt_xxx() API helpers should
@ -30,14 +29,6 @@
#ifndef BPF_USDT_MAX_IP_CNT #ifndef BPF_USDT_MAX_IP_CNT
#define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT) #define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT)
#endif #endif
/* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is
* the only dependency on CO-RE, so if it's undesirable, user can override
* BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not.
*/
#ifndef BPF_USDT_HAS_BPF_COOKIE
#define BPF_USDT_HAS_BPF_COOKIE \
bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt)
#endif
enum __bpf_usdt_arg_type { enum __bpf_usdt_arg_type {
BPF_USDT_ARG_CONST, BPF_USDT_ARG_CONST,
@ -83,15 +74,12 @@ struct {
__type(value, __u32); __type(value, __u32);
} __bpf_usdt_ip_to_spec_id SEC(".maps") __weak; } __bpf_usdt_ip_to_spec_id SEC(".maps") __weak;
/* don't rely on user's BPF code to have latest definition of bpf_func_id */ extern const _Bool LINUX_HAS_BPF_COOKIE __kconfig;
enum bpf_func_id___usdt {
BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */
};
static __always_inline static __always_inline
int __bpf_usdt_spec_id(struct pt_regs *ctx) int __bpf_usdt_spec_id(struct pt_regs *ctx)
{ {
if (!BPF_USDT_HAS_BPF_COOKIE) { if (!LINUX_HAS_BPF_COOKIE) {
long ip = PT_REGS_IP(ctx); long ip = PT_REGS_IP(ctx);
int *spec_id_ptr; int *spec_id_ptr;

View File

@ -148,13 +148,13 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
.write = bpf_testmod_test_write, .write = bpf_testmod_test_write,
}; };
BTF_SET_START(bpf_testmod_check_kfunc_ids) BTF_SET8_START(bpf_testmod_check_kfunc_ids)
BTF_ID(func, bpf_testmod_test_mod_kfunc) BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
BTF_SET_END(bpf_testmod_check_kfunc_ids) BTF_SET8_END(bpf_testmod_check_kfunc_ids)
static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = { static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.check_set = &bpf_testmod_check_kfunc_ids, .set = &bpf_testmod_check_kfunc_ids,
}; };
extern int bpf_fentry_test1(int a); extern int bpf_fentry_test1(int a);

View File

@ -27,6 +27,7 @@
#include "bpf_iter_test_kern5.skel.h" #include "bpf_iter_test_kern5.skel.h"
#include "bpf_iter_test_kern6.skel.h" #include "bpf_iter_test_kern6.skel.h"
#include "bpf_iter_bpf_link.skel.h" #include "bpf_iter_bpf_link.skel.h"
#include "bpf_iter_ksym.skel.h"
static int duration; static int duration;
@ -1120,6 +1121,19 @@ static void test_link_iter(void)
bpf_iter_bpf_link__destroy(skel); bpf_iter_bpf_link__destroy(skel);
} }
static void test_ksym_iter(void)
{
struct bpf_iter_ksym *skel;
skel = bpf_iter_ksym__open_and_load();
if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load"))
return;
do_dummy_read(skel->progs.dump_ksym);
bpf_iter_ksym__destroy(skel);
}
#define CMP_BUFFER_SIZE 1024 #define CMP_BUFFER_SIZE 1024
static char task_vma_output[CMP_BUFFER_SIZE]; static char task_vma_output[CMP_BUFFER_SIZE];
static char proc_maps_output[CMP_BUFFER_SIZE]; static char proc_maps_output[CMP_BUFFER_SIZE];
@ -1267,4 +1281,6 @@ void test_bpf_iter(void)
test_buf_neg_offset(); test_buf_neg_offset();
if (test__start_subtest("link-iter")) if (test__start_subtest("link-iter"))
test_link_iter(); test_link_iter();
if (test__start_subtest("ksym"))
test_ksym_iter();
} }

View File

@ -2,13 +2,29 @@
#include <test_progs.h> #include <test_progs.h>
#include <network_helpers.h> #include <network_helpers.h>
#include "test_bpf_nf.skel.h" #include "test_bpf_nf.skel.h"
#include "test_bpf_nf_fail.skel.h"
static char log_buf[1024 * 1024];
struct {
const char *prog_name;
const char *err_msg;
} test_bpf_nf_fail_tests[] = {
{ "alloc_release", "kernel function bpf_ct_release args#0 expected pointer to STRUCT nf_conn but" },
{ "insert_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" },
{ "lookup_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" },
{ "set_timeout_after_insert", "kernel function bpf_ct_set_timeout args#0 expected pointer to STRUCT nf_conn___init but" },
{ "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" },
{ "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" },
{ "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" },
};
enum { enum {
TEST_XDP, TEST_XDP,
TEST_TC_BPF, TEST_TC_BPF,
}; };
void test_bpf_nf_ct(int mode) static void test_bpf_nf_ct(int mode)
{ {
struct test_bpf_nf *skel; struct test_bpf_nf *skel;
int prog_fd, err; int prog_fd, err;
@ -39,14 +55,60 @@ void test_bpf_nf_ct(int mode)
ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id"); ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup"); ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple"); ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple");
ASSERT_EQ(skel->data->test_alloc_entry, 0, "Test for alloc new entry");
ASSERT_EQ(skel->data->test_insert_entry, 0, "Test for insert new entry");
ASSERT_EQ(skel->data->test_succ_lookup, 0, "Test for successful lookup");
/* allow some tolerance for test_delta_timeout value to avoid races. */
ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update");
ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update");
/* expected status is IPS_SEEN_REPLY */
ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update ");
end: end:
test_bpf_nf__destroy(skel); test_bpf_nf__destroy(skel);
} }
static void test_bpf_nf_ct_fail(const char *prog_name, const char *err_msg)
{
LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf,
.kernel_log_size = sizeof(log_buf),
.kernel_log_level = 1);
struct test_bpf_nf_fail *skel;
struct bpf_program *prog;
int ret;
skel = test_bpf_nf_fail__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "test_bpf_nf_fail__open"))
return;
prog = bpf_object__find_program_by_name(skel->obj, prog_name);
if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
goto end;
bpf_program__set_autoload(prog, true);
ret = test_bpf_nf_fail__load(skel);
if (!ASSERT_ERR(ret, "test_bpf_nf_fail__load must fail"))
goto end;
if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) {
fprintf(stderr, "Expected: %s\n", err_msg);
fprintf(stderr, "Verifier: %s\n", log_buf);
}
end:
test_bpf_nf_fail__destroy(skel);
}
void test_bpf_nf(void) void test_bpf_nf(void)
{ {
int i;
if (test__start_subtest("xdp-ct")) if (test__start_subtest("xdp-ct"))
test_bpf_nf_ct(TEST_XDP); test_bpf_nf_ct(TEST_XDP);
if (test__start_subtest("tc-bpf-ct")) if (test__start_subtest("tc-bpf-ct"))
test_bpf_nf_ct(TEST_TC_BPF); test_bpf_nf_ct(TEST_TC_BPF);
for (i = 0; i < ARRAY_SIZE(test_bpf_nf_fail_tests); i++) {
if (test__start_subtest(test_bpf_nf_fail_tests[i].prog_name))
test_bpf_nf_ct_fail(test_bpf_nf_fail_tests[i].prog_name,
test_bpf_nf_fail_tests[i].err_msg);
}
} }

View File

@ -5338,7 +5338,7 @@ static void do_test_pprint(int test_num)
ret = snprintf(pin_path, sizeof(pin_path), "%s/%s", ret = snprintf(pin_path, sizeof(pin_path), "%s/%s",
"/sys/fs/bpf", test->map_name); "/sys/fs/bpf", test->map_name);
if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long", if (CHECK(ret >= sizeof(pin_path), "pin_path %s/%s is too long",
"/sys/fs/bpf", test->map_name)) { "/sys/fs/bpf", test->map_name)) {
err = -1; err = -1;
goto done; goto done;

View File

@ -39,6 +39,7 @@ static struct test_case {
"CONFIG_STR=\"abracad\"\n" "CONFIG_STR=\"abracad\"\n"
"CONFIG_MISSING=0", "CONFIG_MISSING=0",
.data = { .data = {
.unkn_virt_val = 0,
.bpf_syscall = false, .bpf_syscall = false,
.tristate_val = TRI_MODULE, .tristate_val = TRI_MODULE,
.bool_val = true, .bool_val = true,
@ -121,7 +122,7 @@ static struct test_case {
void test_core_extern(void) void test_core_extern(void)
{ {
const uint32_t kern_ver = get_kernel_version(); const uint32_t kern_ver = get_kernel_version();
int err, duration = 0, i, j; int err, i, j;
struct test_core_extern *skel = NULL; struct test_core_extern *skel = NULL;
uint64_t *got, *exp; uint64_t *got, *exp;
int n = sizeof(*skel->data) / sizeof(uint64_t); int n = sizeof(*skel->data) / sizeof(uint64_t);
@ -136,19 +137,17 @@ void test_core_extern(void)
continue; continue;
skel = test_core_extern__open_opts(&opts); skel = test_core_extern__open_opts(&opts);
if (CHECK(!skel, "skel_open", "skeleton open failed\n")) if (!ASSERT_OK_PTR(skel, "skel_open"))
goto cleanup; goto cleanup;
err = test_core_extern__load(skel); err = test_core_extern__load(skel);
if (t->fails) { if (t->fails) {
CHECK(!err, "skel_load", ASSERT_ERR(err, "skel_load_should_fail");
"shouldn't succeed open/load of skeleton\n");
goto cleanup; goto cleanup;
} else if (CHECK(err, "skel_load", } else if (!ASSERT_OK(err, "skel_load")) {
"failed to open/load skeleton\n")) {
goto cleanup; goto cleanup;
} }
err = test_core_extern__attach(skel); err = test_core_extern__attach(skel);
if (CHECK(err, "attach_raw_tp", "failed attach: %d\n", err)) if (!ASSERT_OK(err, "attach_raw_tp"))
goto cleanup; goto cleanup;
usleep(1); usleep(1);
@ -158,9 +157,7 @@ void test_core_extern(void)
got = (uint64_t *)skel->data; got = (uint64_t *)skel->data;
exp = (uint64_t *)&t->data; exp = (uint64_t *)&t->data;
for (j = 0; j < n; j++) { for (j = 0; j < n; j++) {
CHECK(got[j] != exp[j], "check_res", ASSERT_EQ(got[j], exp[j], "result");
"result #%d: expected %llx, but got %llx\n",
j, (__u64)exp[j], (__u64)got[j]);
} }
cleanup: cleanup:
test_core_extern__destroy(skel); test_core_extern__destroy(skel);

View File

@ -364,6 +364,8 @@ static int get_syms(char ***symsp, size_t *cntp)
continue; continue;
if (!strncmp(name, "rcu_", 4)) if (!strncmp(name, "rcu_", 4))
continue; continue;
if (!strcmp(name, "bpf_dispatcher_xdp_func"))
continue;
if (!strncmp(name, "__ftrace_invalid_address__", if (!strncmp(name, "__ftrace_invalid_address__",
sizeof("__ftrace_invalid_address__") - 1)) sizeof("__ftrace_invalid_address__") - 1))
continue; continue;

View File

@ -50,6 +50,13 @@ void test_ringbuf_multi(void)
if (CHECK(!skel, "skel_open", "skeleton open failed\n")) if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
return; return;
/* validate ringbuf size adjustment logic */
ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_before");
ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size + 1), "rb1_resize");
ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), 2 * page_size, "rb1_size_after");
ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size), "rb1_reset");
ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_final");
proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL); proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL);
if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n")) if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n"))
goto cleanup; goto cleanup;
@ -65,6 +72,10 @@ void test_ringbuf_multi(void)
close(proto_fd); close(proto_fd);
proto_fd = -1; proto_fd = -1;
/* make sure we can't resize ringbuf after object load */
if (!ASSERT_ERR(bpf_map__set_max_entries(skel->maps.ringbuf1, 3 * page_size), "rb1_resize_after_load"))
goto cleanup;
/* only trigger BPF program for current process */ /* only trigger BPF program for current process */
skel->bss->pid = getpid(); skel->bss->pid = getpid();

View File

@ -122,6 +122,8 @@ void test_skeleton(void)
ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var"); ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var");
ASSERT_EQ(bss->huge_arr[ARRAY_SIZE(bss->huge_arr) - 1], 123, "huge_arr");
elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz); elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz);
ASSERT_OK_PTR(elf_bytes, "elf_bytes"); ASSERT_OK_PTR(elf_bytes, "elf_bytes");
ASSERT_GE(elf_bytes_sz, 0, "elf_bytes_sz"); ASSERT_GE(elf_bytes_sz, 0, "elf_bytes_sz");

View File

@ -22,6 +22,7 @@
#define BTF_F_NONAME BTF_F_NONAME___not_used #define BTF_F_NONAME BTF_F_NONAME___not_used
#define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
#define BTF_F_ZERO BTF_F_ZERO___not_used #define BTF_F_ZERO BTF_F_ZERO___not_used
#define bpf_iter__ksym bpf_iter__ksym___not_used
#include "vmlinux.h" #include "vmlinux.h"
#undef bpf_iter_meta #undef bpf_iter_meta
#undef bpf_iter__bpf_map #undef bpf_iter__bpf_map
@ -44,6 +45,7 @@
#undef BTF_F_NONAME #undef BTF_F_NONAME
#undef BTF_F_PTR_RAW #undef BTF_F_PTR_RAW
#undef BTF_F_ZERO #undef BTF_F_ZERO
#undef bpf_iter__ksym
struct bpf_iter_meta { struct bpf_iter_meta {
struct seq_file *seq; struct seq_file *seq;
@ -151,3 +153,8 @@ enum {
BTF_F_PTR_RAW = (1ULL << 2), BTF_F_PTR_RAW = (1ULL << 2),
BTF_F_ZERO = (1ULL << 3), BTF_F_ZERO = (1ULL << 3),
}; };
struct bpf_iter__ksym {
struct bpf_iter_meta *meta;
struct kallsym_iter *ksym;
};

View File

@ -0,0 +1,74 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2022, Oracle and/or its affiliates. */
#include "bpf_iter.h"
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
unsigned long last_sym_value = 0;
static inline char tolower(char c)
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
return c;
}
static inline char toupper(char c)
{
if (c >= 'a' && c <= 'z')
c -= ('a' - 'A');
return c;
}
/* Dump symbols with max size; the latter is calculated by caching symbol N value
* and when iterating on symbol N+1, we can print max size of symbol N via
* address of N+1 - address of N.
*/
SEC("iter/ksym")
int dump_ksym(struct bpf_iter__ksym *ctx)
{
struct seq_file *seq = ctx->meta->seq;
struct kallsym_iter *iter = ctx->ksym;
__u32 seq_num = ctx->meta->seq_num;
unsigned long value;
char type;
int ret;
if (!iter)
return 0;
if (seq_num == 0) {
BPF_SEQ_PRINTF(seq, "ADDR TYPE NAME MODULE_NAME KIND MAX_SIZE\n");
return 0;
}
if (last_sym_value)
BPF_SEQ_PRINTF(seq, "0x%x\n", iter->value - last_sym_value);
else
BPF_SEQ_PRINTF(seq, "\n");
value = iter->show_value ? iter->value : 0;
last_sym_value = value;
type = iter->type;
if (iter->module_name[0]) {
type = iter->exported ? toupper(type) : tolower(type);
BPF_SEQ_PRINTF(seq, "0x%llx %c %s [ %s ] ",
value, type, iter->name, iter->module_name);
} else {
BPF_SEQ_PRINTF(seq, "0x%llx %c %s ", value, type, iter->name);
}
if (!iter->pos_arch_end || iter->pos_arch_end > iter->pos)
BPF_SEQ_PRINTF(seq, "CORE ");
else if (!iter->pos_mod_end || iter->pos_mod_end > iter->pos)
BPF_SEQ_PRINTF(seq, "MOD ");
else if (!iter->pos_ftrace_mod_end || iter->pos_ftrace_mod_end > iter->pos)
BPF_SEQ_PRINTF(seq, "FTRACE_MOD ");
else if (!iter->pos_bpf_end || iter->pos_bpf_end > iter->pos)
BPF_SEQ_PRINTF(seq, "BPF ");
else
BPF_SEQ_PRINTF(seq, "KPROBE ");
return 0;
}

View File

@ -64,9 +64,9 @@ int BPF_KPROBE(handle_sys_prctl)
return 0; return 0;
} }
SEC("kprobe/" SYS_PREFIX "sys_prctl") SEC("ksyscall/prctl")
int BPF_KPROBE_SYSCALL(prctl_enter, int option, unsigned long arg2, int BPF_KSYSCALL(prctl_enter, int option, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5) unsigned long arg3, unsigned long arg4, unsigned long arg5)
{ {
pid_t pid = bpf_get_current_pid_tgid() >> 32; pid_t pid = bpf_get_current_pid_tgid() >> 32;

View File

@ -1,11 +1,10 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2017 Facebook // Copyright (c) 2017 Facebook
#include <linux/ptrace.h> #include "vmlinux.h"
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h> #include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h> #include <bpf/bpf_tracing.h>
#include <stdbool.h> #include <bpf/bpf_core_read.h>
#include "bpf_misc.h" #include "bpf_misc.h"
int kprobe_res = 0; int kprobe_res = 0;
@ -31,8 +30,8 @@ int handle_kprobe(struct pt_regs *ctx)
return 0; return 0;
} }
SEC("kprobe/" SYS_PREFIX "sys_nanosleep") SEC("ksyscall/nanosleep")
int BPF_KPROBE(handle_kprobe_auto) int BPF_KSYSCALL(handle_kprobe_auto, struct __kernel_timespec *req, struct __kernel_timespec *rem)
{ {
kprobe2_res = 11; kprobe2_res = 11;
return 0; return 0;
@ -56,11 +55,11 @@ int handle_kretprobe(struct pt_regs *ctx)
return 0; return 0;
} }
SEC("kretprobe/" SYS_PREFIX "sys_nanosleep") SEC("kretsyscall/nanosleep")
int BPF_KRETPROBE(handle_kretprobe_auto) int BPF_KRETPROBE(handle_kretprobe_auto, int ret)
{ {
kretprobe2_res = 22; kretprobe2_res = 22;
return 0; return ret;
} }
SEC("uprobe") SEC("uprobe")

View File

@ -8,6 +8,8 @@
#define EINVAL 22 #define EINVAL 22
#define ENOENT 2 #define ENOENT 2
extern unsigned long CONFIG_HZ __kconfig;
int test_einval_bpf_tuple = 0; int test_einval_bpf_tuple = 0;
int test_einval_reserved = 0; int test_einval_reserved = 0;
int test_einval_netns_id = 0; int test_einval_netns_id = 0;
@ -16,6 +18,11 @@ int test_eproto_l4proto = 0;
int test_enonet_netns_id = 0; int test_enonet_netns_id = 0;
int test_enoent_lookup = 0; int test_enoent_lookup = 0;
int test_eafnosupport = 0; int test_eafnosupport = 0;
int test_alloc_entry = -EINVAL;
int test_insert_entry = -EAFNOSUPPORT;
int test_succ_lookup = -ENOENT;
u32 test_delta_timeout = 0;
u32 test_status = 0;
struct nf_conn; struct nf_conn;
@ -26,31 +33,44 @@ struct bpf_ct_opts___local {
u8 reserved[3]; u8 reserved[3];
} __attribute__((preserve_access_index)); } __attribute__((preserve_access_index));
struct nf_conn *bpf_xdp_ct_alloc(struct xdp_md *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32, struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym; struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym; struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym;
void bpf_ct_release(struct nf_conn *) __ksym; void bpf_ct_release(struct nf_conn *) __ksym;
void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym;
int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym;
int bpf_ct_set_status(struct nf_conn *, u32) __ksym;
int bpf_ct_change_status(struct nf_conn *, u32) __ksym;
static __always_inline void static __always_inline void
nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32, nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32), struct bpf_ct_opts___local *, u32),
struct nf_conn *(*alloc_fn)(void *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32),
void *ctx) void *ctx)
{ {
struct bpf_ct_opts___local opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 }; struct bpf_ct_opts___local opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 };
struct bpf_sock_tuple bpf_tuple; struct bpf_sock_tuple bpf_tuple;
struct nf_conn *ct; struct nf_conn *ct;
int err;
__builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4)); __builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4));
ct = func(ctx, NULL, 0, &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, NULL, 0, &opts_def, sizeof(opts_def));
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_einval_bpf_tuple = opts_def.error; test_einval_bpf_tuple = opts_def.error;
opts_def.reserved[0] = 1; opts_def.reserved[0] = 1;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
opts_def.reserved[0] = 0; opts_def.reserved[0] = 0;
opts_def.l4proto = IPPROTO_TCP; opts_def.l4proto = IPPROTO_TCP;
if (ct) if (ct)
@ -59,21 +79,24 @@ nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
test_einval_reserved = opts_def.error; test_einval_reserved = opts_def.error;
opts_def.netns_id = -2; opts_def.netns_id = -2;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
opts_def.netns_id = -1; opts_def.netns_id = -1;
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_einval_netns_id = opts_def.error; test_einval_netns_id = opts_def.error;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def) - 1); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def) - 1);
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_einval_len_opts = opts_def.error; test_einval_len_opts = opts_def.error;
opts_def.l4proto = IPPROTO_ICMP; opts_def.l4proto = IPPROTO_ICMP;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
opts_def.l4proto = IPPROTO_TCP; opts_def.l4proto = IPPROTO_TCP;
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
@ -81,37 +104,75 @@ nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32,
test_eproto_l4proto = opts_def.error; test_eproto_l4proto = opts_def.error;
opts_def.netns_id = 0xf00f; opts_def.netns_id = 0xf00f;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
opts_def.netns_id = -1; opts_def.netns_id = -1;
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_enonet_netns_id = opts_def.error; test_enonet_netns_id = opts_def.error;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_enoent_lookup = opts_def.error; test_enoent_lookup = opts_def.error;
ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, sizeof(opts_def)); ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def,
sizeof(opts_def));
if (ct) if (ct)
bpf_ct_release(ct); bpf_ct_release(ct);
else else
test_eafnosupport = opts_def.error; test_eafnosupport = opts_def.error;
bpf_tuple.ipv4.saddr = bpf_get_prandom_u32(); /* src IP */
bpf_tuple.ipv4.daddr = bpf_get_prandom_u32(); /* dst IP */
bpf_tuple.ipv4.sport = bpf_get_prandom_u32(); /* src port */
bpf_tuple.ipv4.dport = bpf_get_prandom_u32(); /* dst port */
ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
sizeof(opts_def));
if (ct) {
struct nf_conn *ct_ins;
bpf_ct_set_timeout(ct, 10000);
bpf_ct_set_status(ct, IPS_CONFIRMED);
ct_ins = bpf_ct_insert_entry(ct);
if (ct_ins) {
struct nf_conn *ct_lk;
ct_lk = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4),
&opts_def, sizeof(opts_def));
if (ct_lk) {
/* update ct entry timeout */
bpf_ct_change_timeout(ct_lk, 10000);
test_delta_timeout = ct_lk->timeout - bpf_jiffies64();
test_delta_timeout /= CONFIG_HZ;
test_status = IPS_SEEN_REPLY;
bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY);
bpf_ct_release(ct_lk);
test_succ_lookup = 0;
}
bpf_ct_release(ct_ins);
test_insert_entry = 0;
}
test_alloc_entry = 0;
}
} }
SEC("xdp") SEC("xdp")
int nf_xdp_ct_test(struct xdp_md *ctx) int nf_xdp_ct_test(struct xdp_md *ctx)
{ {
nf_ct_test((void *)bpf_xdp_ct_lookup, ctx); nf_ct_test((void *)bpf_xdp_ct_lookup, (void *)bpf_xdp_ct_alloc, ctx);
return 0; return 0;
} }
SEC("tc") SEC("tc")
int nf_skb_ct_test(struct __sk_buff *ctx) int nf_skb_ct_test(struct __sk_buff *ctx)
{ {
nf_ct_test((void *)bpf_skb_ct_lookup, ctx); nf_ct_test((void *)bpf_skb_ct_lookup, (void *)bpf_skb_ct_alloc, ctx);
return 0; return 0;
} }

View File

@ -0,0 +1,134 @@
// SPDX-License-Identifier: GPL-2.0
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
struct nf_conn;
struct bpf_ct_opts___local {
s32 netns_id;
s32 error;
u8 l4proto;
u8 reserved[3];
} __attribute__((preserve_access_index));
struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32,
struct bpf_ct_opts___local *, u32) __ksym;
struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym;
void bpf_ct_release(struct nf_conn *) __ksym;
void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym;
int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym;
int bpf_ct_set_status(struct nf_conn *, u32) __ksym;
int bpf_ct_change_status(struct nf_conn *, u32) __ksym;
SEC("?tc")
int alloc_release(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
bpf_ct_release(ct);
return 0;
}
SEC("?tc")
int insert_insert(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
ct = bpf_ct_insert_entry(ct);
if (!ct)
return 0;
ct = bpf_ct_insert_entry(ct);
return 0;
}
SEC("?tc")
int lookup_insert(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
bpf_ct_insert_entry(ct);
return 0;
}
SEC("?tc")
int set_timeout_after_insert(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
ct = bpf_ct_insert_entry(ct);
if (!ct)
return 0;
bpf_ct_set_timeout(ct, 0);
return 0;
}
SEC("?tc")
int set_status_after_insert(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
ct = bpf_ct_insert_entry(ct);
if (!ct)
return 0;
bpf_ct_set_status(ct, 0);
return 0;
}
SEC("?tc")
int change_timeout_after_alloc(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
bpf_ct_change_timeout(ct, 0);
return 0;
}
SEC("?tc")
int change_status_after_alloc(struct __sk_buff *ctx)
{
struct bpf_ct_opts___local opts = {};
struct bpf_sock_tuple tup = {};
struct nf_conn *ct;
ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
if (!ct)
return 0;
bpf_ct_change_status(ct, 0);
return 0;
}
char _license[] SEC("license") = "GPL";

View File

@ -11,6 +11,7 @@
static int (*bpf_missing_helper)(const void *arg1, int arg2) = (void *) 999; static int (*bpf_missing_helper)(const void *arg1, int arg2) = (void *) 999;
extern int LINUX_KERNEL_VERSION __kconfig; extern int LINUX_KERNEL_VERSION __kconfig;
extern int LINUX_UNKNOWN_VIRTUAL_EXTERN __kconfig __weak;
extern bool CONFIG_BPF_SYSCALL __kconfig; /* strong */ extern bool CONFIG_BPF_SYSCALL __kconfig; /* strong */
extern enum libbpf_tristate CONFIG_TRISTATE __kconfig __weak; extern enum libbpf_tristate CONFIG_TRISTATE __kconfig __weak;
extern bool CONFIG_BOOL __kconfig __weak; extern bool CONFIG_BOOL __kconfig __weak;
@ -22,6 +23,7 @@ extern const char CONFIG_STR[8] __kconfig __weak;
extern uint64_t CONFIG_MISSING __kconfig __weak; extern uint64_t CONFIG_MISSING __kconfig __weak;
uint64_t kern_ver = -1; uint64_t kern_ver = -1;
uint64_t unkn_virt_val = -1;
uint64_t bpf_syscall = -1; uint64_t bpf_syscall = -1;
uint64_t tristate_val = -1; uint64_t tristate_val = -1;
uint64_t bool_val = -1; uint64_t bool_val = -1;
@ -38,6 +40,7 @@ int handle_sys_enter(struct pt_regs *ctx)
int i; int i;
kern_ver = LINUX_KERNEL_VERSION; kern_ver = LINUX_KERNEL_VERSION;
unkn_virt_val = LINUX_UNKNOWN_VIRTUAL_EXTERN;
bpf_syscall = CONFIG_BPF_SYSCALL; bpf_syscall = CONFIG_BPF_SYSCALL;
tristate_val = CONFIG_TRISTATE; tristate_val = CONFIG_TRISTATE;
bool_val = CONFIG_BOOL; bool_val = CONFIG_BOOL;

View File

@ -1,35 +1,20 @@
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
#include "vmlinux.h"
#include <linux/ptrace.h>
#include <linux/bpf.h>
#include <netinet/in.h>
#include <bpf/bpf_helpers.h> #include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h> #include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#include "bpf_misc.h" #include "bpf_misc.h"
static struct sockaddr_in old; static struct sockaddr_in old;
SEC("kprobe/" SYS_PREFIX "sys_connect") SEC("ksyscall/connect")
int BPF_KPROBE(handle_sys_connect) int BPF_KSYSCALL(handle_sys_connect, int fd, struct sockaddr_in *uservaddr, int addrlen)
{ {
#if SYSCALL_WRAPPER == 1
struct pt_regs *real_regs;
#endif
struct sockaddr_in new; struct sockaddr_in new;
void *ptr;
#if SYSCALL_WRAPPER == 0 bpf_probe_read_user(&old, sizeof(old), uservaddr);
ptr = (void *)PT_REGS_PARM2(ctx);
#else
real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx);
bpf_probe_read_kernel(&ptr, sizeof(ptr), &PT_REGS_PARM2(real_regs));
#endif
bpf_probe_read_user(&old, sizeof(old), ptr);
__builtin_memset(&new, 0xab, sizeof(new)); __builtin_memset(&new, 0xab, sizeof(new));
bpf_probe_write_user(ptr, &new, sizeof(new)); bpf_probe_write_user(uservaddr, &new, sizeof(new));
return 0; return 0;
} }

View File

@ -51,6 +51,8 @@ int out_dynarr[4] SEC(".data.dyn") = { 1, 2, 3, 4 };
int read_mostly_var __read_mostly; int read_mostly_var __read_mostly;
int out_mostly_var; int out_mostly_var;
char huge_arr[16 * 1024 * 1024];
SEC("raw_tp/sys_enter") SEC("raw_tp/sys_enter")
int handler(const void *ctx) int handler(const void *ctx)
{ {
@ -71,6 +73,8 @@ int handler(const void *ctx)
out_mostly_var = read_mostly_var; out_mostly_var = read_mostly_var;
huge_arr[sizeof(huge_arr) - 1] = 123;
return 0; return 0;
} }

View File

@ -239,7 +239,7 @@ bool parse_udp(void *data, void *data_end,
udp = data + off; udp = data + off;
if (udp + 1 > data_end) if (udp + 1 > data_end)
return 0; return false;
if (!is_icmp) { if (!is_icmp) {
pckt->flow.port16[0] = udp->source; pckt->flow.port16[0] = udp->source;
pckt->flow.port16[1] = udp->dest; pckt->flow.port16[1] = udp->dest;
@ -247,7 +247,7 @@ bool parse_udp(void *data, void *data_end,
pckt->flow.port16[0] = udp->dest; pckt->flow.port16[0] = udp->dest;
pckt->flow.port16[1] = udp->source; pckt->flow.port16[1] = udp->source;
} }
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))
@ -261,7 +261,7 @@ bool parse_tcp(void *data, void *data_end,
tcp = data + off; tcp = data + off;
if (tcp + 1 > data_end) if (tcp + 1 > data_end)
return 0; return false;
if (tcp->syn) if (tcp->syn)
pckt->flags |= (1 << 1); pckt->flags |= (1 << 1);
if (!is_icmp) { if (!is_icmp) {
@ -271,7 +271,7 @@ bool parse_tcp(void *data, void *data_end,
pckt->flow.port16[0] = tcp->dest; pckt->flow.port16[0] = tcp->dest;
pckt->flow.port16[1] = tcp->source; pckt->flow.port16[1] = tcp->source;
} }
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))
@ -287,7 +287,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
void *data; void *data;
if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr))) if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr)))
return 0; return false;
data = (void *)(long)xdp->data; data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end; data_end = (void *)(long)xdp->data_end;
new_eth = data; new_eth = data;
@ -295,7 +295,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
old_eth = data + sizeof(struct ipv6hdr); old_eth = data + sizeof(struct ipv6hdr);
if (new_eth + 1 > data_end || if (new_eth + 1 > data_end ||
old_eth + 1 > data_end || ip6h + 1 > data_end) old_eth + 1 > data_end || ip6h + 1 > data_end)
return 0; return false;
memcpy(new_eth->eth_dest, cval->mac, 6); memcpy(new_eth->eth_dest, cval->mac, 6);
memcpy(new_eth->eth_source, old_eth->eth_dest, 6); memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
new_eth->eth_proto = 56710; new_eth->eth_proto = 56710;
@ -314,7 +314,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
ip6h->saddr.in6_u.u6_addr32[2] = 3; ip6h->saddr.in6_u.u6_addr32[2] = 3;
ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix; ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix;
memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16); memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16);
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))
@ -335,7 +335,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
ip_suffix <<= 15; ip_suffix <<= 15;
ip_suffix ^= pckt->flow.src; ip_suffix ^= pckt->flow.src;
if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr))) if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr)))
return 0; return false;
data = (void *)(long)xdp->data; data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end; data_end = (void *)(long)xdp->data_end;
new_eth = data; new_eth = data;
@ -343,7 +343,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
old_eth = data + sizeof(struct iphdr); old_eth = data + sizeof(struct iphdr);
if (new_eth + 1 > data_end || if (new_eth + 1 > data_end ||
old_eth + 1 > data_end || iph + 1 > data_end) old_eth + 1 > data_end || iph + 1 > data_end)
return 0; return false;
memcpy(new_eth->eth_dest, cval->mac, 6); memcpy(new_eth->eth_dest, cval->mac, 6);
memcpy(new_eth->eth_source, old_eth->eth_dest, 6); memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
new_eth->eth_proto = 8; new_eth->eth_proto = 8;
@ -367,8 +367,8 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
csum += *next_iph_u16++; csum += *next_iph_u16++;
iph->check = ~((csum & 0xffff) + (csum >> 16)); iph->check = ~((csum & 0xffff) + (csum >> 16));
if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
return 0; return false;
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))
@ -386,10 +386,10 @@ bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
else else
new_eth->eth_proto = 56710; new_eth->eth_proto = 56710;
if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr))) if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr)))
return 0; return false;
*data = (void *)(long)xdp->data; *data = (void *)(long)xdp->data;
*data_end = (void *)(long)xdp->data_end; *data_end = (void *)(long)xdp->data_end;
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))
@ -404,10 +404,10 @@ bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
memcpy(new_eth->eth_dest, old_eth->eth_dest, 6); memcpy(new_eth->eth_dest, old_eth->eth_dest, 6);
new_eth->eth_proto = 8; new_eth->eth_proto = 8;
if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
return 0; return false;
*data = (void *)(long)xdp->data; *data = (void *)(long)xdp->data;
*data_end = (void *)(long)xdp->data_end; *data_end = (void *)(long)xdp->data_end;
return 1; return true;
} }
static __attribute__ ((noinline)) static __attribute__ ((noinline))

View File

@ -106,9 +106,9 @@ bpftool prog loadall \
bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0 bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0
bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0 bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0
bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0 bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0
ip link set dev veth1 xdp pinned $BPF_DIR/progs/redirect_map_0 ip link set dev veth1 xdp pinned $BPF_DIR/progs/xdp_redirect_map_0
ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 ip link set dev veth2 xdp pinned $BPF_DIR/progs/xdp_redirect_map_1
ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 ip link set dev veth3 xdp pinned $BPF_DIR/progs/xdp_redirect_map_2
ip -n ${NS1} link set dev veth11 xdp obj xdp_dummy.o sec xdp ip -n ${NS1} link set dev veth11 xdp obj xdp_dummy.o sec xdp
ip -n ${NS2} link set dev veth22 xdp obj xdp_tx.o sec xdp ip -n ${NS2} link set dev veth22 xdp obj xdp_tx.o sec xdp

View File

@ -251,6 +251,7 @@
.expected_insns = { PSEUDO_CALL_INSN() }, .expected_insns = { PSEUDO_CALL_INSN() },
.unexpected_insns = { HELPER_CALL_INSN() }, .unexpected_insns = { HELPER_CALL_INSN() },
.result = ACCEPT, .result = ACCEPT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } }, .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } },
.func_info_cnt = 2, .func_info_cnt = 2,
BTF_TYPES BTF_TYPES

View File

@ -218,6 +218,59 @@
.result = REJECT, .result = REJECT,
.errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed", .errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed",
}, },
{
"calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
.insns = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 16),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.fixup_kfunc_btf_id = {
{ "bpf_kfunc_call_test_acquire", 3 },
{ "bpf_kfunc_call_test_ref", 8 },
{ "bpf_kfunc_call_test_ref", 10 },
},
.result_unpriv = REJECT,
.result = REJECT,
.errstr = "R1 must be referenced",
},
{
"calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
.insns = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.fixup_kfunc_btf_id = {
{ "bpf_kfunc_call_test_acquire", 3 },
{ "bpf_kfunc_call_test_ref", 8 },
{ "bpf_kfunc_call_test_release", 10 },
},
.result_unpriv = REJECT,
.result = ACCEPT,
},
{ {
"calls: basic sanity", "calls: basic sanity",
.insns = { .insns = {