Add an ability to create an empty BTF object from scratch. This is going to be
used by pahole for BTF encoding. And also by selftest for convenient creation
of BTF objects.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com
Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.
Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.
Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.
This patch adds the first two BTF manipulation APIs: btf__add_str(), which
allows to add arbitrary strings to BTF string section, and btf__find_str()
which allows to find existing string offset, but not add it if it's missing.
All the added strings are automatically deduplicated. This is achieved by
maintaining an additional string lookup index for all unique strings. Such
index is built when BTF is switched to modifiable mode. If at that time BTF
strings section contained duplicate strings, they are not de-duplicated. This
is done specifically to not modify the existing content of BTF (types, their
string offsets, etc), which can cause confusion and is especially important
property if there is struct btf_ext associated with struct btf. By following
this "imperfect deduplication" process, btf_ext is kept consitent and correct.
If deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com
Calculating a hash of zero-terminated string is a common need when using
hashmap, so extract it for reuse.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com
Managing dynamically-sized array is a common, but not trivial functionality,
which significant amount of logic and code to implement properly. So instead
of re-implementing it all the time, extract it into a helper function ans
reuse.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-4-andriin@fb.com
Refactor internals of struct btf to remove assumptions that BTF header, type
data, and string data are layed out contiguously in a memory in a single
memory allocation. Now we have three separate pointers pointing to the start
of each respective are: header, types, strings. In the next patches, these
pointers will be re-assigned to point to independently allocated memory areas,
if BTF needs to be modified.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-3-andriin@fb.com
Refactor implementation of internal BTF type index to not use direct pointers.
Instead it uses offset relative to the start of types data section. This
allows for types data to be reallocatable, enabling implementation of
modifiable BTF.
As now getting type by ID has an extra indirection step, convert all internal
type lookups to a new helper btf_type_id(), that returns non-const pointer to
a type by its ID.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-2-andriin@fb.com
The test_overhead prog_test included an fmod_ret program that attached to
__set_task_comm() in the kernel. However, this function was never listed as
allowed for return modification, so this only worked because of the
verifier skipping tests when a trampoline already existed for the attach
point. Now that the verifier checks have been fixed, remove fmod_ret from
the test so it works again.
Fixes: 4eaf0b5c5e ("selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The check_attach_btf_id() function really does three things:
1. It performs a bunch of checks on the program to ensure that the
attachment is valid.
2. It stores a bunch of state about the attachment being requested in
the verifier environment and struct bpf_prog objects.
3. It allocates a trampoline for the attachment.
This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.
This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.
Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a882 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.
Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
From the checks and commit messages for modify_return, it seems it was
never the intention that it should be possible to attach a tracing program
with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
However, check_attach_modify_return() will only look at the function name,
so if the target function starts with "security_", the attach will be
allowed even for bpf2bpf attachment.
Fix this oversight by also blocking the modification if a target program is
supplied.
Fixes: 18644cec71 ("bpf: Fix use-after-free in fmod_ret check")
Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lorenz Bauer says:
====================
Changes in v2:
- Check sk_fullsock in map_update_elem (Martin)
Enable calling map_update_elem on sockmaps from bpf_iter context. This
in turn allows us to copy a sockmap by iterating its elements.
The change itself is tiny, all thanks to the ground work from Martin,
whose series [1] this patch is based on. I updated the tests to do some
copying, and also included two cleanups.
I'm sending this out now rather than when Martin's series has landed
because I hope this can get in before the merge window (potentially)
closes this weekend.
1: https://lore.kernel.org/bpf/20200925000337.3853598-1-kafai@fb.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since we can now call map_update_elem(sockmap) from bpf_iter context
it's possible to copy a sockmap or sockhash in the kernel. Add a
selftest which exercises this.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-5-lmb@cloudflare.com
The shared header to define SOCKMAP_MAX_ENTRIES is a bit overkill.
Dynamically allocate the sock_fd array based on bpf_map__max_entries
instead.
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200928090805.23343-4-lmb@cloudflare.com
We compare socket cookies to ensure that insertion into a sockmap worked.
Pull this out into a helper function for use in other tests.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-3-lmb@cloudflare.com
Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Since we may be passed a request or timewait
socket we also need to check sk_fullsock. Doing this allows calling
map_update_elem on sockmap from bpf_iter context, which uses
BTF pointers.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com
This test runs test_run for raw_tracepoint program. The test covers ctx
input, retval output, and running on correct cpu.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-4-songliubraving@fb.com
Add bpf_prog_test_run_opts() with support of new fields in bpf_attr.test,
namely, flags and cpu. Also extend _opts operations to support outputs via
opts.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-3-songliubraving@fb.com
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
Fix possible crash in socket_release when an out-of-memory error has
occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag
encountered an error in xp_create_and_assign_umem, the bind code
jumped to the exit routine but erroneously forgot to set the err value
before jumping. This meant that the exit routine thought the setup
went well and set the state of the socket to XSK_BOUND. The xsk socket
release code will then, at application exit, think that this is a
properly setup socket, when it is not, leading to a crash when all
fields in the socket have in fact not been initialized properly. Fix
this by setting the err variable in xsk_bind so that the socket is not
set to XSK_BOUND which leads to the clean-up in xsk_release not being
triggered.
Fixes: 1c1efc2af1 ("xsk: Create and free buffer pool independently from umem")
Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com
The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types
denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET
and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the
type PTR_TO_SOCKET is guaranteed to _not_ have a null value.
In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly
different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value,
but it is safe to read this pointer in the program context because
the program context will handle any faults. The fallout is for
PTR_TO_BTF_ID the verifier can assume reads are safe, but can not
use the type in branch analysis. Additionally, authors need to be
extra careful when passing PTR_TO_BTF_ID into helpers. In general
helpers consuming type PTR_TO_BTF_ID will need to assume it may
be null.
Seeing the above is not obvious to readers without the back knowledge
lets add a comment in the type definition.
Editorial comment, as networking and tracing programs get closer
and more tightly merged we may need to consider a new type that we
can ensure is non-null for branch analysis and also passing into
helpers.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
If we AND two values together that are known in the 32bit subregs, but not
known in the 64bit registers we rely on the tnum value to report the 32bit
subreg is known. And do not use mark_reg_known() directly from
scalar32_min_max_and()
Add an AND test to cover the case with known 32bit subreg, but unknown
64bit reg.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
tnum is a constant.
1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
2 scalar32_min_max_[op]
3 if (known) return
4 scalar_min_max_[op]
5 if (known)
6 __mark_reg_known(dst_reg,
dst_reg->var_off.value [op] src_reg.var_off.value)
The result is in 1 we calculate the var_off value and store it in the
dst_reg. Then in 6 we duplicate this logic doing the op again on the
value.
The duplication comes from the the tnum_[op] handlers because they have
already done the value calcuation. For example this is tnum_and().
struct tnum tnum_and(struct tnum a, struct tnum b)
{
u64 alpha, beta, v;
alpha = a.value | a.mask;
beta = b.value | b.mask;
v = a.value & b.value;
return TNUM(v, alpha & beta & ~v);
}
So lets remove the redundant op calculation. Its confusing for readers
and unnecessary. Its also not harmful because those ops have the
property, r1 & r1 = r1 and r1 | r1 = r1.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin KaFai Lau says:
====================
This set allows networking prog type to directly read fields from
the in-kernel socket type, e.g. "struct tcp_sock".
Patch 2 has the details on the use case.
v3:
- Pass arg_btf_id instead of fn into check_reg_type() in Patch 1 (Lorenz)
- Move arg_btf_id from func_proto to struct bpf_reg_types in Patch 2 (Lorenz)
- Remove test_sock_fields from .gitignore in Patch 8 (Andrii)
- Add tests to have better coverage on the modified helpers (Alexei)
Patch 13 is added.
- Use "void *sk" as the helper argument in UAPI bpf.h
v3:
- ARG_PTR_TO_SOCK_COMMON_OR_NULL was attempted in v2. The _OR_NULL was
needed because the PTR_TO_BTF_ID could be NULL but note that a could be NULL
PTR_TO_BTF_ID is not a scalar NULL to the verifier. "_OR_NULL" implicitly
gives an expectation that the helper can take a scalar NULL which does
not make sense in most (except one) helpers. Passing scalar NULL
should be rejected at the verification time.
Thus, this patch uses ARG_PTR_TO_BTF_ID_SOCK_COMMON to specify that the
helper can take both the btf-id ptr or the legacy PTR_TO_SOCK_COMMON but
not scalar NULL. It requires the func_proto to explicitly specify the
arg_btf_id such that there is a very clear expectation that the helper
can handle a NULL PTR_TO_BTF_ID.
v2:
- Add ARG_PTR_TO_SOCK_COMMON_OR_NULL (Lorenz)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch attaches a classifier prog to the ingress filter.
It exercises the following helpers with different socket pointer
types in different logical branches:
1. bpf_sk_release()
2. bpf_sk_assign()
3. bpf_skc_to_tcp_request_sock(), bpf_skc_to_tcp_sock()
4. bpf_tcp_gen_syncookie, bpf_tcp_check_syncookie
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000458.3859627-1-kafai@fb.com
The enum tcp_ca_state is available in <linux/tcp.h>.
Remove it from the bpf_tcp_helpers.h to avoid conflict when the bpf prog
needs to include both both <linux/tcp.h> and bpf_tcp_helpers.h.
Modify the bpf_cubic.c and bpf_dctcp.c to use <linux/tcp.h> instead.
The <linux/stddef.h> is needed by <linux/tcp.h>.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000452.3859313-1-kafai@fb.com
This test uses bpf_skc_to_tcp_sock() to get a kernel tcp_sock ptr "ktp".
Access the ktp->lsndtime and also pass ktp to bpf_sk_storage_get().
It also exercises the bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id()
with the "ktp". To do that, a parent cgroup and a child cgroup are
created. The bpf prog is attached to the child cgroup.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000446.3858975-1-kafai@fb.com
This patch uses start_server() and connect_to_fd() from network_helpers.h
to remove the network testing boiler plate codes. epoll is no longer
needed also since the timeout has already been taken care of also.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000440.3858639-1-kafai@fb.com
skel is used.
Global variables are used to store the result from bpf prog.
addr_map, sock_result_map, and tcp_sock_result_map are gone.
Instead, global variables listen_tp, srv_sa6, cli_tp,, srv_tp,
listen_sk, srv_sk, and cli_sk are added.
Because of that, bpf_addr_array_idx and bpf_result_array_idx are also
no longer needed.
CHECK() macro from test_progs.h is reused and bail as soon as
a CHECK failure.
shutdown() is used to ensure the previous data-ack is received.
The bytes_acked, bytes_received, and the pkt_out_cnt checks are
using "<" to accommodate the final ack may not have been received/sent.
It is enough since it is not the focus of this test.
The sk local storage is all initialized to 0xeB9F now, so the
check_sk_pkt_out_cnt() always checks with the 0xeB9F base. It is to
keep things simple.
The next patch will reuse helpers from network_helpers.h to simplify
things further.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000434.3858204-1-kafai@fb.com
This is a mechanical change to
1. move test_sock_fields.c to prog_tests/sock_fields.c
2. rename progs/test_sock_fields_kern.c to progs/test_sock_fields.c
Minimal change is made to the code itself. Next patch will make
changes to use new ways of writing test, e.g. use skel and global
variables.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000427.3857814-1-kafai@fb.com
The patch tests for:
1. bpf_sk_release() can be called on a tcp_sock btf_id ptr.
2. Ensure the tcp_sock btf_id pointer cannot be used
after bpf_sk_release().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000421.3857616-1-kafai@fb.com
This patch changes the bpf_sk_assign() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.
The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL". Meaning
it specifically takes a literal NULL. ARG_PTR_TO_BTF_ID_SOCK_COMMON
does not allow a literal NULL, so another ARG type is required
for this purpose and another follow-up patch can be used if
there is such need.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com
This patch changes the bpf_tcp_*_syncookie() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.
A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get(). It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.
The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
The previous patch allows the networking bpf prog to use the
bpf_skc_to_*() helpers to get a PTR_TO_BTF_ID socket pointer,
e.g. "struct tcp_sock *". It allows the bpf prog to read all the
fields of the tcp_sock.
This patch changes the bpf_sk_release() and bpf_sk_*cgroup_id()
to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will
work with the pointer returned by the bpf_skc_to_*() helpers
also. For example, the following will work:
sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0);
if (!sk)
return;
tp = bpf_skc_to_tcp_sock(sk);
if (!tp) {
bpf_sk_release(sk);
return;
}
lsndtime = tp->lsndtime;
/* Pass tp to bpf_sk_release() will also work */
bpf_sk_release(tp);
Since PTR_TO_BTF_ID could be NULL, the helper taking
ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime.
A btf_id of "struct sock" may not always mean a fullsock. Regardless
the helper's running context may get a non-fullsock or not,
considering fullsock check/handling is pretty cheap, it is better to
keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID*
will be able to handle the minisock situation. In the bpf_sk_*cgroup_id()
case, it will try to get a fullsock by using sk_to_full_sk() as its
skb variant bpf_sk"b"_*cgroup_id() has already been doing.
bpf_sk_release can already handle minisock, so nothing special has to
be done.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com
There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.
A current workaround could be to use bpf_probe_read_kernel(). However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example. While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.
This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id. For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".
These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.
The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.
This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.
Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is
always the same. Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/
The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL. This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.
The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.
By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected. This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.
[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
ARG_PTR_TO_SOCK* of other existing helpers later such that
those existing helpers can take the PTR_TO_BTF_ID returned by
the bpf_skc_to_*() helpers.
The only special case is bpf_sk_lookup_assign() which can accept a
literal NULL ptr. It has to be handled specially in another follow
up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]
[ When converting the older helpers that take ARG_PTR_TO_SOCK* in
the later patch, if the kernel does not support BTF,
ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
because no reg->type could have PTR_TO_BTF_ID in this case.
It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
here though because these helpers must require BTF vmlinux to begin
with. ]
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
check_reg_type() checks whether a reg can be used as an arg of a
func_proto. For PTR_TO_BTF_ID, the check is actually not
completely done until the reg->btf_id is pointing to a
kernel struct that is acceptable by the func_proto.
Thus, this patch moves the btf_id check into check_reg_type().
"arg_type" and "arg_btf_id" are passed to check_reg_type() instead of
"compatible". The compatible_reg_types[] usage is localized in
check_reg_type() now.
The "if (!btf_id) verbose(...); " is also removed since it won't happen.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com
This reverts commit 31f23a6a18.
This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others.
There was no issue in the code.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Fainelli says:
====================
net: dsa: bcm_sf2: Additional DT changes
This patch series includes some additional changes to the bcm_sf2 in
order to support the Device Tree firmwares provided on such platforms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to include MDIO address 0, which is how our Device Tree blobs
indicate where to find the external BCM53125 switches.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the switch driver is written such that port 5 or 8 could be CPU
ports, the use case on Broadcom STB chips is to use port 8 exclusively.
The platform firmware does make port 5 comply to a proper DSA CPU port
binding by specifiying an "ethernet" phandle. This is undesirable for
now until we have an user-space configuration mechanism (such as
devlink) which could support dynamically changing the port flavor at
run time.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
George Cherian says:
====================
octeontx2: Add support for VLAN based flow distribution
This series add support for VLAN based flow distribution for octeontx2
netdev driver. This adds support for configuring the same via ethtool.
Following tests have been done.
- Multi VLAN flow with same SD
- Multi VLAN flow with same SDFN
- Single VLAN flow with multi SD
- Single VLAN flow with multi SDFN
All tests done for udp/tcp both v4 and v6
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to control rx-flow-hash based on VLAN.
By default VLAN plus 4-tuple based hashing is enabled.
Changes can be done runtime using ethtool
To enable 2-tuple plus VLAN based flow distribution
# ethtool -N <intf> rx-flow-hash <prot> sdv
To enable 4-tuple plus VLAN based flow distribution
# ethtool -N <intf> rx-flow-hash <prot> sdfnv
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for PF/VF drivers to choose RSS flow key algorithm
with VLAN tag included in hashing input data. Only CTAG is considered.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kernel-doc expects the function prototype to be just after
the kernel-doc markup, as otherwise it will get it all wrong:
./net/core/dev.c:10036: warning: Excess function parameter 'dev' description in 'WAIT_REFS_MIN_MSECS'
Fixes: 0e4be9e57e ("net: use exponential backoff in netdev_wait_allrefs")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Marko says:
====================
net: mdio-ipq4019: add Clause 45 support
This patch series adds support for Clause 45 to the driver.
While at it also change some defines to upper case to match rest of the driver.
Changes since v4:
* Rebase onto net-next.git
Changes since v1:
* Drop clock patches, these need further investigation and
no user for non default configuration has been found
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While up-streaming the IPQ4019 driver it was thought that the controller had no Clause 45 support,
but it actually does and its activated by writing a bit to the mode register.
So lets add it as newer SoC-s use the same controller and Clause 45 compliant PHY-s.
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Cc: Luka Perkov <luka.perkov@sartura.hr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the commit adding the IPQ4019 MDIO driver, defines for timeout and sleep partially used lower case.
Lets change it to upper case in line with the rest of driver defines.
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Cc: Luka Perkov <luka.perkov@sartura.hr>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subbaraya Sundeep says:
====================
Introduce mbox tracepoints for Octeontx2
This patchset adds tracepoints support for mailbox.
In Octeontx2, PFs and VFs need to communicate with AF
for allocating and freeing resources. Once all the
configuration is done by AF for a PF/VF then packet I/O
can happen on PF/VF queues. When an interface
is brought up many mailbox messages are sent
to AF for initializing queues. Say a VF is brought up
then each message is sent to PF and PF forwards to
AF and response also traverses from AF to PF and then VF.
To aid debugging, tracepoints are added at places where
messages are allocated, sent and message interrupts.
Below is the trace of one of the messages from VF to AF
and AF response back to VF:
~ # echo 1 > /sys/kernel/tracing/events/rvu/enable
~ # ifconfig eth20 up
[ 279.379559] eth20 NIC Link is UP 10000 Mbps Full duplex
~ # cat /sys/kernel/tracing/trace
ifconfig-171 [000] .... 275.753345: otx2_msg_alloc: [0002:02:00.1] msg:(0x400) size:40
ifconfig-171 [000] ...1 275.753347: otx2_msg_send: [0002:02:00.1] sent 1 msg(s) of size:48
<idle>-0 [001] dNh1 275.753356: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt VF(s) to PF (0x1)
kworker/u9:1-90 [001] ...1 275.753364: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48
kworker/u9:1-90 [001] d.h. 275.753367: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2)
kworker/u9:2-167 [002] .... 275.753535: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0
kworker/u9:2-167 [002] ...1 275.753537: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32
<idle>-0 [003] d.h1 275.753543: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1)
<idle>-0 [001] d.h2 275.754376: otx2_msg_interrupt: [0002:02:00.1] mbox interrupt PF to VF (0x1)
v3 changes:
Removed EXPORT_TRACEPOINT_SYMBOLS of otx2_msg_send and otx2_msg_check
since they are called locally only
v2 changes:
Removed otx2_msg_err tracepoint since it is similar to devlink_hwerr
and it will be used instead when devlink supported is added.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>