There were quite a few overlapping sets of changes here.
Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.
Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly. If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.
In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().
Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.
The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Lets add test cases to cover really all possible direct packet
access tests for good/bad access cases so we keep tracking them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb->mark field is a union with reserved_tailroom which is used
in the TCP code paths from stream memory allocation. Allowing SK_SKB
programs to set this field creates a conflict with future code
optimizations, such as "gifting" the skb to the egress path instead
of creating a new skb and doing a memcpy.
Because we do not have a released version of SK_SKB yet lets just
remove it for now. A more appropriate scratch pad to use at the
socket layer is dev_scratch, but lets add that in future kernels
when needed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f1174f77b5 ("bpf/verifier: rework value tracking")
removed the crafty selection of which pointer types are
allowed to be modified. This is OK for most pointer types
since adjust_ptr_min_max_vals() will catch operations on
immutable pointers. One exception is PTR_TO_CTX which is
now allowed to be offseted freely.
The intent of aforementioned commit was to allow context
access via modified registers. The offset passed to
->is_valid_access() verifier callback has been adjusted
by the value of the variable offset.
What is missing, however, is taking the variable offset
into account when the context register is used. Or in terms
of the code adding the offset to the value passed to the
->convert_ctx_access() callback. This leads to the following
eBPF user code:
r1 += 68
r0 = *(u32 *)(r1 + 8)
exit
being translated to this in kernel space:
0: (07) r1 += 68
1: (61) r0 = *(u32 *)(r1 +180)
2: (95) exit
Offset 8 is corresponding to 180 in the kernel, but offset
76 is valid too. Verifier will "accept" access to offset
68+8=76 but then "convert" access to offset 8 as 180.
Effective access to offset 248 is beyond the kernel context.
(This is a __sk_buff example on a debug-heavy kernel -
packet mark is 8 -> 180, 76 would be data.)
Dereferencing the modified context pointer is not as easy
as dereferencing other types, because we have to translate
the access to reading a field in kernel structures which is
usually at a different offset and often of a different size.
To allow modifying the pointer we would have to make sure
that given eBPF instruction will always access the same
field or the fields accessed are "compatible" in terms of
offset and size...
Disallow dereferencing modified context pointers and add
to selftests the test case described here.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
with addition of tnum logic the verifier got smart enough and
we can enforce return codes at program load time.
For now do so for cgroup-bpf program types.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the libbpf to provide API support to
allow specifying BPF object name.
In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used. Regarding section name, all maps are
under the same section named "maps". Hence, section name
is not a good choice for map's name. To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.
This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.
The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument. For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.
In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code. For bpf_prog, bpf_load.c does
not collect the function symbol name. We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add various test_verifier selftests, and a simple xdp/tc functional
test that is being attached to veths. Also let new versions of clang
use the recently added -mcpu=probe support [1] for the BPF target,
so that it can probe the underlying kernel for BPF insn set extensions.
We could also just set this options always, where older versions just
ignore it and give a note to the user that the -mcpu value is not
supported, but given emitting the note cannot be turned off from clang
side lets not confuse users running selftests with it, thus fallback
to the default generic one when we see that clang doesn't support it.
Also allow CPU option to be overridden in the Makefile from command
line.
[1] d7276a40d8
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tests packet read/writes and additional skb fields.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test makes a read through a map value pointer, then considers pruning
a branch where the register holds an adjusted map value pointer. It
should not prune, but currently it does.
Signed-off-by: Alexei Starovoitov <ast@fb.com>
[ecree@solarflare.com: added test-name and patch description]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Writes in straight-line code should not prevent reads from propagating
along jumps. With current verifier code, the jump from 3 to 5 does not
add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.
Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit
from 3 to 5: safe
from 1 to 3: safe
processed 8 insns, stack depth 0
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds tests to access new __sk_buff members from sk skb program
type.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test cases to the verifier selftest suite in order to verify that
i) direct packet access, and ii) dynamic map value access is working
with the changes related to the new instructions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable ctx accesses and stack accesses aren't allowed, because we can't
determine what type of value will be read.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of selftests fell foul of the changed MAX_PACKET_OFF handling.
For instance, "direct packet access: test2" was potentially reading four
bytes from pkt + 0xffff, which could take it past the verifier's limit,
causing the program to be rejected (checks against pkt_end didn't give
us any reg->range).
Increase the shifts by one so that R2 is now mask 0x7fff instead of
mask 0xffff.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the verifier's error messages have changed, and some constructs
that previously couldn't be verified are now accepted.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.
E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see 13da9e200f ("Revert "endian: #define
__BYTE_ORDER"") and 415586c9e6 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:
./test_verifier: ELF 64-bit MSB executable, [...]
Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux
Before fix: Summary: 505 PASSED, 11 FAILED
After fix: Summary: 516 PASSED, 0 FAILED
Fixes: 18f3d6be6b ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a bug in the verifier's handling of BPF_SUB: [a,b] - [c,d] yields
was [a-c, b-d] rather than the correct [a-d, b-c]. So here is a test
which, with the bogus handling, will produce ranges of [0,0] and thus
allowed accesses; whereas the correct handling will give a range of
[-255, 255] (and hence the right-shift will give a range of [0, 255]) and
the accesses will be rejected.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of more test cases to BPF selftests that are related
to mixed signed and unsigned checks.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These failed due to a bug in verifier bounds handling.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the few existing test cases that used mixed signed/unsigned
bounds and switch them only to one flavor. Reason why we need this
is that proper boundaries cannot be derived from mixed tests.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the test_verifier case, it's quite hard to parse log level 2 to
figure out what's causing an issue when used to log level 1. We do
want to use bpf_verify_program() in order to simulate some of the
tests with strict alignment. So just add an argument to pass the level
and put it to 1 for test_verifier.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add couple of verifier test cases for x|imm += pkt_ptr, including the
imm += x extension.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Leaking kernel addresses on unpriviledged is generally disallowed,
for example, verifier rejects the following:
0: (b7) r0 = 0
1: (18) r2 = 0xffff897e82304400
3: (7b) *(u64 *)(r1 +48) = r2
R2 leaks addr into ctx
Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:
0: (b7) r0 = 0
1: (7b) *(u64 *)(r1 +48) = r0
2: (18) r2 = 0xffff897e82304400 ; map
4: (db) lock *(u64 *)(r1 +48) += r2
5: (95) exit
We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:
0: (bf) r6 = r1
1: (7a) *(u64 *)(r10 -8) = 0
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x0
6: (85) call bpf_map_lookup_elem#1
7: (15) if r0 == 0x0 goto pc+3
R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
8: (b7) r3 = 0
9: (7b) *(u64 *)(r0 +0) = r3
10: (db) lock *(u64 *)(r0 +0) += r6
11: (b7) r0 = 0
12: (95) exit
from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
11: (b7) r0 = 0
12: (95) exit
Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.
Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add test cases in test_verifier and test_progs.
Negative tests are added in test_verifier as well.
The test in test_progs will compare the value of narrower ctx field
load result vs. the masked value of normal full-field load result,
and will fail if they are not the same.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
__u8 h = __sk_buff->hash, or
__u16 p = __sk_buff->protocol
__u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.
This patch solves the issue by:
. Introduce a new parameter ctx_field_size to carry the
field size of narrower load from prog type
specific *__is_valid_access validator back to verifier.
. The non-zero ctx_field_size for a memory access indicates
(1). underlying prog type specific convert_ctx_accesses
supporting non-whole-field access
(2). the current insn is a narrower or whole field access.
. In verifier, for such loads where load memory size is
less than ctx_field_size, verifier transforms it
to a full field load followed by proper masking.
. Currently, __sk_buff and bpf_perf_event_data->sample_period
are supporting narrowing loads.
. Narrower stores are still not allowed as typical ctx stores
are just normal stores.
Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The selftests depend on using the shell exit code as a mean of
detecting the success or failure of test-binary executed. The
appropiate output "[PASS]" or "[FAIL]" in generated by
tools/testing/selftests/lib.mk.
Notice that the exit code is masked with 255. Thus, be careful if
using the number of errors as the exit code, as 256 errors would be
seen as a success.
There are two standard defined exit(3) codes:
/usr/include/stdlib.h
#define EXIT_FAILURE 1 /* Failing exit status. */
#define EXIT_SUCCESS 0 /* Successful exit status. */
Fix test_verifier.c to not use the negative value of variable
"results", but instead return EXIT_FAILURE.
Fix test_align.c and test_progs.c to actually use exit codes, before
they were always indicating success regardless of results.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds various verifier test cases:
1) A test case for the pruning issue when tracking alignment
is used.
2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
arithmetic turns such register into UNKNOWN_VALUE type.
3) Test cases for the special treatment of LD_ABS/LD_IND to
make sure verifier doesn't break calling convention here.
Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
storing temporary data, so they really must be marked as
NOT_INIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.
Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add several test cases around ldimm64, fp arithmetic and direct
packet access.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
[1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix artifact of merge resolution
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of test cases, for example, probing for xadd on a spilled
pointer to packet and map_value_adj register, various other map_value_adj
tests including the unaligned load/store, and trying out pointer arithmetic
on map_value_adj register itself. For the unaligned load/store, we need
to figure out whether the architecture has efficient unaligned access and
need to mark affected tests accordingly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
llvm can optimize the 'if (ptr > data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 > data_end)
return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 > data_end)
return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.
Reported-by: Huapeng Zhou <hzhou@fb.com>
Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test cases for array of maps and hash of maps.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent merge of 'linux-kselftest-4.11-rc1' tree broke bpf test build.
None of the tests were building and test_verifier.c had tons of compiler errors.
Fix it and add #ifdef CAP_IS_SUPPORTED to support old versions of libcap.
Tested on centos 6.8 and 7
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If selftests are run as root, then execute the unprivileged checks as
well. This switch from 243 to 368 tests.
The test numbers are suffixed with "/u" when executed as unprivileged or
with "/p" when executed as privileged.
The geteuid() check is replaced with a capability check.
Handling capabilities requires the libcap dependency.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch fixes the case when adding a zero value to the packet
pointer. The zero value could come from src_reg equals type
BPF_K or CONST_IMM. The patch fixes both, otherwise the verifer
reports the following error:
[...]
R0=imm0,min_value=0,max_value=0
R1=pkt(id=0,off=0,r=4)
R2=pkt_end R3=fp-12
R4=imm4,min_value=4,max_value=4
R5=pkt(id=0,off=4,r=4)
269: (bf) r2 = r0 // r2 becomes imm0
270: (77) r2 >>= 3
271: (bf) r4 = r1 // r4 becomes pkt ptr
272: (0f) r4 += r2 // r4 += 0
addition of negative constant to packet pointer is not allowed
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@vmware.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two tests are based on the work done for f23cc643f9. The first test is
just a basic one to make sure we don't allow AND'ing negative values, even if it
would result in a valid index for the array. The second is a cleaned up version
of the original testcase provided by Jann Horn that resulted in the commit.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William reported couple of issues in relation to direct packet
access. Typical scheme is to check for data + [off] <= data_end,
where [off] can be either immediate or coming from a tracked
register that contains an immediate, depending on the branch, we
can then access the data. However, in case of calculating [off]
for either the mentioned test itself or for access after the test
in a more "complex" way, then the verifier will stop tracking the
CONST_IMM marked register and will mark it as UNKNOWN_VALUE one.
Adding that UNKNOWN_VALUE typed register to a pkt() marked
register, the verifier then bails out in check_packet_ptr_add()
as it finds the registers imm value below 48. In the first below
example, that is due to evaluate_reg_imm_alu() not handling right
shifts and thus marking the register as UNKNOWN_VALUE via helper
__mark_reg_unknown_value() that resets imm to 0.
In the second case the same happens at the time when r4 is set
to r4 &= r5, where it transitions to UNKNOWN_VALUE from
evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside
evaluate_reg_alu(), where the register's imm turns into 3. That
is, for registers with type UNKNOWN_VALUE, imm of 0 means that
we don't know what value the register has, and for imm > 0 it
means that the value has [imm] upper zero bits. F.e. when shifting
an UNKNOWN_VALUE register by 3 to the right, no matter what value
it had, we know that the 3 upper most bits must be zero now.
This is to make sure that ALU operations with unknown registers
don't overflow. Meaning, once we know that we have more than 48
upper zero bits, or, in other words cannot go beyond 0xffff offset
with ALU ops, such an addition will track the target register
as a new pkt() register with a new id, but 0 offset and 0 range,
so for that a new data/data_end test will be required. Is the source
register a CONST_IMM one that is to be added to the pkt() register,
or the source instruction is an add instruction with immediate
value, then it will get added if it stays within max 0xffff bounds.
>From there, pkt() type, can be accessed should reg->off + imm be
within the access range of pkt().
[...]
from 28 to 30: R0=imm1,min_value=1,max_value=1
R1=pkt(id=0,off=0,r=22) R2=pkt_end
R3=imm144,min_value=144,max_value=144
R4=imm0,min_value=0,max_value=0
R5=inv48,min_value=2054,max_value=2054 R10=fp
30: (bf) r5 = r3
31: (07) r5 += 23
32: (77) r5 >>= 3
33: (bf) r6 = r1
34: (0f) r6 += r5
cannot add integer value with 0 upper zero bits to ptr_to_packet
[...]
from 52 to 80: R0=imm1,min_value=1,max_value=1
R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv
R4=imm272 R5=inv56,min_value=17,max_value=17
R6=pkt(id=0,off=26,r=34) R10=fp
80: (07) r4 += 71
81: (18) r5 = 0xfffffff8
83: (5f) r4 &= r5
84: (77) r4 >>= 3
85: (0f) r1 += r4
cannot add integer value with 3 upper zero bits to ptr_to_packet
Thus to get above use-cases working, evaluate_reg_imm_alu() has
been extended for further ALU ops. This is fine, because we only
operate strictly within realm of CONST_IMM types, so here we don't
care about overflows as they will happen in the simulated but also
real execution and interaction with pkt() in check_packet_ptr_add()
will check actual imm value once added to pkt(), but it's irrelevant
before.
With regards to 06c1c04972 ("bpf: allow helpers access to variable
memory") that works on UNKNOWN_VALUE registers, the verifier becomes
now a bit smarter as it can better resolve ALU ops, so we need to
adapt two test cases there, as min/max bound tracking only becomes
necessary when registers were spilled to stack. So while mask was
set before to track upper bound for UNKNOWN_VALUE case, it's now
resolved directly as CONST_IMM, and such contructs are only necessary
when f.e. registers are spilled.
For commit 6b17387307 ("bpf: recognize 64bit immediate loads as
consts") that initially enabled dw load tracking only for nfp jit/
analyzer, I did couple of tests on large, complex programs and we
don't increase complexity badly (my tests were in ~3% range on avg).
I've added a couple of tests similar to affected code above, and
it works fine with verifier now.
Reported-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Gianluca Borello <g.borello@gmail.com>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, helpers that read and write from/to the stack can do so using
a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE.
ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so
that the verifier can safely check the memory access. However, requiring
the argument to be a constant can be limiting in some circumstances.
Since the current logic keeps track of the minimum and maximum value of
a register throughout the simulated execution, ARG_CONST_STACK_SIZE can
be changed to also accept an UNKNOWN_VALUE register in case its
boundaries have been set and the range doesn't cause invalid memory
accesses.
One common situation when this is useful:
int len;
char buf[BUFSIZE]; /* BUFSIZE is 128 */
if (some_condition)
len = 42;
else
len = 84;
some_helper(..., buf, len & (BUFSIZE - 1));
The compiler can often decide to assign the constant values 42 or 48
into a variable on the stack, instead of keeping it in a register. When
the variable is then read back from stack into the register in order to
be passed to the helper, the verifier will not be able to recognize the
register as constant (the verifier is not currently tracking all
constant writes into memory), and the program won't be valid.
However, by allowing the helper to accept an UNKNOWN_VALUE register,
this program will work because the bitwise AND operation will set the
range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE),
so the verifier can guarantee the helper call will be safe (assuming the
argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more
check against 0 would be needed). Custom ranges can be set not only with
ALU operations, but also by explicitly comparing the UNKNOWN_VALUE
register with constants.
Another very common example happens when intercepting system call
arguments and accessing user-provided data of variable size using
bpf_probe_read(). One can load at runtime the user-provided length in an
UNKNOWN_VALUE register, and then read that exact amount of data up to a
compile-time determined limit in order to fit into the proper local
storage allocated on the stack, without having to guess a suboptimal
access size at compile time.
Also, in case the helpers accepting the UNKNOWN_VALUE register operate
in raw mode, disable the raw mode so that the program is required to
initialize all memory, since there is no guarantee the helper will fill
it completely, leaving possibilities for data leak (just relevant when
the memory used by the helper is the stack, not when using a pointer to
map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will
be treated as ARG_PTR_TO_STACK.
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 484611357c ("bpf: allow access into map value arrays")
introduces the ability to do pointer math inside a map element value via
the PTR_TO_MAP_VALUE_ADJ register type.
The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ
is spilled into the stack, limiting several use cases, especially when
generating bpf code from a compiler.
Handle this case by explicitly enabling the register type
PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and
max_value are reset just for BPF_LDX operations that don't result in a
restore of a spilled register from stack.
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable helpers to directly access a map element value by passing a
register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper
arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK.
This enables several use cases. For example, a typical tracing program
might want to capture pathnames passed to sys_open() with:
struct trace_data {
char pathname[PATHLEN];
};
SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
struct trace_data data;
bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di);
/* consume data.pathname, for example via
* bpf_trace_printk() or bpf_perf_event_output()
*/
}
Such a program could easily hit the stack limit in case PATHLEN needs to
be large or more local variables need to exist, both of which are quite
common scenarios. Allowing direct helper access to map element values,
one could do:
struct bpf_map_def SEC("maps") scratch_map = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(struct trace_data),
.max_entries = 1,
};
SEC("kprobe/sys_open")
int bpf_sys_open(struct pt_regs *ctx)
{
int id = 0;
struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id);
if (!p)
return;
bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di);
/* consume p->pathname, for example via
* bpf_trace_printk() or bpf_perf_event_output()
*/
}
And wouldn't risk exhausting the stack.
Code changes are loosely modeled after commit 6841de8b0d ("bpf: allow
helpers access the packet directly"). Unlike with PTR_TO_PACKET, these
changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not
ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be
trivial, but since there is not currently a use case for that, it's
reasonable to limit the set of changes.
Also, add new tests to make sure accesses to map element values from
helpers never go out of boundary, even when adjusted.
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>