linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-12 05:24:12 +08:00

Author	SHA1	Message	Date
Daniel Borkmann	8b614aebec	bpf: move clearing of A/X into classic to eBPF migration prologue Back in the days where eBPF (or back then "internal BPF" ;->) was not exposed to user space, and only the classic BPF programs internally translated into eBPF programs, we missed the fact that for classic BPF A and X needed to be cleared. It was fixed back then via `83d5b7ef99` ("net: filter: initialize A and X registers"), and thus classic BPF specifics were added to the eBPF interpreter core to work around it. This added some confusion for JIT developers later on that take the eBPF interpreter code as an example for deriving their JIT. F.e. in `f75298f5c3` ("s390/bpf: clear correct BPF accumulator register"), at least X could leak stack memory. Furthermore, since this is only needed for classic BPF translations and not for eBPF (verifier takes care that read access to regs cannot be done uninitialized), more complexity is added to JITs as they need to determine whether they deal with migrations or native eBPF where they can just omit clearing A/X in their prologue and thus reduce image size a bit, see f.e. `cde66c2d88` ("s390/bpf: Only clear A and X for converted BPF programs"). In other cases (x86, arm64), A and X is being cleared in the prologue also for eBPF case, which is unnecessary. Lets move this into the BPF migration in bpf_convert_filter() where it actually belongs as long as the number of eBPF JITs are still few. It can thus be done generically; allowing us to remove the quirk from __bpf_prog_run() and to slightly reduce JIT image size in case of eBPF, while reducing code duplication on this matter in current(/future) eBPF JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Yang Shi <yang.shi@linaro.org> Acked-by: Yang Shi <yang.shi@linaro.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-18 16:04:51 -05:00
Daniel Borkmann	a91263d520	ebpf: migrate bpf_prog's flags to bitfield As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 05:02:39 -07:00
Kaixu Xia	2c9c3bbbbf	bpf: s390: Fix build error caused by the struct bpf_array member name changed There is a build error that "'struct bpf_array' has no member named 'prog'" on s390. In commit `2a36f0b92e` ("bpf: Make the bpf_prog_array_map more generic"), the member 'prog' of struct bpf_array is replaced by 'ptrs'. So this patch fixes it. Fixes: `2a36f0b92e` ("bpf: Make the bpf_prog_array_map more generic") Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-11 11:49:40 -07:00
Daniel Borkmann	7b36f92934	bpf: provide helper that indicates eBPF was migrated During recent discussions we had with Michael, we found that it would be useful to have an indicator that tells the JIT that an eBPF program had been migrated from classic instructions into eBPF instructions, as only in that case A and X need to be cleared in the prologue. Such eBPF programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC. Thus, introduce a small helper for `cde66c2d88` ("s390/bpf: Only clear A and X for converted BPF programs") and possibly others in future. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 11:13:20 -07:00
Michael Holzheu	9db7f2b818	s390/bpf: recache skb->data/hlen for skb_vlan_push/pop Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via helper functions. These functions may change skb->data/hlen. This data is cached by s390 JIT to improve performance of ld_abs/ld_ind instructions. Therefore after a change we have to reload the data. In case of usage of skb_vlan_push/pop, in the prologue we store the SKB pointer on the stack and restore it after BPF_JMP_CALL to skb_vlan_push/pop. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:59:58 -07:00
Michael Holzheu	cde66c2d88	s390/bpf: Only clear A and X for converted BPF programs Only classic BPF programs that have been converted to eBPF need to clear the A and X registers. We can check for converted programs with: bpf_prog->type == BPF_PROG_TYPE_UNSPEC So add the check and skip initialization for real eBPF programs. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:59:58 -07:00
Michael Holzheu	ce2b6ad9c1	s390/bpf: increase BPF_SIZE_MAX Currently we have the restriction that jitted BPF programs can have a maximum size of one page. The reason is that we use short displacements for the literal pool. The 20 bit displacements are available since z990 and BPF requires z196 as minimum. Therefore we can remove this restriction and use everywhere 20 bit signed long displacements. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:59:58 -07:00
Michael Holzheu	1df03ffdde	s390/bpf: Fix multiple macro expansions The EMIT6_DISP_LH macro passes the "disp" parameter to the _EMIT6_DISP_LH macro. The _EMIT6_DISP_LH macro uses the "disp" parameter twice: unsigned int __disp_h = ((u32)disp) & 0xff000; unsigned int __disp_l = ((u32)disp) & 0x00fff; The EMIT6_DISP_LH is used several times with EMIT_CONST_U64() as "disp" parameter. Therefore always two constants are created per usage of EMIT6_DISP_LH. Fix this and add variable "_disp" to avoid multiple expansions. * v2: Move "_disp" to _EMIT6_DISP_LH as suggested by Joe Perches Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:59:58 -07:00
Michael Holzheu	f75298f5c3	s390/bpf: clear correct BPF accumulator register Currently we assumed the following BPF to eBPF register mapping: - BPF_REG_A -> BPF_REG_7 - BPF_REG_X -> BPF_REG_8 Unfortunately this mapping is wrong. The correct mapping is: - BPF_REG_A -> BPF_REG_0 - BPF_REG_X -> BPF_REG_7 So clear the correct registers and use the BPF_REG_A and BPF_REG_X macros instead of BPF_REG_0/7. Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:59:58 -07:00
Alexei Starovoitov	4e10df9a60	bpf: introduce bpf_skb_vlan_push/pop() helpers Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via helper functions. These functions may change skb->data/hlen which are cached by some JITs to improve performance of ld_abs/ld_ind instructions. Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls, re-compute header len and re-cache skb->data/hlen back into cpu registers. Note, skb->data/hlen are not directly accessible from the programs, so any changes to skb->data done either by these helpers or by other TC actions are safe. eBPF JIT supported by three architectures: - arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is. - x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls (experiments showed that conditional re-caching is slower). - s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present in the program (re-caching is tbd). These helpers allow more scalable handling of vlan from the programs. Instead of creating thousands of vlan netdevs on top of eth0 and attaching TC+ingress+bpf to all of them, the program can be attached to eth0 directly and manipulate vlans as necessary. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:52:31 -07:00
Michael Holzheu	b035b60ded	s390/bpf: Fix backward jumps Currently all backward jumps crash for JITed s390x eBPF programs with an illegal instruction program check and kernel panic. Because for negative values the opcode of the jump instruction is overriden by the negative branch offset an illegal instruction is generated by the JIT: 000003ff802da378: c01100000002 lgfi %r1,2 000003ff802da37e: fffffff52065 unknown <-- illegal instruction 000003ff802da384: b904002e lgr %r2,%r14 So fix this and mask the offset in order not to damage the opcode. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-06-25 09:39:18 +02:00
Michael Holzheu	6651ee070b	s390/bpf: implement bpf_tail_call() helper bpf_tail_call() arguments: - ctx......: Context pointer - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table - index....: Index in the jump table In this implementation s390x JIT does stack unwinding and jumps into the callee program prologue. Caller and callee use the same stack. With this patch a tail call generates the following code on s390x: if (index >= array->map.max_entries) goto out 000003ff8001c7e4: e31030100016 llgf %r1,16(%r3) 000003ff8001c7ea: ec41001fa065 clgrj %r4,%r1,10,3ff8001c828 if (tail_call_cnt++ > MAX_TAIL_CALL_CNT) goto out; 000003ff8001c7f0: a7080001 lhi %r0,1 000003ff8001c7f4: eb10f25000fa laal %r1,%r0,592(%r15) 000003ff8001c7fa: ec120017207f clij %r1,32,2,3ff8001c828 prog = array->prog[index]; if (prog == NULL) goto out; 000003ff8001c800: eb140003000d sllg %r1,%r4,3 000003ff8001c806: e31310800004 lg %r1,128(%r3,%r1) 000003ff8001c80c: ec18000e007d clgij %r1,0,8,3ff8001c828 Restore registers before calling function 000003ff8001c812: eb68f2980004 lmg %r6,%r8,664(%r15) 000003ff8001c818: ebbff2c00004 lmg %r11,%r15,704(%r15) goto *(prog->bpf_func + tail_call_start); 000003ff8001c81e: e31100200004 lg %r1,32(%r1,%r0) 000003ff8001c824: 47f01006 bc 15,6(%r1) Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-09 11:47:10 -07:00
Michael Holzheu	88aeca15d6	s390/bpf: fix bpf frame pointer setup Currently the bpf frame pointer is set to the old r15. This is wrong because of packed stack. Fix this and adjust the frame pointer to respect packed stack. This now generates a prolog like the following: 3ff8001c3fa: eb67f0480024 stmg %r6,%r7,72(%r15) 3ff8001c400: ebcff0780024 stmg %r12,%r15,120(%r15) 3ff8001c406: b904001f lgr %r1,%r15 <- load backchain 3ff8001c40a: 41d0f048 la %r13,72(%r15) <- load adjusted bfp 3ff8001c40e: a7fbfd98 aghi %r15,-616 3ff8001c412: e310f0980024 stg %r1,152(%r15) <- save backchain Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-03 19:31:39 -07:00
Michael Holzheu	bbac1c9488	s390/bpf: fix stack allocation On s390x we have to provide 160 bytes stack space before we can call the next function. From the 160 bytes that we got from the previous function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left. Currently for BPF we allocate additional 160 - 11 * 8 bytes for the next function. This is wrong because then the next function only gets: (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes Fix this and allocate enough memory for the next function. Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-03 19:31:39 -07:00
Michael Holzheu	b9b4b1cef1	s390/bpf: Fix gcov stack space problem When compiling the kernel for GCOV (CONFIG_GCOV_KERNEL,-fprofile-arcs), gcc allocates a lot of stack space because of the large switch statement in bpf_jit_insn(). This leads to the following compile warning: arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_prog': arch/s390/net/bpf_jit_comp.c:1144:1: warning: frame size of function 'bpf_jit_prog' is 12592 bytes which is more than half the stack size. The dynamic check would not be reliable. No check emitted for this function. arch/s390/net/bpf_jit_comp.c:1144:1: warning: the frame size of 12504 bytes is larger than 1024 bytes [-Wframe-larger-than=] And indead gcc allocates 12592 bytes of stack space: # objdump -d arch/s390/net/bpf_jit_comp.o ... 0000000000000c60 <bpf_jit_prog>: c60: eb 6f f0 48 00 24 stmg %r6,%r15,72(%r15) c66: b9 04 00 ef lgr %r14,%r15 c6a: e3 f0 fe d0 fc 71 lay %r15,-12592(%r15) As a workaround of that problem we now define bpf_jit_insn() as noinline which then reduces the stack space. # objdump -d arch/s390/net/bpf_jit_comp.o ... 0000000000000070 <bpf_jit_insn>: 70: eb 6f f0 48 00 24 stmg %r6,%r15,72(%r15) 76: c0 d0 00 00 00 00 larl %r13,76 <bpf_jit_insn+0x6> 7c: a7 f1 3f 80 tmll %r15,16256 80: b9 04 00 ef lgr %r14,%r15 84: e3 f0 ff a0 ff 71 lay %r15,-96(%r15) Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-04-30 13:50:36 +02:00
Michael Holzheu	771aada9ac	s390/bpf: Adjust ALU64_DIV/MOD to match interpreter change The s390x ALU64_DIV/MOD has been implemented according to the eBPF interpreter specification that used do_div(). This function does a 64-bit by 32-bit divide. It turned out that this was wrong and now the interpreter uses div64_u64_rem() for full 64-bit division. So fix this and use full 64-bit division in the s390x eBPF backend code. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-04-30 13:50:34 +02:00
Michael Holzheu	0546231057	s390/bpf: Add s390x eBPF JIT compiler backend Replace 32 bit BPF JIT backend with new 64 bit eBPF backend. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-04-15 12:23:49 +02:00
Michael Holzheu	fe82bbae36	s390/bpf: Zero extend parameters before calling C function The s390x ABI requires to zero extend parameters before functions are called. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-15 11:10:41 +01:00
Michael Holzheu	1a92b2deaf	s390/bpf: Fix sk_load_byte_msh() In sk_load_byte_msh() sk_load_byte_slow() is called instead of sk_load_byte_msh_slow(). Fix this and call the correct function. Besides of this load only one byte instead of two and fix the comment. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-15 11:10:37 +01:00
Michael Holzheu	d86eb7448e	s390/bpf: Fix offset parameter for skb_copy_bits() Currently the offset parameter for skb_copy_bits is changed in sk_load_word() and sk_load_half(). Therefore it is not correct when calling skb_copy_bits(). Fix this and use the original offset for the function call. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-15 11:10:33 +01:00
Michael Holzheu	db9aa8f432	s390/bpf: Fix skb_copy_bits() parameter passing The skb_copy_bits() function has the following signature: int skb_copy_bits(const struct sk_buff skb, int offset, void to, int len) Currently in bpf_jit.S the "to" and "len" parameters have been exchanged. So fix this and call the function with the correct parameters. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-15 11:10:29 +01:00
Michael Holzheu	5a80244246	s390/bpf: Fix JMP_JGE_K (A >= K) and JMP_JGT_K (A > K) Currently the signed COMPARE HALFWORD IMMEDIATE (chi) and COMPARE (c) instructions are used to compare "A" with "K". This is not correct because "A" and "K" are both unsigned. To fix this remove the chi instruction (no unsigned analogon available) and use the unsigned COMPARE LOGICAL (cl) instruction instead of COMPARE (c). Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-15 08:17:42 +01:00
Michael Holzheu	ae75097459	s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X) Currently the signed COMPARE (cr) instruction is used to compare "A" with "X". This is not correct because "A" and "X" are both unsigned. To fix this use the unsigned COMPARE LOGICAL (clr) instruction instead. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-09 10:10:32 +01:00
Michael Holzheu	df3eed3d28	s390/bpf: Fix ALU_NEG (A = -A) Currently the LOAD NEGATIVE (lnr) instruction is used for ALU_NEG. This instruction always loads the negative value. Therefore, if A is already negative, it remains unchanged. To fix this use LOAD COMPLEMENT (lcr) instead. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2015-01-09 10:10:30 +01:00
Hannes Frederic Sowa	233577a220	net: filter: constify detection of pkt_type_offset Currently we have 2 pkt_type_offset functions doing the same thing and spread across the architecture files. Remove those and replace them with a PKT_TYPE_OFFSET macro helper which gets the constant value from a zero sized sk_buff member right in front of the bitfield with offsetof. This new offset marker does not change size of struct sk_buff. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:07:21 -04:00
Daniel Borkmann	286aad3c40	net: bpf: be friendly to kmemcheck Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Daniel Borkmann	738cbe72ad	net: bpf: consolidate JIT binary allocator Introduced in commit `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") and later on replicated in `aa2d2c73c2` ("s390/bpf,jit: address randomize and write protect jit code") for s390 architecture, write protection for BPF JIT images got added and a random start address of the JIT code, so that it's not on a page boundary anymore. Since both use a very similar allocator for the BPF binary header, we can consolidate this code into the BPF core as it's mostly JIT independant anyway. This will also allow for future archs that support DEBUG_SET_MODULE_RONX to just reuse instead of reimplementing it. JIT tested on x86_64 and s390x with BPF test suite. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Daniel Borkmann	60a3b2253c	net: bpf: make eBPF interpreter images read-only With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-05 12:02:48 -07:00
Alexei Starovoitov	7ae457c1e5	net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_' prefix - everything that is pure BPF is changed to 'bpf_' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern orig_prog; unsigned int (bpf_func)(const struct sk_buff skb, const struct bpf_insn filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter ' and BPF_PROG_RUN to be used with 'struct bpf_prog ' __sk_filter_release(struct sk_filter ) gains __bpf_prog_release(struct bpf_prog ) helper function also perform related renames for the functions that work with 'struct bpf_prog ', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock )/sk_detach_filter(struct sock ) and SK_RUN_FILTER(struct sk_filter , ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog )/bpf_prog_destroy(struct bpf_prog ) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-02 15:03:58 -07:00
Daniel Borkmann	3480593131	net: filter: get rid of BPF_S_* enum This patch finally allows us to get rid of the BPF_S_* enum. Currently, the code performs unnecessary encode and decode workarounds in seccomp and filter migration itself when a filter is being attached in order to overcome BPF_S_* encoding which is not used anymore by the new interpreter resp. JIT compilers. Keeping it around would mean that also in future we would need to extend and maintain this enum and related encoders/decoders. We can get rid of all that and save us these operations during filter attaching. Naturally, also JIT compilers need to be updated by this. Before JIT conversion is being done, each compiler checks if A is being loaded at startup to obtain information if it needs to emit instructions to clear A first. Since BPF extensions are a subset of BPF_LD \| BPF_{W,H,B} \| BPF_ABS variants, case statements for extensions can be removed at that point. To ease and minimalize code changes in the classic JITs, we have introduced bpf_anc_helper(). Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int), arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we unfortunately didn't have access, but changes are analogous to the rest. Joint work with Alexei Starovoitov. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:16:58 -07:00
Heiko Carstens	e84d2f8d2a	net: filter: s390: fix JIT address randomization This is the s390 variant of Alexei's JIT bug fix. (patch description below stolen from Alexei's patch) bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page. Fixes: `aa2d2c73c2` ("s390/bpf,jit: address randomize and write protect jit code") Reported-by: Alexei Starovoitov <ast@plumgrid.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 16:10:16 -04:00
Martin Schwidefsky	6e0de81759	s390/bpf,jit: initialize A register if 1st insn is BPF_S_LDX_B_MSH The A register needs to be initialized to zero in the prolog if the first instruction of the BPF program is BPF_S_LDX_B_MSH to prevent leaking the content of %r5 to user space. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-04-25 14:03:25 +02:00
Daniel Borkmann	f8bbbfc3b9	net: filter: add jited flag to indicate jit compiled filters This patch adds a jited flag into sk_filter struct in order to indicate whether a filter is currently jited or not. The size of sk_filter is not being expanded as the 32 bit 'len' member allows upper bits to be reused since a filter can currently only grow as large as BPF_MAXINSNS. Therefore, there's enough room also for other in future needed flags to reuse 'len' field if necessary. The jited flag also allows for having alternative interpreter functions running as currently, we can only detect jit compiled filters by testing fp->bpf_func to not equal the address of sk_run_filter(). Joint work with Alexei Starovoitov. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-31 00:45:08 -04:00
Tom Herbert	61b905da33	net: Rename skb->rxhash to skb->hash The packet hash can be considered a property of the packet, not just on RX path. This patch changes name of rxhash and l4_rxhash skbuff fields to be hash and l4_hash respectively. This includes changing uses of the field in the code which don't call the access functions. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-26 15:58:20 -04:00
Heiko Carstens	3af57f78c3	s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions The s390 bpf jit compiler emits the signed divide instructions "dr" and "d" for unsigned divisions. This can cause problems: the dividend will be zero extended to a 64 bit value and the divisor is the 32 bit signed value as specified A or X accumulator, even though A and X are supposed to be treated as unsigned values. The divide instrunctions will generate an exception if the result cannot be expressed with a 32 bit signed value. This is the case if e.g. the dividend is 0xffffffff and the divisor either 1 or also 0xffffffff (signed: -1). To avoid all these issues simply use unsigned divide instructions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-17 18:54:49 -08:00
Eric Dumazet	aee636c480	bpf: do not use reciprocal divide At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-15 17:02:08 -08:00
Martin Schwidefsky	6cef30034c	s390/bpf,jit: fix prolog oddity The prolog of functions generated by the bpf jit compiler uses an instruction sequence with an "ahi" instruction to create stack space instead of using an "aghi" instruction. Using the 32-bit "ahi" is not wrong as the stack we are operating on is an order-4 allocation which is always aligned to 16KB. But it is more consistent to use an "aghi" as the stack pointer is a 64-bit value. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-24 17:16:59 +02:00
Heiko Carstens	0f20822a69	s390/dis: move disassembler function prototypes to proper header file Now that the in-kernel disassembler has an own header file move the disassembler related function prototypes to that header file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-24 17:16:48 +02:00
Alexei Starovoitov	d45ed4a4e3	net: fix unsafe set_memory_rw from softirq on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] * DEADLOCK * [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-07 15:16:45 -04:00
Heiko Carstens	4784955a52	s390/bpf,jit: fix address randomization Add misssing braces to hole calculation. This resulted in an addition instead of an substraction. Which in turn means that the jit compiler could try to write out of bounds of the allocated piece of memory. This bug was introduced with `aa2d2c73` "s390/bpf,jit: address randomize and write protect jit code". Fixes this one: [ 37.320956] Unable to handle kernel pointer dereference at virtual kernel address 000003ff80231000 [ 37.320984] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 37.320993] Modules linked in: dm_multipath scsi_dh eadm_sch dm_mod ctcm fsm autofs4 [ 37.321007] CPU: 28 PID: 6443 Comm: multipathd Not tainted 3.10.9-61.x.20130829-s390xdefault #1 [ 37.321011] task: 0000004ada778000 ti: 0000004ae3304000 task.ti: 0000004ae3304000 [ 37.321014] Krnl PSW : 0704c00180000000 000000000012d1de (bpf_jit_compile+0x198e/0x23d0) [ 37.321022] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 000000004350207d 0000004a00000001 0000000000000007 000003ff80231002 [ 37.321029] 0000000000000007 000003ff80230ffe 00000000a7740000 000003ff80230f76 [ 37.321032] 000003ffffffffff 000003ff00000000 000003ff0000007d 000000000071e820 [ 37.321035] 0000004adbe99950 000000000071ea18 0000004af3d9e7c0 0000004ae3307b80 [ 37.321046] Krnl Code: 000000000012d1d0: 41305004 la %r3,4(%r5) 000000000012d1d4: e330f0f80021 clg %r3,248(%r15) #000000000012d1da: a7240009 brc 2,12d1ec >000000000012d1de: 50805000 st %r8,0(%r5) 000000000012d1e2: e330f0f00004 lg %r3,240(%r15) 000000000012d1e8: 41303004 la %r3,4(%r3) 000000000012d1ec: e380f0e00004 lg %r8,224(%r15) 000000000012d1f2: e330f0f00024 stg %r3,240(%r15) [ 37.321074] Call Trace: [ 37.321077] ([<000000000012da78>] bpf_jit_compile+0x2228/0x23d0) [ 37.321083] [<00000000006007c2>] sk_attach_filter+0xfe/0x214 [ 37.321090] [<00000000005d2d92>] sock_setsockopt+0x926/0xbdc [ 37.321097] [<00000000005cbfb6>] SyS_setsockopt+0x8a/0xe8 [ 37.321101] [<00000000005ccaa8>] SyS_socketcall+0x264/0x364 [ 37.321106] [<0000000000713f1c>] sysc_nr_ok+0x22/0x28 [ 37.321113] [<000003fffce10ea8>] 0x3fffce10ea8 [ 37.321118] INFO: lockdep is turned off. [ 37.321121] Last Breaking-Event-Address: [ 37.321124] [<000000000012d192>] bpf_jit_compile+0x1942/0x23d0 [ 37.321132] [ 37.321135] Kernel panic - not syncing: Fatal exception: panic_on_oops Cc: stable@vger.kernel.org # v3.11 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2013-09-04 17:18:55 +02:00
Heiko Carstens	c9a7afa380	s390/bpf,jit: add pkt_type support s390 version of `3b58908a` "x86: bpf_jit_comp: add pkt_type support". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2013-07-18 12:44:38 +02:00
Heiko Carstens	aa2d2c73c2	s390/bpf,jit: address randomize and write protect jit code This is the s390 variant of `314beb9b` "x86: bpf_jit_comp: secure bpf jit against spraying attacks". With this change the whole jit code and literal pool will be write protected after creation. In addition the start address of the jit code won't be always on a page boundary anymore. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-07-18 12:44:37 +02:00
Heiko Carstens	fee1b5488d	s390/bpf,jit: use generic jit dumper This is the s390 backend of `79617801` "filter: bpf_jit_comp: refactor and unify BPF JIT image dump output". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-07-18 12:44:35 +02:00
Heiko Carstens	1eeb74782d	s390/bpf,jit: call module_free() from any context The workqueue workaround is no longer needed. Same as `5199dfe531` "sparc: bpf_jit_comp: can call module_free() from any context". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-07-18 12:44:34 +02:00
Stelian Nirlu	3d04fea5e7	s390/bpf,jit: use kcalloc instead of kmalloc and memset Signed-off-by: Stelian Nirlu <steliannirlu@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-04-17 14:07:27 +02:00
Heiko Carstens	5303a0fe8c	s390/bpf,jit: add vlan tag support s390 version of `855ddb56` "x86: bpf_jit_comp: add vlan tag support". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-14 15:55:20 +01:00
Heiko Carstens	916908df24	s390/bpf,jit: add support for XOR instruction Add support for XOR instruction for use with X/K. s390 JIT support for the new BPF_S_ALU_XOR_* instructions introduced with `9e49e889` "filter: add XOR instruction for use with X/K". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-12-03 10:44:05 -05:00
Heiko Carstens	3247274536	s390/bpf,jit: add support MOD instruction Add support for MOD operation for s390's JIT. Same as `280050cc` "x86 bpf_jit: support MOD operation" for x86 which adds JIT support for the generic new MOD operation introduced with `b6069a9570` "filter: add MOD operation". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-12-03 10:44:02 -05:00
Heiko Carstens	c59eed111b	s390/bpf,jit: add support for BPF_S_ANC_ALU_XOR_X instruction Add support for new BPF_S_ANC_ALU_XOR_X instruction which got added with `ffe06c17` "filter: add XOR operation". s390 version of `4bfaddf1` "x86 bpf_jit: support BPF_S_ANC_ALU_XOR_X instruction". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-09-26 15:45:28 +02:00
Heiko Carstens	68d9884dbc	s390/bpf,jit: improve code generation Make use of new immediate instructions that came with the extended immediate and general instruction extension facilities. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-09-26 15:44:49 +02:00

1 2

51 Commits