linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 16:54:20 +08:00

Author	SHA1	Message	Date
Peter Oskolkov	3e0bd37ce0	bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap This patch adds all needed plumbing in preparation to allowing bpf programs to do IP encapping via bpf_lwt_push_encap. Actual implementation is added in the next patch in the patchset. Of note: - bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT prog types in addition to BPF_PROG_TYPE_LWT_IN; - if the skb being encapped has GSO set, encapsulation is limited to IPIP/IP+GRE/IP+GUE (both IPv4 and IPv6); - as route lookups are different for ingress vs egress, the single external bpf_lwt_push_encap BPF helper is routed internally to either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs, depending on prog type. v8 changes: fixed a typo. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-13 18:27:55 -08:00
Jakub Kicinski	dd27c2e3d0	bpf: offload: add priv field for drivers Currently bpf_offload_dev does not have any priv pointer, forcing the drivers to work backwards from the netdev in program metadata. This is not great given programs are conceptually associated with the offload device, and it means one or two unnecessary deferences. Add a priv pointer to bpf_offload_dev. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-12 17:07:09 +01:00
Prashant Bhole	ebbed0f46e	tools: bpftool: doc, add text about feature-subcommand This patch adds missing information about feature-subcommand in bpftool.rst Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-12 17:06:18 +01:00
Alexei Starovoitov	ecdf68e2bb	Merge branch 'bpf-prog-build' Jiong Wang says: ==================== This set improves bpf object file related rules in selftests Makefile. - tell git to ignore the build dir "alu32". - extend sub-register mode compilation to all bpf object files to give LLVM compiler bpf back-end more exercise. - auto-generate bpf kernel object file list. - relax sub-register mode compilation criteria. v1 -> v2: - rename "kern_progs" to "progs". (Alexei) - spin a new patch to remove build server kernel requirement for sub-register mode compilation (Alexei) - rebase on top of KaFai’s latest "test_sock_fields" patch set. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-11 20:31:39 -08:00
Jiong Wang	64e39ee2c8	selftests: bpf: relax sub-register mode compilation criteria Sub-register mode compilation was enabled only when there are eBPF "v3" processor supports at both compilation time inside LLVM and runtime inside kernel. Given separation betwen build and test server could be often, this patch removes the runtime support criteria. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-11 20:31:38 -08:00
Jiong Wang	bd4aed0ee7	selftests: bpf: centre kernel bpf objects under new subdir "progs" At the moment, all kernel bpf objects are listed under BPF_OBJ_FILES. Listing them manually sometimes causing patch conflict when people are adding new testcases simultaneously. It is better to centre all the related source files under a subdir "progs", then auto-generate the object file list. Suggested-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-11 20:31:38 -08:00
Jiong Wang	4836b4637e	selftests: bpf: extend sub-register mode compilation to all bpf object files At the moment, we only do extra sub-register mode compilation on bpf object files used by "test_progs". These object files are really loaded and executed. This patch further extends sub-register mode compilation to all bpf object files, even those without corresponding runtime tests. Because this could help testing LLVM sub-register code-gen, kernel bpf selftest has much more C testcases with reasonable size and complexity compared with LLVM testsuite which only contains unit tests. There were some file duplication inside BPF_OBJ_FILES_DUAL_COMPILE which is removed now. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-11 20:31:38 -08:00
Jiong Wang	1727a9dce6	selftests: bpf: add "alu32" to .gitignore "alu32" is a build dir and contains various files for BPF sub-register code-gen testing. This patch tells git to ignore it. Suggested-by: Yonghong Song <yhs@fb.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-11 20:31:38 -08:00
Alexei Starovoitov	d105fa983c	Merge branch 'skb_sk-sk_fullsock-tcp_sock' Martin KaFai Lau says: ==================== This series adds __sk_buff->sk, "struct bpf_tcp_sock", BPF_FUNC_sk_fullsock and BPF_FUNC_tcp_sock. Together, they provide a common way to expose the members of "struct tcp_sock" and "struct bpf_sock" for the bpf_prog to access. The patch series first adds a bpf_sock pointer to __sk_buff and a new helper BPF_FUNC_sk_fullsock. It then adds BPF_FUNC_tcp_sock to get a bpf_tcp_sock pointer from a bpf_sock pointer. The current use case is to allow a cg_skb_bpf_prog to provide per cgroup traffic policing/shaping. Please see individual patch for details. v2: - Patch 1 depends on commit `d623876646` ("bpf: Fix narrow load on a bpf_sock returned from sk_lookup()") in the bpf branch. - Add sk_to_full_sk() to bpf_sk_fullsock() and bpf_tcp_sock() such that there is a way to access the listener's sk and tcp_sk when __sk_buff->sk is a request_sock. The comments in the uapi bpf.h is updated accordingly. - bpf_ctx_range_till() is used in bpf_sock_common_is_valid_access() in patch 1. Saved a few lines. - Patch 2 is new in v2 and it adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the bpf_sock. Narrow load is allowed on them. The "state" (i.e. sk_state) has already been used in INET_DIAG (e.g. ss -t) and getsockopt(TCP_INFO). - While at it in the new patch 2, also allow narrow load on some existing fields of the bpf_sock, which are "family", "type", "protocol" and "src_port". Only allow loading from first byte for now. i.e. does not allow narrow load starting from the 2nd byte. - Add some narrow load tests to the test_verifier's sock.c ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:18 -08:00
Martin KaFai Lau	e0b27b3f97	bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock This patch adds a C program to show the usage on skb->sk and bpf_tcp_sock. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	fb47d1d931	bpf: Add skb->sk, bpf_sk_fullsock and bpf_tcp_sock tests to test_verifer This patch tests accessing the skb->sk and the new helpers, bpf_sk_fullsock and bpf_tcp_sock. The errstr of some existing "reference tracking" tests is changed with s/bpf_sock/sock/ and s/socket/sock/ where "sock" is from the verifier's reg_type_str[]. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	281f9e7572	bpf: Sync bpf.h to tools/ This patch sync the uapi bpf.h to tools/. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	655a51e536	bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock This patch adds a helper function BPF_FUNC_tcp_sock and it is currently available for cg_skb and sched_(cls\|act): struct bpf_tcp_sock bpf_tcp_sock(struct bpf_sock sk); int cg_skb_foo(struct __sk_buff skb) { struct bpf_tcp_sock tp; struct bpf_sock sk; __u32 snd_cwnd; sk = skb->sk; if (!sk) return 1; tp = bpf_tcp_sock(sk); if (!tp) return 1; snd_cwnd = tp->snd_cwnd; / ... */ return 1; } A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide read-only access. bpf_tcp_sock has all the existing tcp_sock's fields that has already been exposed by the bpf_sock_ops. i.e. no new tcp_sock's fields are exposed in bpf.h. This helper returns a pointer to the tcp_sock. If it is not a tcp_sock or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it returns NULL. Hence, the caller needs to check for NULL before accessing it. The current use case is to expose members from tcp_sock to allow a cg_skb_bpf_prog to provide per cgroup traffic policing/shaping. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	9b1f3d6e5a	bpf: Refactor sock_ops_convert_ctx_access The next patch will introduce a new "struct bpf_tcp_sock" which exposes the same tcp_sock's fields already exposed in "struct bpf_sock_ops". This patch refactor the existing convert_ctx_access() codes for "struct bpf_sock_ops" to get them ready to be reused for "struct bpf_tcp_sock". The "rtt_min" is not refactored in this patch because its handling is different from other fields. The SOCK_OPS_GET_TCP_SOCK_FIELD is new. All other SOCK_OPS_XXX_FIELD changes are code move only. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	aa65d6960a	bpf: Add state, dst_ip4, dst_ip6 and dst_port to bpf_sock This patch adds "state", "dst_ip4", "dst_ip6" and "dst_port" to the bpf_sock. The userspace has already been using "state", e.g. inet_diag (ss -t) and getsockopt(TCP_INFO). This patch also allows narrow load on the following existing fields: "family", "type", "protocol" and "src_port". Unlike IP address, the load offset is resticted to the first byte for them but it can be relaxed later if there is a use case. This patch also folds __sock_filter_check_size() into bpf_sock_is_valid_access() since it is not called by any where else. All bpf_sock checking is in one place. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	46f8bc9275	bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)" before accessing the fields in sock. For example, in __netdev_pick_tx: static u16 __netdev_pick_tx(struct net_device dev, struct sk_buff skb, struct net_device sb_dev) { / ... / struct sock sk = skb->sk; if (queue_index != new_index && sk && sk_fullsock(sk) && rcu_access_pointer(sk->sk_dst_cache)) sk_tx_queue_set(sk, new_index); /* ... / return queue_index; } This patch adds a "struct bpf_sock sk" pointer to the "struct __sk_buff" where a few of the convert_ctx_access() in filter.c has already been accessing the skb->sk sock_common's fields, e.g. sock_ops_convert_ctx_access(). "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier. Some of the fileds in "bpf_sock" will not be directly accessible through the "__sk_buff->sk" pointer. It is limited by the new "bpf_sock_common_is_valid_access()". e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock are not allowed. The newly added "struct bpf_sock bpf_sk_fullsock(struct bpf_sock sk)" can be used to get a sk with all accessible fields in "bpf_sock". This helper is added to both cg_skb and sched_(cls\|act). int cg_skb_foo(struct __sk_buff skb) { struct bpf_sock sk; sk = skb->sk; if (!sk) return 1; sk = bpf_sk_fullsock(sk); if (!sk) return 1; if (sk->family != AF_INET6 \|\| sk->protocol != IPPROTO_TCP) return 1; /* some_traffic_shaping(); */ return 1; } (1) The sk is read only (2) There is no new "struct bpf_sock_common" introduced. (3) Future kernel sock's members could be added to bpf_sock only instead of repeatedly adding at multiple places like currently in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc. (4) After "sk = skb->sk", the reg holding sk is in type PTR_TO_SOCK_COMMON_OR_NULL. (5) After bpf_sk_fullsock(), the return type will be in type PTR_TO_SOCKET_OR_NULL which is the same as the return type of bpf_sk_lookup_xxx(). However, bpf_sk_fullsock() does not take refcnt. The acquire_reference_state() is only depending on the return type now. To avoid it, a new is_acquire_function() is checked before calling acquire_reference_state(). (6) The WARN_ON in "release_reference_state()" is no longer an internal verifier bug. When reg->id is not found in state->refs[], it means the bpf_prog does something wrong like "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has never been acquired by calling "bpf_sk_fullsock(skb->sk)". A -EINVAL and a verbose are done instead of WARN_ON. A test is added to the test_verifier in a later patch. Since the WARN_ON in "release_reference_state()" is no longer needed, "__release_reference_state()" is folded into "release_reference_state()" also. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:46:17 -08:00
Martin KaFai Lau	5f4566498d	bpf: Fix narrow load on a bpf_sock returned from sk_lookup() By adding this test to test_verifier: { "reference tracking: access sk->src_ip4 (narrow load)", .insns = { BPF_SK_LOOKUP, BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2), BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), BPF_EMIT_CALL(BPF_FUNC_sk_release), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, The above test loads 2 bytes from sk->src_ip4 where sk is obtained by bpf_sk_lookup_tcp(). It hits an internal verifier error from convert_ctx_accesses(): [root@arch-fb-vm1 bpf]# ./test_verifier 665 665 Failed to load prog 'Invalid argument'! 0: (b7) r2 = 0 1: (63) (u32 )(r10 -8) = r2 2: (7b) (u64 )(r10 -16) = r2 3: (7b) (u64 )(r10 -24) = r2 4: (7b) (u64 )(r10 -32) = r2 5: (7b) (u64 )(r10 -40) = r2 6: (7b) (u64 )(r10 -48) = r2 7: (bf) r2 = r10 8: (07) r2 += -48 9: (b7) r3 = 36 10: (b7) r4 = 0 11: (b7) r5 = 0 12: (85) call bpf_sk_lookup_tcp#84 13: (bf) r6 = r0 14: (15) if r0 == 0x0 goto pc+3 R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1 15: (69) r2 = (u16 )(r0 +26) 16: (bf) r1 = r6 17: (85) call bpf_sk_release#86 18: (95) exit from 14 to 18: safe processed 20 insns (limit 131072), stack depth 48 bpf verifier is misconfigured Summary: 0 PASSED, 0 SKIPPED, 1 FAILED The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly loaded (meaning load any 1 or 2 bytes of the src_ip4) by marking info->ctx_field_size. However, this marked ctx_field_size is not used. This patch fixes it. Due to the recent refactoring in test_verifier, this new test will be added to the bpf-next branch (together with the bpf_tcp_sock patchset) to avoid merge conflict. Fixes: `c64b798328` ("bpf: Add PTR_TO_SOCKET verifier type") Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-10 19:37:41 -08:00
Alexei Starovoitov	28bbfc3a25	Merge branch 'btf-api-extensions' Andrii Nakryiko says: ==================== This patchset introduces a set of new APIs that make it possible to work with BTF more effectively (and without involving kernel) for applications like pahole that need to manipulate .BTF and .BTF.ext data. Patch #1 changes existing btf__new() API call to only load and initialize struct btf, while exposing new btf__load() API to attempt to load and validate BTF in kernel. Patch #2 adds btf__get_raw_data() API allowing to get access to raw BTF data from struct btf. Patch #3 adds similar btf_ext__get_raw_data() API for working with struct btf_ext. Patch #4 removes not-yet-stable btf__get_strings() API which was added to be able to test contents of struct btf for btf__dedup(). It's now superseded by raw APIs. v3->v4: - formatting fixes - renamed btf_ext functions/structs to use "setup" language instead of "copy" - removed btf__get_strings from libbpf.map v2->v3: - const void* variants of btf__get_raw_data() - added btf_ext__get_raw_data() - removed btf__get_strings() and adapted test_btf.c to use btf__get_raw_data() v1->v2: - btf_load() returns just error, not fd - fix ordering in libbpf.map ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-08 12:04:14 -08:00
Andrii Nakryiko	49b57e0d01	tools/bpf: remove btf__get_strings() superseded by raw data API Now that we have btf__get_raw_data() it's trivial for tests to iterate over all strings for testing purposes, which eliminates the need for btf__get_strings() API. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-08 12:04:13 -08:00
Andrii Nakryiko	ae4ab4b411	btf: expose API to work with raw btf_ext data This patch changes struct btf_ext to retain original data in sequential block of memory, which makes it possible to expose btf_ext__get_raw_data() interface similar to btf__get_raw_data(), allowing users of libbpf to get access to raw representation of .BTF.ext section. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-08 12:04:13 -08:00
Andrii Nakryiko	02c874460f	btf: expose API to work with raw btf data This patch exposes new API btf__get_raw_data() that allows to get a copy of raw BTF data out of struct btf. This is useful for external programs that need to manipulate raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info and then writing it back to file. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-08 12:04:13 -08:00
Andrii Nakryiko	d29d87f7e6	btf: separate btf creation and loading This change splits out previous btf__new functionality of constructing struct btf and loading it into kernel into two: - btf__new() just creates and initializes struct btf - btf__load() attempts to load existing struct btf into kernel btf__free will still close BTF fd, if it was ever loaded successfully into kernel. This change allows users of libbpf to manipulate BTF using its API, without the need to unnecessarily load it into kernel. One of the intended use cases is pahole, which will do DWARF to BTF conversion and then use libbpf to do type deduplication, while then handling ELF sections overwriting and other concerns on its own. Fixes: `2d3feca8c4` ("bpf: btf: print map dump and lookup with btf info") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-08 12:04:13 -08:00
Yonghong Song	a4021a3579	tools/bpf: add log_level to bpf_load_program_attr The kernel verifier has three levels of logs: 0: no logs 1: logs mostly useful > 1: verbose Current libbpf API functions bpf_load_program_xattr() and bpf_load_program() cannot specify log_level. The bcc, however, provides an interface for user to specify log_level 2 for verbose output. This patch added log_level into structure bpf_load_program_attr, so users, including bcc, can use bpf_load_program_xattr() to change log_level. The supported log_level is 0, 1, and 2. The bpf selftest test_sock.c is modified to enable log_level = 2. If the "verbose" in test_sock.c is changed to true, the test will output logs like below: $ ./test_sock func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 0: (bf) r6 = r1 1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 1: (61) r7 = (u32 )(r6 +28) invalid bpf_context access off=28 size=4 Test case: bind4 load with invalid access: src_ip6 .. [PASS] ... Test case: bind6 allow all .. [PASS] Summary: 16 PASSED, 0 FAILED Some test_sock tests are negative tests and verbose verifier log will be printed out as shown in the above. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-07 18:22:31 -08:00
Andrii Nakryiko	62b8cea62e	tools/bpf: add missing strings.h include Few files in libbpf are using bzero() function (defined in strings.h header), but don't include corresponding header. When libbpf is added as a dependency to pahole, this undeterministically causes warnings on some machines: bpf.c:225:2: warning: implicit declaration of function 'bzero' [-Wimplicit-function-declaration] bzero(&attr, sizeof(attr)); ^~~~~ Signed-off-by: Andrii Nakryiko <andriin@fb.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-02-07 18:18:42 -08:00
Moritz Fischer	71bd106d25	net: fixed-phy: Add fixed_phy_register_with_gpiod() API Add fixed_phy_register_with_gpiod() API. It lets users create a fixed_phy instance that uses a GPIO descriptor which was obtained externally e.g. through platform data. This enables platform devices (non-DT based) to use GPIOs for link status. Signed-off-by: Moritz Fischer <mdf@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:11:58 -08:00
David S. Miller	a4751093a2	Merge branch 'Add-comphy-support-for-Armada-38x' Russell King says: ==================== Add comphy support for Armada 38x This series adds support for the comphy for Armada 38x, which allows these SoCs to use 2500BASE-X mode with appropriate SFP modules. Tested on SolidRun Clearfog after updating for the 5.0 merge window changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:26 -08:00
Russell King	f548ced15f	ARM: dts: clearfog: add comphy settings for Ethernet interfaces Add the comphy settings for the Ethernet interfaces. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:26 -08:00
Russell King	a10c1c8191	net: marvell: neta: add comphy support Add support for the common phy binding, so that we can reconfigure the comphy according to the desired ethernet speed. This will allow us to support 1000base-X and 2500base-X SFPs dynamically on SolidRun Clearfog. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:26 -08:00
Russell King	4ca124f4d9	dt-bindings: net: mvneta: add phys property Add an optional phys property to the mvneta binding documentation for the common phy. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:26 -08:00
Russell King	f3a6a9f370	ARM: dts: add description for Armada 38x common phy Add the DT description for the Armada 38x common phy. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:25 -08:00
Russell King	14dc100b44	phy: armada38x: add common phy support Add support for the Armada 38x common phy to allow us to change the speed of the Ethernet serdes lane. This driver only supports manipulation of the speed, it does not support configuration of the common phy. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:25 -08:00
Russell King	120382714c	dt-bindings: phy: Armada 38x common phy bindings Add the Marvell Armada 38x common phy bindings. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:10:25 -08:00
David S. Miller	f06f095f32	Merge branch 'smc-next' Ursula Braun says: ==================== net/smc: patches 2019-02-07 here are patches for SMC: * patches 1, 3, and 6 are cleanups without functional change * patch 2 postpones closing of internal clcsock * patches 4 and 5 improve link group creation locking * patch 7 restores AF_SMC as diag_family field ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:19 -08:00
Karsten Graul	232dc8ef64	net/smc: original socket family in inet_sock_diag Commit `ed75986f4a` ("net/smc: ipv6 support for smc_diag.c") changed the value of the diag_family field. The idea was to indicate the family of the IP address in the inet_diag_sockid field. But the change makes it impossible to distinguish an inet_sock_diag response message from SMC sock_diag response. This patch restores the original behaviour and sends AF_SMC as value of the diag_family field. Fixes: `ed75986f4a` ("net/smc: ipv6 support for smc_diag.c") Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:19 -08:00
Karsten Graul	8fc002b01a	net/smc: move code to clear the conn->lgr field The lgr field of an smc_connection is set in smc_conn_create() and should be cleared in smc_conn_free() for consistency reasons, so move the responsible code. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:19 -08:00
Hans Wippel	72a36a8aec	net/smc: use client and server LGR pending locks for SMC-R If SMC client and server connections are both established at the same time, smc_connect_rdma() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-R, there are two types of LGRs (client and server LGRs) which can be protected by separate locks. So, this patch splits the LGR pending lock into two separate locks for client and server to avoid the locking issue for SMC-R. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:18 -08:00
Hans Wippel	62c7139f3e	net/smc: unlock LGR pending lock earlier for SMC-D If SMC client and server connections are both established at the same time, smc_connect_ism() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-D, the LGR pending lock is not needed while smc_listen_work() is waiting for the CLC confirm message. So, this patch releases the lock earlier for SMC-D to avoid the locking issue. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:18 -08:00
Ursula Braun	a225d2cd88	net/smc: use smc_curs_copy() for SMC-D SMC already provides a wrapper for atomic64 calls to be architecture independent. Use this wrapper for SMC-D as well. Reported-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:18 -08:00
Ursula Braun	b03faa1faf	net/smc: postpone release of clcsock According to RFC7609 (http://www.rfc-editor.org/info/rfc7609) first the SMC-R connection is shut down and then the normal TCP connection FIN processing drives cleanup of the internal TCP connection. The unconditional release of the clcsock during active socket closing has to be postponed if the peer has not yet signalled socket closing. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:18 -08:00
Ursula Braun	41c80be24b	s390/net: move pnet constants There is no need to define these PNETID related constants in the pnet.h file, since they are just used locally within pnet.c. Just code cleanup, no functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 18:06:18 -08:00
Petr Machata	fc4aa1ca16	net: vxlan: Free a leaked vetoed multicast rdst When an rdst is rejected by a driver, the current code removes it from the remote list, but neglects to free it. This is triggered by tools/testing/selftests/drivers/net/mlxsw/vxlan_fdb_veto.sh and shows as the following kmemleak trace: unreferenced object 0xffff88817fa3d888 (size 96): comm "softirq", pid 0, jiffies 4372702718 (age 165.252s) hex dump (first 32 bytes): 02 00 00 00 c6 33 64 03 80 f5 a2 61 81 88 ff ff .....3d....a.... 06 df 71 ae ff ff ff ff 0c 00 00 00 04 d2 6a 6b ..q...........jk backtrace: [<00000000296b27ac>] kmem_cache_alloc_trace+0x1ae/0x370 [<0000000075c86dc6>] vxlan_fdb_append.part.12+0x62/0x3b0 [vxlan] [<00000000e0414b63>] vxlan_fdb_update+0xc61/0x1020 [vxlan] [<00000000f330c4bd>] vxlan_fdb_add+0x2e8/0x3d0 [vxlan] [<0000000008f81c2c>] rtnl_fdb_add+0x4c2/0xa10 [<00000000bdc4b270>] rtnetlink_rcv_msg+0x6dd/0x970 [<000000006701f2ce>] netlink_rcv_skb+0x290/0x410 [<00000000c08a5487>] rtnetlink_rcv+0x15/0x20 [<00000000d5f54b1e>] netlink_unicast+0x43f/0x5e0 [<00000000db4336bb>] netlink_sendmsg+0x789/0xcd0 [<00000000e1ee26b6>] sock_sendmsg+0xba/0x100 [<00000000ba409802>] ___sys_sendmsg+0x631/0x960 [<000000003c332113>] __sys_sendmsg+0xea/0x180 [<00000000f4139144>] __x64_sys_sendmsg+0x78/0xb0 [<000000006d1ddc59>] do_syscall_64+0x94/0x410 [<00000000c8defa9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe Move vxlan_dst_free() up and schedule a call thereof to plug this leak. Fixes: `61f46fe8c6` ("vxlan: Allow vetoing of FDB notifications") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 11:17:08 -08:00
David S. Miller	0739d24d0c	Merge branch 'devlink-health' Eran Ben Elisha says: ==================== Devlink health reporting and recovery system The health mechanism is targeted for Real Time Alerting, in order to know when something bad had happened to a PCI device - Provide alert debug information - Self healing - If problem needs vendor support, provide a way to gather all needed debugging information. The main idea is to unify and centralize driver health reports in the generic devlink instance and allow the user to set different attributes of the health reporting and recovery procedures. The devlink health reporter: Device driver creates a "health reporter" per each error/health type. Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) or unknown (driver specific). For each registered health reporter a driver can issue error/health reports asynchronously. All health reports handling is done by devlink. Device driver can provide specific callbacks for each "health reporter", e.g. - Recovery procedures - Diagnostics and object dump procedures - OOB initial attributes Different parts of the driver can register different types of health reporters with different handlers. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and saved at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto-recovery configuration - Grace period vs. time passed since last recover The user interface: User can access/change each reporter attributes and driver specific callbacks via devlink, e.g per error type (per health reporter) - Configure reporter's generic attributes (like: Disable/enable auto recovery) - Invoke recovery procedure - Run diagnostics - Object dump The devlink health interface (via netlink): DEVLINK_CMD_HEALTH_REPORTER_GET Retrieves status and configuration info per DEV and reporter. DEVLINK_CMD_HEALTH_REPORTER_SET Allows reporter-related configuration setting. DEVLINK_CMD_HEALTH_REPORTER_RECOVER Triggers a reporter's recovery procedure. DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE Retrieves diagnostics data from a reporter on a device. DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET Retrieves the last stored dump. Devlink health saves a single dump. If an dump is not already stored by the devlink for this reporter, devlink generates a new dump. dump output is defined by the reporter. DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR Clears the last saved dump file for the specified reporter. netlink +--------------------------+ \| \| \| + \| \| \| \| +--------------------------+ \|request for ops \|(diagnose, mlx5_core devlink \|recover, \|dump) +--------+ +--------------------------+ \| \| \| reporter\| \| \| \| \| +---------v----------+ \| \| \| ops execution \| \| \| \| \| <----------------------------------+ \| \| \| \| \| \| \| \| \| \| \| + ^------------------+ \| \| \| \| \| request for ops \| \| \| \| \| (recover, dump) \| \| \| \| \| \| \| \| \| +-+------------------+ \| \| \| health report \| \| health handler \| \| \| +-------------------------------> \| \| \| \| \| +--------------------+ \| \| \| health reporter create \| \| \| +----------------------------> \| +--------+ +--------------------------+ In this patchset, mlx5e TX reporter is implemented. Cmdline format: devlink health show [DEV reporter REPORTE_NAME] devlink health recover DEV reporter REPORTER_NAME devlink health diagnose DEV reporter REPORTER_NAME devlink health dump show DEV reporter REPORTER_NAME devlink health dump clear DEV reporter REPORTER_NAME devlink health set DEV reporter REPORTER_NAME NAME VALUE Cmdline examples: $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health set pci/0000:00:09.0 reporter tx grace_period 3500 $devlink health set pci/0000:00:09.0 reporter tx auto_recover false Changelog: v4: - Rebase on latest net-next - Remove trace_devlink_health signature exposure in case CONFIG_NET_DEVLINK is not defined as it shall only be used from devlink. v3: - Redesign of devlink <-> driver fmsg API - Various bug fixes v2: - Remove FW* reporters to decrease the amount of patches in the patchset ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:39 -08:00
Aya Levin	db2ab7a08f	devlink: Add Documentation/networking/devlink-health.txt This patch adds a new file to add information about devlink health mechanism. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:29 -08:00
Eran Ben Elisha	7d91126b1a	net/mlx5e: Add tx timeout support for mlx5e tx reporter With this patch, ndo_tx_timeout callback will be redirected to the tx reporter in order to detect a tx timeout error and report it to the devlink health. (The watchdog detects tx timeouts, but the driver verify the issue still exists before launching any recover method). In addition, recover from tx timeout in case of lost interrupt was added to the tx reporter recover method. The tx timeout recover from lost interrupt is not a new feature in the driver, this patch re-organize the functionality and move it to the tx reporter recovery flow. tx timeout example: (with auto_recover set to false, if set to true, the manual recover and diagnose sections are irrelevant) $cat /sys/kernel/debug/tracing/trace ... devlink_health_report: bus_name=pci dev_name=0000:00:09.0 driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a, CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000 $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": true },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: true sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 1 last_dump_ts N/A parameters: grace_period 500 auto_recover false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:29 -08:00
Eran Ben Elisha	de8650a820	net/mlx5e: Add tx reporter support Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of tx errors. This patch declares the TX reporter operations and creates it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full tx recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Diagnose output: $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:29 -08:00
Eran Ben Elisha	35455e23e6	devlink: Add health dump {get,clear} commands Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:29 -08:00
Eran Ben Elisha	fca42a2794	devlink: Add health diagnose command Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:28 -08:00
Eran Ben Elisha	20a0943a5b	devlink: Add health recover command Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:28 -08:00
Eran Ben Elisha	a1e55ec0a0	devlink: Add health set command Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:28 -08:00
Eran Ben Elisha	7afe335a8b	devlink: Add health get command Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 10:34:28 -08:00

1 2 3 4 5 ...

812488 Commits