linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 12:28:41 +08:00

Author	SHA1	Message	Date
Andrii Nakryiko	7ca6112159	libbpf: Add API that copies all BTF types from one BTF object to another Add a bulk copying api, btf__add_btf(), that speeds up and simplifies appending entire contents of one BTF object to another one, taking care of copying BTF type data, adjusting resulting BTF type IDs according to their new locations in the destination BTF object, as well as copying and deduplicating all the referenced strings and updating all the string offsets in new BTF types as appropriate. This API is intended to be used from tools that are generating and otherwise manipulating BTFs generically, such as pahole. In pahole's case, this API is useful for speeding up parallelized BTF encoding, as it allows pahole to offload all the intricacies of BTF type copying to libbpf and handle the parallelization aspects of the process. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/bpf/20211006051107.17921-2-andrii@kernel.org	2021-10-06 15:35:46 +02:00
Jie Meng	57a610f1c5	bpf, x64: Save bytes for DIV by reducing reg copies Instead of unconditionally performing push/pop on %rax/%rdx in case of division/modulo, we can save a few bytes in case of destination register being either BPF r0 (%rax) or r3 (%rdx) since the result is written in there anyway. Also, we do not need to copy the source to %r11 unless the source is either %rax, %rdx or an immediate. For example, before the patch: 22: push %rax 23: push %rdx 24: mov %rsi,%r11 27: xor %edx,%edx 29: div %r11 2c: mov %rax,%r11 2f: pop %rdx 30: pop %rax 31: mov %r11,%rax After: 22: push %rdx 23: xor %edx,%edx 25: div %rsi 28: pop %rdx Signed-off-by: Jie Meng <jmeng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211002035626.2041910-1-jmeng@fb.com	2021-10-06 15:24:36 +02:00
Andrey Ignatov	0640c77c46	bpf: Avoid retpoline for bpf_for_each_map_elem Similarly to `09772d92cd` ("bpf: avoid retpoline for lookup/update/delete calls on maps") and `84430d4232` ("bpf, verifier: avoid retpoline for map push/pop/peek operation") avoid indirect call while calling bpf_for_each_map_elem. Before (a program fragment): ; if (rules_map) { 142: (15) if r4 == 0x0 goto pc+8 143: (bf) r3 = r10 ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0); 144: (07) r3 += -24 145: (bf) r1 = r4 146: (18) r2 = subprog[+5] 148: (b7) r4 = 0 149: (85) call bpf_for_each_map_elem#143680 <-- indirect call via helper After (same program fragment): ; if (rules_map) { 142: (15) if r4 == 0x0 goto pc+8 143: (bf) r3 = r10 ; bpf_for_each_map_elem(rules_map, process_each_rule, &ctx, 0); 144: (07) r3 += -24 145: (bf) r1 = r4 146: (18) r2 = subprog[+5] 148: (b7) r4 = 0 149: (85) call bpf_for_each_array_elem#170336 <-- direct call On a benchmark that calls bpf_for_each_map_elem() once and does many other things (mostly checking fields in skb) with CONFIG_RETPOLINE=y it makes program faster. Before: ============================================================================ Benchmark.cpp time/iter iters/s ============================================================================ IngressMatchByRemoteEndpoint 80.78ns 12.38M IngressMatchByRemoteIP 80.66ns 12.40M IngressMatchByRemotePort 80.87ns 12.37M After: ============================================================================ Benchmark.cpp time/iter iters/s ============================================================================ IngressMatchByRemoteEndpoint 73.49ns 13.61M IngressMatchByRemoteIP 71.48ns 13.99M IngressMatchByRemotePort 70.39ns 14.21M Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211006001838.75607-1-rdna@fb.com	2021-10-05 19:22:33 -07:00
Alexei Starovoitov	32a16f6bfe	Merge branch 'Support kernel module function calls from eBPF' Kumar Kartikeya says: ==================== This set enables kernel module function calls, and also modifies verifier logic to permit invalid kernel function calls as long as they are pruned as part of dead code elimination. This is done to provide better runtime portability for BPF objects, which can conditionally disable parts of code that are pruned later by the verifier (e.g. const volatile vars, kconfig options). libbpf modifications are made along with kernel changes to support module function calls. It also converts TCP congestion control objects to use the module kfunc support instead of relying on IS_BUILTIN ifdef. Changelog: ---------- v6 -> v7 v6: https://lore.kernel.org/bpf/20210930062948.1843919-1-memxor@gmail.com * Let __bpf_check_kfunc_call take kfunc_btf_id_list instead of generating callbacks (Andrii) * Rename it to bpf_check_mod_kfunc_call to reflect usage * Remove OOM checks (Alexei) * Remove resolve_btfids invocation for bpf_testmod (Andrii) * Move fd_array_cnt initialization near fd_array alloc (Andrii) * Rename helper to btf_find_by_name_kind and pass start_id (Andrii) * memset when data is NULL in add_data (Alexei) * Fix other nits v5 -> v6 v5: https://lore.kernel.org/bpf/20210927145941.1383001-1-memxor@gmail.com * Rework gen_loader relocation emits * Only emit bpf_btf_find_by_name_kind call when required (Alexei) * Refactor code to emit ksym var and func relo into separate helpers, this will be easier to add future weak/typeless ksym support to (for my followup) * Count references for both ksym var and funcs, and avoid calling helpers unless required for both of them. This also means we share fds between ksym vars for the module BTFs. Also be careful with this when closing BTF fd so that we only close one instance of the fd for each ksym v4 -> v5 v4: https://lore.kernel.org/bpf/20210920141526.3940002-1-memxor@gmail.com * Address comments from Alexei * Use reserved fd_array area in loader map instead of creating a new map * Drop selftest testing the 256 kfunc limit, however selftest testing reuse of BTF fd for same kfunc in gen_loader and libbpf is kept * Address comments from Andrii * Make --no-fail the default for resolve_btfids, i.e. only fail if we find BTF section and cannot process it * Use obj->btf_modules array to store index in the fd_array, so that we don't have to do any searching to reuse the index, instead only set it the first time a module BTF's fd is used * Make find_ksym_btf_id to return struct module_btf * in last parameter * Improve logging when index becomes bigger than INT16_MAX * Add btf__find_by_name_kind_own internal helper to only start searching for kfunc ID in module BTF, since find_ksym_btf_id already checks vmlinux BTF before iterating over module BTFs. * Fix various other nits * Fixes for failing selftests on BPF CI * Rearrange/cleanup selftests * Avoid testing kfunc limit (Alexei) * Do test gen_loader and libbpf BTF fd index dedup with 256 calls * Move invalid kfunc failure test to verifier selftest * Minimize duplication * Use consistent bpf_<type>_check_kfunc_call naming for module kfunc callback * Since we try to add fd using add_data while we can, cherry pick Alexei's patch from CO-RE RFC series to align gen_loader data. v3 -> v4 v3: https://lore.kernel.org/bpf/20210915050943.679062-1-memxor@gmail.com * Address comments from Alexei * Drop MAX_BPF_STACK change, instead move map_fd and BTF fd to BPF array map and pass fd_array using BPF_PSEUDO_MAP_IDX_VALUE * Address comments from Andrii * Fix selftest to store to variable for observing function call instead of printk and polluting CI logs * Drop use of raw_tp for testing, instead reuse classifier based prog_test_run * Drop index + 1 based insn->off convention for kfunc module calls * Expand selftests to cover more corner cases * Misc cleanups v2 -> v3 v2: https://lore.kernel.org/bpf/20210914123750.460750-1-memxor@gmail.com * Fix issues pointed out by Kernel Test Robot * Fix find_kfunc_desc to also take offset into consideration when comparing RFC v1 -> v2 v1: https://lore.kernel.org/bpf/20210830173424.1385796-1-memxor@gmail.com * Address comments from Alexei * Reuse fd_array instead of introducing kfunc_btf_fds array * Take btf and module reference as needed, instead of preloading * Add BTF_KIND_FUNC relocation support to gen_loader infrastructure * Address comments from Andrii * Drop hashmap in libbpf for finding index of existing BTF in fd_array * Preserve invalid kfunc calls only when the symbol is weak * Adjust verifier selftests ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi	c48e51c8b0	bpf: selftests: Add selftests for module kfunc support This adds selftests that tests the success and failure path for modules kfuncs (in presence of invalid kfunc calls) for both libbpf and gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can add module BTF ID set from bpf_testmod. This also introduces a couple of test cases to verifier selftests for validating whether we get an error or not depending on if invalid kfunc call remains after elimination of unreachable instructions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com	2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi	18f4fccbf3	libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations This change updates the BPF syscall loader to relocate BTF_KIND_FUNC relocations, with support for weak kfunc relocations. The general idea is to move map_fds to loader map, and also use the data for storing kfunc BTF fds. Since both reuse the fd_array parameter, they need to be kept together. For map_fds, we reserve MAX_USED_MAPS slots in a region, and for kfunc, we reserve MAX_KFUNC_DESCS. This is done so that insn->off has more chances of being <= INT16_MAX than treating data map as a sparse array and adding fd as needed. When the MAX_KFUNC_DESCS limit is reached, we fall back to the sparse array model, so that as long as it does remain <= INT16_MAX, we pass an index relative to the start of fd_array. We store all ksyms in an array where we try to avoid calling the bpf_btf_find_by_name_kind helper, and also reuse the BTF fd that was already stored. This also speeds up the loading process compared to emitting calls in all cases, in later tests. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-9-memxor@gmail.com	2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi	466b2e1397	libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0 Preserve these calls as it allows verifier to succeed in loading the program if they are determined to be unreachable after dead code elimination during program load. If not, the verifier will fail at runtime. This is done for ext->is_weak symbols similar to the case for variable ksyms. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-8-memxor@gmail.com	2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi	9dbe601563	libbpf: Support kernel module function calls This patch adds libbpf support for kernel module function call support. The fd_array parameter is used during BPF program load to pass module BTFs referenced by the program. insn->off is set to index into this array, but starts from 1, because insn->off as 0 is reserved for btf_vmlinux. We try to use existing insn->off for a module, since the kernel limits the maximum distinct module BTFs for kfuncs to 256, and also because index must never exceed the maximum allowed value that can fit in insn->off (INT16_MAX). In the future, if kernel interprets signed offset as unsigned for kfunc calls, this limit can be increased to UINT16_MAX. Also introduce a btf__find_by_name_kind_own helper to start searching from module BTF's start id when we know that the BTF ID is not present in vmlinux BTF (in find_ksym_btf_id). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-7-memxor@gmail.com	2021-10-05 17:07:42 -07:00
Kumar Kartikeya Dwivedi	0e32dfc80b	bpf: Enable TCP congestion control kfunc from modules This commit moves BTF ID lookup into the newly added registration helper, in a way that the bbr, cubic, and dctcp implementation set up their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not dependent on modules are looked up from the wrapper function. This lifts the restriction for them to be compiled as built in objects, and can be loaded as modules if required. Also modify Makefile.modfinal to call resolve_btfids for each module. Note that since kernel kfunc_ids never overlap with module kfunc_ids, we only match the owner for module btf id sets. See following commits for background on use of: CONFIG_X86 ifdef: `569c484f99` (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86) CONFIG_DYNAMIC_FTRACE ifdef: `7aae231ac9` (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com	2021-10-05 17:07:41 -07:00
Kumar Kartikeya Dwivedi	f614f2c755	tools: Allow specifying base BTF file in resolve_btfids This commit allows specifying the base BTF for resolving btf id lists/sets during link time in the resolve_btfids tool. The base BTF is set to NULL if no path is passed. This allows resolving BTF ids for module kernel objects. Also, drop the --no-fail option, as it is only used in case .BTF_ids section is not present, instead make no-fail the default mode. The long option name is same as that of pahole. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-5-memxor@gmail.com	2021-10-05 17:07:41 -07:00
Kumar Kartikeya Dwivedi	14f267d95f	bpf: btf: Introduce helpers for dynamic BTF set registration This adds helpers for registering btf_id_set from modules and the bpf_check_mod_kfunc_call callback that can be used to look them up. With in kernel sets, the way this is supposed to work is, in kernel callback looks up within the in-kernel kfunc whitelist, and then defers to the dynamic BTF set lookup if it doesn't find the BTF id. If there is no in-kernel BTF id set, this callback can be used directly. Also fix includes for btf.h and bpfptr.h so that they can included in isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic and tcp_dctcp modules in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-4-memxor@gmail.com	2021-10-05 17:07:41 -07:00
Kumar Kartikeya Dwivedi	a5d8272752	bpf: Be conservative while processing invalid kfunc calls This patch also modifies the BPF verifier to only return error for invalid kfunc calls specially marked by userspace (with insn->imm == 0, insn->off == 0) after the verifier has eliminated dead instructions. This can be handled in the fixup stage, and skip processing during add and check stages. If such an invalid call is dropped, the fixup stage will not encounter insn->imm as 0, otherwise it bails out and returns an error. This will be exposed as weak ksym support in libbpf in later patches. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-3-memxor@gmail.com	2021-10-05 17:07:41 -07:00
Kumar Kartikeya Dwivedi	2357672c54	bpf: Introduce BPF support for kernel module function calls This change adds support on the kernel side to allow for BPF programs to call kernel module functions. Userspace will prepare an array of module BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter. In the kernel, the module BTFs are placed in the auxilliary struct for bpf_prog, and loaded as needed. The verifier then uses insn->off to index into the fd_array. insn->off 0 is reserved for vmlinux BTF (for backwards compat), so userspace must use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is sorted based on offset in an array, and each offset corresponds to one descriptor, with a max limit up to 256 such module BTFs. We also change existing kfunc_tab to distinguish each element based on imm, off pair as each such call will now be distinct. Another change is to check_kfunc_call callback, which now include a struct module * pointer, this is to be used in later patch such that the kfunc_id and module pointer are matched for dynamically registered BTF sets from loadable modules, so that same kfunc_id in two modules doesn't lead to check_kfunc_call succeeding. For the duration of the check_kfunc_call, the reference to struct module exists, as it returns the pointer stored in kfunc_btf_tab. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com	2021-10-05 17:07:41 -07:00
Jakub Kicinski	d0f1c248b4	bluetooth-next pull request for net-next: - Add support for MediaTek MT7922 and MT7921 - Enable support for AOSP extention in Qualcomm WCN399x and Realtek 8822C/8852A. - Add initial support for link quality and audio/codec offload. - Rework of sockets sendmsg to avoid locking issues. - Add vhci suspend/resume emulation. -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmFXkGYZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKVQdD/9dtVeMRjzAQbvPI5InAi4N AjEy7IUAV27AE0QVZW/Q201BNnFixAivDWiQXHFiTV1ocrmX/qiW15AsKOTlpVKx BAzy97KVLPHNoNBN3XV9PwP8OovX7zkWTL/XPHzg1lIAfniWiInuQDrlU/F3TKOO 2yJOFTy3x9RrwZcXfZUHltBo6766SC40zW4H+3WA42jljOPKXR1jH6lSIzSezBFt qsaw/CS/aW1Z8JAA8fhZurCmoljHMgRNOsnh8AfHPCYsUZSsw9ZE6wMrDUvjXBtm Zp9pI+h3mwc9tW/BGSZSpcktUDdXlxo9cXSCrXtxHjmrUxAsNGtHmXE7adogWSHs PaXfst8qFdsqp+kjrx+ZbMksfhhq2/ysYNIFWvSGim3VBBw5x74tu/VebSw19PjC 1ZWzOt/5J5WCBD6BTGApCQg+YTg1u9koPRG441ZwIJ82eYgIQx80Y78uj/tG7mUH HC3GMxEwgQzYnQsDfDI936umNAgcdGw0DL7Tu71z4zM+Tn0WcvcQgHNk42zN5gZI XMeulxdj++pUoBYweDhlgJ88lr+gTeVwvfrYGWHpNvKsfmEcQJoqBvxTS1OJvMM5 WfTwJgJCG+o4Jfj4oH2haUeMEUFQXikV0C7Dlw6MmhmcLmzhpbaMRdLQvcDocbTA hxhxSYhK6LQTRWv2h71MMw== =QRkQ -----END PGP SIGNATURE----- Merge tag 'for-net-next-2021-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for MediaTek MT7922 and MT7921 - Enable support for AOSP extention in Qualcomm WCN399x and Realtek 8822C/8852A. - Add initial support for link quality and audio/codec offload. - Rework of sockets sendmsg to avoid locking issues. - Add vhci suspend/resume emulation. ==================== Link: https://lore.kernel.org/r/20211001230850.3635543-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-05 07:41:16 -07:00
Jakub Kicinski	49ed8dde37	net: usb: use eth_hw_addr_set() for dev->addr_len cases Convert usb drivers from memcpy(... dev->addr_len) to eth_hw_addr_set(): @@ expression dev, np; @@ - memcpy(dev->dev_addr, np, dev->addr_len) + eth_hw_addr_set(dev, np) Manually checked these are either usbnet or pure etherdevs. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:16:48 +01:00
Jakub Kicinski	a05e4c0af4	ethernet: use eth_hw_addr_set() for dev->addr_len cases Convert all Ethernet drivers from memcpy(... dev->addr_len) to eth_hw_addr_set(): @@ expression dev, np; @@ - memcpy(dev->dev_addr, np, dev->addr_len) + eth_hw_addr_set(dev, np) In theory addr_len may not be ETH_ALEN, but we don't expect non-Ethernet devices to live under this directory, and only the following cases of setting addr_len exist: - cxgb4 for mgmt device, and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:16:48 +01:00
David S. Miller	5e8fba848e	Merge branch 'mlx4-const-dev_addr' Jakub Kicinski says: ==================== mlx4: prep for constant dev->dev_addr This patch converts mlx4 for dev->dev_addr being const. It converts to use of common helpers but also removes some seemingly unnecessary idiosyncrasies. Please review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:15:35 +01:00
Jakub Kicinski	ebb1fdb589	mlx4: constify args for const dev_addr netdev->dev_addr will become const soon. Make sure all functions which pass it around mark appropriate args as const. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:15:35 +01:00
Jakub Kicinski	e04ffd120f	mlx4: remove custom dev_addr clearing mlx4_en_u64_to_mac() takes the dev->dev_addr pointer and writes to it byte by byte. It also clears the two bytes _after_ ETH_ALEN which seems unnecessary. dev->addr_len is set to ETH_ALEN just before the call. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:15:35 +01:00
Jakub Kicinski	1bb96a07f9	mlx4: replace mlx4_u64_to_mac() with u64_to_ether_addr() mlx4_u64_to_mac() predates the common helper but doesn't make the argument constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:15:35 +01:00
Jakub Kicinski	ded6e16b37	mlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64() mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64(). It doesn't make the argument constant so it'll be problematic when dev->dev_addr becomes a const. Convert to the generic helper. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:15:35 +01:00
Florian Westphal	549017aa1b	netlink: remove netlink_broadcast_filtered No users in tree since commit `a3498436b3` ("netns: restrict uevents"), so remove this functionality. Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 13:07:03 +01:00
David S. Miller	95bf387e35	mlx5-updates-2021-10-04 Misc updates for mlx5 driver 1) Add TX max rate support for MQPRIO channel mode 2) Trivial TC action and modify header refactoring 3) TC support for accept action in fdb offloads 4) Allow single IRQ for PCI functions 5) Bridge offload: Pop PVID VLAN header on egress miss Vlad Buslov says: ================= With current architecture of mlx5 bridge offload it is possible for a packet to match in ingress table by source MAC (resulting VLAN header push in case of port with configured PVID) and then miss in egress table when destination MAC is not in FDB. Due to the lack of hardware learning in NICs, this, in turn, results packet going to software data path with PVID VLAN already added by hardware. This doesn't break software bridge since it accepts either untagged packets or packets with any provisioned VLAN on ports with PVID, but can break ingress TC, if affected part of Ethernet header is matched by classifier. Improve compatibility with software TC by restoring the packet header on egress miss. Effectively, this change implements atomicity of mlx5 bridge offload implementation - packet is either modified and redirected to destination port or appears unmodified in software. ================= -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFbpiEACgkQSD+KveBX +j4rjQf/a1UTqBH31Rh3+zr71yAhfsEYHdSogdPe1oo9zA4IvDZ0uwwdBNPNjzYa ZTcDPKmHgbi6UUqokpmWHYDieXNsZz95lPWS0/QcySgnSag9keGpS2I1y9KtvurH MkejWuCUD1UniPPIw02F1AJ3hNOLjDst8gydyt2T52lqxHX9xprcgxAXcUPkGCsW 7jw+g5F6hbahgh1fFdBERqdLmvJiv2i0gmo5XEIYr5lQePqba43B4EQNKZkSQ/91 Gz8537wCHixW4q2e81m60b0olXrG65JTQAj+ckUUR8VbHwxCBbM5jzOZZXw9FXbB hebTL+GflwbmshVWluXLlSKLu2gBaw== =iVdj -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-04 Misc updates for mlx5 driver 1) Add TX max rate support for MQPRIO channel mode 2) Trivial TC action and modify header refactoring 3) TC support for accept action in fdb offloads 4) Allow single IRQ for PCI functions 5) Bridge offload: Pop PVID VLAN header on egress miss Vlad Buslov says: ================= With current architecture of mlx5 bridge offload it is possible for a packet to match in ingress table by source MAC (resulting VLAN header push in case of port with configured PVID) and then miss in egress table when destination MAC is not in FDB. Due to the lack of hardware learning in NICs, this, in turn, results packet going to software data path with PVID VLAN already added by hardware. This doesn't break software bridge since it accepts either untagged packets or packets with any provisioned VLAN on ports with PVID, but can break ingress TC, if affected part of Ethernet header is matched by classifier. Improve compatibility with software TC by restoring the packet header on egress miss. Effectively, this change implements atomicity of mlx5 bridge offload implementation - packet is either modified and redirected to destination port or appears unmodified in software. ================= ================= Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 11:42:38 +01:00
Rafał Miłecki	45c9d96668	net: bgmac: support MDIO described in DT Check ethernet controller DT node for "mdio" subnode and use it with of_mdiobus_register() when present. That allows specifying MDIO and its PHY devices in a standard DT based way. This is required for BCM53573 SoC support. That family is sometimes called Northstar (by marketing?) but is quite different from it. It uses different CPU(s) and many different hw blocks. One of shared blocks in BCM53573 is Ethernet controller. Switch however is not SRAB accessible (as it Northstar) but is MDIO attached. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 11:38:37 +01:00
Rafał Miłecki	b537550918	net: bgmac: improve handling PHY 1. Use info from DT if available It allows describing for example a fixed link. It's more accurate than just guessing there may be one (depending on a chipset). 2. Verify PHY ID before trying to connect PHY PHY addr 0x1e (30) is special in Broadcom routers and means a switch connected as MDIO devices instead of a real PHY. Don't try connecting to it. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 11:38:37 +01:00
Jakub Kicinski	ceca777dab	ethernet: ehea: add missing cast We need to cast the pointer, unlike memcpy() eth_hw_addr_set() does not take void . The driver already casts &port->mac_addr to u8 in other places. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `a96d317fb1` ("ethernet: use eth_hw_addr_set()") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 11:35:56 +01:00
David S. Miller	fb8ece514d	sparc: Fix typo. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-05 11:34:17 +01:00
Shay Drory	f891b7cdbd	net/mlx5: Enable single IRQ for PCI Function Prior to this patch the driver requires two IRQs to function properly, one required IRQ for control and at least one required IRQ for IO. This requirement can be relaxed to one as the driver now allows sharing of IRQs, so control and IO EQs can share the same irq. This is needed for high scale amount of VFs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:57 -07:00
Shay Drory	3663ad34bc	net/mlx5: Shift control IRQ to the last index Control IRQ is the first IRQ vector. This complicates handling of completion irqs as we need to offset them by one. in the next patch, there are scenarios where completion and control EQs will share the same irq. for example: functions with single IRQ. To ease such scenarios, we shift control IRQ to the end of the irq array. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:57 -07:00
Vlad Buslov	575baa92fd	net/mlx5: Bridge, pop VLAN on egress table miss Create lowest priority flow group in egress table with single rule that matches on special reg_c1 value that is set on ingress VLAN push with single action that pops VLAN. The flow destination is skip table that is used to skip any further processing of packet in FDB bridge priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:56 -07:00
Vlad Buslov	5249001d69	net/mlx5: Bridge, mark reg_c1 when pushing VLAN On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which was reserved by one of previous patches in the series). In following patch the reg value is matched on egress miss to restore the packet to its original state by removing the VLAN before passing it to the software data path. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:56 -07:00
Vlad Buslov	64fc4b3589	net/mlx5: Bridge, extract VLAN pop code to dedicated functions Following patches in series need to pop VLAN when packet misses on egress. To reuse existing bridge VLAN pop handling code, extract it to dedicated helpers mlx5_esw_bridge_pkt_reformat_vlan_pop_supported() and mlx5_esw_bridge_pkt_reformat_vlan_pop_create(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:56 -07:00
Vlad Buslov	a1a6e7217e	net/mlx5: Bridge, refactor eswitch instance usage Several functions in bridge.c excessively obtain pointer to parent eswitch instance by dereferencing br_offloads->esw on every usage and following patches in this series add even more usages of eswitch. Introduce local variable 'esw' and use it instead. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:55 -07:00
Vlad Buslov	6ba2e2b33d	net/mlx5e: Support accept action Support TC generic 'accept' action in mlx5 by introducing MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is required because existing 'slow path' flag can be flipped by tunneling subsystem when neighbor changes state. Introduce new helper function mlx5_esw_attr_flags_skip() to check whether attribute flags for 'slow path' or 'accept' action are set and use it in eswitch code instead of direct bit manipulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:54 -07:00
Chris Mi	2f8ec867b6	net/mlx5e: Specify out ifindex when looking up encap route There is a use case that the local and remote VTEPs are in the same host. Currently, the out ifindex is not specified when looking up the encap route for offloads. So in this case, a local route is returned and the route dev is lo. Actual tunnel interface can be created with a parameter "dev" [1], which specifies the physical device to use for tunnel endpoint communication. Pass this parameter to driver when looking up encap route for offloads. So that a unicast route will be returned. [1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789 Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:54 -07:00
Vlad Buslov	3222efd4b3	net/mlx5e: Reserve a value from TC tunnel options mapping Reserve one more value from TC tunnel options range to be used by bridge offload in following patches. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:54 -07:00
Roi Dayan	d4f401d9ab	net/mlx5e: Move parse fdb check into actions_match_supported_fdb() The parse fdb/nic actions funcs parse the actions and then call actions_match_supported() for final check. Move related check in parse_tc_fdb_actions() into actions_match_supported_fdb() for more organized code. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:53 -07:00
Roi Dayan	9c1d3511a2	net/mlx5e: Split actions_match_supported() into a sub function There will probably be more checks, some for nic flows, some for fdb flows and some are shared checks. Split it for fdb and nic to avoid the function getting too big. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:53 -07:00
Roi Dayan	d9581e2fa7	net/mlx5e: Move mod hdr allocation to a single place Move mod hdr allocation chunk from parse_tc_fdb_actions() and parse_tc_nic_actions() to a shared function. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:52 -07:00
Roi Dayan	61c6f0d190	net/mlx5e: TC, Refactor sample offload error flow Refactor sample unoffload to be symmetric to sample offload. Use the existing del_post_rule() to release the post rule. Also mlx5e_tc_sample_unoffload() should not return post_rule which is NULL when post actions are supported. Sample offload works with this NULL because many places of the code use IS_ERR() instead of IS_ERR_OR_NULL() to check rule is valid and when rule is detected as sample offload the code is not using the rule. Let's be persistent and avoid returning NULL anyway and return the pre rule, like in CT case, which is not NULL. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:52 -07:00
Tariq Toukan	80743c4f8d	net/mlx5e: Add TX max rate support for MQPRIO channel mode Add driver max_rate support for the MQPRIO bw_rlimit shaper in channel mode. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:52 -07:00
Tariq Toukan	e0ee689117	net/mlx5e: Specify SQ stats struct for mlx5e_open_txqsq() Let the caller of mlx5e_open_txqsq() directly pass the SQ stats structure pointer. This replaces logic involving the qos_queue_group_id parameter, and helps generalizing its role in the next patch. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-10-04 18:10:51 -07:00
David S. Miller	1660034361	Merge branch 'phy-10g-mode-helper' Russell King says: ==================== Add phylink helper for 10G modes During the last cycle, there was discussion about adding a helper to set the 10G link modes for phylink, which resulted in these two patches introduce such a helper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 13:50:05 +01:00
Russell King (Oracle)	14ad41c74f	net: ethernet: use phylink_set_10g_modes() Update three drivers to use the new phylink_set_10g_modes() helper: Cadence macb, Freescale DPAA2 and Marvell PP2. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 13:50:05 +01:00
Russell King (Oracle)	a2c27a61b4	net: phylink: add phylink_set_10g_modes() helper Add a helper for setting 10Gigabit modes, so we have one central place that sets all appropriate 10G modes for a driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 13:50:05 +01:00
MichelleJin	23b0826048	net: ipv6: fix use after free of struct seg6_pernet_data sdata->tun_src should be freed before sdata is freed because sdata->tun_src is allocated after sdata allocation. So, kfree(sdata) and kfree(rcu_dereference_raw(sdata->tun_src)) are changed code order. Fixes: `f04ed7d277` ("net: ipv6: check return value of rhashtable_init") Signed-off-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 13:40:19 +01:00
David S. Miller	e4addd4ed9	Merge branch 'qed-new-fw' Prabhakar Kushwaha says: ==================== qed: new firmware version 8.59.1.0 support This series integrate new firmware version 8.59.1.0, along with updated HSI (hardware software interface) to use the FW, into the family of qed drivers (fastlinq devices). This FW does not reside in the NVRAM. It needs to be programmed to device during driver load as the part of initialization sequence. Similar to previous FW support series, this FW is tightly linked to software and pf function driver. This means FW release is not backward compatible, and driver should always run with the FW it was designed against. FW binary blob is already submitted & accepted in linux-firmware repo. Patches in the series include: patch 1 - qed: Fix kernel-doc warnings patch 2 - qed: Remove e4_ and _e4 from FW HSI patch 3 - qed: split huge qed_hsi.h header file patch 4-8 - HSI (hardware software interface) changes patch 9 - qed: Add '_GTT' suffix to the IRO RAM macros patch 10 - qed: Update debug related changes patch 11 - qed: rdma: Update TCP silly-window-syndrome timeout patch 12 - qed: Update the TCP active termination 2 MSL timer patch 13 - qed: fix ll2 establishment during load of RDMA driver In addition, this patch series also fixes existing checkpatch warnings and checks which are missing. Changes for v2: - Incorporated Jakub's comments. - New patch introduced to fix all kernel-doc issue in qed driver. - Fixed warning: ‘qed_mfw_ext_20g’ defined but not used. - Fixed warning related to kernel-doc wrt to this series. - Removed inline function declaration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 12:55:49 +01:00
Manish Chopra	17696cada7	qed: fix ll2 establishment during load of RDMA driver If stats ID of a LL2 (light l2) queue exceeds than the total amount of statistics counters, it may cause system crash upon enabling RDMA on all PFs. This patch makes sure that the stats ID of the LL2 queue doesn't exceed the max allowed value. Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 12:55:49 +01:00
Prabhakar Kushwaha	a64aa0a8b9	qed: Update the TCP active termination 2 MSL timer ("TIME_WAIT") Initialize 2 MSL timeout value used for the TCP TIME_WAIT state to non-zero default. This patch also removes magic number from qedi/qedi_main.c. Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Nikolay Assa <nassa@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 12:55:49 +01:00
Nikolay Assa	3a6f5d0cbd	qed: Update TCP silly-window-syndrome timeout for iwarp, scsi Update TCP silly-window-syndrome timeout, for the cases where initiator's small TCP window size prevents FW from transmitting packets on the connection. Timeout causes FW to retransmit window probes if needed, preventing I/O stall if initiator ignores first window probe. Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Nikolay Assa <nassa@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 12:55:49 +01:00

1 2 3 4 5 ...

1044271 Commits