linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-29 14:05:19 +08:00

Author	SHA1	Message	Date
Grygorii Strashko	06095f34f8	net: ethernet: ti: cpsw: fix allmulti cfg in dual_mac mode Now CPSW ALE will set/clean Host port bit in Unregistered Multicast Flood Mask (UNREG_MCAST_FLOOD_MASK) for every VLAN without checking if this port belongs to VLAN or not when ALLMULTI mode flag is set for nedev. This is working in non dual_mac mode, but in dual_mac - it causes enabling/disabling ALLMULTI flag for both ports. Hence fix it by adding additional parameter to cpsw_ale_set_allmulti() to specify ALE port number for which ALLMULTI has to be enabled and check if port belongs to VLAN before modifying UNREG_MCAST_FLOOD_MASK. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:49 -04:00
Grygorii Strashko	91c88659a7	net: ethernet: ti: ale: use define for host port in cpsw_ale_set_allmulti() Use ALE_PORT_HOST define for host port in cpsw_ale_set_allmulti() instead of constants. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:49 -04:00
Grygorii Strashko	af9f4e6a33	net: ethernet: ti: ale: fix mcast super setting Use correct define ALE_SUPER for ALE Multicast Address Table Entry Supervisory Packet (SUPER) bit setting instead of ALE_BLOCKED. No issues were observed till now as it have never been set, but it's going to be used by new CPSW switch driver. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	10ae805477	net: ethernet: ti: cpsw: drop cpsw_tx_packet_submit() Drop unnecessary wrapper function cpsw_tx_packet_submit() which is used only in one place. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	d183a9428d	net: ethernet: ti: cpsw: use devm_alloc_etherdev_mqs() Use devm_alloc_etherdev_mqs() and simplify code. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	56bf8a5df3	net: ethernet: ti: cpsw: drop pinctrl_pm_select_default_state call Drop pinctrl_pm_select_default_state call from probe as default pinctrl state is set by DD core. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	c8fb566875	net: ethernet: ti: cpsw: use local var dev in probe Use local variable struct device *dev in probe to simplify code. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	9763a891a5	net: ethernet: ti: cpsw: update cpsw_split_res() to accept cpsw_common Update cpsw_split_res() to accept struct cpsw_common instead of struct net_device to simplify code. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	16f5416482	net: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config option All TI drivers CPSW/NETCP can't work without ALE, hence simplify build of those drivers by always linking cpsw_ale and drop CONFIG_TI_CPSW_ALE config option. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	99f6297182	net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option Both drivers CPSW and EMAC can't work without CPDMA, hence simplify build of those drivers by always linking davinci_cpdma and drop TI_DAVINCI_CPDMA config option. Note. the davinci_emac driver module was changed to "ti_davinci_emac" to make build work. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
Grygorii Strashko	68cf027f3d	net: ethernet: ti: convert to SPDX license identifiers Replace textual license with SPDX-License-Identifier. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:11:48 -04:00
David S. Miller	84ee91640f	Merge branch 'strict-netlink-validation' Johannes Berg says: ==================== strict netlink validation Here's a respin, with the following changes: * change message when rejecting unknown attribute types (David Ahern) * drop nl80211 patch - I'll apply it separately * remove NL_VALIDATE_POLICY - we have a lot of calls to nla_parse() that really should be without a policy as it has previously been validated - need to find a good way to handle this later * include the correct generic netlink change (d'oh, sorry) ==================== Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:22 -04:00
Johannes Berg	ef6243acb4	genetlink: optionally validate strictly/dumps Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT \| GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:22 -04:00
Johannes Berg	56738f4608	netlink: add strict parsing for future attributes Unfortunately, we cannot add strict parsing for all attributes, as that would break existing userspace. We currently warn about it, but that's about all we can do. For new attributes, however, the story is better: nobody is using them, so we can reject bad sizes. Also, for new attributes, we need not accept them when the policy doesn't declare their usage. David Ahern and I went back and forth on how to best encode this, and the best way we found was to have a "boundary type", from which point on new attributes have all possible validation applied, and NLA_UNSPEC is rejected. As we didn't want to add another argument to all functions that get a netlink policy, the workaround is to encode that boundary in the first entry of the policy array (which is for type 0 and thus probably not really valid anyway). I put it into the validation union for the rare possibility that somebody is actually using attribute 0, which would continue to work fine unless they tried to use the extended validation, which isn't likely. We also didn't find any in-tree users with type 0. The reason for setting the "start strict here" attribute is that we never really need to start strict from 0, which is invalid anyway (or in legacy families where that isn't true, it cannot be set to strict), so we can thus reserve the value 0 for "don't do this check" and don't have to add the tag to all policies right now. Thus, policies can now opt in to this validation, which we should do for all existing policies, at least when adding new attributes. Note that entirely new policies won't need to set it, as the use of that should be using nla_parse()/nlmsg_parse() etc. which anyway do fully strict validation now, regardless of this. So in effect, this patch only covers the "existing command with new attribute" case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:22 -04:00
Johannes Berg	3de6440354	netlink: re-add parse/validate functions in strict mode This re-adds the parse and validate functions like nla_parse() that are now actually strict after the previous rename and were just split out to make sure everything is converted (and if not compilation of the previous patch would fail.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:22 -04:00
Johannes Berg	8cb081746c	netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be everything. The current _strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to _parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:21 -04:00
Johannes Berg	6f455f5f4e	netlink: add NLA_MIN_LEN Rather than using NLA_UNSPEC for this type of thing, use NLA_MIN_LEN so we can make NLA_UNSPEC be NLA_REJECT under certain conditions for future attributes. While at it, also use NLA_EXACT_LEN for the struct example. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:21 -04:00
David S. Miller	f6ad55a6a1	Merge branch 'nla_nest_start' Michal Kubecek says: ==================== make nla_nest_start() add NLA_F_NESTED flag One of the comments in recent review of the ethtool netlink series pointed out that proposed ethnl_nest_start() helper which adds NLA_F_NESTED to second argument of nla_nest_start() is not really specific to ethtool netlink code. That is hard to argue with as closer inspection revealed that exactly the same helper already exists in ipset code (except it's a macro rather than an inline function). Another observation was that even if NLA_F_NESTED flag was introduced in 2007, only few netlink based interfaces set it in kernel generated messages and even many recently added APIs omit it. That is unfortunate as without the flag, message parsers not familiar with attribute semantics cannot recognize nested attributes and do not see message structure; this affects e.g. wireshark dissector or mnl_nlmsg_fprintf() from libmnl. This is why I'm suggesting to rename existing nla_nest_start() to different name (nla_nest_start_noflag) and reintroduce nla_nest_start() as a wrapper adding NLA_F_NESTED flag. This is implemented in first patch which is mostly generated by spatch. Second patch drops ipset helper macros which lose their purpose. Third patch cleans up minor coding style issues found by checkpatch.pl in first patch. We could leave nla_nest_start() untouched and simply add a wrapper adding NLA_F_NESTED but that would probably preserve the state when even most new code doesn't set the flag. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
Michal Kubecek	f78c6032c4	net: fix two coding style issues This is a simple cleanup addressing two coding style issues found by checkpatch.pl in an earlier patch. It's submitted as a separate patch to keep the original patch as it was generated by spatch. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
Michal Kubecek	12ad5f65f0	ipset: drop ipset_nest_start() and ipset_nest_end() After the previous commit, both ipset_nest_start() and ipset_nest_end() are just aliases for nla_nest_start() and nla_nest_end() so that there is no need to keep them. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
Michal Kubecek	ae0be8de9a	netlink: make nla_nest_start() add NLA_F_NESTED flag Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 \| NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
David S. Miller	c7881b4a97	Merge branch 'net-tls-small-code-cleanup' Jakub Kicinski says: ==================== net/tls: small code cleanup This small patch set cleans up tls (mostly offload parts). Other than avoiding unnecessary error messages - no functional changes here. v2 (Saeed): - fix up Review tags; - remove the warning on failure completely. ==================== Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:54 -04:00
Jakub Kicinski	63a1c95f3f	net/tls: byte swap device req TCP seq no upon setting To avoid a sparse warning byteswap the be32 sequence number before it's stored in the atomic value. While at it drop unnecessary brackets and use kernel's u64 type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Jakub Kicinski	da68b4ad02	net/tls: move definition of tls ops into net/tls.h There seems to be no reason for tls_ops to be defined in netdevice.h which is included in a lot of places. Don't wrap the struct/enum declaration in ifdefs, it trickles down unnecessary ifdefs into driver code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Jakub Kicinski	9e9957973c	net/tls: remove old exports of sk_destruct functions tls_device_sk_destruct being set on a socket used to indicate that socket is a kTLS device one. That is no longer true - now we use sk_validate_xmit_skb pointer for that purpose. Remove the export. tls_device_attach() needs to be moved. While at it, remove the dead declaration of tls_sk_destruct(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Jakub Kicinski	e49d268db9	net/tls: don't log errors every time offload can't proceed Currently when CONFIG_TLS_DEVICE is set each time kTLS connection is opened and the offload is not successful (either because the underlying device doesn't support it or e.g. it's tables are full) a rate limited error will be printed to the logs. There is nothing wrong with failing TLS offload. SW path will process the packets just fine, drop the noisy messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Alexei Starovoitov	9076c49bdc	Merge branch 'sk-local-storage' Martin KaFai Lau says: ==================== v4: - Move checks to map_alloc_check in patch 1 (Stanislav Fomichev) - Refactor BTF encoding macros to test_btf.h at a new patch 4 (Stanislav Fomichev) - Refactor getenv and add print PASS message at the end of the test in patch 6 (Yonghong Song) v3: - Replace spinlock_types.h with spinlock.h in patch 1 (kbuild test robot <lkp@intel.com>) v2: - Add the "test_maps.h" file in patch 5 This series introduces the BPF sk local storage. The details is in the patch 1 commit message. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:49 -07:00
Martin KaFai Lau	263d0b3533	bpf: Add ene-to-end test for bpf_sk_storage_* helpers This patch rides on an existing BPF_PROG_TYPE_CGROUP_SKB test (test_sock_fields.c) to do a TCP end-to-end test on the new bpf_sk_storage_* helpers. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	51a0e301a5	bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps This patch adds BPF_MAP_TYPE_SK_STORAGE test to test_maps. The src file is rather long, so it is put into another dir map_tests/ and compile like the current prog_tests/ does. Other existing tests in test_maps can also be re-factored into map_tests/ in the future. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	7a9bb9762d	bpf: Add verifier tests for the bpf_sk_storage This patch adds verifier tests for the bpf_sk_storage: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. Map and helper compatibility (e.g. disallow bpf_map_loookup_elem) It also takes this chance to remove the unused struct btf_raw_data and uses the BTF encoding macros from "test_btf.h". Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	3f4d4c7410	bpf: Refactor BTF encoding macro to test_btf.h Refactor common BTF encoding macros for other tests to use. The libbpf may reuse some of them in the future which requires some more thoughts before publishing as a libbpf API. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	a19f89f366	bpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probing This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE. BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe requires to create and load a BTF also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	948d930e3d	bpf: Sync bpf.h to tools This patch sync the bpf.h to tools/. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:05 -07:00
Martin KaFai Lau	6ac99e8f23	bpf: Introduce bpf sk local storage After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup\|update\|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-27 09:07:04 -07:00
Alexei Starovoitov	3745dc24aa	Merge branch 'writeable-bpf-tracepoints' Matt Mullins says: ==================== This adds an opt-in interface for tracepoints to expose a writable context to BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT programs, as well as from non-BPF-based tracepoints. The initial motivation is to support tracing that can be observed from the remote end of an NBD socket, e.g. by adding flags to the struct nbd_request header. Earlier attempts included adding an NBD-specific tracepoint fd, but in code review, I was recommended to implement it more generically -- as a result, this patchset is far simpler than my initial try. v4->v5: * rebased onto bpf-next/master and fixed merge conflicts * "tools: sync bpf.h" also syncs comments that have previously changed in bpf-next v3->v4: * fixed a silly copy/paste typo in include/trace/events/bpf_test_run.h (_TRACE_NBD_H -> _TRACE_BPF_TEST_RUN_H) * fixed incorrect/misleading wording in patch 1's commit message, since the pointer cannot be directly dereferenced in a BPF_PROG_TYPE_RAW_TRACEPOINT * cleaned up the error message wording if the prog_tests fail * Addressed feedback from Yonghong * reject non-pointer-sized accesses to the buffer pointer * use sizeof(struct nbd_request) as one-byte-past-the-end in raw_tp_writable_reject_nbd_invalid.c * use BPF_MOV64_IMM instead of BPF_LD_IMM64 v2->v3: * Andrew addressed Josef's comments: * C-style commenting in nbd.c * Collapsed identical events into a single DECLARE_EVENT_CLASS. This saves about 2kB of kernel text v1->v2: * add selftests * sync tools/include/uapi/linux/bpf.h * reject variable offset into the buffer * add string representation of PTR_TO_TP_BUFFER to reg_type_str ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:05:44 -07:00
Matt Mullins	e950e84336	selftests: bpf: test writable buffers in raw tps This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Matt Mullins	4635b0ae4d	tools: sync bpf.h This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum] build errors it would otherwise cause in libbpf. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Andrew Hall	2abd2de712	nbd: add tracepoints for send/receive timing This adds four tracepoints to nbd, enabling separate tracing of payload and header sending/receipt. In the send path for headers that have already been sent, we also explicitly initialize the handle so it can be referenced by the later tracepoint. Signed-off-by: Andrew Hall <hall@fb.com> Signed-off-by: Matt Mullins <mmullins@fb.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Matt Mullins	ea106722c7	nbd: trace sending nbd requests This adds a tracepoint that can both observe the nbd request being sent to the server, as well as modify that request , e.g., setting a flag in the request that will cause the server to collect detailed tracing data. The struct request * being handled is included to permit correlation with the block tracepoints. Signed-off-by: Matt Mullins <mmullins@fb.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Matt Mullins	9df1c28bb7	bpf: add writable context for raw tracepoints This is an opt-in interface that allows a tracepoint to provide a safe buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program. The size of the buffer must be a compile-time constant, and is checked before allowing a BPF program to attach to a tracepoint that uses this feature. The pointer to this buffer will be the first argument of tracepoints that opt in; the pointer is valid and can be bpf_probe_read() by both BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a tracepoint, but the buffer to which it points may only be written by the latter. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 19:04:19 -07:00
Daniel Borkmann	34b8ab091f	bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 18:53:40 -07:00
Daniel Borkmann	8968c67a82	bpf, arm64: remove prefetch insn in xadd mapping Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction. Fixes: `85f68fe898` ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-04-26 18:53:15 -07:00
David S. Miller	30e5a9a5ba	Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlzC5YYACgkQB8qZga/f l8QDWg/+N7wm+l7bTMx4hjJzZZ60n9fBvyGJx0gsnPVH8wdOiPoh/epuI04I8I4m pGNbGvPB9Z4z2tD56XsIQnXf88ab3R27bRupSSW1vtzVSbDhg8wQ7jg0nABrdyDS PgoTmDMfVERLewXdntqRANzVYGfoWSOzo1u6A0Xhys8FqxxX/eD+Vdo4dKzmeN47 +LDfuCpInVPn0TOpFp5IJ4+B4a0dhkz2/Q1BOE7NquXVvk4X77VJohV/BgQJ04Io yt7mn5rzYM6j4o1XLACxUEHkXvht6h34abG0yHRnuoAEp/sdPz2jAXT4OxYqs6x0 XdLdr8gZgkMnnYaOQef74uJ2Ku+4A1ootjXSPazA7BWX0X5GqHnET/INk2S6cQPj C95LYfKC0ICD0qfioBmmHx8icDGoovcaswCju2ozfqWaD4Lwr3BcesnNDFtkHD9o aYaTTGGSxFyr2bZWTDpv4D4H5g3V4srRJsXs+SokL54nvlwd/smUJ4PVTLomP9y2 XswRtLdoiUsCrJy967CXfhsxnE5SRhmBQE38Jq8/pzetlRk2spvJJC5MGYF0O/nT 0UHbrjBCFUT2s8jv+gWWabOBUovsNJlgaxFwrZ/eNVIk0DK0ERoMV3V4MktU8uza Y339T14kxw4wlY2z5pOmEgkxmKZbPb55dBba04JEZzz9zDTawTk= =JQOx -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2019-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 16:05:52 -04:00
David S. Miller	148f025d41	Merge branch 'hns3-next' Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/11 - 3/11] fixes some bugs about the IO path [patch 4/11 - 6/11] includes some optimization and bugfixes about mailbox handling [patch 7/11 - 11/11] adds misc code optimizations and bugfixes. Change log: V2->V3: fixes comments from Neil Horman, removes [patch 8/12] V1->V2: adds modification on [patch 8/12] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:29 -04:00
Weihang Li	96490a1c09	net: hns3: remove reset after command send failed It's meaningless to trigger reset when failed to send command to IMP, because the failure is usually caused by no authority, illegal command and so on. When that happened, we just need to return the status code for further debugging. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:29 -04:00
Huazhong Tan	7b8f622e53	net: hns3: prevent double free in hns3_put_ring_config() This patch adds a check for the hns3_put_ring_config() to prevent double free, and for more readable, move the NULL assignment of priv->ring_data into the hns3_put_ring_config(). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:28 -04:00
liuzhongzhu	fd85717d28	net: hns3: extend the loopback state acquisition time The test results show that the maximum time of hardware return to mac link state is 500MS.The software needs to set twice the maximum time of hardware return state (1000MS). If not modified, the loopback test returns probability failure. Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:28 -04:00
Huazhong Tan	fba2efdae8	net: hns3: fix pause configure fail problem When configure pause, current implementation returns directly after setup PFC without setup BP, which is not sufficient. So this patch fixes it, only return while setting PFC failed. Fixes: `44e59e375b` ("net: hns3: do not return GE PFC setting err when initializing") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:28 -04:00
Huazhong Tan	146e92c13f	net: hns3: not reset TQP in the DOWN while VF resetting Since the hardware does not handle mailboxes and the hardware reset include TQP reset, so it is unnecessary to reset TQP in the hclgevf_ae_stop() while doing VF reset. Also it is unnecessary to reset the remaining TQP when one reset fails. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:28 -04:00
Huazhong Tan	b7048d324b	net: hns3: use a reserved byte to identify need_resp flag This patch uses a reserved byte in the hclge_mbx_vf_to_pf_cmd to save the need_resp flag, so when PF received the mailbox, it can use it to decise whether send a response to VF. For hclge_set_vf_uc_mac_addr(), it should use mbx_need_resp flag to decide whether send response to VF. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-26 12:13:28 -04:00

1 2 3 4 5 ...

827934 Commits