linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-04 09:34:12 +08:00

Author	SHA1	Message	Date
Vasily Averin	6126891c6d	memcg: enable accounting for IP address and routing-related objects An netadmin inside container can use 'ip a a' and 'ip r a' to assign a large number of ipv4/ipv6 addresses and routing entries and force kernel to allocate megabytes of unaccounted memory for long-lived per-netdevice related kernel objects: 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node', 'struct rt6_info', 'struct fib_rules' and ip_fib caches. These objects can be manually removed, though usually they lives in memory till destroy of its net namespace. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. One of such objects is the 'struct fib6_node' mostly allocated in net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: write_lock_bh(&table->tb6_lock); err = fib6_add(&table->tb6_root, rt, info, mxc); write_unlock_bh(&table->tb6_lock); In this case it is not enough to simply add SLAB_ACCOUNT to corresponding kmem cache. The proper memory cgroup still cannot be found due to the incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). Obsoleted in_interrupt() does not describe real execution context properly. >From include/linux/preempt.h: The following macros are deprecated and should not be used in new code: in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled To verify the current execution context new macro should be used instead: in_task() - We're in task context Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 06:00:38 -07:00
Vasily Averin	c948f51c16	memcg: enable accounting for net_device and Tx/Rx queues Container netadmin can create a lot of fake net devices, then create a new net namespace and repeat it again and again. Net device can request the creation of up to 4096 tx and rx queues, and force kernel to allocate up to several tens of megabytes memory per net device. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 06:00:38 -07:00
David S. Miller	2967eed908	Merge branch 'bridge-vlan-multicast' Nikolay Aleksandrov says: ==================== net: bridge: multicast: add vlan support This patchset adds initial per-vlan multicast support, most of the code deals with moving to multicast context pointers from bridge/port pointers. That allows us to switch them with the per-vlan contexts when a multicast packet is being processed and vlan multicast snooping has been enabled. That is controlled by a global bridge option added in patch 06 which is off by default (BR_BOOLOPT_MCAST_VLAN_SNOOPING). It is important to note that this option can change only under RTNL and doesn't require multicast_lock, so we need to be careful when retrieving mcast contexts in parallel. For packet processing they are switched only once in br_multicast_rcv() and then used until the packet has been processed. For the most part we need these contexts only to read config values and check if they are disabled. The global mcast state which is maintained consists of querier and router timers, the rest are config options. The port mcast state which is maintained consists of query timer and link to router port list if it's ever marked as a router port. Port multicast contexts _must_ be used only with their respective global contexts, that is a bridge port's mcast context must be used only with bridge's global mcast context and a vlan/port's mcast context must be used only with that vlan's global mcast context due to the router port lists. This way a bridge port can be marked as a router in multiple vlans, but might not be a router in some other vlan. Also this allows us to have per-vlan querier elections, per-vlan queries and basically the whole multicast state becomes per-vlan when the option is enabled. One of the hardest parts is synchronization with vlan's memory management, that is done through a new vlan flag: BR_VLFLAG_MCAST_ENABLED which is changed only under multicast_lock. When a vlan is being destroyed first that flag is removed under the lock, then the multicast context is torn down which includes waiting for any outstanding context timers. Since all of the vlan processing depends on BR_VLFLAG_MCAST_ENABLED it must be checked first if the contexts are vlan and the multicast_lock has been acquired. That is done by all IGMP/MLD packet processing functions and timers. When processing a packet we have RCU so the vlan memory won't be freed, but if the flag is missing we must not process it. The timers are synchronized in the same way with the addition of waiting for them to finish in case they are running after removing the flag under multicast_lock (i.e. they were waiting for the lock). Multicast vlan snooping requires vlan filtering to be enabled, if it's disabled then snooping gets automatically disabled, too. BR_VLFLAG_GLOBAL_MCAST_ENABLED controls if a vlan has BR_VLFLAG_MCAST_ENABLED set which is used in all vlan disabled checks. We need both flags because one is controlled by user-space globally (BR_VLFLAG_GLOBAL_MCAST_ENABLED) and the other is for a particular bridge/vlan or port/vlan entry (BR_VLFLAG_MCAST_ENABLED). Since the latter is also used for synchronization between the multicast and vlan code, and also controlled by BR_VLFLAG_GLOBAL_MCAST_ENABLED we rely on it when checking if a vlan context is disabled. The multicast fast-path has 3 new bit tests on the cache-hot bridge flags field, I didn't observe any measurable difference. I haven't forced either context options to be always disabled when the other type is enabled because the state consists of timers which either expire (router) or don't affect the normal operation. Some options, like the mcast querier one, won't be allowed to change for the disabled context type, that will come with a future patch-set which adds per-vlan querier control. Another important addition is the global vlan options, so far we had only per bridge/port vlan options but in order to control vlan multicast snooping globally we need to add a new type of global vlan options. They can be changed only on the bridge device and are dumped only when a special flag is set in the dump request. The first global option is vlan mcast snooping control, it controls the vlan BR_VLFLAG_GLOBAL_MCAST_ENABLED private flag. It can be set only on master vlan entries. There will be many more global vlan options in the future both for multicast config and other per-vlan options (e.g. STP). There's a lot of room for improvements, I'll do some of the initial ones but splitting the state to different contexts opens the door for a lot more. Also any new multicast options become vlan-supported with very little to no effort by using the same contexts. Short patch description: patches 01-04: initial mcast context add, no functional changes patch 05: adds vlan mcast init and control helpers and uses them on vlan create/destroy patch 06: adds a global bridge mcast vlan snooping knob (default off) patches 07-08: add a helper for users which must derive the contexts based on current bridge and vlan options (e.g. timers) patch 09: adds checks for disabled vlan contexts in packet processing and timers patch 10: adds support for per-vlan querier and tagged queries patch 11: adds router port vlan id in the notifications patches 12-14: add global vlan options support (change, dump, notify) patch 15: adds per-vlan global mcast snooping control Future patch-sets which build on this one (in order): - vlan state mcast handling - user-space mdb contexts (currently only the bridge contexts are used there) - all bridge multicast config options added per-vlan global and per vlan/port - iproute2 support for all the new uAPIs - selftests This set has been stress-tested (deleting/adding ports/vlans while changing vlan mcast snooping while processing IGMP/MLD packets), and also has passed all bridge self-tests. I'm sending this set as early as possible since there're a few more related sets that should go in the same release to get proper and full mcast vlan snooping support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:29 -07:00
Nikolay Aleksandrov	9dee572c38	net: bridge: vlan: add mcast snooping control Add a new global vlan option which controls whether multicast snooping is enabled or disabled for a single vlan. It controls the vlan private flag: BR_VLFLAG_GLOBAL_MCAST_ENABLED. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	9aba624d7c	net: bridge: vlan: notify when global options change Add support for global options notifications. They use only RTM_NEWVLAN since global options can only be set and are contained in a separate vlan global options attribute. Notifications are compressed in ranges where possible, i.e. the sequential vlan options are equal. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	743a53d963	net: bridge: vlan: add support for dumping global vlan options Add a new vlan options dump flag which causes only global vlan options to be dumped. The dumps are done only with bridge devices, ports are ignored. They support vlan compression if the options in sequential vlans are equal (currently always true). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	47ecd2dbd8	net: bridge: vlan: add support for global options We can have two types of vlan options depending on context: - per-device vlan options (split in per-bridge and per-port) - global vlan options The second type wasn't supported in the bridge until now, but we need them for per-vlan multicast support, per-vlan STP support and other options which require global vlan context. They are contained in the global bridge vlan context even if the vlan is not configured on the bridge device itself. This patch adds initial netlink attributes and support for setting these global vlan options, they can only be set (RTM_NEWVLAN) and the operation must use the bridge device. Since there are no such options yet it shouldn't have any functional effect. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	1e9ca45662	net: bridge: multicast: include router port vlan id in notifications Use the port multicast context to check if the router port is a vlan and in case it is include its vlan id in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	615cc23e62	net: bridge: multicast: add vlan querier and query support Add basic vlan context querier support, if the contexts passed to multicast_alloc_query are vlan then the query will be tagged. Also handle querier start/stop of vlan contexts. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	4cdd0d10f3	net: bridge: multicast: check if should use vlan mcast ctx Add helpers which check if the current bridge/port multicast context should be used (i.e. they're not disabled) and use them for Rx IGMP/MLD processing, timers and new group addition. It is important for vlans to disable processing of timer/packet after the multicast_lock is obtained if the vlan context doesn't have BR_VLFLAG_MCAST_ENABLED. There are two cases when that flag is missing: - if the vlan is getting destroyed it will be removed and timers will be stopped - if the vlan mcast snooping is being disabled Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	eb1593a0b4	net: bridge: multicast: use the port group to port context helper We need to use the new port group to port context helper in places where we cannot pass down the proper context (i.e. functions that can be called by timers or outside the packet snooping paths). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	74edfd483d	net: bridge: multicast: add helper to get port mcast context from port group Add br_multicast_pg_to_port_ctx() which returns the proper port multicast context from either port or vlan based on bridge option and vlan flags. As the comment inside explains the locking is a bit tricky, we rely on the fact that BR_VLFLAG_MCAST_ENABLED requires multicast_lock to change and we also require it to be held to call that helper. If we find the vlan under rcu and it still has the flag then we can be sure it will be alive until we unlock multicast_lock which should be enough. Note that the context might change from vlan to bridge between different calls to this helper as the mcast vlan knob requires only rtnl so it should be used carefully and for read-only/check purposes. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	f4b7002a70	net: bridge: add vlan mcast snooping knob Add a global knob that controls if vlan multicast snooping is enabled. The proper contexts (vlan or bridge-wide) will be chosen based on the knob when processing packets and changing bridge device state. Note that vlans have their individual mcast snooping enabled by default, but this knob is needed to turn on bridge vlan snooping. It is disabled by default. To enable the knob vlan filtering must also be enabled, it doesn't make sense to have vlan mcast snooping without vlan filtering since that would lead to inconsistencies. Disabling vlan filtering will also automatically disable vlan mcast snooping. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:20 -07:00
Nikolay Aleksandrov	7b54aaaf53	net: bridge: multicast: add vlan state initialization and control Add helpers to enable/disable vlan multicast based on its flags, we need two flags because we need to know if the vlan has multicast enabled globally (user-controlled) and if it has it enabled on the specific device (bridge or port). The new private vlan flags are: - BR_VLFLAG_MCAST_ENABLED: locally enabled multicast on the device, used when removing a vlan, toggling vlan mcast snooping and controlling single vlan (kernel-controlled, valid under RTNL and multicast_lock) - BR_VLFLAG_GLOBAL_MCAST_ENABLED: globally enabled multicast for the vlan, used to control the bridge-wide vlan mcast snooping for a single vlan (user-controlled, can be checked under any context) Bridge vlan contexts are created with multicast snooping enabled by default to be in line with the current bridge snooping defaults. In order to actually activate per vlan snooping and context usage a bridge-wide knob will be added later which will default to disabled. If that knob is enabled then automatically all vlan snooping will be enabled. All vlan contexts are initialized with the current bridge multicast context defaults. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov	613d61dbef	net: bridge: vlan: add global and per-port multicast context Add global and per-port vlan multicast context, only initialized but still not used. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov	adc47037a7	net: bridge: multicast: use multicast contexts instead of bridge or port Pass multicast context pointers to multicast functions instead of bridge/port. This would make it easier later to switch these contexts to their per-vlan versions. The patch is basically search and replace, no functional changes. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov	d3d065c003	net: bridge: multicast: factor out bridge multicast context Factor out the bridge's global multicast context into a separate structure which will later be used for per-vlan global context. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:19 -07:00
Nikolay Aleksandrov	9632233e7d	net: bridge: multicast: factor out port multicast context Factor out the port's multicast context into a separate structure which will later be shared for per-port,vlan context. No functional changes intended. We need the structure even if bridge multicast is not defined to pass down as pointer to forwarding functions. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-20 05:41:19 -07:00
Liu Jian	23d2b94043	igmp: Add ip_mc_list lock in ip_check_mc_rcu I got below panic when doing fuzz test: Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x7a/0x9b panic+0x2cd/0x5af end_report.cold+0x5a/0x5a kasan_report+0xec/0x110 ip_check_mc_rcu+0x556/0x5d0 __mkroute_output+0x895/0x1740 ip_route_output_key_hash_rcu+0x2d0/0x1050 ip_route_output_key_hash+0x182/0x2e0 ip_route_output_flow+0x28/0x130 udp_sendmsg+0x165d/0x2280 udpv6_sendmsg+0x121e/0x24f0 inet6_sendmsg+0xf7/0x140 sock_sendmsg+0xe9/0x180 ____sys_sendmsg+0x2b8/0x7a0 ___sys_sendmsg+0xf0/0x160 __sys_sendmmsg+0x17e/0x3c0 __x64_sys_sendmmsg+0x9e/0x100 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x462eb9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9 RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff It is one use-after-free in ip_check_mc_rcu. In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection. But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-17 12:58:29 -07:00
David S. Miller	ab0441b4a9	Merge branch 'vmxnet3-version-6' Ronak Doshi says: ==================== vmxnet3: upgrade to version 6 vmxnet3 emulation has recently added several new features which includes increase in queues supported, remove power of 2 limitation on queues, add RSS for ESP IPv6, etc. This patch series extends the vmxnet3 driver to leverage these new features. Compatibility is maintained using existing vmxnet3 versioning mechanism as follows: - new features added to vmxnet3 emulation are associated with new vmxnet3 version viz. vmxnet3 version 6. - emulation advertises all the versions it supports to the driver. - during initialization, vmxnet3 driver picks the highest version number supported by both the emulation and the driver and configures emulation to run at that version. In particular, following changes are introduced: Patch 1: This patch introduces utility macros for vmxnet3 version 6 comparison and updates Copyright information. Patch 2: This patch adds support to increase maximum Tx/Rx queues from 8 to 32. Patch 3: This patch removes the limitation of power of 2 on the queues. Patch 4: Uses existing get_rss_hash_opts and set_rss_hash_opts methods to add support for ESP IPv6 RSS. Patch 5: This patch reports correct RSS hash type based on the type of RSS performed. Patch 6: This patch updates maximum configurable mtu to 9190. Patch 7: With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver, with this patch, the driver can configure emulation to run at vmxnet3 version 6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:15 -07:00
Ronak Doshi	ce2639ad69	vmxnet3: update to version 6 With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver, the driver can configure emulation to run at vmxnet3 version 6, provided the emulation advertises support for version 6. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	8c5663e461	vmxnet3: increase maximum configurable mtu to 9190 This patch increases the maximum configurable mtu to 9190 to accommodate jumbo packets of overlay traffic. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	b3973bb400	vmxnet3: set correct hash type based on rss information As vmxnet3 supports IP/TCP/UDP RSS, this patch sets appropriate hash type based on the type of RSS performed. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	79d124bb36	vmxnet3: add support for ESP IPv6 RSS Vmxnet3 version 4 added support for ESP RSS. However, only IPv4 was supported. With vmxnet3 version 6, this patch enables RSS for ESP IPv6 packets as well. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	15ccf2f4b0	vmxnet3: remove power of 2 limitation on the queues With version 6, vmxnet3 relaxes the restriction on queues to be power of two. This is helpful in cases (Edge VM) where vcpus are less than 8 and device requires more than 4 queues. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	39f9895a00	vmxnet3: add support for 32 Tx/Rx queues Currently, vmxnet3 supports maximum of 8 Tx/Rx queues. With increase in number of vcpus on a VM, to achieve better performance and utilize idle vcpus, we need to increase the max number of queues supported. This patch enhances vmxnet3 to support maximum of 32 Tx/Rx queues. Increasing the Rx queues also increases the probability of distrubuting the traffic from different flows to different queues with RSS. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Ronak Doshi	69dbef0d1c	vmxnet3: prepare for version 6 changes vmxnet3 is currently at version 4 and this patch initiates the preparation to accommodate changes for upto version 6. Introduced utility macros for vmxnet3 version 6 comparison and update Copyright information. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:32:14 -07:00
Xin Long	f4919ff59c	tipc: keep the skb in rcv queue until the whole data is read Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Note that this will work as above only when MSG_EOR is set in the flags parameter of recvmsg(), so that it won't break any old user applications. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:28:09 -07:00
David S. Miller	5242b0c6b5	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/t nguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2021-07-16 Vinicius Costa Gomes says: Add support for steering traffic to specific RX queues using Flex Filters. As the name implies, Flex Filters are more flexible than using Layer-2, VLAN or MAC address filters, one of the reasons is that they allow "AND" operations more easily, e.g. when the user wants to steer some traffic based on the source MAC address and the packet ethertype. Future work include adding support for offloading tc-u32 filters to the hardware. The series is divided as follows: Patch 1/5, add the low level primitives for configuring Flex filters. Patch 2/5 and 3/5, allow ethtool to manage Flex filters. Patch 4/5, when specifying filters that have multiple predicates, use Flex filters. Patch 5/5, Adds support for exposing the i225 LEDs using the LED subsystem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 17:25:13 -07:00
Kurt Kanzenbach	cf8331825a	igc: Export LEDs Each i225 has three LEDs. Export them via the LED class framework. Each LED is controllable via sysfs. Example: $ cd /sys/class/leds/igc_led0 $ cat brightness # Current Mode $ cat max_brightness # 15 $ echo 0 > brightness # Mode 0 $ echo 1 > brightness # Mode 1 The brightness field here reflects the different LED modes ranging from 0 to 15. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-07-16 14:08:12 -07:00
Kurt Kanzenbach	7374426221	igc: Make flex filter more flexible Currently flex filters are only used for filters containing user data. However, it makes sense to utilize them also for filters having multiple conditions, because that's not supported by the driver at the moment. Add it. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-07-16 14:08:03 -07:00
Vinicius Costa Gomes	7991487ecb	igc: Allow for Flex Filters to be installed Allows Flex Filters to be installed. The previous restriction to the types of filters that can be installed can now be lifted. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-07-16 14:07:55 -07:00
Kurt Kanzenbach	2b477d057e	igc: Integrate flex filter into ethtool ops Use the flex filter mechanism to extend the current ethtool filter operations by intercoperating the user data. This allows to match eight more bytes within a Ethernet frame in addition to macs, ether types and vlan. The matching pattern looks like this: * dest_mac [6] * src_mac [6] * tpid [2] * vlan tci [2] * ether type [2] * user data [8] This can be used to match Profinet traffic classes by FrameID range. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-07-16 14:07:33 -07:00
Kurt Kanzenbach	6574631b50	igc: Add possibility to add flex filter The Intel i225 NIC has the possibility to add flex filters which can match up to the first 128 byte of a packet. These filters are useful for all kind of packet matching. One particular use case is Profinet, as the different traffic classes are distinguished by the frame id range which cannot be matched by any other means. Add code to configure and enable flex filters. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-07-16 14:07:24 -07:00
Voon Weifeng	08041a9af9	net: phy: marvell10g: enable WoL for 88X3310 and 88E2110 Implement Wake-on-LAN feature for 88X3310 and 88E2110. This is done by enabling WoL interrupt and WoL detection and configuring MAC address into WoL magic packet registers Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ling Pei Lee <pei.lee.ling@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 13:22:12 -07:00
Joakim Zhang	86a176f485	ARM: dts: imx7-mba7: remove un-used "phy-reset-delay" property Remove un-used "phy-reset-delay" property which found when do dtbs_check (set additionalProperties: false in fsl,fec.yaml). Double check current driver and commit history, "phy-reset-delay" never comes up, so it should be safe to remove it. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml arch/arm/boot/dts/imx7d-mba7.dt.yaml: ethernet@30be0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' arch/arm/boot/dts/imx7d-mba7.dt.yaml: ethernet@30bf0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' /arch/arm/boot/dts/imx7s-mba7.dt.yaml: ethernet@30be0000: 'phy-reset-delay' does not match any of the regexes: 'pinctrl-[0-9]+' Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 12:34:29 -07:00
Joakim Zhang	95740a9a3a	ARM: dts: imx35: correct node name for FEC Correct node name for FEC which found when do dtbs_check. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml arch/arm/boot/dts/imx35-eukrea-mbimxsd35-baseboard.dt.yaml: fec@50038000: $nodename:0: 'fec@50038000' does not match '^ethernet(@.)?$' arch/arm/boot/dts/imx35-pdk.dt.yaml: fec@50038000: $nodename:0: 'fec@50038000' does not match '^ethernet(@.)?$' Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 12:34:29 -07:00
Joakim Zhang	96e4781b3d	dt-bindings: net: fec: convert fsl,*fec bindings to yaml In order to automate the verification of DT nodes convert fsl-fec.txt to fsl,fec.yaml, and pass binding check with below command. $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml DTEX Documentation/devicetree/bindings/net/fsl,fec.example.dts DTC Documentation/devicetree/bindings/net/fsl,fec.example.dt.yaml CHECK Documentation/devicetree/bindings/net/fsl,fec.example.dt.yaml Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 12:34:29 -07:00
Peilin Ye	d4861fc6be	netdevsim: Add multi-queue support Currently netdevsim only supports a single queue per port, which is insufficient for testing multi-queue TC schedulers e.g. sch_mq. Extend the current sysfs interface so that users can create ports with multiple queues: $ echo "[ID] [PORT_COUNT] [NUM_QUEUES]" > /sys/bus/netdevsim/new_device As an example, echoing "2 4 8" creates 4 ports, with 8 queues per port. Note, this is compatible with the current interface, with default number of queues set to 1. For example, echoing "2 4" creates 4 ports with 1 queue per port; echoing "2" simply creates 1 port with 1 queue. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 11:17:56 -07:00
Mark Gray	b83d23a2a3	openvswitch: Introduce per-cpu upcall dispatch The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html Bugzilla: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 11:06:33 -07:00
Bill Wendling	919d527956	bnx2x: remove unused variable 'cur_data_offset' Fix the clang build warning: drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1862:13: error: variable 'cur_data_offset' set but not used [-Werror,-Wunused-but-set-variable] dma_addr_t cur_data_offset; Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:56:55 -07:00
Christophe JAILLET	a99f030b24	net: switchdev: Simplify 'mlxsw_sp_mc_write_mdb_entry()' Use 'bitmap_alloc()/bitmap_free()' instead of hand-writing it. This makes the code less verbose. Also, use 'bitmap_alloc()' instead of 'bitmap_zalloc()' because the bitmap is fully overridden by a 'bitmap_copy()' call just after its allocation. While at it, remove an extra and unneeded space. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:54:18 -07:00
Yajun Deng	f79a3bcb1a	net/sched: Remove unnecessary if statement It has been deal with the 'if (err' statement in rtnetlink_send() and rtnl_unicast(). so remove unnecessary if statement. v2: use the raw name rtnetlink_send(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:46:35 -07:00
Yajun Deng	cfdf0d9ae7	rtnetlink: use nlmsg_notify() in rtnetlink_send() The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them. so use nlmsg_notify() instead. so that the caller wouldn't deal with 'if (err > 0' statement. v2: use nlmsg_notify() will do well. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:46:35 -07:00
Haiyue Wang	63a9192b8f	gve: fix the wrong AdminQ buffer overflow check The 'tail' pointer is also free-running count, so it needs to be masked as 'adminq_prod_cnt' does, to become an index value of AdminQ buffer. Fixes: `5cdad90de6` ("gve: Batch AQ commands for creating and destroying queues.") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:41:40 -07:00
David S. Miller	82a1ffe57e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-07-15 The following pull-request contains BPF updates for your net-next tree. We've added 45 non-merge commits during the last 15 day(s) which contain a total of 52 files changed, 3122 insertions(+), 384 deletions(-). The main changes are: 1) Introduce bpf timers, from Alexei. 2) Add sockmap support for unix datagram socket, from Cong. 3) Fix potential memleak and UAF in the verifier, from He. 4) Add bpf_get_func_ip helper, from Jiri. 5) Improvements to generic XDP mode, from Kumar. 6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi. =================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-15 22:40:10 -07:00
Alexei Starovoitov	c50524ec4e	Merge branch 'sockmap: add sockmap support for unix datagram socket' Cong Wang says: ==================== From: Cong Wang <cong.wang@bytedance.com> This is the last patchset of the original large patchset. In the previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT was introduced and UDP began to support it too. In this patchset, we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that we can finally splice Unix datagram socket and UDP socket. Please check each patch description for more details. To see the big picture, the previous patchsets are available here: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4 and this patchset is available here: https://github.com/congwang/linux/tree/sockmap3 Acked-by: John Fastabend <john.fastabend@gmail.com> --- v5: lift socket state check for dgram remove ->unhash() case add retries for EAGAIN in all test cases remove an unused parameter of __unix_dgram_recvmsg() rebase on the latest bpf-next v4: fix af_unix disconnect case add unix_unhash() split out two small patches reduce u->iolock critical section remove an unused parameter of __unix_dgram_recvmsg() v3: fix Kconfig dependency make unix_read_sock() static fix a UAF in unix_release() add a missing header unix_bpf.c v2: separate out from the original large patchset rebase to the latest bpf-next clean up unix_read_sock() export sock_map_close() factor out some helpers in selftests for code reuse ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-07-15 18:17:51 -07:00
Cong Wang	a2ffda38dc	selftests/bpf: Add test cases for redirection between udp and unix Add two test cases to ensure redirection between udp and unix work bidirectionally. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-12-xiyou.wangcong@gmail.com	2021-07-15 18:17:51 -07:00
Cong Wang	5ea905dd43	selftests/bpf: Add a test case for unix sockmap Add a test case to ensure redirection between two AF_UNIX datagram sockets work. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-11-xiyou.wangcong@gmail.com	2021-07-15 18:17:50 -07:00
Cong Wang	0626bc2ff6	selftests/bpf: Factor out add_to_sockmap() Factor out a common helper add_to_sockmap() which adds two sockets into a sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-10-xiyou.wangcong@gmail.com	2021-07-15 18:17:50 -07:00

1 2 3 4 5 ...

1029626 Commits