linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 13:14:07 +08:00

Author	SHA1	Message	Date
David Ahern	5d1f0f09b5	nexthop: Rename nexthop_free_mpath nexthop_free_mpath really should be nexthop_free_group. Rename it. Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:49:51 -08:00
Julian Wiedmann	2c3b4456c8	net/af_iucv: build SG skbs for TRANS_HIPER sockets The TX path no longer falls apart when some of its SG skbs are later linearized by lower layers of the stack. So enable the use of SG skbs in iucv_sock_sendmsg() again. This effectively reverts commit `dc5367bcc5` ("net/af_iucv: don't use paged skbs for TX on HiperSockets"). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:36:22 -08:00
Julian Wiedmann	80bc97aa0a	net/af_iucv: don't track individual TX skbs for TRANS_HIPER sockets Stop maintaining the skb_send_q list for TRANS_HIPER sockets. Not only is it extra overhead, but keeping around a list of skb clones means that we later also have to match the ->sk_txnotify() calls against these clones and free them accordingly. The current matching logic (comparing the skbs' shinfo location) is frustratingly fragile, and breaks if the skb's head is mangled in any sort of way while passing from dev_queue_xmit() to the device's HW queue. Also adjust the interface for ->sk_txnotify(), to make clear that we don't actually care about any skb internals. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:36:21 -08:00
Julian Wiedmann	ef6af7bdb9	net/af_iucv: count packets in the xmit path The TX code keeps track of all skbs that are in-flight but haven't actually been sent out yet. For native IUCV sockets that's not a huge deal, but with TRANS_HIPER sockets it would be much better if we didn't need to maintain a list of skb clones. Note that we actually only care about the _count_ of skbs in this stage of the TX pipeline. So as prep work for removing the skb tracking on TRANS_HIPER sockets, keep track of the skb count in a separate variable and pair any list {enqueue, unlink} with a count {increment, decrement}. Then replace all occurences where we currently look at the skb list's fill level. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:36:21 -08:00
Julian Wiedmann	c464444fa2	net/af_iucv: don't lookup the socket on TX notification Whoever called iucv_sk(sk)->sk_txnotify() must already know that they're dealing with an af_iucv socket. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:36:21 -08:00
Alexander Egorenkov	27e9c1de52	net/af_iucv: remove WARN_ONCE on malformed RX packets syzbot reported the following finding: AF_IUCV failed to receive skb, len=0 WARNING: CPU: 0 PID: 522 at net/iucv/af_iucv.c:2039 afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039 CPU: 0 PID: 522 Comm: syz-executor091 Not tainted 5.10.0-rc1-syzkaller-07082-g55027a88ec9f #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: [<00000000b87ea538>] afiucv_hs_rcv+0x178/0x190 net/iucv/af_iucv.c:2039 ([<00000000b87ea534>] afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039) [<00000000b796533e>] __netif_receive_skb_one_core+0x13e/0x188 net/core/dev.c:5315 [<00000000b79653ce>] __netif_receive_skb+0x46/0x1c0 net/core/dev.c:5429 [<00000000b79655fe>] netif_receive_skb_internal+0xb6/0x220 net/core/dev.c:5534 [<00000000b796ac3a>] netif_receive_skb+0x42/0x318 net/core/dev.c:5593 [<00000000b6fd45f4>] tun_rx_batched.isra.0+0x6fc/0x860 drivers/net/tun.c:1485 [<00000000b6fddc4e>] tun_get_user+0x1c26/0x27f0 drivers/net/tun.c:1939 [<00000000b6fe0f00>] tun_chr_write_iter+0x158/0x248 drivers/net/tun.c:1968 [<00000000b4f22bfa>] call_write_iter include/linux/fs.h:1887 [inline] [<00000000b4f22bfa>] new_sync_write+0x442/0x648 fs/read_write.c:518 [<00000000b4f238fe>] vfs_write.part.0+0x36e/0x5d8 fs/read_write.c:605 [<00000000b4f2984e>] vfs_write+0x10e/0x148 fs/read_write.c:615 [<00000000b4f29d0e>] ksys_write+0x166/0x290 fs/read_write.c:658 [<00000000b8dc4ab4>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 Last Breaking-Event-Address: [<00000000b8dc64d4>] __s390_indirect_jump_r14+0x0/0xc Malformed RX packets shouldn't generate any warnings because debugging info already flows to dropmon via the kfree_skb(). Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 20:36:21 -08:00
Jakub Kicinski	c358f95205	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/can/dev.c `b552766c87` ("can: dev: prevent potential information leak in can_fill_info()") `3e77f70e73` ("can: dev: move driver related infrastructure into separate subdir") `0a042c6ec9` ("can: dev: move netlink related code into seperate file") Code move. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c `57ac4a31c4` ("net/mlx5e: Correctly handle changing the number of queues when the interface is down") `214baf2287` ("net/mlx5e: Support HTB offload") Adjacent code changes net/switchdev/switchdev.c `20776b465c` ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP") `ffb68fc58e` ("net: switchdev: remove the transaction structure from port object notifiers") `bae33f2b5a` ("net: switchdev: remove the transaction structure from port attributes") Transaction parameter gets dropped otherwise keep the fix. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 17:09:31 -08:00
Jakub Kicinski	24a790da0a	mlx5 subfunction support Parav Pandit Says: ================= This patchset introduces support for mlx5 subfunction (SF). A subfunction is a lightweight function that has a parent PCI function on which it is deployed. mlx5 subfunction has its own function capabilities and its own resources. This means a subfunction has its own dedicated queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from the parent PCI function. When subfunction is RDMA capable, it has its own QP1, GID table and rdma resources neither shared nor stolen from the parent PCI function. A subfunction has dedicated window in PCI BAR space that is not shared with the other subfunctions or parent PCI function. This ensures that all class devices of the subfunction accesses only assigned PCI BAR space. A Subfunction supports eswitch representation through which it supports tc offloads. User must configure eswitch to send/receive packets from/to subfunction port. Subfunctions share PCI level resources such as PCI MSI-X IRQs with their other subfunctions and/or with its parent PCI function. Patch summary: -------------- Patch 1 to 4 prepares devlink patch 5 to 7 mlx5 adds SF device support Patch 8 to 11 mlx5 adds SF devlink port support Patch 12 and 14 adds documentation Patch-1 prepares code to handle multiple port function attributes Patch-2 introduces devlink pcisf port flavour similar to pcipf and pcivf Patch-3 adds port add and delete driver callbacks Patch-4 adds port function state get and set callbacks Patch-5 mlx5 vhca event notifier support to distribute subfunction state change notification Patch-6 adds SF auxiliary device Patch-7 adds SF auxiliary driver Patch-8 prepares eswitch to handler SF vport Patch-9 adds eswitch helpers to add/remove SF vport Patch-10 implements devlink port add/del callbacks Patch-11 implements devlink port function get/set callbacks Patch-12 to 14 adds documentation Patch-12 added mlx5 port function documentation Patch-13 adds subfunction documentation Patch-14 adds mlx5 subfunction documentation Subfunction support is discussed in detail in RFC [1] and [2]. RFC [1] and extension [2] describes requirements, design and proposed plumbing using devlink, auxiliary bus and sysfs for systemd/udev support. Functionality of this patchset is best explained using real examples further below. overview: -------- A subfunction can be created and deleted by a user using devlink port add/delete interface. A subfunction can be configured using devlink port function attribute before its activated. When a subfunction is activated, it results in an auxiliary device on the host PCI device where it is deployed. A driver binds to the auxiliary device that further creates supported class devices. example subfunction usage sequence: ----------------------------------- Change device to switchdev mode: $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev Add a devlink port of subfunction flavour: $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 Configure mac address of the port function: $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 Now activate the function: $ devlink port function set ens2f0npf0sf88 state active Now use the auxiliary device and class devices: $ devlink dev show pci/0000:06:00.0 auxiliary/mlx5_core.sf.4 $ ip link show 127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff altname enp6s0f0np0 129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff $ rdma dev show 43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112 44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 After use inactivate the function: $ devlink port function set ens2f0npf0sf88 state inactive Now delete the subfunction port: $ devlink port del ens2f0npf0sf88 [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/ [2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2 ================= -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmALKDwACgkQSD+KveBX +j7qjQf6A1moPhhIlXROCzaJUjlAj2U291LWBveU+I6na6fjYjAAWHYwfv0YKQpo Qb0NRt+9abgEpGidc4hOwIJKhK+vlWrQuehRt83aAfAwaN3OEeGuNllniWo821Hj sNiJfSC/DslOlQSxKLsAs3Fduy/sV3GN9Zv7hEwOFgEr5QvB2c6H1XiypVP2Ecsd ZXC3SuEWxIoRtfXEkTkJne9LNoiDChlvT1FR/z75h8HUBdAOjzBTQzBbM+8M4Msw 8aKUPya3FMRAPWsOgPhkpU0xTtH2Mi7MC9TlwiWmrK4Q3uvesIav8pVf7r3GNAZA sipIZ4gP0M5SiCaZa8rIBpTXBHxmvg== =jEG4 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 subfunction support Parav Pandit says: This patchset introduces support for mlx5 subfunction (SF). A subfunction is a lightweight function that has a parent PCI function on which it is deployed. mlx5 subfunction has its own function capabilities and its own resources. This means a subfunction has its own dedicated queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from the parent PCI function. When subfunction is RDMA capable, it has its own QP1, GID table and rdma resources neither shared nor stolen from the parent PCI function. A subfunction has dedicated window in PCI BAR space that is not shared with the other subfunctions or parent PCI function. This ensures that all class devices of the subfunction accesses only assigned PCI BAR space. A Subfunction supports eswitch representation through which it supports tc offloads. User must configure eswitch to send/receive packets from/to subfunction port. Subfunctions share PCI level resources such as PCI MSI-X IRQs with their other subfunctions and/or with its parent PCI function. Subfunction support is discussed in detail in RFC [1] and [2]. RFC [1] and extension [2] describes requirements, design and proposed plumbing using devlink, auxiliary bus and sysfs for systemd/udev support. Functionality of this patchset is best explained using real examples further below. overview: -------- A subfunction can be created and deleted by a user using devlink port add/delete interface. A subfunction can be configured using devlink port function attribute before its activated. When a subfunction is activated, it results in an auxiliary device on the host PCI device where it is deployed. A driver binds to the auxiliary device that further creates supported class devices. example subfunction usage sequence: ----------------------------------- Change device to switchdev mode: $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev Add a devlink port of subfunction flavour: $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 Configure mac address of the port function: $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 Now activate the function: $ devlink port function set ens2f0npf0sf88 state active Now use the auxiliary device and class devices: $ devlink dev show pci/0000:06:00.0 auxiliary/mlx5_core.sf.4 $ ip link show 127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff altname enp6s0f0np0 129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff $ rdma dev show 43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112 44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 After use inactivate the function: $ devlink port function set ens2f0npf0sf88 state inactive Now delete the subfunction port: $ devlink port del ens2f0npf0sf88 [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/ [2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2 ================= * tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Add devlink subfunction port documentation devlink: Extend devlink port documentation for subfunctions devlink: Add devlink port documentation net/mlx5: SF, Port function state change support net/mlx5: SF, Add port add delete functionality net/mlx5: E-switch, Add eswitch helpers for SF vport net/mlx5: E-switch, Prepare eswitch to handle SF vport net/mlx5: SF, Add auxiliary device driver net/mlx5: SF, Add auxiliary device support net/mlx5: Introduce vhca state event notifier devlink: Support get and set state of port function devlink: Support add and delete devlink port devlink: Introduce PCI SF port flavour and port attribute devlink: Prepare code to fill multiple port function attributes ==================== Link: https://lore.kernel.org/r/20210122193658.282884-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 16:57:19 -08:00
Linus Torvalds	909b447dcc	Networking fixes for 5.11-rc6, including fixes from can, xfrm, wireless, wireless-drivers and netfilter trees. Nothing scary, Intel WiFi-related fixes seemed most notable to the users. Current release - regressions: - dsa: microchip: ksz8795: fix KSZ8794 port map again to program the CPU port correctly Current release - new code bugs: - iwlwifi: pcie: reschedule in long-running memory reads Previous releases - regressions: - iwlwifi: dbg: don't try to overwrite read-only FW data - iwlwifi: provide gso_type to GSO packets - octeontx2: make sure the buffer is 128 byte aligned - tcp: make TCP_USER_TIMEOUT accurate for zero window probes - xfrm: fix wraparound in xfrm_policy_addr_delta() - xfrm: fix oops in xfrm_replay_advance_bmp due to a race between CPUs in presence of packet reorder - tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN - wext: fix NULL-ptr-dereference with cfg80211's lack of commit() Previous releases - always broken: - igc: fix link speed advertising - stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing - team: protect features update by RCU to avoid deadlock - xfrm: fix disable_xfrm sysctl when used on xfrm interfaces themselves - fec: fix temporary RMII clock reset on link up - can: dev: prevent potential information leak in can_fill_info() Misc: - mrp: fix bad packing of MRP test packet structures - uapi: fix big endian definition of ipv6_rpl_sr_hdr - add David Ahern to IPv4/IPv6 maintainers Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmATRs4ACgkQMUZtbf5S IrtOfQ//Vmn1WprrwLPf6/uOuBN0RAKHC+64IRIw2ahDuiB1QQV0c3ALRd42Xp8n qnoDMB/mUWdF/KjjJEKvwYyBuwBeQWLcpgTXi1HvvhxM13PVHjvyIp6hTAYYj+m4 KyWWzQZwezz0zKQ3wXFdZV4JuefXEgXvMx65o8nk+TsutHn6WK/E6ZnWTexoZ0pa 5Lab149mtoCdSpT3gr2x1aTqd9KYWaxfarYOUD1GY58BQyDFl4wj10MV3oE7xWPj /MKnSBvPx52ajbb+rUVhfFjBN1BmEjdze7cBMncJc5H+0X38R23ZaAlP3gecGaac hZ5C2wnSSvRR8KIvSEwbCArlpuyU+exacZXZ0vS6sfgqISKqoPv8erWvpxtLil3v YfwZVNPYG9RBwbnDVw1gLQIFn3lUqLhIPnJ8J2Ue6KUm7ur4fO566RjyPU3gkPdp 5Zj3Eh7hsB2EqOy4RdwnoI0QboWmlq9+wT11HCXPFyJ077JzVU0FzMSvJr4dgVSI 3D3ckmw+RSej4ib6G4xjpq1tPCFzdf9zlFoUPomRFTKgfJFaky5pEb/22C3bztp1 43fsv3PiwlQtoYP3pfQsRj+r6DikYwDL7A3lskWohIZXviY2wErKWViUcIXr5ULE BxYQq0NYMl4TgDkn525U9EFwVgJAvPAedhYxF7VKn3eHNODqWBo= =dwFD -----END PGP SIGNATURE----- Merge tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes including fixes from can, xfrm, wireless, wireless-drivers and netfilter trees. Nothing scary, Intel WiFi-related fixes seemed most notable to the users. Current release - regressions: - dsa: microchip: ksz8795: fix KSZ8794 port map again to program the CPU port correctly Current release - new code bugs: - iwlwifi: pcie: reschedule in long-running memory reads Previous releases - regressions: - iwlwifi: dbg: don't try to overwrite read-only FW data - iwlwifi: provide gso_type to GSO packets - octeontx2: make sure the buffer is 128 byte aligned - tcp: make TCP_USER_TIMEOUT accurate for zero window probes - xfrm: fix wraparound in xfrm_policy_addr_delta() - xfrm: fix oops in xfrm_replay_advance_bmp due to a race between CPUs in presence of packet reorder - tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN - wext: fix NULL-ptr-dereference with cfg80211's lack of commit() Previous releases - always broken: - igc: fix link speed advertising - stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing - team: protect features update by RCU to avoid deadlock - xfrm: fix disable_xfrm sysctl when used on xfrm interfaces themselves - fec: fix temporary RMII clock reset on link up - can: dev: prevent potential information leak in can_fill_info() Misc: - mrp: fix bad packing of MRP test packet structures - uapi: fix big endian definition of ipv6_rpl_sr_hdr - add David Ahern to IPv4/IPv6 maintainers" * tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits) rxrpc: Fix memory leak in rxrpc_lookup_local mlxsw: spectrum_span: Do not overwrite policer configuration selftests: forwarding: Specify interface when invoking mausezahn stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing net: usb: cdc_ether: added support for Thales Cinterion PLSx3 modem family. ibmvnic: Ensure that CRQ entry read are correctly ordered MAINTAINERS: add missing header for bonding net: decnet: fix netdev refcount leaking on error path net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP can: dev: prevent potential information leak in can_fill_info() net: fec: Fix temporary RMII clock reset on link up net: lapb: Add locking to the lapb module team: protect features update by RCU to avoid deadlock MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset net/mlx5e: Revert parameters on errors when changing trust state without reset net/mlx5e: Correctly handle changing the number of queues when the interface is down net/mlx5e: Fix CT rule + encap slow path offload and deletion net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled ...	2021-01-28 15:24:43 -08:00
Takeshi Misawa	b8323f7288	rxrpc: Fix memory leak in rxrpc_lookup_local Commit `9ebeddef58` ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record") Then release ref in __rxrpc_put_peer and rxrpc_put_peer_locked. struct rxrpc_peer rxrpc_alloc_peer(struct rxrpc_local local, gfp_t gfp) - peer->local = local; + peer->local = rxrpc_get_local(local); rxrpc_discard_prealloc also need ref release in discarding. syzbot report: BUG: memory leak unreferenced object 0xffff8881080ddc00 (size 256): comm "syz-executor339", pid 8462, jiffies 4294942238 (age 12.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 0a 00 00 00 00 c0 00 08 81 88 ff ff ................ backtrace: [<000000002b6e495f>] kmalloc include/linux/slab.h:552 [inline] [<000000002b6e495f>] kzalloc include/linux/slab.h:682 [inline] [<000000002b6e495f>] rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline] [<000000002b6e495f>] rxrpc_lookup_local+0x1c1/0x760 net/rxrpc/local_object.c:244 [<000000006b43a77b>] rxrpc_bind+0x174/0x240 net/rxrpc/af_rxrpc.c:149 [<00000000fd447a55>] afs_open_socket+0xdb/0x200 fs/afs/rxrpc.c:64 [<000000007fd8867c>] afs_net_init+0x2b4/0x340 fs/afs/main.c:126 [<0000000063d80ec1>] ops_init+0x4e/0x190 net/core/net_namespace.c:152 [<00000000073c5efa>] setup_net+0xde/0x2d0 net/core/net_namespace.c:342 [<00000000a6744d5b>] copy_net_ns+0x19f/0x3e0 net/core/net_namespace.c:483 [<0000000017d3aec3>] create_new_namespaces+0x199/0x4f0 kernel/nsproxy.c:110 [<00000000186271ef>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226 [<000000002de7bac4>] ksys_unshare+0x2fe/0x5c0 kernel/fork.c:2957 [<00000000349b12ba>] __do_sys_unshare kernel/fork.c:3025 [inline] [<00000000349b12ba>] __se_sys_unshare kernel/fork.c:3023 [inline] [<00000000349b12ba>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3023 [<000000006d178ef7>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 [<00000000637076d4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `9ebeddef58` ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record") Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Reported-and-tested-by: syzbot+305326672fed51b205f7@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/161183091692.3506637.3206605651502458810.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 13:12:14 -08:00
Eric Dumazet	bbc20b7042	net: reduce indentation level in sk_clone_lock() Rework initial test to jump over init code if memory allocation has failed. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210127152731.748663-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 11:02:39 -08:00
Aya Levin	e78ab16459	devlink: Add DMAC filter generic packet trap Add packet trap that can report packets that were dropped due to destination MAC filtering. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 19:53:40 -08:00
Jakub Kicinski	5998dd0217	More updates: * many minstrel improvements, including removal of the old minstrel in favour of minstrel_ht * speed improvements on FQ * support for RX decapsulation (header conversion) offload * RTNL reduction: limit RTNL usage in the wireless stack mostly to where really needed (regulatory not yet) to reduce contention on it * various other small updates -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAR1hoACgkQB8qZga/f l8QPVRAAjbF502gkrM2+VjC31xNrSVtAXn5LIsTHUUee1Kwhhe7VHNqrSYtR+poQ jdtBaaszlnkbBqThvIqD6AvbJgUeFhDB9Jcyynhbaq3wPcDiu7IuJikBzkN2Mhyq bzKu46jFPG6l31ItCKF9mToujlmcJrJVoS5mWJEi3QIWc9B3dwPsznO57tCVznmU 95zkhs6tSwgk9ENpDoulXxvpsnlnIzo8f1z0Lr4h9TbzvwzTEA2+by0hmnBk/vZN wjBJXNVMlimp+lmgVms/gQfuGG6NHtmtatqsu+iKCRbO5AjOlvvql2pjkf5yNZXP GD2LkOcM3DiNP1Mf3pP31du1cFBo7nq+P1/ChekmY1A6/xV3GXFsKofGDCSONu1n E5+X+6MFt1gGHAwh8yxmg2HUjpuq6fApnx656HqlYluXqeCRR0hcv0iJ3j9ApAVJ Qwn6cP1rjCXXHsbOdGQi/Vx0xmUjEzNw3+j9x5nC5+zM82joRKxFMqZA8revxRUj gjkiu6EC4ZWWBzTXY9+qITSgEO8b1y7emBgdKSAQplWDAQnaa8nPGBdQ7QmwPdpV vOSZ6Dd4dMKh8Qk7FWPHxYZJKoDz+i0a8g1m9CogC3TO8hNLRt6D37ffTdoOag2m 648kFBBS4SghKAjH6P4Nf6cfAPHpX/afivnSTpRK691cZ+2pajc= =gIbl -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2021-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== More updates: * many minstrel improvements, including removal of the old minstrel in favour of minstrel_ht * speed improvements on FQ * support for RX decapsulation (header conversion) offload * RTNL reduction: limit RTNL usage in the wireless stack mostly to where really needed (regulatory not yet) to reduce contention on it * tag 'mac80211-next-for-net-next-2021-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (24 commits) mac80211: minstrel_ht: fix regression in the max_prob_rate fix virt_wifi: fix deadlock on RTNL cfg80211: avoid holding the RTNL when calling the driver cfg80211: change netdev registration/unregistration semantics mac80211: minstrel_ht: fix rounding error in throughput calculation mac80211: minstrel_ht: increase stats update interval mac80211: minstrel_ht: fix max probability rate selection mac80211: minstrel_ht: improve sample rate selection mac80211: minstrel_ht: improve ampdu length estimation mac80211: minstrel_ht: remove old ewma based rate average code mac80211: remove legacy minstrel rate control mac80211: minstrel_ht: add support for OFDM rates on non-HT clients mac80211: minstrel_ht: clean up CCK code mac80211: introduce aql_enable node in debugfs cfg80211: Add phyrate conversion support for extended MCS in 60GHz band cfg80211: add VHT rate entries for MCS-10 and MCS-11 mac80211: reduce peer HE MCS/NSS to own capabilities mac80211: remove NSS number of 160MHz if not support 160MHz for HE mac80211_hwsim: add 6GHz channels mac80211: add LDPC encoding to ieee80211_parse_tx_radiotap ... ==================== Link: https://lore.kernel.org/r/20210127210915.135550-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 19:01:06 -08:00
Jakub Kicinski	df9d80470a	linux-can-next-for-5.12-20210127 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmARLD8THG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqdlXB/48nQ5I+Z1wnhPvbtvyH4tk9XSbJaTt 4HH+i3R5RUAzHcOmfm2PQHe9/DxiogOQAFv9Lo0t7HN449bM3LMHrhTCcJIrIRf9 VxFSk4H97wjHR0Zj6TlEe++CTUPUalCpkCluERwqYP9WXRRklXL1mju+WNKnMMl0 9fl4CvQDWjB2wNXXoZ1SVuoFxyeqiKQHJy9n3Wez8sQTIlguOZvm8glDQlyb4v+q rSxpCUrlpOVv6/11NqxQ7CfGdfTgLUi1a4greriwf1PjEXvDArXMjpDG3bo0kbgy 7Iv0U9GsvtzOPB+6XKxEFeYTKFaixyLugYBAadfvs0lVEIFP1mtlYvQs =pHI/ -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.12-20210127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2021-01-27 The first two patches are by me and fix typos on the CAN gw protocol and the flexcan driver. The next patch is by Vincent Mailhol and targets the CAN driver infrastructure, it exports the function that converts the CAN state into a human readable string. A patch by me, which target the CAN driver infrastructure, too, makes the calculation in can_fd_len2dlc() more readable. A patch by Tom Rix fixes a checkpatch warning in the mcba_usb driver. The next seven patches target the mcp251xfd driver. Su Yanjun's patch replaces several hardcoded assumptions when calling regmap, by using regmap_get_val_bytes(). The remaining patches are by me. First an open coded check is replaced by an existing helper function, then in the TX path the padding for CAN-FD frames is cleaned up. The next two patches clean up the RTR frame handling in the RX and TX path. Then support for len8_dlc is added. The last patch adds BQL support. * tag 'linux-can-next-for-5.12-20210127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: mcp251xfd: add BQL support can: mcp251xfd: add len8_dlc support can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): don't copy data for RTR CAN frames in TX-path can: mcp251xfd: mcp251xfd_hw_rx_obj_to_skb(): don't copy data for RTR CAN frames in RX-path can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): clean up padding of CAN-FD frames can: mcp251xfd: mcp251xfd_start_xmit(): use mcp251xfd_get_tx_free() to check TX is is full can: mcp251xfd: replace sizeof(u32) with val_bytes in regmap can: mcba_usb: remove h from printk format specifier can: length: can_fd_len2dlc(): make legnth calculation readable again can: dev: export can_get_state_str() function can: flexcan: fix typos can: gw: fix typo ==================== Link: https://lore.kernel.org/r/20210127092227.2775573-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 18:53:10 -08:00
Hoang Huu Le	2a9063b7ff	tipc: remove duplicated code in tipc_msg_create Remove a duplicate code checking for header size in tipc_msg_create() as it's already being done in tipc_msg_init(). Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20210127025123.6390-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 18:50:07 -08:00
Jakub Kicinski	0f764eec3e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Honor stateful expressions defined in the set from the dynset extension. The set definition provides a stateful expression that must be used by the dynset expression in case it is specified. 2) Missing timeout extension in the set element in the dynset extension leads to inconsistent ruleset listing, not allowing the user to restore timeout and expiration on ruleset reload. 3) Do not dump the stateful expression from the dynset extension if it coming from the set definition. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nft_dynset: dump expressions when set definition contains no expressions netfilter: nft_dynset: add timeout extension to template netfilter: nft_dynset: honor stateful expressions in set definition ==================== Link: https://lore.kernel.org/r/20210127132512.5472-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:53:46 -08:00
Nikolay Aleksandrov	2dba407f99	net: bridge: multicast: make tracked EHT hosts limit configurable Add two new port attributes which make EHT hosts limit configurable and export the current number of tracked EHT hosts: - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed. Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've increased it to 40 to have space for two more future entries. v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c, no functional change Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Nikolay Aleksandrov	89268b056e	net: bridge: multicast: add per-port EHT hosts limit Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Vadim Fedorenko	3f96d64497	net: decnet: fix netdev refcount leaking on error path On building the route there is an assumption that the destination could be local. In this case loopback_dev is used to get the address. If the address is still cannot be retrieved dn_route_output_slow returns EADDRNOTAVAIL with loopback_dev reference taken. Cannot find hash for the fixes tag because this code was introduced long time ago. I don't think that this bug has ever fired but the patch is done just to have a consistent code base. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1611619334-20955-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:33:46 -08:00
Masahiro Yamada	864e898ba3	net: remove redundant 'depends on NET' These Kconfig files are included from net/Kconfig, inside the if NET ... endif. Remove 'depends on NET', which we know it is already met. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125232026.106855-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:04:12 -08:00
Masahiro Yamada	d32f834cd6	net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile CONFIG_NET_L3_MASTER_DEV is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_NET_L3_MASTER_DEV) form. Use obj-y in net/l3mdev/Makefile because Kbuild visits this Makefile only when CONFIG_NET_L3_MASTER_DEV=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-4-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	0cfd99b487	net: switchdev: use obj-$(CONFIG_NET_SWITCHDEV) form in net/Makefile CONFIG_NET_SWITCHDEV is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_NET_SWITCHDEV) form. Use obj-y in net/switchdev/Makefile because Kbuild visits this Makefile only when CONFIG_NET_SWITCHDEV=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-3-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	1e328ed559	net: dcb: use obj-$(CONFIG_DCB) form in net/Makefile CONFIG_DCB is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_DCB) form. Use obj-y in net/dcb/Makefile because Kbuild visits this Makefile only when CONFIG_DCB=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-2-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	8b5f4eb3ab	net: move CONFIG_NET guard to top Makefile When CONFIG_NET is disabled, nothing under the net/ directory is compiled. Move the CONFIG_NET guard to the top Makefile so the net/ directory is entirely skipped. When Kbuild visits net/Makefile, CONFIG_NET is obvioulsy 'y' because CONFIG_NET is a bool option. Clean up net/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	69783429cd	net: sysctl: remove redundant #ifdef CONFIG_NET CONFIG_NET is a bool option, and this file is compiled only when CONFIG_NET=y. Remove #ifdef CONFIG_NET, which we know it is always met. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231421.105936-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:02:43 -08:00
Matthieu Baerts	1f2f1931b2	mptcp: pm nl: reduce variable scope To avoid confusions like when working on the previous patch, better to declare and assign this variable only where it is needed. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:55 -08:00
Matthieu Baerts	7b9b0f7e12	mptcp: pm nl: support IPv4 mapped in v6 addresses On one side, we can allow the creation of subflows between v4 mapped in v6 and v4 addresses. For that we look for v4mapped addresses between the local address we want to select and the remote one. On the other side, we also properly deal with received v4mapped addresses, either announced ones or set via Netlink. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/122 Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:52 -08:00
Matthieu Baerts	50a13bc394	mptcp: support MPJoin with IPv4 mapped in v6 sk With an IPv4 mapped in v6 socket, we were trying to call inet6_bind() with an IPv4 address resulting in a -EINVAL error because the given addr_len -- size of the address structure -- was too short. We now make sure to use address structures for the same family as the MPTCP socket for both the bind() and the connect(). It means we convert v4 addresses to v4 mapped in v6 or the opposite if needed. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/122 Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:47 -08:00
Di Zhu	275b1e88ca	pktgen: fix misuse of BUG_ON() in pktgen_thread_worker() pktgen create threads for all online cpus and bond these threads to relevant cpu repecivtily. when this thread firstly be woken up, it will compare cpu currently running with the cpu specified at the time of creation and if the two cpus are not equal, BUG_ON() will take effect causing panic on the system. Notice that these threads could be migrated to other cpus before start running because of the cpu hotplug after these threads have created. so the BUG_ON() used here seems unreasonable and we can replace it with WARN_ON() to just printf a warning other than panic the system. Signed-off-by: Di Zhu <zhudi21@huawei.com> Link: https://lore.kernel.org/r/20210125124229.19334-1-zhudi21@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:46:37 -08:00
Rasmus Villemoes	20776b465c	net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP It's not true that switchdev_port_obj_notify() only inspects the ->handled field of "struct switchdev_notifier_port_obj_info" if call_switchdev_blocking_notifiers() returns 0 - there's a WARN_ON() triggering for a non-zero return combined with ->handled not being true. But the real problem here is that -EOPNOTSUPP is not being properly handled. The wrapper functions switchdev_handle_port_obj_add() et al change a return value of -EOPNOTSUPP to 0, and the treatment of ->handled in switchdev_port_obj_notify() seems to be designed to change that back to -EOPNOTSUPP in case nobody actually acted on the notifier (i.e., everybody returned -EOPNOTSUPP). Currently, as soon as some device down the stack passes the check_cb() check, ->handled gets set to true, which means that switchdev_port_obj_notify() cannot actually ever return -EOPNOTSUPP. This, for example, means that the detection of hardware offload support in the MRP code is broken: switchdev_port_obj_add() used by br_mrp_switchdev_send_ring_test() always returns 0, so since the MRP code thinks the generation of MRP test frames has been offloaded, no such frames are actually put on the wire. Similarly, br_mrp_switchdev_set_ring_role() also always returns 0, causing mrp->ring_role_offloaded to be set to 1. To fix this, continue to set ->handled true if any callback returns success or any error distinct from -EOPNOTSUPP. But if all the callbacks return -EOPNOTSUPP, make sure that ->handled stays false, so the logic in switchdev_port_obj_notify() can propagate that information. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: `f30f0601eb` ("switchdev: Add helpers to aid traversal through lower devices") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Link: https://lore.kernel.org/r/20210125124116.102928-1-rasmus.villemoes@prevas.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:41:59 -08:00
Felix Fietkau	d3b9b45f7e	mac80211: minstrel_ht: fix regression in the max_prob_rate fix Since mi->max_prob_rate is overwritten after the loop that calls minstrel_ht_set_best_prob_rate, the new best rate needs to be written to *dest Fixes: `a7fca4e403` ("mac80211: minstrel_ht: fix max probability rate selection") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210126154409.6755-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-27 22:06:38 +01:00
Marc Kleine-Budde	12da7a1f3c	can: gw: fix typo This patch fixes a typo found by codespell. Fixes: `94c23097f9` ("can: gw: support modification of Classical CAN DLCs") Link: https://lore.kernel.org/r/20210127085529.2768537-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-27 10:01:46 +01:00
Praveen Chaudhary	6b2e04bc24	net: allow user to set metric on default route learned via Router Advertisement For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This patch adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces: auto eth0 iface eth0 inet dhcp metric 4261413864 IPv4 Kernel Route Table: $ ip route list default via 172.21.47.1 dev eth0 metric 4261413864 FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.] Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m i.e. User can prefer Default Router learned via Routing Protocol in IPv4. Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705 IP monitor: [When IPv6 RA is received] default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high Kernel IPv6 routing table $ ip -6 route list default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.] Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m If the metric is changed later, the effect will be seen only when next IPv6 RA is received, because the default route must be fully controlled by RA msg. Below metric is changed from 1996489705 to 1996489704. $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704 net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704 IP monitor: [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric] Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 18:39:45 -08:00
Xie He	b491e6a739	net: lapb: Add locking to the lapb module In the lapb module, the timers may run concurrently with other code in this module, and there is currently no locking to prevent the code from racing on "struct lapb_cb". This patch adds locking to prevent racing. 1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and "spin_unlock_bh" to APIs, timer functions and notifier functions. 2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us able to ask running timers to abort; Modify "lapb_stop_t1timer" and "lapb_stop_t2timer" to make them able to abort running timers; Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer", and "lapb_start_t1timer", "lapb_start_t2timer". 3. Let lapb_unregister wait for other API functions and running timers to stop. 4. The lapb_device_event function calls lapb_disconnect_request. In order to avoid trying to hold the lock twice, add a new function named "__lapb_disconnect_request" which assumes the lock is held, and make it called by lapb_disconnect_request and lapb_device_event. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20210126040939.69995-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 17:53:45 -08:00
Nikolay Aleksandrov	3e841bacf7	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: `baa74d39ca` ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 16:26:50 -08:00
Jakub Kicinski	c5e9e8d48a	A couple of fixes: * fix 160 MHz channel switch in mac80211 * fix a staging driver to not deadlock due to some recent cfg80211 changes * fix NULL-ptr deref if cfg80211 returns -EINPROGRESS to wext (syzbot) * pause TX in mac80211 in type change to prevent crashes (syzbot) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAQEXcACgkQB8qZga/f l8SYWg//X01WgOAAu8mTbnXGdnHF/ZTssp+/eenmoc/XroJgMmvbt4b/gfSt0xMF kEd3kPBPtp6Ttgr4HQAyfFctmtXcgq9qo6L8wBWy8WeDHBjaOMLWBLMlZ3m3/abH oU8LvDNO0FYCf5qCIzmpFJYXlEsyv22XOi9GYx62ux5g7DbWO6C1Me3dn7im+IiW +/P7Qy5lSDpww9F89wMfkwP4dDNcycgKkeRn6IEH1gJxzF6x6GFjuTvrr0finBnO 9m/zgZCbBm0VIdFcySiJUiWqyhDgdfa3dfcOtNwegkH8mZUZ+h8sEG67Ku4Hp56L Js6AmOw4TA9kSs11nOWOZm9609kk8s8ucy88A8v7ct6O9FkxvzcK6VLo3GCGTBPE 8BsPJoL9OXvziT/vwuGPJDruO3Vv6yzoDf8cSiiMGCCOe5TPioucDSbU8QV4N02Q OJ8NcgRtxIo0GC6DD8xOvafWVvc6/byruBpe1x7SkkgfK0NIaMKgWh2g8Ebi3RpE H1X/2FpsZ7AgKDs2yfJs4oFOwL72o8WezH6jTM4AFoGOEtxPnrYaG1+R35y6R9Vp DwPRo9bTyVb4rpnoOv/FYMoBaKpP7qQDyGlwvGeqUlAW8Yj+H+MmxVUv5JUIj4Cu Zdy0bFq27dntJ+/jnoryFfU/afyOdkANAu9cmlx+vXqr1etLvww= =g3wj -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of fixes: * fix 160 MHz channel switch in mac80211 * fix a staging driver to not deadlock due to some recent cfg80211 changes * fix NULL-ptr deref if cfg80211 returns -EINPROGRESS to wext (syzbot) * pause TX in mac80211 in type change to prevent crashes (syzbot) * tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: staging: rtl8723bs: fix wireless regulatory API misuse mac80211: pause TX while changing interface type wext: fix NULL-ptr-dereference with cfg80211's lack of commit() mac80211: 160MHz with extended NSS BW in CSA ==================== Link: https://lore.kernel.org/r/20210126130529.75225-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 15:23:18 -08:00
Johannes Berg	054c9939b4	mac80211: pause TX while changing interface type syzbot reported a crash that happened when changing the interface type around a lot, and while it might have been easy to fix just the symptom there, a little deeper investigation found that really the reason is that we allowed packets to be transmitted while in the middle of changing the interface type. Disallow TX by stopping the queues while changing the type. Fixes: `34d4bc4d41` ("mac80211: support runtime interface type changes") Reported-by: syzbot+d7a3b15976bf7de2238a@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20210122171115.b321f98f4d4f.I6997841933c17b093535c31d29355be3c0c39628@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-26 11:59:45 +01:00
Johannes Berg	5122565188	wext: fix NULL-ptr-dereference with cfg80211's lack of commit() Since cfg80211 doesn't implement commit, we never really cared about that code there (and it's configured out w/o CONFIG_WIRELESS_EXT). After all, since it has no commit, it shouldn't return -EIWCOMMIT to indicate commit is needed. However, EIWCOMMIT is actually an alias for EINPROGRESS, which _can_ happen if e.g. we try to change the frequency but we're already in the process of connecting to some network, and drivers could return that value (or even cfg80211 itself might). This then causes us to crash because dev->wireless_handlers is NULL but we try to check dev->wireless_handlers->standard[0]. Fix this by also checking dev->wireless_handlers. Also simplify the code a little bit. Cc: stable@vger.kernel.org Reported-by: syzbot+444248c79e117bc99f46@syzkaller.appspotmail.com Reported-by: syzbot+8b2a88a09653d4084179@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20210121171621.2076e4a37d5a.I5d9c72220fe7bb133fb718751da0180a57ecba4e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-26 11:59:42 +01:00
Johannes Berg	a05829a722	cfg80211: avoid holding the RTNL when calling the driver Currently, _everything_ in cfg80211 holds the RTNL, and if you have a slow USB device (or a few) you can get some bad lock contention on that. Fix that by re-adding a mutex to each wiphy/rdev as we had at some point, so we have locking for the wireless_dev lists and all the other things in there, and also so that drivers still don't have to worry too much about it (they still won't get parallel calls for a single device). Then, we can restrict the RTNL to a few cases where we add or remove interfaces and really need the added protection. Some of the global list management still also uses the RTNL, since we need to have it anyway for netdev management, but we only hold the RTNL for very short periods of time here. Link: https://lore.kernel.org/r/20210122161942.81df9f5e047a.I4a8e1a60b18863ea8c5e6d3a0faeafb2d45b2f40@changeid Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> [marvell driver issues] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-26 11:55:50 +01:00
Jiapeng Zhong	8d21c882ab	bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:07 -08:00
Pengcheng Yang	62d9f1a694	tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN Upon receiving a cumulative ACK that changes the congestion state from Disorder to Open, the TLP timer is not set. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this is that the TLP timer is set before the congestion state changes in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. This commit has two additional benefits: 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to avoid spurious RTO caused by RTO timer early expires. 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder timer is set. Fixes: `df92c8394e` ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.com Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 21:33:01 -08:00
Alexander Lobakin	36707061d6	udp: allow forwarding of plain (non-fraglisted) UDP GRO packets Commit `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") actually not only added a support for fraglisted UDP GRO, but also tweaked some logics the way that non-fraglisted UDP GRO started to work for forwarding too. Commit `2e4ef10f58` ("net: add GSO UDP L4 and GSO fraglists to the list of software-backed types") added GSO UDP L4 to the list of software GSO to allow virtual netdevs to forward them as is up to the real drivers. Tests showed that currently forwarding and NATing of plain UDP GRO packets are performed fully correctly, regardless if the target netdevice has a support for hardware/driver GSO UDP L4 or not. Add the last element and allow to form plain UDP GRO packets if we are on forwarding path, and the new NETIF_F_GRO_UDP_FWD is enabled on a receiving netdevice. If both NETIF_F_GRO_FRAGLIST and NETIF_F_GRO_UDP_FWD are set, fraglisted GRO takes precedence. This keeps the current behaviour and is generally more optimal for now, as the number of NICs with hardware USO offload is relatively small. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:18:16 -08:00
Alexander Lobakin	6f1c0ea133	net: introduce a netdev feature for UDP GRO forwarding Introduce a new netdev feature, NETIF_F_GRO_UDP_FWD, to allow user to turn UDP GRO on and off for forwarding. Defaults to off to not change current datapath. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:16:24 -08:00
Enke Chen	344db93ae3	tcp: make TCP_USER_TIMEOUT accurate for zero window probes The TCP_USER_TIMEOUT is checked by the 0-window probe timer. As the timer has backoff with a max interval of about two minutes, the actual timeout for TCP_USER_TIMEOUT can be off by up to two minutes. In this patch the TCP_USER_TIMEOUT is made more accurate by taking it into account when computing the timer value for the 0-window probes. This patch is similar to and builds on top of the one that made TCP_USER_TIMEOUT accurate for RTOs in commit `b701a99e43` ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy"). Fixes: `9721e709fa` ("tcp: simplify window probe aborting on USER_TIMEOUT") Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210122191306.GA99540@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 19:32:51 -08:00
Pan Bian	3a30537cee	NFC: fix resource leak when target index is invalid Goto to the label put_dev instead of the label error to fix potential resource leak on path that the target index is invalid. Fixes: `c4fbb6515a` ("NFC: The core part should generate the target index") Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210121152748.98409-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:34:35 -08:00
Pan Bian	d8f923c3ab	NFC: fix possible resource leak Put the device to avoid resource leak on path that the polling flag is invalid. Fixes: `a831b91320` ("NFC: Do not return EBUSY when stopping a poll that's already stopped") Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210121153745.122184-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:34:31 -08:00
Rasmus Villemoes	6781939054	net: mrp: move struct definitions out of uapi None of these are actually used in the kernel/userspace interface - there's a userspace component of implementing MRP, and userspace will need to construct certain frames to put on the wire, but there's no reason the kernel should provide the relevant definitions in a UAPI header. In fact, some of those definitions were broken until previous commit, so only keep the few that are actually referenced in the kernel code, and move them to the br_private_mrp.h header. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 12:38:42 -08:00
Danielle Ratson	321f7ab0d4	mlxsw: Register physical ports as a devlink resource The switch ASIC has a limited capacity of physical ('flavour physical' in devlink terminology) ports that it can support. While each system is brought up with a different number of ports, this number can be increased via splitting up to the ASIC's limit. Expose physical ports as a devlink resource so that user space will have visibility to the maximum number of ports that can be supported and the current occupancy. In addition, add a "Generic Resources" section in devlink-resource documentation so the different drivers will be aligned by the same resource name when exposing to user space. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:42:13 -08:00
Maxim Mikityanskiy	8327158624	sch_htb: Stats for offloaded HTB This commit adds support for statistics of offloaded HTB. Bytes and packets counters for leaf and inner nodes are supported, the values are taken from per-queue qdiscs, and the numbers that the user sees should have the same behavior as the software (non-offloaded) HTB. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	d03b195b5a	sch_htb: Hierarchical QoS hardware offload HTB doesn't scale well because of contention on a single lock, and it also consumes CPU. This patch adds support for offloading HTB to hardware that supports hierarchical rate limiting. In the offload mode, HTB passes control commands to the driver using ndo_setup_tc. The driver has to replicate the whole hierarchy of classes and their settings (rate, ceil) in the NIC. Every modification of the HTB tree caused by the admin results in ndo_setup_tc being called. After this setup, the HTB algorithm is done completely in the NIC. An SQ (send queue) is created for every leaf class and attached to the hierarchy, so that the NIC can calculate and obey aggregated rate limits, too. In the future, it can be changed, so that multiple SQs will back a single leaf class. ndo_select_queue is responsible for selecting the right queue that serves the traffic class of each packet. The data path works as follows: a packet is classified by clsact, the driver selects a hardware queue according to its class, and the packet is enqueued into this queue's qdisc. This solution addresses two main problems of scaling HTB: 1. Contention by flow classification. Currently the filters are attached to the HTB instance as follows: # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10 It's possible to move classification to clsact egress hook, which is thread-safe and lock-free: # tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10 This way classification still happens in software, but the lock contention is eliminated, and it happens before selecting the TX queue, allowing the driver to translate the class to the corresponding hardware queue in ndo_select_queue. Note that this is already compatible with non-offloaded HTB and doesn't require changes to the kernel nor iproute2. 2. Contention by handling packets. HTB is not multi-queue, it attaches to a whole net device, and handling of all packets takes the same lock. When HTB is offloaded, it registers itself as a multi-queue qdisc, similarly to mq: HTB is attached to the netdev, and each queue has its own qdisc. Some features of HTB may be not supported by some particular hardware, for example, the maximum number of classes may be limited, the granularity of rate and ceil parameters may be different, etc. - so, the offload is not enabled by default, a new parameter is used to enable it: # tc qdisc replace dev eth0 root handle 1: htb offload Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00

1 2 3 4 5 ...

63792 Commits