linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 08:14:15 +08:00

History

Yunsheng Lin c4fef01ba4 net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK flag set, but queue discipline by-pass does not work for lockless qdisc because skb is always enqueued to qdisc even when the qdisc is empty, see __dev_xmit_skb(). This patch calls sch_direct_xmit() to transmit the skb directly to the driver for empty lockless qdisc, which aviod enqueuing and dequeuing operation. As qdisc->empty is not reliable to indicate a empty qdisc because there is a time window between enqueuing and setting qdisc->empty. So we use the MISSED state added in commit `a90c57f2ce` ("net: sched: fix packet stuck problem for lockless qdisc"), which indicate there is lock contention, suggesting that it is better not to do the qdisc bypass in order to avoid packet out of order problem. In order to make MISSED state reliable to indicate a empty qdisc, we need to ensure that testing and clearing of MISSED state is within the protection of qdisc->seqlock, only setting MISSED state can be done without the protection of qdisc->seqlock. A MISSED state testing is added without the protection of qdisc->seqlock to aviod doing unnecessary spin_trylock() for contention case. As the enqueuing is not within the protection of qdisc->seqlock, there is still a potential data race as mentioned by Jakub [1]: thread1 thread2 thread3 qdisc_run_begin() # true qdisc_run_begin(q) set(MISSED) pfifo_fast_dequeue clear(MISSED) # recheck the queue qdisc_run_end() enqueue skb1 qdisc empty # true qdisc_run_begin() # true sch_direct_xmit() # skb2 qdisc_run_begin() set(MISSED) When above happens, skb1 enqueued by thread2 is transmited after skb2 is transmited by thread3 because MISSED state setting and enqueuing is not under the qdisc->seqlock. If qdisc bypass is disabled, skb1 has better chance to be transmited quicker than skb2. This patch does not take care of the above data race, because we view this as similar as below: Even at the same time CPU1 and CPU2 write the skb to two socket which both heading to the same qdisc, there is no guarantee that which skb will hit the qdisc first, because there is a lot of factor like interrupt/softirq/cache miss/scheduling afffecting that. There are below cases that need special handling: 1. When MISSED state is cleared before another round of dequeuing in pfifo_fast_dequeue(), and __qdisc_run() might not be able to dequeue all skb in one round and call __netif_schedule(), which might result in a non-empty qdisc without MISSED set. In order to avoid this, the MISSED state is set for lockless qdisc and __netif_schedule() will be called at the end of qdisc_run_end. 2. The MISSED state also need to be set for lockless qdisc instead of calling __netif_schedule() directly when requeuing a skb for a similar reason. 3. For netdev queue stopped case, the MISSED case need clearing while the netdev queue is stopped, otherwise there may be unnecessary __netif_schedule() calling. So a new DRAINING state is added to indicate this case, which also indicate a non-empty qdisc. 4. As there is already netif_xmit_frozen_or_stopped() checking in dequeue_skb() and sch_direct_xmit(), which are both within the protection of qdisc->seqlock, but the same checking in __dev_xmit_skb() is without the protection, which might cause empty indication of a lockless qdisc to be not reliable. So remove the checking in __dev_xmit_skb(), and the checking in the protection of qdisc->seqlock seems enough to avoid the cpu consumption problem for netdev queue stopped case. 1. https://lkml.org/lkml/2021/5/29/215 Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2021-06-23 12:17:35 -07:00
..
6lowpan	6lowpan: Fix some typos in nhc_udp.c	2021-03-24 17:52:11 -07:00
9p	9p/trans_virtio: Fix spelling mistakes	2021-06-02 14:01:55 -07:00
802
8021q	net: vlan: pass thru all GSO_SOFTWARE in hw_enc_features	2021-06-18 11:58:03 -07:00
appletalk	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
atm	atm: Use list_for_each_entry() to simplify code in resources.c	2021-06-10 14:08:09 -07:00
ax25	net/ax25: Delete obsolete TODO file	2021-03-30 16:54:50 -07:00
batman-adv	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
bluetooth	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
bpf	bpf: Prepare bpf syscall to be used from kernel and user space.	2021-05-19 00:33:40 +02:00
bpfilter	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
bridge	bridge: cfm: remove redundant return	2021-06-22 10:35:15 -07:00
caif	net: caif: modify the label out_err to out	2021-06-18 12:07:09 -07:00
can	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
ceph	libceph: Fix spelling mistakes	2021-06-03 13:24:23 -07:00
core	net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc	2021-06-23 12:17:35 -07:00
dcb	net: dcb: Return the correct errno code	2021-06-01 17:01:33 -07:00
dccp	dccp: tfrc: fix doc warnings in tfrc_equation.c	2021-06-10 14:08:49 -07:00
decnet	decnet: Fix spelling mistakes	2021-06-02 14:01:55 -07:00
dns_resolver	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
dsa	net: dsa: remove cross-chip support from the MRP notifiers	2021-06-21 12:50:20 -07:00
ethernet	of: net: pass the dst buffer to of_get_mac_address()	2021-04-13 14:35:02 -07:00
ethtool	ethtool: Validate module EEPROM offset as part of policy	2021-06-22 10:40:54 -07:00
hsr	net: hsr: don't check sequence number if tag removal is offloaded	2021-06-16 12:13:01 -07:00
ieee802154	ieee802154: fix error return code in ieee802154_llsec_getparams()	2021-06-03 10:59:49 +02:00
ife	net: remove redundant 'depends on NET'	2021-01-27 17:04:12 -08:00
ipv4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
ipv6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
iucv	net/af_iucv: clean up some forward declarations	2021-06-12 13:06:33 -07:00
kcm	revert "net: kcm: fix memory leak in kcm_sendmsg"	2021-06-07 13:34:37 -07:00
key
l2tp	l2tp: Fix spelling mistakes	2021-06-07 14:08:30 -07:00
l3mdev	l3mdev: Correct function names in the kerneldoc comments	2021-03-28 17:56:55 -07:00
lapb	net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c	2021-06-08 16:31:25 -07:00
llc	llc2: Remove redundant assignment to rc	2021-04-27 14:16:14 -07:00
mac80211	mac80211: handle various extensible elements correctly	2021-06-18 13:25:49 +02:00
mac802154	net: mac802154: Fix general protection fault	2021-04-06 22:42:16 +02:00
mpls	mpls: Remove redundant assignment to err	2021-04-27 14:17:00 -07:00
mptcp	mptcp: refine mptcp_cleanup_rbuf	2021-06-22 14:36:01 -07:00
ncsi	net/ncsi: Fix spelling mistakes	2021-06-07 14:08:30 -07:00
netfilter	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
netlabel	netlabel: Fix memory leak in netlbl_mgmt_add_common	2021-06-15 11:19:04 -07:00
netlink	netlink: disable IRQs for netlink_lock_table()	2021-05-17 15:31:03 -07:00
netrom	net: netrom: nr_in: Remove redundant assignment to ns	2021-04-28 13:59:08 -07:00
nfc	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	2021-06-07 13:01:52 -07:00
nsh
openvswitch	openvswitch: add trace points	2021-06-22 10:47:32 -07:00
packet	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
phonet
psample	psample: Add additional metadata attributes	2021-03-14 15:00:43 -07:00
qrtr	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
rds	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-06-18 19:47:02 -07:00
rfkill	Another set of updates, all over the map:	2021-04-20 16:44:04 -07:00
rose	net: rose: Fix fall-through warnings for Clang	2021-03-10 12:45:15 -08:00
rxrpc	rxrpc: Fix a typo	2021-06-02 14:01:55 -07:00
sched	net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc	2021-06-23 12:17:35 -07:00
sctp	sctp: process sctp over udp icmp err on sctp side	2021-06-22 11:28:52 -07:00
smc	net/smc: Fix ENODATA tests in smc_nl_get_fback_stats()	2021-06-21 12:16:58 -07:00
strparser
sunrpc	xprtrdma: Revert `586a0787ce`	2021-05-27 08:46:19 -04:00
switchdev	net: bridge: propagate extack through switchdev_port_attr_set	2021-02-14 17:38:11 -08:00
tipc	tipc:subscr.c: fix a spelling mistake	2021-06-10 13:48:43 -07:00
tls	skbuff: add a parameter to __skb_frag_unref	2021-06-07 14:11:47 -07:00
unix	__unix_find_socket_byname(): don't pass hash and type separately	2021-06-21 12:28:49 -07:00
vmw_vsock	virtio/vsock: avoid NULL deref in virtio_transport_seqpacket_allow()	2021-06-22 09:49:37 -07:00
wireless	cfg80211: avoid double free of PMSR request	2021-06-18 13:25:24 +02:00
x25	net: x25: Use list_for_each_entry() to simplify code in x25_route.c	2021-06-10 14:08:09 -07:00
xdp	xdp: Extend xdp_redirect_map with broadcast support	2021-05-26 09:46:16 +02:00
xfrm	xfrm: ipcomp: remove unnecessary get_cpu()	2021-04-19 12:49:29 +02:00
compat.c	net: Return the correct errno code	2021-06-03 15:13:56 -07:00
devres.c	net: devres: Correct a grammatical error	2021-06-11 12:55:28 -07:00
Kconfig	bpf, kconfig: Add consolidated menu entry for bpf with core options	2021-05-11 13:56:16 -07:00
Makefile	net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile	2021-01-27 17:03:52 -08:00
socket.c	net: add pf_family_names[] for protocol family	2021-06-21 14:41:54 -07:00
sysctl_net.c	net: Ensure net namespace isolation of sysctls	2021-04-12 13:27:11 -07:00