linux/net
Yunsheng Lin 102b55ee92 net: sched: fix tx action rescheduling issue during deactivation
Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:

CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
          .                   .                     .
          .            set STATE_MISSED             .
          .           __netif_schedule()            .
          .                   .           set STATE_DEACTIVATED
          .                   .                qdisc_reset()
          .                   .                     .
          .<---------------   .              synchronize_net()
clear __QDISC_STATE_SCHED  |  .                     .
          .                |  .                     .
          .                |  .            some_qdisc_is_busy()
          .                |  .               return *false*
          .                |  .                     .
  test STATE_DEACTIVATED   |  .                     .
__qdisc_run() *not* called |  .                     .
          .                |  .                     .
   test STATE_MISS         |  .                     .
 __netif_schedule()--------|  .                     .
          .                   .                     .
          .                   .                     .

__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.

qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.

So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().

The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed86 ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().

After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
    avoid the endless rescheduling problem, but there may still
    be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:05:46 -07:00
..
6lowpan 6lowpan: Fix some typos in nhc_udp.c 2021-03-24 17:52:11 -07:00
9p net: 9p: Correct function names in the kerneldoc comments 2021-03-28 17:56:56 -07:00
802
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-26 12:00:00 -07:00
appletalk appletalk: Fix skb allocation size in loopback case 2021-02-12 16:40:28 -08:00
atm net: atm: pppoatm: use new API for wakeup tasklet 2021-01-29 18:24:05 -08:00
ax25 net/ax25: Delete obsolete TODO file 2021-03-30 16:54:50 -07:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-09 20:48:35 -07:00
bluetooth Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
bpf bpf: selftests: Add kfunc_call test 2021-03-26 20:41:52 -07:00
bpfilter net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
bridge bridge: Fix possible races between assigning rx_handler_data and setting IFF_BRIDGE_PORT bit 2021-04-29 15:33:17 -07:00
caif net: caif: Use netif_rx_any_context(). 2021-02-15 13:21:48 -08:00
can can: isotp: prevent race between isotp_bind() and isotp_setsockopt() 2021-05-12 08:52:47 +02:00
ceph Notable items here are a series to take advantage of David Howells' 2021-05-06 10:27:02 -07:00
core net: sched: fix tx action rescheduling issue during deactivation 2021-05-14 15:05:46 -07:00
dcb net: dcb: use obj-$(CONFIG_DCB) form in net/Makefile 2021-01-27 17:03:52 -08:00
dccp net: dccp: use net_generic storage 2021-04-09 16:34:56 -07:00
decnet net/decnet: Delete obsolete TODO file 2021-03-30 16:54:50 -07:00
dns_resolver net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
dsa net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count 2021-05-10 14:36:59 -07:00
ethernet of: net: pass the dst buffer to of_get_mac_address() 2021-04-13 14:35:02 -07:00
ethtool ethtool: fix missing NLM_F_MULTI flag when dumping 2021-05-05 12:41:10 -07:00
hsr net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info 2021-05-03 13:33:54 -07:00
ieee802154 net: remove the new_ifindex argument from dev_change_net_namespace 2021-04-07 14:43:28 -07:00
ife net: remove redundant 'depends on NET' 2021-01-27 17:04:12 -08:00
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2021-05-11 16:05:56 -07:00
ipv6 net: Remove redundant assignment to err 2021-04-29 15:34:15 -07:00
iucv iucv: af_iucv.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
kcm kcm: kcmsock.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
key af_key: relax availability checks for skb size calculation 2021-01-04 10:05:50 +01:00
l2tp net: fix a concurrency bug in l2tp_tunnel_register() 2021-04-27 14:23:13 -07:00
l3mdev l3mdev: Correct function names in the kerneldoc comments 2021-03-28 17:56:55 -07:00
lapb net: lapb: Make "lapb_t1timer_running" able to detect an already running timer 2021-03-23 14:14:50 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: extend protection against mixed key and fragment cache attacks 2021-05-11 20:14:50 +02:00
mac802154 net: mac802154: Fix general protection fault 2021-04-06 22:42:16 +02:00
mpls mpls: Remove redundant assignment to err 2021-04-27 14:17:00 -07:00
mptcp mptcp: fix data stream corruption 2021-05-11 16:19:17 -07:00
ncsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-09 20:48:35 -07:00
netfilter netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version 2021-05-14 01:42:52 +02:00
netlabel Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
netlink netlink: don't call ->netlink_bind with table lock held 2021-04-16 17:01:04 -07:00
netrom net: netrom: nr_in: Remove redundant assignment to ns 2021-04-28 13:59:08 -07:00
nfc net/nfc/rawsock.c: fix a permission check bug 2021-05-10 14:21:02 -07:00
nsh
openvswitch openvswitch: meter: fix race when getting now_ms. 2021-05-13 15:54:59 -07:00
packet net: packetmmap: fix only tx timestamp on request 2021-05-12 14:00:04 -07:00
phonet
psample psample: Add additional metadata attributes 2021-03-14 15:00:43 -07:00
qrtr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-04-26 12:00:00 -07:00
rds RDMA merge window pull request 2021-05-01 09:15:05 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose net: rose: Fix fall-through warnings for Clang 2021-03-10 12:45:15 -08:00
rxrpc Networking changes for 5.13. 2021-04-29 11:57:23 -07:00
sched net: sched: fix tx action rescheduling issue during deactivation 2021-05-14 15:05:46 -07:00
sctp sctp: delay auto_asconf init until binding the first addr 2021-05-03 13:36:21 -07:00
smc smc: disallow TCP_ULP in smc_setsockopt() 2021-05-05 12:52:45 -07:00
strparser
sunrpc NFS client updates for Linux 5.13 2021-05-07 11:23:41 -07:00
switchdev net: bridge: propagate extack through switchdev_port_attr_set 2021-02-14 17:38:11 -08:00
tipc Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv" 2021-05-14 15:01:58 -07:00
tls tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT 2021-05-14 15:03:25 -07:00
unix af_unix: handle idmapped mounts 2021-01-24 14:27:18 +01:00
vmw_vsock vsock/vmci: Remove redundant assignment to err 2021-04-30 15:00:59 -07:00
wireless cfg80211: mitigate A-MSDU aggregation attacks 2021-05-11 20:13:13 +02:00
x25 af_x25.c: Fix a spello 2021-03-28 17:31:13 -07:00
xdp xsk: Fix for xp_aligned_validate_desc() when len == chunk_size 2021-05-04 00:28:06 +02:00
xfrm xfrm: ipcomp: remove unnecessary get_cpu() 2021-04-19 12:49:29 +02:00
compat.c
devres.c
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile 2021-01-27 17:03:52 -08:00
socket.c net: Fix a misspell in socket.c 2021-03-25 16:56:27 -07:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00