linux/net/sched
Eric Dumazet 95ecba62e2 net: fix races in netdev_tx_sent_queue()/dev_watchdog()
Some workloads hit the infamous dev_watchdog() message:

"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"

It seems possible to hit this even for perfectly normal
BQL enabled drivers:

1) Assume a TX queue was idle for more than dev->watchdog_timeo
   (5 seconds unless changed by the driver)

2) Assume a big packet is sent, exceeding current BQL limit.

3) Driver ndo_start_xmit() puts the packet in TX ring,
   and netdev_tx_sent_queue() is called.

4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
   before txq->trans_start has been written.

5) txq->trans_start is written later, from netdev_start_xmit()

    if (rc == NETDEV_TX_OK)
          txq_trans_update(txq)

dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.

To solve the issue, write txq->trans_start right before one XOFF bit
is set :

- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()

From dev_watchdog(), we have to read txq->state before txq->trans_start.

Add memory barriers to enforce correct ordering.

In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.

Fixes: bec251bc8b ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21 12:54:25 +02:00
..
act_api.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-06-20 13:49:59 -07:00
act_bpf.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
act_connmark.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_csum.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_ct.c sched: act_ct: avoid -Wflex-array-member-not-at-end warning 2024-08-12 17:54:24 -07:00
act_ctinfo.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_gact.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_gate.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_ife.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-02-22 15:29:26 -08:00
act_mpls.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_nat.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_pedit.c net: sched: Annotate struct tc_pedit with __counted_by 2024-02-19 10:58:24 +00:00
act_police.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_sample.c net: sched: act_sample: add action cookie to sample 2024-07-05 17:45:47 -07:00
act_simple.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_skbedit.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
act_skbmod.c net/sched: act_skbmod: convert comma to semicolon 2024-07-11 17:12:15 -07:00
act_tunnel_key.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
act_vlan.c tc: adjust network header after 2nd vlan push 2024-08-27 11:37:42 +02:00
cls_api.c net: sched: cls_api: fix slab-use-after-free in fl_dump_key 2024-04-10 08:28:26 +01:00
cls_basic.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_bpf.c net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-05-23 14:14:23 -07:00
cls_cgroup.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_flow.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_flower.c net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK 2024-07-15 09:14:39 -07:00
cls_fw.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_matchall.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_route.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
cls_u32.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
em_canid.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_cmp.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_nbyte.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_text.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
em_u32.c net: fill in MODULE_DESCRIPTION()s for net/sched 2024-02-09 14:12:02 -08:00
ematch.c net_sched: reject TCF_EM_SIMPLE case for complex ematch module 2022-12-19 09:43:18 +00:00
Kconfig net: sched: Remove NET_ACT_IPT from Kconfig 2024-02-13 11:24:35 +01:00
Makefile net/sched: Retire ipt action 2024-01-02 12:41:16 +00:00
sch_api.c net/sched: accept TCA_STAB only for root qdisc 2024-10-08 15:38:56 -07:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c sch_cake: constify inverse square root cache 2024-09-10 18:31:52 -07:00
sch_cbs.c net_sched: sch_cbs: implement lockless cbs_dump() 2024-04-19 11:34:07 +01:00
sch_choke.c net_sched: sch_choke: implement lockless choke_dump() 2024-04-19 11:34:07 +01:00
sch_codel.c net_sched: sch_codel: implement lockless codel_dump() 2024-04-19 11:34:07 +01:00
sch_drr.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_etf.c net_sched: sch_tfs: implement lockless etf_dump() 2024-04-19 11:34:07 +01:00
sch_ets.c net_sched: sch_ets: implement lockless ets_dump() 2024-04-19 11:34:07 +01:00
sch_fifo.c net_sched: sch_fifo: implement lockless __fifo_dump() 2024-04-19 11:34:07 +01:00
sch_fq_codel.c net_sched: sch_fq_codel: implement lockless fq_codel_dump() 2024-04-19 11:34:07 +01:00
sch_fq_pie.c net_sched: sch_fq_pie: implement lockless fq_pie_dump() 2024-04-19 11:34:07 +01:00
sch_fq.c net_sched: sch_fq: fix incorrect behavior for small weights 2024-08-27 08:20:45 -07:00
sch_frag.c net: dst: remove unnecessary input parameter in dst_alloc and dst_init 2023-09-12 11:42:25 +02:00
sch_generic.c net: fix races in netdev_tx_sent_queue()/dev_watchdog() 2024-10-21 12:54:25 +02:00
sch_gred.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_hfsc.c net_sched: sch_hfsc: implement lockless accesses to q->defcls 2024-04-19 11:34:08 +01:00
sch_hhf.c net_sched: sch_hhf: implement lockless hhf_dump() 2024-04-19 11:34:08 +01:00
sch_htb.c net/sched: fix false lockdep warning on qdisc root lock 2024-04-26 10:46:41 +02:00
sch_ingress.c bpf: Fix too early release of tcx_entry 2024-07-08 14:07:31 -07:00
sch_mq.c net: sched: add rcu annotations around qdisc->qdisc_sleeping 2023-06-07 10:25:39 +01:00
sch_mqprio_lib.c net: sched: Fill in missing MODULE_DESCRIPTION for qdiscs 2023-11-01 21:49:09 -07:00
sch_mqprio_lib.h net/sched: mqprio: allow per-TC user input of FP adminStatus 2023-04-13 22:22:10 -07:00
sch_mqprio.c netlink: introduce type-checking attribute iteration 2024-03-29 15:06:02 -07:00
sch_multiq.c net: sched: sch_multiq: fix possible OOB write in multiq_tune() 2024-06-05 10:50:19 +01:00
sch_netem.c sch/netem: fix use after free in netem_dequeue 2024-09-03 11:44:23 -07:00
sch_pie.c net_sched: sch_pie: implement lockless pie_dump() 2024-04-19 11:34:08 +01:00
sch_plug.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_prio.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_qfq.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_red.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_sfb.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_sfq.c net_sched: sch_sfq: annotate data-races around q->perturb_period 2024-05-02 19:01:35 -07:00
sch_skbprio.c net_sched: sch_skbprio: implement lockless skbprio_dump() 2024-04-19 11:34:08 +01:00
sch_taprio.c net: sched: consistently use rcu_replace_pointer() in taprio_change() 2024-09-08 11:18:57 +01:00
sch_tbf.c net/sched: Add module aliases for cls_,sch_,act_ modules 2024-02-02 10:57:55 -08:00
sch_teql.c net: annotate writes on dev->mtu from ndo_change_mtu() 2024-05-07 16:19:14 -07:00