linux/include
Daniel Borkmann 230bb13650 bpf: Fix too early release of tcx_entry
[ Upstream commit 1cb6f0bae5 ]

Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported
an issue that the tcx_entry can be released too early leading to a use
after free (UAF) when an active old-style ingress or clsact qdisc with a
shared tc block is later replaced by another ingress or clsact instance.

Essentially, the sequence to trigger the UAF (one example) can be as follows:

  1. A network namespace is created
  2. An ingress qdisc is created. This allocates a tcx_entry, and
     &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the
     same time, a tcf block with index 1 is created.
  3. chain0 is attached to the tcf block. chain0 must be connected to
     the block linked to the ingress qdisc to later reach the function
     tcf_chain0_head_change_cb_del() which triggers the UAF.
  4. Create and graft a clsact qdisc. This causes the ingress qdisc
     created in step 1 to be removed, thus freeing the previously linked
     tcx_entry:

     rtnetlink_rcv_msg()
       => tc_modify_qdisc()
         => qdisc_create()
           => clsact_init() [a]
         => qdisc_graft()
           => qdisc_destroy()
             => __qdisc_destroy()
               => ingress_destroy() [b]
                 => tcx_entry_free()
                   => kfree_rcu() // tcx_entry freed

  5. Finally, the network namespace is closed. This registers the
     cleanup_net worker, and during the process of releasing the
     remaining clsact qdisc, it accesses the tcx_entry that was
     already freed in step 4, causing the UAF to occur:

     cleanup_net()
       => ops_exit_list()
         => default_device_exit_batch()
           => unregister_netdevice_many()
             => unregister_netdevice_many_notify()
               => dev_shutdown()
                 => qdisc_put()
                   => clsact_destroy() [c]
                     => tcf_block_put_ext()
                       => tcf_chain0_head_change_cb_del()
                         => tcf_chain_head_change_item()
                           => clsact_chain_head_change()
                             => mini_qdisc_pair_swap() // UAF

There are also other variants, the gist is to add an ingress (or clsact)
qdisc with a specific shared block, then to replace that qdisc, waiting
for the tcx_entry kfree_rcu() to be executed and subsequently accessing
the current active qdisc's miniq one way or another.

The correct fix is to turn the miniq_active boolean into a counter. What
can be observed, at step 2 above, the counter transitions from 0->1, at
step [a] from 1->2 (in order for the miniq object to remain active during
the replacement), then in [b] from 2->1 and finally [c] 1->0 with the
eventual release. The reference counter in general ranges from [0,2] and
it does not need to be atomic since all access to the counter is protected
by the rtnl mutex. With this in place, there is no longer a UAF happening
and the tcx_entry is freed at the correct time.

Fixes: e420bed025 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Pedro Pinto <xten@osec.io>
Co-developed-by: Pedro Pinto <xten@osec.io>
Signed-off-by: Pedro Pinto <xten@osec.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hyunwoo Kim <v4bel@theori.io>
Cc: Wongi Lee <qwerty@theori.io>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240708133130.11609-1-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-18 13:21:12 +02:00
..
acpi ACPI: EC: Evaluate orphan _REG under EC device 2024-06-27 13:49:10 +02:00
asm-generic sched: Add missing memory barrier in switch_mm_cid 2024-04-27 17:11:41 +02:00
clocksource
crypto crypto: af_alg - Disallow multiple in-flight AIO requests 2024-01-25 15:35:16 -08:00
drm drm/edid: Parse topology block for all DispID structure v1.x 2024-06-12 11:12:05 +02:00
dt-bindings clk: renesas: r8a779g0: Correct PFC/GPIO parent clocks 2024-03-26 18:19:47 -04:00
keys
kunit
kvm KVM: arm64: Fix host-programmed guest events in nVHE 2024-04-10 16:35:48 +02:00
linux mm: prevent derefencing NULL ptr in pfn_section_valid() 2024-07-18 13:21:10 +02:00
math-emu
media media: cec: core: avoid recursive cec_claim_log_addrs 2024-06-12 11:12:43 +02:00
memory
misc
net bpf: Fix too early release of tcx_entry 2024-07-18 13:21:12 +02:00
pcmcia
ras
rdma RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz 2023-12-13 18:45:16 +01:00
rv rv: Set variable 'da_mon_##name' to static 2023-09-01 21:00:00 -04:00
scsi scsi: mpi3mr: Fix ATA NCQ priority support 2024-06-21 14:38:25 +02:00
soc soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request 2024-06-16 13:47:33 +02:00
sound ASoC: tas2781: Fix wrong loading calibrated data sequence 2024-06-12 11:12:47 +02:00
target
trace tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() 2024-07-05 09:34:07 +02:00
uapi connector: Fix invalid conversion in cn_proc.h 2024-07-11 12:49:20 +02:00
ufs Merge branch 'fixes' into misc 2023-09-02 08:25:19 +01:00
vdso
video fbdev: stifb: Make the STI next font pointer a 32-bit signed offset 2023-11-28 17:19:58 +00:00
xen xen/events: reduce externally visible helper functions 2024-03-01 13:34:57 +01:00