Go to file
Florian Westphal 4318608dc2 inet: inet_defrag: prevent sk release while still in use
commit 18685451fc upstream.

ip_local_out() and other functions can pass skb->sk as function argument.

If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.

This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.

Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
  Calling ip_defrag() in output path is also implying skb_orphan(),
  which is buggy because output path relies on sk not disappearing.

  A relevant old patch about the issue was :
  8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")

  [..]

  net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
  inet socket, not an arbitrary one.

  If we orphan the packet in ipvlan, then downstream things like FQ
  packet scheduler will not work properly.

  We need to change ip_defrag() to only use skb_orphan() when really
  needed, ie whenever frag_list is going to be used.

Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:

If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.

This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.

This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.

In the former case, things work as before, skb is orphaned.  This is
safe because skb gets queued/stolen and won't continue past reasm engine.

In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17 15:10:41 +02:00
arch x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency 2024-10-17 15:10:40 +02:00
block block: remove the blk_flush_integrity call in blk_integrity_unregister 2024-09-12 11:07:42 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:22:01 +02:00
crypto crypto: aead,cipher - zeroize key buffer after use 2024-07-18 13:07:27 +02:00
Documentation hwspinlock: Introduce hwspin_lock_bust() 2024-09-12 11:07:41 +02:00
drivers gpio: prevent potential speculation leaks in gpio_device_get_desc() 2024-10-17 15:10:41 +02:00
fs ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() 2024-10-17 15:10:40 +02:00
include inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:10:41 +02:00
init init/main.c: Fix potential static_command_line memory overflow 2024-04-27 17:05:28 +02:00
io_uring io_uring/io-wq: limit retrying worker initialisation 2024-08-19 05:45:22 +02:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2022-12-08 11:28:45 +01:00
kernel cgroup: Make operations on the cgroup root_list RCU safe 2024-10-17 15:10:40 +02:00
lib lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() 2024-09-12 11:07:50 +02:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm mm: avoid leaving partial pfn mappings around in error case 2024-10-17 15:10:34 +02:00
net inet: inet_defrag: prevent sk release while still in use 2024-10-17 15:10:41 +02:00
samples Add gitignore file for samples/fanotify/ subdirectory 2024-07-27 10:46:16 +02:00
scripts scripts: kconfig: merge_config: config files: add a trailing newline 2024-10-17 15:10:33 +02:00
security smack: unix sockets: fix accept()ed socket label 2024-09-12 11:07:45 +02:00
sound ASoC: tda7419: fix module autoloading 2024-10-17 15:10:39 +02:00
tools selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() 2024-10-17 15:10:35 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:27:15 +01:00
virt KVM: Always flush async #PF workqueue when vCPU is being destroyed 2024-04-10 16:18:34 +02:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS trace: Relocate event helper files 2024-04-10 16:19:24 +02:00
Makefile Linux 5.15.167 2024-09-12 11:07:53 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.