It was pointed out[1] that using folio_test_hwpoison() is wrong as we need
to check the indiviual page that has poison. folio_test_hwpoison() only
checks the head page so go back to using PageHWPoison().
User-visible effects include existing hwpoison-inject tests possibly
failing as unpoisoning a single subpage could lead to unpoisoning an
entire folio. Memory unpoisoning could also not work as expected as
the function will break early due to only checking the head page and
not the actually poisoned subpage.
[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com
Fixes: a6fddef49e ("mm/memory-failure: convert unpoison_memory() to folios")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The bug is the error handling:
if (tmp < nr_bytes) {
"tmp" can hold negative error codes but because "nr_bytes" is type size_t
the negative error codes are treated as very high positive values
(success). Fix this by changing "nr_bytes" to type ssize_t. The
"nr_bytes" variable is used to store values between 1 and PAGE_SIZE and
they can fit in ssize_t without any issue.
Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mountain
Fixes: 5d8de293c2 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The lack of mailmap updates for @codeaurora.org addresses reduces the
usefulness of tools such as get_maintainer.pl. Some recent (and welcome!)
additions has been made to improve the situation, this concludes the
effort.
Link: https://lkml.kernel.org/r/20230720210256.1296567-1-quic_bjorande@quicinc.com
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
T-Head is a vendor of processor core IP, and they have recently introduced
the RISC-V TH1520 SoC. Remove 'thead' as a typo of 'thread' to avoid
checkpatch incorrectly warning that 'thead' is typo in patches that add
support for T-Head designs in the kernel.
Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com
Link: https://www.t-head.cn/
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Diederik de Haas <didi.debian@cknow.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form
"mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)".
EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set
up with pmd entries which fit the pmd_bad() check: so 0d940a9b27 warns
and clears those entries, which would ruin running Win16 binaries.
The failing pte_offset_map() stopped such a kernel from even booting,
until a few commits later be872f83bf changed the pagewalk to tolerate
that: but it needs to be even more careful, to not spoil those entries.
I might have preferred to change init_espfix_ap() not to use "bad" pmd
entries; or to leave them out of the efi_mm dump. But there is great
value in staying away from there, and a pagewalk check of address against
TASK_SIZE may protect from other such aberrations too.
Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com
Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/
Fixes: 0d940a9b27 ("mm/pgtable: allow pte_offset_map[_lock]() to fail")
Fixes: be872f83bf ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
HWPoison: my reading of folio_test_hwpoison() is that it only tests the
head page of a large folio, whereas splice_folio_into_pipe() will splice
as much of the folio as it can: so for safety we should also check the
has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned.
(Perhaps that ugliness can be improved at the mm end later.)
The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
it for "part" not "len".
Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
Fixes: bd194b1871 ("shmem: Implement splice-read")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The noswap mount option is surely not one of the three options for sizing:
move its description down.
The huge= mount option does not accept numeric values: those are just in
an internal enum. Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).
/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).
[rdunlap@infradead.org: fixup Docs table for huge mount options]
Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442 ("shmem: update documentation")
Fixes: 2c6efe9cf2 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit 9b0da3f223.
The sigio.c is clearly user space code which is handled by
arch/um/scripts/Makefile.rules (see USER_OBJS rule).
The above mentioned commit simply broke this agreement,
we may not use Linux kernel internal headers in them without
thorough thinking.
Hence, revert the wrong commit.
Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Herve Codina <herve.codina@bootlin.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Richard Weinberger <richard@nod.at>
Cc: Yang Guang <yang.guang5@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The maple tree register cache is based on a much more modern data structure
than the rbtree cache and makes optimisation choices which are probably
more appropriate for modern systems than those made by the rbtree cache. In
v6.5 it has also acquired the ability to generate multi-register writes in
sync operations, bringing performance up to parity with the rbtree cache
there.
Update the sti-sas driver to use the more modern data structure.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230712-asoc-st-maple-v1-5-46eab2c0ce23@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The maple tree register cache is based on a much more modern data structure
than the rbtree cache and makes optimisation choices which are probably
more appropriate for modern systems than those made by the rbtree cache. In
v6.5 it has also acquired the ability to generate multi-register writes in
sync operations, bringing performance up to parity with the rbtree cache
there.
Update the stac9766 driver to use the more modern data structure.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230712-asoc-st-maple-v1-4-46eab2c0ce23@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The maple tree register cache is based on a much more modern data structure
than the rbtree cache and makes optimisation choices which are probably
more appropriate for modern systems than those made by the rbtree cache. In
v6.5 it has also acquired the ability to generate multi-register writes in
sync operations, bringing performance up to parity with the rbtree cache
there.
Update the sta529 driver to use the more modern data structure.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230712-asoc-st-maple-v1-3-46eab2c0ce23@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The maple tree register cache is based on a much more modern data structure
than the rbtree cache and makes optimisation choices which are probably
more appropriate for modern systems than those made by the rbtree cache. In
v6.5 it has also acquired the ability to generate multi-register writes in
sync operations, bringing performance up to parity with the rbtree cache
there.
Update the sta350 driver to use the more modern data structure.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230712-asoc-st-maple-v1-2-46eab2c0ce23@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The maple tree register cache is based on a much more modern data structure
than the rbtree cache and makes optimisation choices which are probably
more appropriate for modern systems than those made by the rbtree cache. In
v6.5 it has also acquired the ability to generate multi-register writes in
sync operations, bringing performance up to parity with the rbtree cache
there.
Update the sta32x driver to use the more modern data structure.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230712-asoc-st-maple-v1-1-46eab2c0ce23@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
- Driver fixes for
- Out of bound fix for hisilicon phy
- Qualcomm synopsis femto phy for keeping clock enabled during suspend
and enabling ref clocks
- Mediatek driver fixes for upper limit test and error code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmTCVkUACgkQfBQHDyUj
g0cwAg/+KSYSB1//gIqKQDEzCztwGOeyXPBedF7x/hsxmd0PgJKTJjPKhVMvwDhs
7Nfk/TR5LcEfZvVUrwQ2kykZTCR03Kqn7egSOq+W+vx57zVyLpQqxV0Gm79LEPPk
w7ncXf//b2pQ5yRI164skzCWcB6BkpyTaUlH/jZdWXB8FXGkxbdMLDhwmU2Xd5YU
49Jv3fH5Poym92RRALGZoFJg/jZxmniX/sV8NXqNVkND2mwKuz9psH/iCWh2+ofU
1TZYyvwfdODCdR3+kYDXY46uUttPy/xeh2vEHCKYlfr/Zp0pbnLM0/jBD3o/1x2C
/Dgk1eQjMoZNNUgLtaIv2FjSdQRNPX0u8Ml9iLvoFNgQR5LB1gzTHDmtwIRtOHpN
BInIWWkoGgEgOjxybzPkRTCXhWS/eErAK1K4ZXh24xQfkivBH3YGiDiMpG3eR4Wy
4rurv/UxDKqqvCdNkweiYCI41N4ThyzG6qr8fcat2a9ZZmdqlvc9lBFlCVXdxmQj
1p4ZAwriGmo4zFP1/jxTKk1wU/peEh7hzBa8dsyTP77LYO9adcFi+OVnkNM24CJh
w9cO4EqDl01rZ30vHbh1J1b1yRX3Cwa1+JGHMnGg96g1dJb+mC+1xU6nZ60sMcew
NKnFeANY94SGEeN2Bn4BDaIcFCZ1UBfcxs0PHeaGO+81s7tj5j8=
=nyhr
-----END PGP SIGNATURE-----
Merge tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- Out of bound fix for hisilicon phy
- Qualcomm synopsis femto phy for keeping clock enabled during suspend
and enabling ref clocks
- Mediatek driver fixes for upper limit test and error code
* tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
phy: qcom-snps-femto-v2: properly enable ref clock
phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test
phy: phy-mtk-dp: Fix an error code in probe()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmTCTAsACgkQxWXV+ddt
WDvhYhAAluWrfM2ZzhY/tdDeKUNpf0NIAFGZIV4QP/2E43yIPC2+xMPqW/AnBnIP
k28gOhgoH7LBP/cr0IrFHMz8Glges3cHz1UxFjJZgjiU3mAA0mgkIttPpzms7vqi
3SVUxL2bJkebJy53nOpZcHlrcWveg+q0hTUslquCYBb3dA4gb61HBwzA2e0wKFeB
wYw/gQtEy3TkHQPAxVjUF28ASoaroNKsE9QjfLZV0FDn0u0zBFxqpqj7bFUay++i
sG3nPVZsqKcgIX7sUSwrpv4XAFu8fHz+GAQqCNqTxKCJ0ZZzsgzJtKs+12rv7dZC
EvRt0jEt+DgwvmEy7j250TEbcI9rMaQuny8yt2j9sNKH/m9bW0BjptCtwghDoL89
0D6qqicHbA+dJNq8/kDyxV6xC2Git2Ck0fpOfiU7YzhAFECZc/DkidvXa1keMUay
usspO+YHOjDtlq0zJ0xixbxCseJfrj4habieVKZ/CnAvb84082ZiLcMxFqop/ewB
WHKNB0O2+P78xoa7/Be6tp/w1HaaW8ZHvkPicD9d4khKJrXAKLNc/Xny4OqRT14z
sWWaFuNjC7kIUT15EAQNj0wgymA7XcTL9gM1uuSO95PN+M3j4CleApzEvR3dn9FX
gmoxuwVfVsJKKcwo6WFByqzu03kuSladEFasSHQAJbh3jyU9LUY=
=Y/po
-----END PGP SIGNATURE-----
Merge tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix accounting of global block reserve size when block group tree is
enabled
- the async discard has been enabled in 6.2 unconditionally, but for
zoned mode it does not make that much sense to do it asynchronously
as the zones are reset as needed
- error handling and proper error value propagation fixes
* tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: check for commit error at btrfs_attach_transaction_barrier()
btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
btrfs: remove BUG_ON()'s in add_new_free_space()
btrfs: account block group tree when calculating global reserve size
btrfs: zoned: do not enable async discard
A call to memblock_free() or memblock_phys_free() issued after memblock
data is discarded will result in use after free in
memblock_isolate_range().
When CONFIG_KASAN is enabled, this will cause a panic early in boot.
Without CONFIG_KASAN, there is a chance that memblock_isolate_range() might
scribble on memory that is now in use by somebody else.
Avoid those issues by making sure that memblock_discard points
memblock.reserved.regions back at the static buffer.
If memblock_free() or memblock_phys_free() is called after memblock memory
is discarded, that will print a warning in memblock_remove_region().
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmTB94cQHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kesHB/4rNvGFGEI8LFxooARLt8glcv0Hn7oJ+z3L
Xyczw1ZkglT3DEYsoY78bSriddWPqrV3wWkr+p2NYXPBJWgQZ6t3DRZviqzXcj2l
Ew2XwLAfT6Vay1eqEFfJJvkGg27QLhnmJPnjDzCWweiXUaR5xOESwKCBmZBWeXUU
t5EFJMIXLVEoBDLGW5kk+Q4RZDqhU/sJWDqf4ciWQ5vDS8OFTr56hfth7T8XoMxm
BPlC21+cEJUWrbb1gAJUMbIERTzvYg8odZqSAESlHyNyDEtYjyLce5W6HA6zHK+H
2gqiti+Pd1OyHbJUc1lN7iRTE8FJ7DQcBr6H9sk81Po5af02Ky7m
=FRx8
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"A call to memblock_free() or memblock_phys_free() issued after
memblock data is discarded will result in use after free in
memblock_isolate_range().
Avoid those issues by making sure that memblock_discard points
memblock.reserved.regions back at the static buffer"
* tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm,memblock: reset memblock.reserved to system init state to prevent UAF
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.
However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.
This means we can get UAF in the following scenario:
THREAD 1 THREAD 2
======== ========
<page fault>
lock_vma_under_rcu()
rcu_read_lock()
mas_walk()
check vma->anon_vma
mremap() syscall
move_vma()
vma_start_write()
unlink_anon_vmas()
<syscall end>
handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_pte_missing()
do_anonymous_page()
anon_vma_prepare()
__anon_vma_prepare()
find_mergeable_anon_vma()
mas_walk() [looks up VMA X]
munmap() syscall (deletes VMA X)
reusable_anon_vma() [called on freed VMA X]
This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.
This patch is based on some previous discussion with Linus Torvalds on
the security list.
Cc: stable@vger.kernel.org
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Industrial processor i3255 supports temperatures -40 deg celcius
to 105 deg Celcius. The current implementation of k10temp_read_temp
rounds off any negative temperatures to '0'. To fix this,
the following changes have been made.
A flag 'disp_negative' is added to struct k10temp_data to support
AMD i3255 processors. Flag 'disp_negative' is set if 3255 processor
is found during k10temp_probe. Flag 'disp_negative' is used to
determine whether to round off negative temperatures to '0' in
k10temp_read_temp.
Signed-off-by: Baskaran Kannan <Baski.Kannan@amd.com>
Link: https://lore.kernel.org/r/20230727162159.1056136-1-Baski.Kannan@amd.com
Fixes: aef17ca127 ("hwmon: (k10temp) Only apply temperature offset if result is positive")
Cc: stable@vger.kernel.org
[groeck: Fixed multi-line comment]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
syzkaller found a race where IOMMUFD_DESTROY increments the refcount:
obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY);
if (IS_ERR(obj))
return PTR_ERR(obj);
iommufd_ref_to_users(obj);
/* See iommufd_ref_to_users() */
if (!iommufd_object_destroy_user(ucmd->ictx, obj))
As part of the sequence to join the two existing primitives together.
Allowing the refcount the be elevated without holding the destroy_rwsem
violates the assumption that all temporary refcount elevations are
protected by destroy_rwsem. Racing IOMMUFD_DESTROY with
iommufd_object_destroy_user() will cause spurious failures:
WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477 iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478
Modules linked in:
CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
RIP: 0010:iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:477
Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89 fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41
RSP: 0018:ffffc90003067e08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500
R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88
R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe
FS: 00007ff7254fe6c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0
Call Trace:
<TASK>
iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596 [inline]
iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813
iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The solution is to not increment the refcount on the IOMMUFD_DESTROY path
at all. Instead use the xa_lock to serialize everything. The refcount
check == 1 and xa_erase can be done under a single critical region. This
avoids the need for any refcount incrementing.
It has the downside that if userspace races destroy with other operations
it will get an EBUSY instead of waiting, but this is kind of racing is
already dangerous.
Fixes: 2ff4bed7fe ("iommufd: File descriptor, context, kconfig and makefiles")
Link: https://lore.kernel.org/r/2-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: syzbot+7574ebfe589049630608@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
If user interrupts wait_event_interruptible() in ublk_ctrl_del_dev(),
return -EINTR and let user know what happens.
Fixes: 0abe39dec0 ("block: ublk: improve handling device deletion")
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230726144502.566785-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In ublk_ctrl_end_recovery(), if wait_for_completion_interruptible() is
interrupted by signal, queues aren't setup successfully yet, so we
have to fail UBLK_CMD_END_USER_RECOVERY, otherwise kernel oops can be
triggered.
Fixes: c732a852b4 ("ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support")
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230726144502.566785-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In ublk_ctrl_start_dev(), if wait_for_completion_interruptible() is
interrupted by signal, queues aren't setup successfully yet, so we
have to fail UBLK_CMD_START_DEV, otherwise kernel oops can be triggered.
Reported by German when working on qemu-storage-deamon which requires
single thread ublk daemon.
Fixes: 71f28f3136 ("ublk_drv: add io_uring based userspace block driver")
Reported-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230726144502.566785-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A collection of device specific fixes, none particularly remarkable.
There's a set of repetitive fixes for the RealTek drivers fixing an
issue with suspend that was replicated in multiple drivers.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmTCYMAACgkQJNaLcl1U
h9Do+Af/USa8kLylJn0vzxbfkwpSu3rCbgQurw9KKCDa7lTB7jqZzpCAmPbs7txO
WEwKKz8YSka2YlmXm0rRzhqHIdTdkHlvJ3aircrolfpedeelRyqthhCjdgl6pJAj
3+Kpi7a2QaSqxc2Z45GX4vR86xOmlivWS4gOKZV4GuJt2FkmTIgbGYjtumU0GPla
DneK7yxQpNe68Z+AHxmGoAvKkXggqE49up1PGRiV2nlyioHeQLqDyUlvZsc4MP3Y
Qx/RKvvFoh20HVNKv+iXss7VxYebIzkHuAJLwRDFHkcQajFHcri+ZWEv9lVd/pak
Hiso2ryviIrUFIKfsCWKb9xHYbptCQ==
=HNYO
-----END PGP SIGNATURE-----
Merge tag 'asoc-fix-v6.5-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.5
A collection of device specific fixes, none particularly remarkable.
There's a set of repetitive fixes for the RealTek drivers fixing an
issue with suspend that was replicated in multiple drivers.
If tipc_link_bc_create() fails inside tipc_node_create() for a newly
allocated tipc node then we should stop its tipc crypto and free the
resources allocated with a call to tipc_crypto_start().
As the node ref is initialized to one to that point, just put the ref on
tipc_link_bc_create() error case that would lead to tipc_node_free() be
eventually executed and properly clean the node and its crypto resources.
Found by Linux Verification Center (linuxtesting.org).
Fixes: cb8092d70a ("tipc: move bc link creation back to tipc_node_create")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
kernel test robot reported slab-out-of-bounds access in strlen(). [0]
Commit 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
removed unix_mkname_bsd() call in unix_bind_bsd().
If sunaddr->sun_path is not terminated by user and we don't enable
CONFIG_INIT_STACK_ALL_ZERO=y, strlen() will do the out-of-bounds access
during file creation.
Let's go back to strlen()-with-sockaddr_storage way and pack all 108
trickiness into unix_mkname_bsd() with bold comments.
[0]:
BUG: KASAN: slab-out-of-bounds in strlen (lib/string.c:?)
Read of size 1 at addr ffff000015492777 by task fortify_strlen_/168
CPU: 0 PID: 168 Comm: fortify_strlen_ Not tainted 6.5.0-rc1-00333-g3329b603ebba #16
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
show_stack (arch/arm64/kernel/stacktrace.c:242)
dump_stack_lvl (lib/dump_stack.c:107)
print_report (mm/kasan/report.c:365 mm/kasan/report.c:475)
kasan_report (mm/kasan/report.c:590)
__asan_report_load1_noabort (mm/kasan/report_generic.c:378)
strlen (lib/string.c:?)
getname_kernel (./include/linux/fortify-string.h:? fs/namei.c:226)
kern_path_create (fs/namei.c:3926)
unix_bind (net/unix/af_unix.c:1221 net/unix/af_unix.c:1324)
__sys_bind (net/socket.c:1792)
__arm64_sys_bind (net/socket.c:1801)
invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52)
el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147)
do_el0_svc (arch/arm64/kernel/syscall.c:189)
el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648)
el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?)
el0t_64_sync (arch/arm64/kernel/entry.S:591)
Allocated by task 168:
kasan_set_track (mm/kasan/common.c:45 mm/kasan/common.c:52)
kasan_save_alloc_info (mm/kasan/generic.c:512)
__kasan_kmalloc (mm/kasan/common.c:383)
__kmalloc (mm/slab_common.c:? mm/slab_common.c:998)
unix_bind (net/unix/af_unix.c:257 net/unix/af_unix.c:1213 net/unix/af_unix.c:1324)
__sys_bind (net/socket.c:1792)
__arm64_sys_bind (net/socket.c:1801)
invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52)
el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147)
do_el0_svc (arch/arm64/kernel/syscall.c:189)
el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648)
el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?)
el0t_64_sync (arch/arm64/kernel/entry.S:591)
The buggy address belongs to the object at ffff000015492700
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes to the right of
allocated 119-byte region [ffff000015492700, ffff000015492777)
The buggy address belongs to the physical page:
page:00000000aeab52ba refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x55492
anon flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 03fffc0000000200 ffff0000084018c0 fffffc00003d0e00 0000000000000005
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff000015492600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff000015492680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff000015492700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 fc
^
ffff000015492780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff000015492800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202307262110.659e5e8-oliver.sang@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230726190828.47874-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
goto free_skb if an unexpected result is returned by pskb_tirm()
in tipc_crypto_rcv_complete().
Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230725064810.5820-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A negative number from ret means the host controller had failed to send
usb message and 0 means succeed. Therefore, the if logic is wrong here
and this patch will fix it.
Fixes: f2b42379c5 ("usb: misc: ehset: Rework test mode entry")
Cc: stable <stable@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20230705095231.457860-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hardware based on the Bay Trail / BYT SoCs require an external ULPI phy for
USB device-mode. The phy chip usually has its 'reset' and 'chip select'
lines connected to GPIOs described by ACPI fwnodes in the DSDT table.
Because of hardware with missing ACPI resources for the 'reset' and 'chip
select' GPIOs commit 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table
on platforms without ACPI GPIO resources") introduced a fallback
gpiod_lookup_table with hard-coded mappings for Bay Trail devices.
However there are existing Bay Trail based devices, like the National
Instruments cRIO-903x series, where the phy chip has its 'reset' and
'chip-select' lines always asserted in hardware via resistor pull-ups. On
this hardware the phy chip is always enabled and the ACPI dsdt table is
missing information not only for the 'chip-select' and 'reset' lines but
also for the BYT GPIO controller itself "INT33FC".
With the introduction of the gpiod_lookup_table initializing the USB
device-mode on these hardware now errors out. The error comes from the
gpiod_get_optional() calls in dwc3_pci_quirks() which will now return an
-ENOENT error due to the missing ACPI entry for the INT33FC gpio controller
used in the aforementioned table.
This hardware used to work before because gpiod_get_optional() will return
NULL instead of -ENOENT if no GPIO has been assigned to the requested
function. The dwc3_pci_quirks() code for setting the 'cs' and 'reset' GPIOs
was then skipped (due to the NULL return). This is the correct behavior in
cases where the phy chip is hardwired and there are no GPIOs to control.
Since the gpiod_lookup_table relies on the presence of INT33FC fwnode
in ACPI tables only add the table if we know the entry for the INT33FC
gpio controller is present. This allows Bay Trail based devices with
hardwired dwc3 ULPI phys to continue working.
Fixes: 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table on platforms without ACPI GPIO resources")
Cc: stable <stable@kernel.org>
Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230726184555.218091-2-gratian.crisan@ni.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
in be_lancer_xmit_workarounds(), it should go to label 'tx_drop'
if an unexpected value is returned by pskb_trim().
Fixes: 93040ae5cc ("be2net: Fix to trim skb for padded vlan packets to workaround an ASIC Bug")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230725032726.15002-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The "exc->key_len" is a u16 that comes from the user. If it's over
IW_ENCODING_TOKEN_MAX (64) that could lead to memory corruption.
Fixes: b121d84882 ("staging: ks7010: simplify calls to memcpy()")
Cc: stable <stable@kernel.org>
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/tencent_5153B668C0283CAA15AA518325346E026A09@qq.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using FBTFT_REGISTER_DRIVER resolves to a NULL struct spi_device_id. This
ultimately causes a warning when the module probes. Fixes it.
Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Link: https://lore.kernel.org/r/20230718172024.67488-1-rgallaispou@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This laptop has CS35L41 amp connected via I2C.
With this patch speakers begin to work if the
missing _DSD properties are added to ACPI tables.
Signed-off-by: Pavel Asyutchenko <svenpavel@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230726223732.20775-1-svenpavel@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list. Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first. However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first. As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.
To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable. The default is still
10, but it can be overridden via a module parameter.
This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross@suse.com>
-----BEGIN PGP SIGNATURE-----
iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTBN7QNHGZ3QHN0cmxl
bi5kZQAKCRBwkajZrV/2AMg4D/wLs+Nm4XZmvz3ZOtgbHrw3xSbkLgJ563cCGwHw
1k4/726FrfbVHqMvOWgHNVdmsVCcw0jeLVddlIN3QmbMlxn2YUwQ16nRg6SUMeKZ
u/9KsI0wLr9zGKJxiUDWhe+7An3oY5G/7uFngJZ1M/vbGTxrl1GqOEU7bMvkm/G/
08z8/ho9Qv8CbPUfn4ZcYfxDCPKTB74O0UZiItHDAp+tQcD729xvTZ+AGOPe+632
YifWe7FzY+SUNu/sesr8kyeMU0UPsxETO7pnvgn5PJ3osLDQB1mxj+Kpa5YSuQ4H
3gO2S6T07iQZ74//aUTzexEzss4HNsdXKDPcTfXyyiZPJsfVkZRuKTcuA16bGt0l
zCVzCz2Aj5brZEjYhFlmgCnWzlnWDNGHJF7WBxF9UAHoEqXQa3h4im2RTj9/8RvZ
ZRXCZnmL+UIr5k3m5NzXYDM7vWxHvavNbKRl594XVq1fI6GdSGGQ9SOyPkKFUYQt
BYxDsbg9RY/piZt7vH+YQfmjuQ8sCkxqPEqJvSzy6U/TVtTOdsjfCCeijSrEIlvg
i/B+P24GI2dZW09trHfVVcn5YQc8/HkzCr029BcmfSu9+4+GeTmjMArSJfO1hThd
d3IpsNye9OgdhxsR75kZusCoZpRfuEtySKdkfdzvvAWLqPh8nNzhIzGx6WHmcTt6
TYYquA==
=4wB/
-----END PGP SIGNATURE-----
Merge tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
1. On-demand overlap detection in 'rbtree' set can cause memory leaks.
This is broken since 6.2.
2. An earlier fix in 6.4 to address an imbalance in refcounts during
transaction error unwinding was incomplete, from Pablo Neira.
3. Disallow adding a rule to a deleted chain, also from Pablo.
Broken since 5.9.
* tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
====================
Link: https://lore.kernel.org/r/20230726152524.26268-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A race were found where set_channels could be called after registering
but before virtnet_set_queues() in virtnet_probe(). Fixing this by
moving the virtnet_set_queues() before netdevice registering. While at
it, use _virtnet_set_queues() to avoid holding rtnl as the device is
not even registered at that time.
Cc: stable@vger.kernel.org
Fixes: a220871be6 ("virtio-net: correctly enable multiqueue")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230725072049.617289-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nla_for_each_nested parsing in function mqprio_parse_nlattr() does
not check the length of the nested attribute. This can lead to an
out-of-attribute read and allow a malformed nlattr (e.g., length 0) to
be viewed as 8 byte integer and passed to priv->max_rate/min_rate.
This patch adds the check based on nla_len() when check the nla_type(),
which ensures that the length of these two attribute must equals
sizeof(u64).
Fixes: 4e8b86c062 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230725024227.426561-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
LTP sendfile07 [1], which expects sendfile() to return EAGAIN when
transferring data from regular file to a "full" O_NONBLOCK socket,
started failing after commit 2dc334f1a6 ("splice, net: Use
sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()").
sendfile() no longer immediately returns, but now blocks.
Removed sock_sendpage() handled this case by setting a MSG_DONTWAIT
flag, fix new splice_to_socket() to do the same for O_NONBLOCK sockets.
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/sendfile/sendfile07.c
Fixes: 2dc334f1a6 ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Link: https://lore.kernel.org/r/023c0e21e595e00b93903a813bc0bfb9a5d7e368.1690219914.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to the clarification [1] in the latest napi.rst, the tx
processing cannot call any XDP (or page pool) APIs if the "budget"
is 0. Because NAPI is called with the budget of 0 (such as netpoll)
indicates we may be in an IRQ context, however, we cannot use the
page pool from IRQ context.
[1] https://lore.kernel.org/all/20230720161323.2025379-1-kuba@kernel.org/
Fixes: 20f7973990 ("net: fec: recycle pages for transmitted XDP frames")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230725074148.2936402-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau says:
====================
mptcp: More fixes for 6.5
Patch 1: Better detection of ip6tables vs ip6tables-legacy tools for
self tests. Fix for 6.4 and newer.
Patch 2: Only generate "new listener" event if listen operation
succeeds. Fix for 6.2 and newer.
====================
Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-0-6f60fe7137a9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the mptcp code generate a "new listener" event even
if the actual listen() syscall fails. Address the issue moving
the event generation call under the successful branch.
Cc: stable@vger.kernel.org
Fixes: f8c9dfbd87 ("mptcp: add pm listener events")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-2-6f60fe7137a9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If 'iptables-legacy' is available, 'ip6tables-legacy' command will be
used instead of 'ip6tables'. So no need to look if 'ip6tables' is
available in this case.
Cc: stable@vger.kernel.org
Fixes: 0c4cd3f86a ("selftests: mptcp: join: use 'iptables-legacy' if available")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-1-6f60fe7137a9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>