Merge amd-pstate driver fixes for 6.12-rc4:
- Enable ACPI CPPC in amd_pstate_register_driver() after disabling
it in amd_pstate_unregister_driver() during driver operation mode
switch (Dhananjay Ugwekar).
- Make amd-pstate use nominal performance as the maximum performance
level when boost is disabled (Mario Limonciello).
* pm-cpufreq:
cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems
Including:
- ARM-SMMU fixes from Will Deacon:
- Clarify warning message when failing to disable the MMU-500
prefetcher
- Fix undefined behaviour in calculation of L1 stream-table index
when 32-bit StreamIDs are implemented
- Replace a rogue comma with a semicolon
- Intel VT-d fix from Lu Baolu:
- Fix incorrect pci_for_each_dma_alias() for non-PCI devices
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmcSIjEACgkQK/BELZcB
GuPeJg/5AT4AuqnEG9NMecJAN4zvVd+2OO144XrqWsIMELqIzpYgLwkjoo8+dDE9
EerRRs+HLddF1oV1WiC0UOyvIOdq4sIlAsdjLFQVd39qjmNQI5KM+ELCAlKuUkMj
GFrk64PvXuWG6AeP+iAiVLnOmbkl6JAQAQ7qg7p9JxmSRUlUhMZywLUBi7CjR/FN
uhgPsVzFfS13iQ4FXw5z6R+mG728chvc3DImGjmRlv0gIHdKif1MCZBp4zLo75Y5
vTdP6Cf1uzIkTpr+OwtbE2u/zsXzS3oi9WJieHBbGs5VPtys/Uvl5wEdsNyBsLVv
fUbNNhHaFinFoRGrctWDtsQ1CqdqIfQz7o9ob93lAVfQYSvN87qqmibvHUZllWhE
WyxUvjerpvKOcKtVGHmIEbDSydAlVCXhoRG3uJ9c4KUucluVFnwy3IGSV52xIvl9
WpEP95o0VUu+szkrsL0ACRMXjs4PXJQCX+LNBt8+Qkwf9K9ILk4NdoJ0MTxGTTPa
b/ZJ4/6jxizBK4FGW1keZ0JodPu2Do1knYZ7gEwsb3UoBklAgtfM2EPzgjqzAodG
9146w0XPJKED9k7m8fQuX67ZhihXs41RgUD1XqkrAhV9VifWBPfaS0pIXSC1l0ev
OnvdQ/+5pt5pzSG7ErT2zpUBTBS8Vyruis68gIrKJJECnhMh3gM=
=YT8V
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"ARM-SMMU fixes from Will Deacon:
- Clarify warning message when failing to disable the MMU-500
prefetcher
- Fix undefined behaviour in calculation of L1 stream-table index
when 32-bit StreamIDs are implemented
- Replace a rogue comma with a semicolon
Intel VT-d fix from Lu Baolu:
- Fix incorrect pci_for_each_dma_alias() for non-PCI devices"
* tag 'iommu-fixes-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/vt-d: Fix incorrect pci_for_each_dma_alias() for non-PCI devices
iommu/arm-smmu-v3: Convert comma to semicolon
iommu/arm-smmu-v3: Fix last_sid_idx calculation for sid_bits==32
iommu/arm-smmu: Clarify MMU-500 CPRE workaround
- To prevent possible memory leak, free "name" on error in opal_event_init()
Thanks to: Michael Ellerman, 2639161967.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmcR+7QACgkQpnEsdPSH
ZJQxjhAAgxgANS4GJ3i5kB0a6T7kOBynafDNmTsj1iEI9f9IgclH6zzUkmzdjWJ3
FHXavhqvI/xIvzDSyVV68De/lSgNwjEr13Igq/DlpnQuNDavdJytQRZVCOTXFyon
bI7gNdA367vR44sQtDWYnIeCqWKLA535G9ahFicXBMNfAIzT2hhW4h2zu3K7dcu/
jq/nY6iuI2TWtH1givntR3fHdfqKYXLd/eIt/J+7hXA4UrJHYDPmH3+hjz8gRG1p
dI7bP5sf0OPjlUgtrJ+P/QeYzrVABECqcONcMQuqyZjtmbt/8dSw2JLgzPrhP8Cc
jIJTlbwTawPUJuIifMZpLFnbSgkHadrCNfpWmk1m9NHB0RCjI9DOqgETaIxR+Dkz
utmoOXWFCCZF59/8vVF94Pi9dG7HrZBBvvWnGNh4BFocMPnNb62ZHBdmWAbG/G2m
XajtUXrpNDqXLxOJLYA6NQxh8hE109tnU35/X87gKcRI6BiF56ytN8bAdTdW7qan
5C1r0RrXtmySkb5siFVzsfvEl9sI3BslkDV4V9BEby4XYyqqYjwKh+oaeYcnGkdG
hY1k9Xbw7racsnUzqjbb1Z86NwK23FQ/MWNmi/nrSB6w2/pobE1WVMta7vej0VqV
+NpvZBUSIsGM0YiySzar8xZGaA09KjRP2283VdkK3Uwh0K6g/jo=
=6YWt
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Madhavan Srinivasan:
- To prevent possible memory leak, free "name" on error in
opal_event_init()
Thanks to Michael Ellerman and 2639161967.
* tag 'powerpc-6.12-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv: Free name on error in opal_event_init()
- Fix PCI error recovery by handling error events correctly
- Fix CCA crypto card behavior within protected execution environment
- Two KVM commits which fix virtual vs physical address handling bugs
in KVM pfault handling
- Fix return code handling in pckmo_key2protkey()
- Deactivate sclp console as late as possible so that outstanding
messages appear on the console instead of being dropped on reboot
- Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
as required by the vt220 specification
- Initialize also psw mask in perf_arch_fetch_caller_regs() to make
sure that user_mode(regs) will return false
- Update defconfigs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmcSND8ACgkQIg7DeRsp
bsK4YA//St56dSqntwp8Lc/t6fcMvUU4kN7DFdLdn2weuNF/WqgraB0eBJuQSjvP
L8gJutE/clV3+bWvU5OG978snXLS6i1PDRGU4Jaox0HiDonRuYodOMI9q57FDebD
fVh3iLJNwgW507PXxjSdkdMNpwptzylYBKTykJcaU55MiWnNYZa/Yj+5UxzT5U8C
UYgX4RWUjNdLq9A9nMpfcsRGKa0YF+DNk7egPLaxmbI1JJu1i+rt6esAf2peOU3f
dEzL0fTb6j6l0N+Bz6Wx3WEwn3cRdDBkRDKF+IkMtPJ30lIJ6ULh3nRVhFBuDU2m
KmLpRMg1dkHil89r5JV9SAfrZfXCz4//9ssP5otsltSPVrV6E230LXuYwD0o1q8H
XQQ5Zag9dodY10VQp1VUzMvz5uehDM16tapzu8sUgItQ9PnmST/7XnFSe6/jotlZ
A6ALDEMmBG/F/s3N3x6cSoBJdCjJWKJQS5ft04s7AZ2rlmefdLSoqbgGVJcZXTaP
eiPhhRoKI+dT/4dc10I0923NXpKHE6a2ID6GhhJ+f0wjrNarlGnwQVqPeC68dUdv
8ayD70K4BSRIAhkbEIqhKiueAAMgmW0LYKBqkbs9AgdoordbqNO9QtxEOlvI7+/g
A4sffCRrjbESE8sZoA8ItnX1Cf9yiP8KzzkNzMpGiI/zC8tPq8c=
=3IRa
-----END PGP SIGNATURE-----
Merge tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix PCI error recovery by handling error events correctly
- Fix CCA crypto card behavior within protected execution environment
- Two KVM commits which fix virtual vs physical address handling bugs
in KVM pfault handling
- Fix return code handling in pckmo_key2protkey()
- Deactivate sclp console as late as possible so that outstanding
messages appear on the console instead of being dropped on reboot
- Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
as required by the vt220 specification
- Initialize also psw mask in perf_arch_fetch_caller_regs() to make
sure that user_mode(regs) will return false
- Update defconfigs
* tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Update defconfigs
s390: Initialize psw mask in perf_arch_fetch_caller_regs()
s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
s390/sclp: Deactivate sclp after all its users
s390/pkey_pckmo: Return with success for valid protected key types
KVM: s390: Change virtual to physical address access in diag 0x258 handler
KVM: s390: gaccess: Check if guest address is in memslot
s390/ap: Fix CCA crypto card behavior within protected execution environment
s390/pci: Handle PCI error codes other than 0x3a
rts5228, rts5261, rts5264 are supported by the rtsx_pci driver, but
they are not mentioned in the Kconfig help when the code was added.
List those models in the Kconfig help accordingly.
Signed-off-by: Yo-Jung Lin (Leo) <0xff07@gmail.com>
Link: https://lore.kernel.org/r/20241017144747.15966-1-0xff07@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove some entries due to various compliance requirements. They can come
back in the future if sufficient documentation is provided.
Link: https://lore.kernel.org/r/2024101835-tiptop-blip-09ed@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here are some new modem device ids.
Everything has been in linux-next over night with no reported issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCZxIyrQAKCRALxc3C7H1l
CCgGAQDkVANkAOTEq8zOfpaULkWkMJXMRzd28YrkCqlgJfbkYQEAjTWtz8F48Ne7
V9CYj/dbNULlQ0Q+kqKqGBcKdZrq2AI=
=/sNu
-----END PGP SIGNATURE-----
Merge tag 'usb-serial-6.12-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial device ids for 6.12-rc4
Here are some new modem device ids.
Everything has been in linux-next over night with no reported issues.
* tag 'usb-serial-6.12-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit FN920C04 MBIM compositions
USB: serial: option: add support for Quectel EG916Q-GL
Commit 2fae6bb7be ("xen/privcmd: Add new syscall to get gsi from dev")
adds a weak reverse dependency to the config XEN_PRIVCMD definition, that
dependency causes xen-privcmd can't be loaded on domU, because dependent
xen-pciback isn't always be loaded successfully on domU.
To solve above problem, remove that dependency, and do not call
pcistub_get_gsi_from_sbdf() directly, instead add a hook in
drivers/xen/apci.c, xen-pciback register the real call function, then in
privcmd_ioctl_pcidev_get_gsi call that hook.
Fixes: 2fae6bb7be ("xen/privcmd: Add new syscall to get gsi from dev")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241012084537.1543059-1-Jiqian.Chen@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
as part of IBPB. Make sure that happens by doing the flushing in
software on those generations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmcI+6UACgkQEsHwGGHe
VUrYVw/+JmJHckfzI1jqc+FIGkG124X5l5ml3nUwExyL5anZk3KY6QEhvEjk8xgt
5pLYaHd76W21DWsP5AOXwsLAOBptsN8E7zwmG4Wg4H9EOkcQYDujlm0a8ne+zmqk
NQh/y7NzACruYJoDzo0S89Gcz2IUZ3C5HTKp9GUor4cUOw1wZsEm5RHAkl+SlK9j
amGq4ABO6xE6UjnrZMDW1uo253nCTZjH9DZvwzzLXULaAQjTvn6lowSPCJWZezNh
ue2Tdl/GYo6qbHyd7OYK4N4IxWNJujHLlcIXJ/mU3EPVKBh98f3SZakvoXMuWkBL
KS5xxHf86Un+8UM59ZYIK8263O8CmlgmOosk+wPV2DZfnomG/dxoYvaZ7x41X2I+
xdGMiHBP3SaQmqIxdvVCbtBIoLLd5MQ/JtAcDuLM4pbXBgLTxSfF7fDb4OtpuCwe
QybeQ33QNCAn63DT+3bbWKxQpzC9vpu2+t48XV9a/rgQpsnMBodFP6RSxXxBsH4I
zRDMoeyfn1mTiGRbbuhwNq52M01L1G8bkJ5sX0m/PB5XGhkg46998i93W8yAKKHY
5sF1sP53idK94CNcA0fs/Z8ZoKmkszoh8GDAn3Pb+eP+m9f7EDL9AVanBzerJhfJ
i69EfM9r0ESIkmmmjaycn2EVzwQ5Vtv1r/LgDI4up2bQzbPS6So=
=ovJa
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 IBPB fixes from Borislav Petkov:
"This fixes the IBPB implementation of older AMDs (< gen4) that do not
flush the RSB (Return Address Stack) so you can still do some leaking
when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
the flushing in software on those generations.
IBPB is not the default setting so this is not likely to affect
anybody in practice"
* tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
x86/bugs: Skip RSB fill at VMEXIT
x86/entry: Have entry_ibpb() invalidate return predictions
x86/cpufeatures: Add a IBPB_NO_RET BUG flag
x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
The barrier_nospec() after the array bounds check is overkill and
painfully slow for arches which implement it.
Furthermore, most arches don't implement it, so they remain exposed to
Spectre v1 (which can affect pretty much any CPU with branch
prediction).
Instead, clamp the user pointer to a valid range so it's guaranteed to
be a valid array index even when the bounds check mispredicts.
Fixes: 8270cb10c0 ("cdrom: Fix spectre-v1 gadget")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/1d86f4d9d8fba68e5ca64cdeac2451b95a8bf872.1729202937.git.jpoimboe@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is the usual shower of unrelated singletons - please see the individual
changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZxGY5wAKCRDdBJ7gKXxA
js6RAQC16zQ7WRV091i79cEi1C5648NbZjMCU626hZjuyfbzKgEA2v8PYtjj9w2e
UGLxMY+PYZki2XNEh75Sikdkiyl9Vgg=
=xcWT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"28 hotfixes. 13 are cc:stable. 23 are MM.
It is the usual shower of unrelated singletons - please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: add regression test for spanning store bug
maple_tree: correct tree corruption on spanning store
mm/mglru: only clear kswapd_failures if reclaimable
mm/swapfile: skip HugeTLB pages for unuse_vma
selftests: mm: fix the incorrect usage() info of khugepaged
MAINTAINERS: add Jann as memory mapping/VMA reviewer
mm: swap: prevent possible data-race in __try_to_reclaim_swap
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
MAINTAINERS: kasan, kcov: add bugzilla links
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
mm: remove unused stub for can_swapin_thp()
mailmap: add an entry for Andy Chiu
MAINTAINERS: add memory mapping/VMA co-maintainers
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
...
- Terminate the of_device_id table in the Samsung exynosautov920 clk
driver so that device matching logic doesn't run off the end of the
array into other memory and break matching for any kernel with this
driver loaded
- Properly limit the max clk ID in the Rockchip clk driver
- Use clk kunit helpers in the clk tests so that memory isn't leaked
after the test concludes
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmcRk/8RHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSVu4RAA2qKoXfSdElkgzpv5fW2Ydr0LbN5pS90B
4ey6QfVxco6imyvry/MVREIQVG0832otv1WMJ2D6kErCxvzo5bNf1vsXcg7h5Gci
VJELgtgKUPkVC2S3UgXXJe7cTobZhDUoin/mfCXgHh1yk6rg5t0gElP+qrxto1qH
Tan7cGpqW8CcK33M2BnYwS0LX1gSXQ3EN9R5opyIQGX6OFvBRXYk8GH8g3WTPBRt
OtJoR3u7PZ95U3FfHuSGgTddD2r9ZlI0lCY95+RpDkF0cL2yIkVMU8GZHZH5DDpH
iMGEFZ/QxC11BLjlleSPQsQJjPFJQ/lz7ZlJ8/c7zMFhmBusBu2Tk5aC8w7Dxa4o
B1rErFxg63sFVTY1iC1gyaPYnNtIkIZe287YyIeO+Sr6aW6hEsLbDXtvfydAYJL1
w4wLwrOSgv9ncoEzaEnKoz62pgZbg/o3wB3fLLVrtBtdCaTZMSWRettHV1DdSLOZ
gPQQtp6F0VgzW+ip7mPYYy+DKO5jErg1OtBmCszFgDvmKKDloHzaE9lP7j3YNFbm
GaAaD/dM8muOJ+gvdVja/K2WEyFsec9vnABOGlb5LTmZtfHajrJl9jyaJ+9ILPi1
1xcWK5FH6m4NJygC7TcEUq5KZBZeEoe+9Z2hgIflX5XmL9aLeZOuGPvz6r2moICw
ztsggsLsIYU=
=EKY/
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two clk driver fixes and a unit test fix:
- Terminate the of_device_id table in the Samsung exynosautov920 clk
driver so that device matching logic doesn't run off the end of the
array into other memory and break matching for any kernel with this
driver loaded
- Properly limit the max clk ID in the Rockchip clk driver
- Use clk kunit helpers in the clk tests so that memory isn't leaked
after the test concludes"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: test: Fix some memory leaks
clk: rockchip: fix finding of maximum clock ID
clk: samsung: Fix out-of-bound access of of_match_node()
Add a maintainers entry now that the PREEMPT_RT bits are merged. Steven
volunteered and asked for the list.
There are no files associated with this entry since it is spread over the
kernel. It serves as entry for people knowing what they look for. There is
a keyword added so if PREEMPT_RT is mentioned somewhere, then the entry
will be picked up.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Pavel Machek <pavel@denx.de>
Link: https://lore.kernel.org/all/20241015151132.Erx81G9f@linutronix.de
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai@intel.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
The current policy management makes it impossible to use IPE
in a general purpose distribution. In such cases the users are not
building the kernel, the distribution is, and access to the private
key included in the trusted keyring is, for obvious reason, not
available.
This means that users have no way to enable IPE, since there will
be no built-in generic policy, and no access to the key to sign
updates validated by the trusted keyring.
Just as we do for dm-verity, kernel modules and more, allow the
secondary and platform keyrings to also validate policies. This
allows users enrolling their own keys in UEFI db or MOK to also
sign policies, and enroll them. This makes it sensible to enable
IPE in general purpose distributions, as it becomes usable by
any user wishing to do so. Keys in these keyrings can already
load kernels and kernel modules, so there is no security
downgrade.
Add a kconfig each, like dm-verity does, but default to enabled if
the dependencies are available.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
[FW: fixed some style issues]
Signed-off-by: Fan Wu <wufan@kernel.org>
Currently IPE accepts an update that has the same version as the policy
being updated, but it doesn't make it a no-op nor it checks that the
old and new policyes are the same. So it is possible to change the
content of a policy, without changing its version. This is very
confusing from userspace when managing policies.
Instead change the update logic to reject updates that have the same
version with ESTALE, as that is much clearer and intuitive behaviour.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
When loading policies in userspace we want a recognizable error when an
update attempts to use an old policy, as that is an error that needs
to be treated differently from an invalid policy. Use -ESTALE as it is
clear enough for an update mechanism.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
We no more need acquiring ctrl->lock before accessing the
NVMe controller state and instead we can now use the helper
nvme_ctrl_state. So replace the use of ctrl->lock from
nvme_keep_alive_finish function with nvme_ctrl_state call.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (invoked while shutting down controller) and hw/hctx
queue dispatcher called from the nvme keep-alive async request queuing
operation. This race could lead to the kernel crash shown below:
Call Trace:
autoremove_wake_function+0x0/0xbc (unreliable)
__blk_mq_sched_dispatch_requests+0x114/0x24c
blk_mq_sched_dispatch_requests+0x44/0x84
blk_mq_run_hw_queue+0x140/0x220
nvme_keep_alive_work+0xc8/0x19c [nvme_core]
process_one_work+0x200/0x4e0
worker_thread+0x340/0x504
kthread+0x138/0x140
start_kernel_thread+0x14/0x18
While shutting down fabric controller, if nvme keep-alive request sneaks
in then it would be flushed off. The nvme_keep_alive_end_io function is
then invoked to handle the end of the keep-alive operation which
decrements the admin->q_usage_counter and assuming this is the last/only
request in the admin queue then the admin->q_usage_counter becomes zero.
If that happens then blk-mq destroy queue operation (blk_mq_destroy_
queue()) which could be potentially running simultaneously on another
cpu (as this is the controller shutdown code path) would forward
progress and deletes the admin queue. So, now from this point onward
we are not supposed to access the admin queue resources. However the
issue here's that the nvme keep-alive thread running hw/hctx queue
dispatch operation hasn't yet finished its work and so it could still
potentially access the admin queue resource while the admin queue had
been already deleted and that causes the above crash.
This fix helps avoid the observed crash by implementing keep-alive as a
synchronous operation so that we decrement admin->q_usage_counter only
after keep-alive command finished its execution and returns the command
status back up to its caller (blk_execute_rq()). This would ensure that
fabric shutdown code path doesn't destroy the fabric admin queue until
keep-alive request finished execution and also keep-alive thread is not
running hw/hctx queue dispatch operation.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
While shutting down loop controller, we first quiesce the admin/IO queue,
delete the admin/IO tag-set and then at last destroy the admin/IO queue.
However it's quite possible that during the window between quiescing and
destroying of the admin/IO queue, some admin/IO request might sneak in
and if that happens then we could potentially encounter a hung task
because shutdown operation can't forward progress until any pending I/O
is flushed off.
This commit helps ensure that before destroying the admin/IO queue, we
unquiesce the admin/IO queue so that any outstanding requests, which are
added after the admin/IO queue is quiesced, are now flushed to its
completion.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
print_reg_state() should not consider adding reg->off to reg->var_off.value
when dumping scalars. Scalars can be produced with reg->off != 0 through
BPF_ADD_CONST, and thus as-is this can skew the register log dump.
Fixes: 98d7ca374b ("bpf: Track delta between "linked" registers.")
Reported-by: Nathaniel Theis <nathaniel.theis@nccgroup.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016134913.32249-2-daniel@iogearbox.net
Nathaniel reported a bug in the linked scalar delta tracking, which can lead
to accepting a program with OOB access. The specific code is related to the
sync_linked_regs() function and the BPF_ADD_CONST flag, which signifies a
constant offset between two scalar registers tracked by the same register id.
The verifier attempts to track "similar" scalars in order to propagate bounds
information learned about one scalar to others. For instance, if r1 and r2
are known to contain the same value, then upon encountering 'if (r1 != 0x1234)
goto xyz', not only does it know that r1 is equal to 0x1234 on the path where
that conditional jump is not taken, it also knows that r2 is.
Additionally, with env->bpf_capable set, the verifier will track scalars
which should be a constant delta apart (if r1 is known to be one greater than
r2, then if r1 is known to be equal to 0x1234, r2 must be equal to 0x1233.)
The code path for the latter in adjust_reg_min_max_vals() is reached when
processing both 32 and 64-bit addition operations. While adjust_reg_min_max_vals()
knows whether dst_reg was produced by a 32 or a 64-bit addition (based on the
alu32 bool), the only information saved in dst_reg is the id of the source
register (reg->id, or'ed by BPF_ADD_CONST) and the value of the constant
offset (reg->off).
Later, the function sync_linked_regs() will attempt to use this information
to propagate bounds information from one register (known_reg) to others,
meaning, for all R in linked_regs, it copies known_reg range (and possibly
adjusting delta) into R for the case of R->id == known_reg->id.
For the delta adjustment, meaning, matching reg->id with BPF_ADD_CONST, the
verifier adjusts the register as reg = known_reg; reg += delta where delta
is computed as (s32)reg->off - (s32)known_reg->off and placed as a scalar
into a fake_reg to then simulate the addition of reg += fake_reg. This is
only correct, however, if the value in reg was created by a 64-bit addition.
When reg contains the result of a 32-bit addition operation, its upper 32
bits will always be zero. sync_linked_regs() on the other hand, may cause
the verifier to believe that the addition between fake_reg and reg overflows
into those upper bits. For example, if reg was generated by adding the
constant 1 to known_reg using a 32-bit alu operation, then reg->off is 1
and known_reg->off is 0. If known_reg is known to be the constant 0xFFFFFFFF,
sync_linked_regs() will tell the verifier that reg is equal to the constant
0x100000000. This is incorrect as the actual value of reg will be 0, as the
32-bit addition will wrap around.
Example:
0: (b7) r0 = 0; R0_w=0
1: (18) r1 = 0x80000001; R1_w=0x80000001
3: (37) r1 /= 1; R1_w=scalar()
4: (bf) r2 = r1; R1_w=scalar(id=1) R2_w=scalar(id=1)
5: (bf) r4 = r1; R1_w=scalar(id=1) R4_w=scalar(id=1)
6: (04) w2 += 2147483647; R2_w=scalar(id=1+2147483647,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
7: (04) w4 += 0 ; R4_w=scalar(id=1+0,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
8: (15) if r2 == 0x0 goto pc+1
10: R0=0 R1=0xffffffff80000001 R2=0x7fffffff R4=0xffffffff80000001 R10=fp0
What can be seen here is that r1 is copied to r2 and r4, such that {r1,r2,r4}.id
are all the same which later lets sync_linked_regs() to be invoked. Then, in
a next step constants are added with alu32 to r2 and r4, setting their ->off,
as well as id |= BPF_ADD_CONST. Next, the conditional will bind r2 and
propagate ranges to its linked registers. The verifier now believes the upper
32 bits of r4 are r4=0xffffffff80000001, while actually r4=r1=0x80000001.
One approach for a simple fix suitable also for stable is to limit the constant
delta tracking to only 64-bit alu addition. If necessary at some later point,
BPF_ADD_CONST could be split into BPF_ADD_CONST64 and BPF_ADD_CONST32 to avoid
mixing the two under the tradeoff to further complicate sync_linked_regs().
However, none of the added tests from dedf56d775 ("selftests/bpf: Add tests
for add_const") make this necessary at this point, meaning, BPF CI also passes
with just limiting tracking to 64-bit alu addition.
Fixes: 98d7ca374b ("bpf: Track delta between "linked" registers.")
Reported-by: Nathaniel Theis <nathaniel.theis@nccgroup.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20241016134913.32249-1-daniel@iogearbox.net
Previously test_task_tid was setting `linfo.task.tid`
to `getpid()` which is the same as `gettid()` for the
parent process. Instead create a new child thread
and set `linfo.task.tid` to `gettid()` to make sure
the tid filtering logic is working as expected.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016210048.1213935-2-linux@jordanrome.com
In userspace, you can add a tid filter by setting
the "task.tid" field for "bpf_iter_link_info".
However, `get_pid_task` when called for the
`BPF_TASK_ITER_TID` type should have been using
`PIDTYPE_PID` (tid) instead of `PIDTYPE_TGID` (pid).
Fixes: f0d74c4da1 ("bpf: Parameterize task iterators.")
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241016210048.1213935-1-linux@jordanrome.com
nvme_dev_disable() modifies the dev->online_queues field, therefore
nvme_pci_update_nr_queues() should avoid racing against it, otherwise
we could end up passing invalid values to blk_mq_update_nr_hw_queues().
WARNING: CPU: 39 PID: 61303 at drivers/pci/msi/api.c:347
pci_irq_get_affinity+0x187/0x210
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
RIP: 0010:pci_irq_get_affinity+0x187/0x210
Call Trace:
<TASK>
? blk_mq_pci_map_queues+0x87/0x3c0
? pci_irq_get_affinity+0x187/0x210
blk_mq_pci_map_queues+0x87/0x3c0
nvme_pci_map_queues+0x189/0x460 [nvme]
blk_mq_update_nr_hw_queues+0x2a/0x40
nvme_reset_work+0x1be/0x2a0 [nvme]
Fix the bug by locking the shutdown_lock mutex before using
dev->online_queues. Give up if nvme_dev_disable() is running or if
it has been executed already.
Fixes: 949928c1c7 ("NVMe: Fix possible queue use after freed")
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
- Disable software tag-based KASAN when compiling with GCC, as functions
are incorrectly instrumented leading to a crash early during boot.
- Fix pkey configuration for kernel threads when POE is enabled.
- Fix invalid memory accesses in uprobes when targetting load-literal
instructions.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmcPrzQQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNIr6B/wN+o1xI7Fv/QdlaTuKYLvOOg/XTl6sbUDj
YssxtjhpKuaFVG4zJHNsWvgUqO+YCM7m3F1L8LVPMF7l2xoKtRTIB1Ye315hTjYm
dW5Te6xBMVKF8SVxE8sBbZobdokIW1JNPBrvGvHO3d5ujmofzwHU8RNMXuTUItRw
z85Qy75FkEDTEbsWhS3VL5HOgEr+k0TYDRa8SXwKWVj7/rYna3tO39kIdS5dt9VX
wDJbnxtWJMhiHmDnevFFhBkSZrips12P1Rb6HUSmhpUJh0Rk4TAZntSl2f/lr+jA
PuboBbSG68UOCwAHoNmTcLdFhkiNaiyw4w2F7hk2A6aNRtme+bT0
=M/ug
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Disable software tag-based KASAN when compiling with GCC, as
functions are incorrectly instrumented leading to a crash early
during boot
- Fix pkey configuration for kernel threads when POE is enabled
- Fix invalid memory accesses in uprobes when targetting load-literal
instructions
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kasan: Disable Software Tag-Based KASAN with GCC
Documentation/protection-keys: add AArch64 to documentation
arm64: set POR_EL0 for kernel threads
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Remove broken LDR (literal) uprobe support
Most of the fixes this time are for platform specific drivers, addressing
issues found through build testing on freescale, ep93xx, starfive,
and npcm platforms, as as well as the ffa firmware.
The fixes for the scmi firmware driver address compatibility problems
found on broadcom machines.
There are only two devicetree fixes, addressing incorrect in configuration
on broadcom and marvell machines.
The changes to the Documentation and MAINTAINERS files are for
clarification only.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmcRMFQACgkQYKtH/8kJ
Uiflnw//QtSzQtwZpe9LQBq+1jS/q2y1dAsgNqP+CYB0nNsO6MSIEUDpjh8TRYpP
uuRg9oN3xdtT6xlFGGEMdsOc+DCcbOakW+CFoxgvvxi2AiFpvdtiZpHgse5DtVfl
HQYSSLQvttwtjDlNjPmw58F1GR+3FzXh0mJS8N03bP+k8yJxVrSff4TJTFEHBLrG
mrorC+qfBaB7djvmjBolPNt3qMB5pXVYio3ZyflHFdxxUHjnrBVkWpmLE2BQksJt
I5nbl8vFqfLhLFCsqyCm4gC0gDSdxsWhHuRpOzXYQJKHxWpg/uYu4TOsKxVAc/Nc
nZNOdeQ0C7pU6yiyDD1jqWW0l98itHVQOvz6dxE2wZpL1duOqQ7yx1DJfJw4V3PP
Cn66mcp9Vrh0nYpZxhGfOs3wibhbJ0NQYhC4jaddVrxfcGuJc3jTR+bnkK+K+b8b
nt7FlLe/vHFvaahGclgeg63wpTqmPAc0eSMF8YJDo1bTP8uF6Km6oXo/kaQQywBm
C3mAXJn2rtn+4sPx2N3tjGAhnpK3cZZj/9QA82WjN/Cm7vzXwRgrt0j4S3weGxGI
rPo2tNC8wJT+gXAPDWvRwyjqhOoCYbWgQ/9xkF5kQNe+cktvdU7gzGiejYhWl9Oz
3eLb6IfZbuMTsVWhQuNSJe4LAp+/18M0T0rpzbL3VIJ5XuP3n64=
=h897
-----END PGP SIGNATURE-----
Merge tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Most of the fixes this time are for platform specific drivers,
addressing issues found through build testing on freescale, ep93xx,
starfive, and npcm platforms, as as well as the ffa firmware.
The fixes for the scmi firmware driver address compatibility problems
found on broadcom machines.
There are only two devicetree fixes, addressing incorrect in
configuration on broadcom and marvell machines.
The changes to the Documentation and MAINTAINERS files are for
clarification only"
* tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
firmware: arm_scmi: Queue in scmi layer for mailbox implementation
firmware: arm_ffa: Avoid string-fortify warning in export_uuid()
firmware: arm_scmi: Give SMC transport precedence over mailbox
firmware: arm_scmi: Fix the double free in scmi_debugfs_common_setup()
Documentation/process: maintainer-soc: clarify submitting patches
dmaengine: cirrus: check that output may be truncated
dmaengine: cirrus: ERR_CAST() ioremap error
MAINTAINERS: use the canonical soc mailing list address and mark it as L:
ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin
arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbers
soc: fsl: cpm1: qmc: Fix unused data compilation warning
soc: fsl: cpm1: qmc: Do not use IS_ERR_VALUE() on error pointers
reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC
reset: npcm: convert comma to semicolon
A collection of small fixes, nothing really stands out.
- Usual HD-audio quirks / device-specific fixes
- Kconfig dependency fix for UM
- A series of minor fixes for SoundWire
- Updates of USB-audio LINE6 contact address
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmcRH54OHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE908g//Y+EXKVNOkt//gIR4lPput7L0xXuiw09YU6Hy
DPyuApYSS9z3VfVOE215dmJpgINw9lS1q7zHMEncLFA56oLEB8gxpNrha93GCCVQ
79Ov4oiIvOZpl7SGdN3ijs3Xf+uyGd3qEyAlyfRe2MHGw1WG7vDAKSKWU5ss5SGU
jai0t446Mxiyg5IkcHV0d0HP8+WlIayxergHihKrTgrQ1BwCdrcqt7KYF1vYPFZO
g87pnqOJ608bOwdkEQ47PraNv91futyQr+S0XYr53BTsCxgZ/m+kUPRGV7cB76wL
2UosHKwIBTnvXW6rBVNAzcs3KveuIyCSWTfvUJUGOe6mhAh5+t5BwiZbQJK9x2Vc
ragsZ2MIdXhymUJ7hLujQ4E/2JCzJJPdoZXXUZfWmp23yH/ei4ylrOr2D/LntZ5g
NVADlxCYPlYxiF3Fd+rv21rW0Zkk8kjMWsCxomzF1uE+wrMgJMlgBYvEVxZbI+P7
sGluv5v51g85X1RDrsofCbzt0IgKQd7RjLj1/dcgH3qEZVU7uuVV/tacHIayXCxf
GNHD09+Y93Uwt7CMSnUMCf0r9/64AHXLgeRQNsLb39qjGGTZZFzx5vIPJDKBdeoK
ail9LcynfbWi73tHFbAd4hDelQUIRo3H6CNR1nOvI7SfweqywDmngW/OWanbmGPG
VEBQ01Q=
=YI8n
-----END PGP SIGNATURE-----
Merge tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, nothing really stands out:
- Usual HD-audio quirks / device-specific fixes
- Kconfig dependency fix for UM
- A series of minor fixes for SoundWire
- Updates of USB-audio LINE6 contact address"
* tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects
ALSA: scarlett2: Add error check after retrieving PEQ filter values
ALSA: hda/cs8409: Fix possible NULL dereference
sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
ALSA: line6: update contact information
ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set()
ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2
ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024
Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmcRDoMSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOklXoP/2VKcUDYMfP03vB6a60riiqMcD+GVZzm
lLaa9xBiDdEsnL+4dcQLF7tecYbqpIqh+0ZeEe0aXgp0HjIm2Im1eiXv5cB7INxK
n1lxVibI/zD2j2M9cy2NDTeeYSI29GH98g0IkLrU9vnfsrp5jRuXFttrCmotzesZ
tYb200cVMR/nt9rrG3aNAxTUHjaykgpKh/zSvFC0MRStXFfUbIia88LzcQOQzEK3
jcWMj8jsJHkDLhUHS8LVZEV1JDYIb+QeEjGz5LxMpQV5/T5sP7Ki4ISPxpUbYNrZ
QWwSrFSg2ivaq8PkQ2LBTKAtiHyGlcEJtnlSf7Y50cpc9RNphClq8YMSBUCcGTEi
3W18eVIZ0aj1omTeHEQLbMkqT0soTwYxskDW0uCCFKf/SRCQm1ixrpQiA6PGP266
e0q7mMD7v4S9ZdO5VF+oYzf1fF0OhaOkZtUEsWjxHite6ujh8719EPbUoee4cqPL
ofoHYF338BlYl4YIZERZMff8HJD4+PR0R8c9SIBXQcGUMvf2mQdvaD9q2MGySSJd
4mlH6bE8Ckj4KVp9llxB8zyw/5OmJdMZ+ar4o7+CX2Z3Wrs1SMSTy7qAADcV2B2G
S9kvwNDffak+N94N/MqswQM4se5yo8dmyUnhqChBFYPFWc4N7v6wwKelROigqLL7
7Xhj2TXRBGKs
=ukTi
-----END PGP SIGNATURE-----
Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in
vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration
for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers"
* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5e: Don't call cleanup on profile rollback failure
net/mlx5: Unregister notifier on eswitch init failure
net/mlx5: Fix command bitmask initialization
net/mlx5: Check for invalid vector index on EQ creation
net/mlx5: HWS, use lock classes for bwc locks
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
net/mlx5: HWS, fixed double free in error flow of definer layout
net/mlx5: HWS, removed wrong access to a number of rules variable
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
net: phy: mdio-bcm-unimac: Add BCM6846 support
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
udp: Compute L4 checksum as usual when not segmenting the skb
genetlink: hold RCU in genlmsg_mcast()
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
...
Not all virtual addresses have physical addresses, such as if they were
vmalloc'd. Just trace the virtual address instead of trying to trace a
physical address. This aligns with the API, and is good enough to
associate dma_alloc with dma_free.
Fixes: 038eb433dc ("dma-mapping: add tracing for dma-mapping API calls")
Reported-by: syzbot+b4bfacdec173efaa8567@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/670ebde5.050a0220.d9b66.0154.GAE@google.com/
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a regression test to assert that, when performing a spanning store
which consumes the entirety of the rightmost right leaf node does not
result in maple tree corruption when doing so.
This achieves this by building a test tree of 3 levels and establishing a
store which ultimately results in a spanned store of this nature.
Link: https://lkml.kernel.org/r/30cdc101a700d16e03ba2f9aa5d83f2efa894168.1728314403.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit f8d112a4e6
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit f8d112a4e6 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
According to the prototype formal BPF memory consistency model
discussed e.g. in [1] and following the ordering properties of
the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation
with the BPF_CMPXCHG modifier is fully ordered. However, the
current RISC-V JIT lowerings fail to meet such memory ordering
property. This is illustrated by the following litmus test:
BPF BPF__MP+success_cmpxchg+fence
{
0:r1=x; 0:r3=y; 0:r5=1;
1:r2=y; 1:r4=f; 1:r7=x;
}
P0 | P1 ;
*(u64 *)(r1 + 0) = 1 | r1 = *(u64 *)(r2 + 0) ;
r2 = cmpxchg_64 (r3 + 0, r4, r5) | r3 = atomic_fetch_add((u64 *)(r4 + 0), r5) ;
| r6 = *(u64 *)(r7 + 0) ;
exists (1:r1=1 /\ 1:r6=0)
whose "exists" clause is not satisfiable according to the BPF
memory model. Using the current RISC-V JIT lowerings, the test
can be mapped to the following RISC-V litmus test:
RISCV RISCV__MP+success_cmpxchg+fence
{
0:x1=x; 0:x3=y; 0:x5=1;
1:x2=y; 1:x4=f; 1:x7=x;
}
P0 | P1 ;
sd x5, 0(x1) | ld x1, 0(x2) ;
L00: | amoadd.d.aqrl x3, x5, 0(x4) ;
lr.d x2, 0(x3) | ld x6, 0(x7) ;
bne x2, x4, L01 | ;
sc.d x6, x5, 0(x3) | ;
bne x6, x4, L00 | ;
fence rw, rw | ;
L01: | ;
exists (1:x1=1 /\ 1:x6=0)
where the two stores in P0 can be reordered. Update the RISC-V
JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with
RELEASE ("rl") annotation in order to meet the expected memory
ordering guarantees. The resulting RISC-V JIT lowerings of
BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg().
Other lowerings were fixed via 20a759df3b ("riscv, bpf: make
some atomic operations fully ordered").
Fixes: dd642ccb45 ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lpc.events/event/18/contributions/1949/attachments/1665/3441/bpfmemmodel.2024.09.19p.pdf [1]
Link: https://lore.kernel.org/bpf/20241017143628.2673894-1-parri.andrea@gmail.com
When the sqpoll is exiting and cancels pending work items, it may need
to run task_work. If this happens from within io_uring_cancel_generic(),
then it may be under waiting for the io_uring_task waitqueue. This
results in the below splat from the scheduler, as the ring mutex may be
attempted grabbed while in a TASK_INTERRUPTIBLE state.
Ensure that the task state is set appropriately for that, just like what
is done for the other cases in io_run_task_work().
do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000029387fd2>] prepare_to_wait+0x88/0x2fc
WARNING: CPU: 6 PID: 59939 at kernel/sched/core.c:8561 __might_sleep+0xf4/0x140
Modules linked in:
CPU: 6 UID: 0 PID: 59939 Comm: iou-sqp-59938 Not tainted 6.12.0-rc3-00113-g8d020023b155 #7456
Hardware name: linux,dummy-virt (DT)
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __might_sleep+0xf4/0x140
lr : __might_sleep+0xf4/0x140
sp : ffff80008c5e7830
x29: ffff80008c5e7830 x28: ffff0000d93088c0 x27: ffff60001c2d7230
x26: dfff800000000000 x25: ffff0000e16b9180 x24: ffff80008c5e7a50
x23: 1ffff000118bcf4a x22: ffff0000e16b9180 x21: ffff0000e16b9180
x20: 000000000000011b x19: ffff80008310fac0 x18: 1ffff000118bcd90
x17: 30303c5b20746120 x16: 74657320313d6574 x15: 0720072007200720
x14: 0720072007200720 x13: 0720072007200720 x12: ffff600036c64f0b
x11: 1fffe00036c64f0a x10: ffff600036c64f0a x9 : dfff800000000000
x8 : 00009fffc939b0f6 x7 : ffff0001b6327853 x6 : 0000000000000001
x5 : ffff0001b6327850 x4 : ffff600036c64f0b x3 : ffff8000803c35bc
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000e16b9180
Call trace:
__might_sleep+0xf4/0x140
mutex_lock+0x84/0x124
io_handle_tw_list+0xf4/0x260
tctx_task_work_run+0x94/0x340
io_run_task_work+0x1ec/0x3c0
io_uring_cancel_generic+0x364/0x524
io_sq_thread+0x820/0x124c
ret_from_fork+0x10/0x20
Cc: stable@vger.kernel.org
Fixes: af5d68f889 ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
vsock_bpf_prot is set up at runtime. Remove the superfluous init.
No functional change intended.
Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which
then confused SOCK_SEQPACKET recv(). Decrease the counter.
Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt()
calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after
vsock_transport::read_skb().
While here, also inform the peer that we've freed up space and it has more
credit.
Failing to update rx_bytes after packet is dequeued leads to a warning on
SOCK_STREAM recv():
[ 233.396654] rx_queue is empty, but rx_bytes is non-zero
[ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589
Fixes: 634f1a7110 ("vsock: support sockmap")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.
Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
Tariq Toukan says:
====================
mlx5 misc fixes 2024-10-15
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
Series generated against:
commit 174714f0e5 ("selftests: drivers: net: fix name not defined")
====================
Link: https://patch.msgid.link/20241015093208.197603-1-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.
In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.
Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]
Fix it by Initialize command bitmask correctly.
While at it, re-write the code for better understanding.
[1]
BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078
CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
kasan_report+0xb9/0xf0
kasan_check_range+0xec/0x190
mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
mlx5_cmd_flush+0x94/0x240 [mlx5_core]
enter_error_state+0x6c/0xd0 [mlx5_core]
mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
process_one_work+0x787/0x1490
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0xda0/0xda0
? assign_work+0x168/0x240
worker_thread+0x586/0xd30
? rescuer_thread+0xae0/0xae0
kthread+0x2df/0x3b0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x2d/0x70
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
Fixes: 9b98d395b8 ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>