KVM selftests changes for 6.12:
- Fix a goof that caused some Hyper-V tests to be skipped when run on bare
metal, i.e. NOT in a VM.
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
- Explicitly include one-off assets in .gitignore. Past Sean was completely
wrong about not being able to detect missing .gitignore entries.
- Verify userspace single-stepping works when KVM happens to handle a VM-Exit
in its fastpath.
- Misc cleanups
KVM x86 misc changes for 6.12
- Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
functionality that is on the horizon).
- Rework common MSR handling code to suppress errors on userspace accesses to
unsupported-but-advertised MSRs. This will allow removing (almost?) all of
KVM's exemptions for userspace access to MSRs that shouldn't exist based on
the vCPU model (the actual cleanup is non-trivial future work).
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
stores the entire 64-bit value a the ICR offset.
- Fix a bug where KVM would fail to exit to userspace if one was triggered by
a fastpath exit handler.
- Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
there's already a pending wake event at the time of the exit.
- Finally fix the RSM vs. nested VM-Enter WARN by forcing the vCPU out of
guest mode prior to signalling SHUTDOWN (architecturally, the SHUTDOWN is
supposed to hit L1, not L2).
KVK generic changes for 6.12:
- Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/PIO in many cases, clean up the related code, and add a testcase.
- Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
the gpa+len crosses a page boundary, which thankfully is guaranteed to not
happen in the current code base. Add WARNs in more helpers that read/write
guest memory to detect similar bugs.
Today whenever a memslot is moved or deleted, KVM invalidates the entire
page tables and generates fresh ones based on the new memslot layout.
This behavior traditionally was kept because of a bug which was never
fully investigated and caused VM instability with assigned GeForce
GPUs. It generally does not have a huge overhead, because the old
MMU is able to reuse cached page tables and the new one is more
scalabale and can resolve EPT violations/nested page faults in parallel,
but it has worse performance if the guest frequently deletes and
adds small memslots, and it's entirely not viable for TDX. This is
because TDX requires re-accepting of private pages after page dropping.
For non-TDX VMs, this series therefore introduces the
KVM_X86_QUIRK_SLOT_ZAP_ALL quirk, enabling users to control the behavior
of memslot zapping when a memslot is moved/deleted. The quirk is turned
on by default, leading to the zapping of all SPTEs when a memslot is
moved/deleted; users however have the option to turn off the quirk,
which limits the zapping only to those SPTEs hat lie within the range
of memslot being moved/deleted.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A fairly big update at this time, both in core and driver sides.
The core received rewrites in PCM buffer allocation handling and
locking optimizations, PCM rate updates followed by lots of cleanups.
In ASoC side, the legacy Intel drivers have been deprecated by AVS
drivers which leaded to the significant amount of code reduction.
SoundWire driver updates and other cleanups contributed more code
reduction, too.
USB-audio driver received a large cleanup of its big quirk table, and
the old snd_print*() API usages in many legacy drivers are replaced
with the standard print API.
Here are some highlights:
Core:
- More optimized locking in ALSA control code
- Rewrites of memalloc helpers for better DMA API usage
- Drop of obsoleted vmalloc PCM buffer helper API
- Continued MIDI2 UMP updates
- Support of a new user-space driven timer instance
- Update for more PCM support rates and cleanups
- Xrun counter report in the proc files
ASoC:
- Continued simplification and cleanup works for ASoC
- Extensive cleanups and refactoring of the Soundwire drivers
- Removal of Intel machine support obsoleted by the AVS driver
- Lots of DT schema conversions
- Machine support for many AMD and Intel x86 platforms
- Support for AMD ACP 7.1, Mediatek MT6367 and MT8365, Realtek RTL1320
SoundWire and rev C, and Texas Instruments TAS2563
USB-audio:
- Add support of multiple control interfaces
- A large rewrite of quirk table with macros
- Support for RME Digiface USB
HD-audio:
- Cleanup of quirk code for Samsung Galaxy laptops
- Clean up of detection of Cirrus codecs
- C-Media CM9825 HD-audio codec support
Others:
- Rewrites to standard print API in a lot of legacy drivers
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmblvDMOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE823BAAktHgwGbgu+s/U4osgk5M+x1IAzbbRFDEEhuG
Pck6K1NikgUGXg/x/m6O0/M4CmLcGv7NeebD4ihJJPxdK7fpsEOcIeCiPoWfpumN
whtrzf6DP6gMxrE/ov4qUydItuCGVNWcEF/bWv7inEcoJ+qtqiRAWLGvpwQurrvn
NwO+9V/L8NSTWiZVX5ve1+hVVxpLoEQEhRpvMfrVyPXgX0zXgSexka9pwSdb+3xD
vkIKQ1ju1JD8HG6JLfsIOBQYndrz3KLYWhozzrPKh+hGz3vOkhUPrfhYz5hyoWO9
Ep95ZHF4ynAIV0pHlsQTH79BmkxmAJKVQImYHOnOWDvL4T6OVpoY6bzIMXzE9IHJ
p/5JkG422qguoqIEBhM1mkggdXXIjwARFEtqQs+NvUErAd2Pnckl38TSrBtswa1c
FcEjVq8MfIMFroDIPbEt6UY5K5GLWjwFG8rYFYbbEI4qIMLYSi4pbGtedpGxVZ4P
eZGbAlAL6cpzXhTh90maA+NXSyeZUl9Tg8aHF48WjkU8LsEi9fHW/YU8JYyMfyQ3
nYWAZocvXOlIpul8MOPVOg1vXpFKhSVXITKXolQQK1e/C3PirfWsrDxbdF8HduTi
tfVGPiHprwPw2PE0E7ZqjBO1nRLMGcCqv2Iz69lFisPprDJr75C4voPDK+rjo7We
YIhyUMU=
=HLUp
-----END PGP SIGNATURE-----
Merge tag 'sound-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"A fairly big update at this time, both in core and driver sides.
The core received rewrites in PCM buffer allocation handling and
locking optimizations, PCM rate updates followed by lots of cleanups.
In ASoC side, the legacy Intel drivers have been deprecated by AVS
drivers which leaded to the significant amount of code reduction.
SoundWire driver updates and other cleanups contributed more code
reduction, too.
USB-audio driver received a large cleanup of its big quirk table, and
the old snd_print*() API usages in many legacy drivers are replaced
with the standard print API.
Here are some highlights:
Core:
- More optimized locking in ALSA control code
- Rewrites of memalloc helpers for better DMA API usage
- Drop of obsoleted vmalloc PCM buffer helper API
- Continued MIDI2 UMP updates
- Support of a new user-space driven timer instance
- Update for more PCM support rates and cleanups
- Xrun counter report in the proc files
ASoC:
- Continued simplification and cleanup works for ASoC
- Extensive cleanups and refactoring of the Soundwire drivers
- Removal of Intel machine support obsoleted by the AVS driver
- Lots of DT schema conversions
- Machine support for many AMD and Intel x86 platforms
- Support for AMD ACP 7.1, Mediatek MT6367 and MT8365, Realtek
RTL1320 SoundWire and rev C, and Texas Instruments TAS2563
USB-audio:
- Add support of multiple control interfaces
- A large rewrite of quirk table with macros
- Support for RME Digiface USB
HD-audio:
- Cleanup of quirk code for Samsung Galaxy laptops
- Clean up of detection of Cirrus codecs
- C-Media CM9825 HD-audio codec support
Others:
- Rewrites to standard print API in a lot of legacy drivers"
* tag 'sound-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (410 commits)
ASoC: topology: Fix redundant logical jump
ASoC: tas2781: Add Calibration Kcontrols for Chromebook
ASoC: amd: acp: refactor SoundWire machine driver code
ASoC: sdw_utils/intel: move soundwire endpoint parsing helper functions
ASoC: sdw_util/intel: move soundwire endpoint and dai link structures
ASoC: intel: sof_sdw: rename soundwire parsing helper functions
ASoC: intel: sof_sdw: rename soundwire endpoint and dailink structures
ASoC: atmel: mchp-pdmc: Retain Non-Runtime Controls
ALSA: hda/realtek: Add support for Galaxy Book2 Pro (NP950XEE)
ASoC: mediatek: mt7986-afe-pcm: Remove redundant error message
ALSA: memalloc: Use proper DMA mapping API for x86 S/G buffer allocations
ALSA: memalloc: Use proper DMA mapping API for x86 WC buffer allocations
ALSA: usb-audio: Add logitech Audio profile quirk
ASoc: mediatek: mt8365: Remove unneeded assignment
ASoC: Intel: ARL: Add entry for HDMI-In capture support to non-I2S codec boards.
ASoC: Intel: sof_rt5682: Add HDMI-In capture with rt5682 support for ARL.
ASoC: SOF: Intel: hda: remove common_hdmi_codec_drv
ASoC: Intel: sof_pcm512x: do not check common_hdmi_codec_drv
ASoC: Intel: ehl_rt5660: do not check common_hdmi_codec_drv
ASoC: Intel: skl_hda_dsp_generic: use common module for DAI links
...
This kunit update for Linux 6.12-rc1 consists of:
-- a new int_pow test suite
-- documentation update to clarify filename best practices
-- kernel-doc fix for EXPORT_SYMBOL_IF_KUNIT
-- change to build compile_commands.json automatically instead
of requiring a manual build.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbo3WEACgkQCwJExA0N
Qxz1WxAAj+772NHxsJ4JnPqr/74doKnzKc1jM2V4g/F9Y+BT0tSKs1Cu5CyN9VsT
wvxVPWqYltyhumVm/H6SaUGb0yZ7CzJi/5FuT3p3QFUDidMSu1h9KnlLi79q3cDI
VuFKE8K4DDP0GfyFMpbSPZOGfYQp24FybhxRxreY+7q6uRVAnPh33Q1/Bonv6K6q
5329a0z9wWySgisa93ABmQNpF4UJSYunR2bsdUzZqHgyrTXSyK66fcmVKwbBUaIT
o16P1LBjDcIbfwswFb+xUmWD1IPGk7ulirEq8n69tErI6zKbkv1rojXHsoXuvOEN
a4i+sNyR+a7NVI1h/T8F25pSbegkL0XQs7cmehATqpInmEZNDeGR8PkaGZNXXrFy
kG/z7LlWh8zQUBrTsqOLU/iz4sRVrsPCuLIUzo8MiKpAskmj/7fqw5Cab9jmL5V3
6OLAfCQDrfcH7fM9V5U6Ury2dkcovFuw+ZhFcBuLnspB5z0Cj7Yqz6aDZdJ97qyR
PfZuyBU2ouykhpJ4P/sRJC3Gq1t0b+PoDq3qNdCqz4ETld1jaiDz0e75ypquJWyB
QdVMNJF6W7Nwnmpzp4GY9QZ6dtwOKGZyuvW5J0eleWKiD4gjHZaoupIzqT24fgYi
vdscbcOxMMU3/b9F4qDlgsLSPCLVF4HIXTAK2UdiznLdaxYVHQ0=
=rmqh
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- a new int_pow test suite
- documentation update to clarify filename best practices
- kernel-doc fix for EXPORT_SYMBOL_IF_KUNIT
- change to build compile_commands.json automatically instead of
requiring a manual build
* tag 'linux_kselftest-kunit-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
lib/math: Add int_pow test suite
kunit: tool: Build compile_commands.json
kunit: Fix kernel-doc for EXPORT_SYMBOL_IF_KUNIT
Documentation: KUnit: Update filename best practices
This kselftest update for Linux 6.12-rc1 consists of:
-- test coverage for dup_fd() failure handling in unshare_fd()
-- new selftest for the acct() syscall
-- basic uprobe testcase
-- several small fixes and cleanups to existing tests
-- user and strscpy removal as they became kunit tests
-- fixes to build failures and warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmboxGEACgkQCwJExA0N
Qxzq6xAAl5f8mW8acVT5DESKtKHJRpuA0bhNm+1sRRinBS+lTF42Pwd5BbYbtpZE
wjCxKtyo775HAS2F8pE/afZRZRx08EChE0W4GxEacH0nw5BCUiWM5aHxf+84NEKE
GEQoLlfXnT4F3V+dtwx0eC+kXUDJ0fZT6P+iI29Dj/IZ1WjEYZ1IF6R0PgCaR4RE
LH6d77AYx3HolwMDolDmoyXdpCbeYmhtWR0QzqaMaYLozitd92uN4Iwkf9LPPBXq
O8P8wYcOo/h8x7OVf8bLA1UqxOU09FA/TBb+Vnu9qMDyKgB6S6NXko7cMDVyCtbe
lHnLk2MFyDnCmZqa+sXXtUmDiEgjYSJqmAdP7ue4oFnyKAIoPKwdDutFi5pk+N2p
ZqHdWRAYOliz4ZNn2xaUXKc++u4a3ZcBzel/cNrvtBXrHZTgYFBIoycdIHw/e2mz
KsvjSxlz/DEC+U266C9MgNnp6S1x9nM0qyPmkxOiUwZO996LYcZJ90WF0PKIaI5M
bFDbidAbymkMF9Eh0uMIVzv1L8YTv55qjLdMtHGDBQEnsT5WlUC2HN24sWQUAzGS
RBQn33Uoo+sIO0hh0pujOZuYoV1fGlS9gGCpjs6XOKUiU+F1yLdhOLsoiWDfMXR+
MwemO56tQFlNo/2V9ecbav28RZgItVkq4XFXKMsdPkniNcSS06Q=
=bFgK
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-next-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
- test coverage for dup_fd() failure handling in unshare_fd()
- new selftest for the acct() syscall
- basic uprobe testcase
- several small fixes and cleanups to existing tests
- user and strscpy removal as they became kunit tests
- fixes to build failures and warnings
* tag 'linux_kselftest-next-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
selftests: kselftest: Use strerror() on nolibc
selftests/timers: Remove unused NSEC_PER_SEC macro
selftests:resctrl: Fix build failure on archs without __cpuid_count()
selftests/ftrace: Fix eventfs ownership testcase to find mount point
selftests: filesystems: fix warn_unused_result build warnings
selftests:core: test coverage for dup_fd() failure handling in unshare_fd()
selftests/ftrace: Fix test to handle both old and new kernels
kselftest: timers: Fix const correctness
selftests/ftrace: Add required dependency for kprobe tests
selftests: rust: config: disable GCC_PLUGINS
selftests: rust: config: add trailing newline
tracing/selftests: Run the ownership test twice
selftests/uprobes: Add a basic uprobe testcase
selftests: harness: rename __constructor_order for clarification
selftests: harness: remove unneeded __constructor_order_last()
selftest: acct: Add selftest for the acct() syscall
selftests: lib: remove strscpy test
selftests: user: remove user suite
kselftest: cpufreq: Add RTC wakeup alarm
selftests/exec: Fix grammar in an error message.
...
This nolibc update for Linux 6.12-rc1 consists of:
Highlights
----------
* Clang support (including LTO)
Other Changes
-------------
* stdbool.h support
* argc/argv/envp arguments for constructors
* Small #include ordering fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbovjQACgkQCwJExA0N
QxzUlQ//W82aSMj5UVFtTvIsezeN87JbS1kswzynqlm4bpNfDlwdF1Ui3WrhTWpt
PcRzJtTOq1jQL2snvC7yihbcgEnsKkxgdCVwYlc1RYFd4+baUjZg5409taQHfzo9
kWat4fsCK+Bev5oHlyMXxEysHhd2LqLwheHmqh+yfMNGFHrzlTwkgAXYU4PvJ2mG
IQto22xAuf5Y1S2vLTrz4DbM8/qa2gEk17U9rbXcGDCH0IaTYTBswLDCZAzoB/N5
BuERfa2CjXFvWlun8vSCNkPMKKYR37qPdoRdgGzvque9eUZTfzvbZ4IFE8uGolxn
P03S57KwNPBsq9/8VPKVJDFvrGl/wdNgNdsyKBtJA4yXAi60kma+q5D2UE+aU9fX
qBnkcyv6pUTvnJprVqaEy7w0u42/laDQfiIW9lnQEueThmYvaT028NihrNH3VFNp
nVt26v4JPFXz2uWDk6ZgO6EKmSlBxAAr7AD5vg979XgNyMVZuXzEuh97MTL2yeTZ
s0N49VW95URshjlQdjC1rTI6dV6bSslgbaEYqVofYTYBidZqTfKMVdn4qyn0scL/
5DPe3q7xkgRpeLxHqNbwtrhLBzHR6FYllRlXWuP4hdpNjMYIpIUGpMW8420Dj0KN
0WfMQteQovQwrtqEbOXUiJ853hEwCJVMBWLVOWxMwcOingk/VjQ=
=DswX
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-nolibc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull nolibc updates from Shuah Khan:
"Highlights:
- Clang support (including LTO)
Other Changes:
- stdbool.h support
- argc/argv/envp arguments for constructors
- Small #include ordering fix"
* tag 'linux_kselftest-nolibc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
tools/nolibc: x86_64: use local label in memcpy/memmove
tools/nolibc: stackprotector: mark implicitly used symbols as used
tools/nolibc: crt: mark _start_c() as used
selftests/nolibc: run-tests.sh: allow building through LLVM
selftests/nolibc: use correct clang target for s390/systemz
selftests/nolibc: don't use libgcc when building with clang
selftests/nolibc: run-tests.sh: avoid overwriting CFLAGS_EXTRA
selftests/nolibc: add cc-option compatible with clang cross builds
selftests/nolibc: add support for LLVM= parameter
selftests/nolibc: determine $(srctree) first
selftests/nolibc: avoid passing NULL to printf("%s")
selftests/nolibc: report failure if no testcase passed
tools/nolibc: compiler: use attribute((naked)) if available
tools/nolibc: move entrypoint specifics to compiler.h
tools/nolibc: compiler: introduce __nolibc_has_attribute()
tools/nolibc: powerpc: limit stack-protector workaround to GCC
tools/nolibc: mips: load current function to $t9
tools/nolibc: arm: use clang-compatible asm syntax
tools/nolibc: pass argc, argv and envp to constructors
tools/nolibc: add stdbool.h header
...
- Make LAM enablement safe vs. kernel threads using a process mm
temporarily as switching back to the process would not update CR3 and
therefore not enable LAM causing faults in user space when using tagged
pointers. Cure it by synchronizing LAM enablement via IPIs to all CPUs
which use the related mm.
- Cure a LAM harmless inconsistency between CR3 and the state during
context switch. It's both confusing and prone to lead to real bugs
- Handle alt stack handling for threads which run with a non-zero
protection key. The non-zero key prevents the kernel to access the
alternate stack. Cure it by temporarily enabling all protection keys for
the alternate stack setup/restore operations.
- Provide a EFI config table identity mapping for kexec kernel to prevent
kexec fails because the new kernel cannot access the config table array
- Use GB pages only when a full GB is mapped in the identity map as
otherwise the CPU can speculate into reserved areas after the end of
memory which causes malfunction on UV systems.
- Remove the noisy and pointless SRAT table dump during boot
- Use is_ioremap_addr() for iounmap() address range checks instead of
high_memory. is_ioremap_addr() is more precise.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpPpYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYddD/9HeH5/rpWS3JU4ZVC+huY28uJuwAFW
ER48zniRbmuz8y+dZZ6K8uvqoWB+ro+yNjA9Jhm9nHUzhs7kE5O8+bmkUi6HXViW
6zS6PW95+u80dmSGy1Gna0SU3158OyBf2X61SySJABLLek7WwrR7jakkgrDBVtL5
ILKS/dUwIrUPoVlszCh9uE0Kj6gdFquooE06sif5EIibnhSgSXfr2EbGj0Qq/YYf
FYfpggSSVpTXFSkZSB2VCEqK66jaGUfKzZ6v1DkSioChUCsky2OO6zD9pk0dMixO
a/0XvRUo3OhiXZbj1tPUtxaEBgJdigpsxke7xQSVxSl+DNNuapiybpgAzFM5Xh+m
yFcP66nIpJcHE10vjVR3jSUlTSb2zk+v9d1Ujj10G1h8RHLTfsTCRHgzs7P0/nkE
NJleWstYVRV5rFpPLoY0ryQmjW/PzYokkaqWKI12Lhxg4ojijZso3pS8WfOsk1/B
081tOZERWeGnJEOOJwwYE1wt0Qq8th4S9b2/fz3vk2fsEHIf42s4fKQwy1CxKopb
PyIrgnZyWx6ueX9QaIGIzGV1GsY4FKMgFJVOyVb0D0stMnr1ty2m3993eNs/nCXy
+rHPMwFteLcwiWp/C3hq5IQd7uEvmRt/mYJ5hdvCj5wCIkXI3JtgsXfLSVs3Ln4f
R6HvZehYmbJoNQ==
=VZcR
-----END PGP SIGNATURE-----
Merge tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 memory management updates from Thomas Gleixner:
- Make LAM enablement safe vs. kernel threads using a process mm
temporarily as switching back to the process would not update CR3 and
therefore not enable LAM causing faults in user space when using
tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
all CPUs which use the related mm.
- Cure a LAM harmless inconsistency between CR3 and the state during
context switch. It's both confusing and prone to lead to real bugs
- Handle alt stack handling for threads which run with a non-zero
protection key. The non-zero key prevents the kernel to access the
alternate stack. Cure it by temporarily enabling all protection keys
for the alternate stack setup/restore operations.
- Provide a EFI config table identity mapping for kexec kernel to
prevent kexec fails because the new kernel cannot access the config
table array
- Use GB pages only when a full GB is mapped in the identity map as
otherwise the CPU can speculate into reserved areas after the end of
memory which causes malfunction on UV systems.
- Remove the noisy and pointless SRAT table dump during boot
- Use is_ioremap_addr() for iounmap() address range checks instead of
high_memory. is_ioremap_addr() is more precise.
* tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioremap: Improve iounmap() address range checks
x86/mm: Remove duplicate check from build_cr3()
x86/mm: Remove unused NX related declarations
x86/mm: Remove unused CR3_HW_ASID_BITS
x86/mm: Don't print out SRAT table information
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
x86/kexec: Add EFI config table identity mapping for kexec kernel
selftests/mm: Add new testcases for pkeys
x86/pkeys: Restore altstack access in sigreturn()
x86/pkeys: Update PKRU to enable all pkeys before XSAVE
x86/pkeys: Add helper functions to update PKRU on the sigframe
x86/pkeys: Add PKRU as a parameter in signal handling functions
x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
x86/mm: Fix LAM inconsistency during context switch
x86/mm: Use IPIs to synchronize LAM enablement
- Core:
- Overhaul of posix-timers in preparation of removing the
workaround for periodic timers which have signal delivery
ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep
time since the large rewrite to a non-cascading wheel, but the
extra jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack
for real time tasks and enforce it at the core level instead of
having inconsistent individual checks in various timer setup
functions.
- The usual set of updates and enhancements all over the place.
- Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
F6p9cgoDNP9KFg==
=jE0N
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Core:
- Overhaul of posix-timers in preparation of removing the workaround
for periodic timers which have signal delivery ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep time
since the large rewrite to a non-cascading wheel, but the extra
jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack for
real time tasks and enforce it at the core level instead of having
inconsistent individual checks in various timer setup functions.
- The usual set of updates and enhancements all over the place.
Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers"
* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
ntp: Make sure RTC is synchronized when time goes backwards
treewide: Fix wrong singular form of jiffies in comments
cpu: Use already existing usleep_range()
timers: Rename next_expiry_recalc() to be unique
platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
clocksource/drivers/jcore: Use request_percpu_irq()
clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
clocksource: acpi_pm: Add external callback for suspend/resume
clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
dt-bindings: timer: rockchip: Add rk3576 compatible
timers: Annotate possible non critical data race of next_expiry
timers: Remove historical extra jiffie for timeout in msleep()
hrtimer: Use and report correct timerslack values for realtime tasks
hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
timers: Add sparse annotation for timer_sync_wait_running().
signal: Replace BUG_ON()s
...
Add a test to verify that the SIGURG signal created by an out-of-bound
message in UNIX sockets is well controlled by the file_send_sigiotask
hook.
Test coverage for security/landlock is 92.2% of 1046 lines according to
gcc/gcov-14.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/50daeed4d4f60d71e9564d0f24004a373fc5f7d5.1725657728.git.fahimitahera@gmail.com
[mic: Improve commit message and add test coverage, improve test with
four variants to fully cover the hook, use abstract unix socket to avoid
managing a file, use dedicated variable per process, add comments, avoid
negative ASSERT, move close calls]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Provide tests for the signal scoping. If the signal is 0, no signal
will be sent, but the permission of a process to send a signal will be
checked. Likewise, this test consider one signal for each signal
category: SIGTRAP, SIGURG, SIGHUP, and SIGTSTP.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/15dc202bb7f0a462ddeaa0c1cd630d2a7c6fa5c5.1725657728.git.fahimitahera@gmail.com
[mic: Fix commit message, use dedicated variables per process, properly
close FDs, extend send_sig_to_parent to make sure scoping works as
expected]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Currently, a sandbox process is not restricted to sending a signal (e.g.
SIGKILL) to a process outside the sandbox environment. The ability to
send a signal for a sandboxed process should be scoped the same way
abstract UNIX sockets are scoped. Therefore, we extend the "scoped"
field in a ruleset with LANDLOCK_SCOPE_SIGNAL to specify that a ruleset
will deny sending any signal from within a sandbox process to its parent
(i.e. any parent sandbox or non-sandboxed processes).
This patch adds file_set_fowner and file_free_security hooks to set and
release a pointer to the file owner's domain. This pointer, fown_domain
in landlock_file_security will be used in file_send_sigiotask to check
if the process can send a signal.
The ruleset_with_unknown_scope test is updated to support
LANDLOCK_SCOPE_SIGNAL.
This depends on two new changes:
- commit 1934b21261 ("file: reclaim 24 bytes from f_owner"): replace
container_of(fown, struct file, f_owner) with fown->file .
- commit 26f204380a ("fs: Fix file_set_fowner LSM hook
inconsistencies"): lock before calling the hook.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Closes: https://github.com/landlock-lsm/linux/issues/8
Link: https://lore.kernel.org/r/df2b4f880a2ed3042992689a793ea0951f6798a5.1725657727.git.fahimitahera@gmail.com
[mic: Update landlock_get_current_domain()'s return type, improve and
fix locking in hook_file_set_fowner(), simplify and fix sleepable call
and locking issue in hook_file_send_sigiotask() and rebase on the latest
VFS tree, simplify hook_task_kill() and quickly return when not
sandboxed, improve comments, rename LANDLOCK_SCOPED_SIGNAL]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
A socket can be shared between multiple processes, so it can connect and
send data to them. Provide a test scenario where a sandboxed process
inherits a socket's file descriptor. The process cannot connect or send
data to the inherited socket since the process is scoped.
Test coverage for security/landlock is 92.0% of 1013 lines according to
gcc/gcov-14.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/1428574deec13603b6ab2f2ed68ecbfa3b63bcb3.1725494372.git.fahimitahera@gmail.com
[mic: Remove negative ASSERT, fix potential race condition because of
closed connections, remove useless buffer, add test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Check the specific case where a scoped datagram socket is connected and
send(2) works, whereas sendto(2) is denied if the datagram socket is not
connected.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/c28c9cd8feef67dd25e115c401a2389a75f9983b.1725494372.git.fahimitahera@gmail.com
[mic: Use more EXPECT and avoid negative ASSERT, use variables dedicated
per process, remove useless buffer]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Expand abstract UNIX socket restriction tests by examining different
scenarios for UNIX sockets with pathname or unnamed address formats
connection with scoped domain.
The various_address_sockets tests ensure that UNIX sockets bound to a
filesystem pathname and unnamed sockets created by socketpair can still
connect to a socket outside of their scoped domain, meaning that even if
the domain is scoped with LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET, the
socket can connect to a socket outside the scoped domain.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/a9e8016aaa5846252623b158c8f1ce0d666944f4.1725494372.git.fahimitahera@gmail.com
[mic: Remove useless clang-format tags, fix unlink/rmdir calls, drop
capabilities, rename variables, remove useless mknod/unlink calls, clean
up fixture, test write/read on sockets, test sendto() on datagram
sockets, close sockets as soon as possible]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Add three tests that examine different scenarios for abstract UNIX
socket:
1) scoped_domains: Base tests of the abstract socket scoping mechanism
for a landlocked process, same as the ptrace test.
2) scoped_vs_unscoped: Generates three processes with different domains
and tests if a process with a non-scoped domain can connect to other
processes.
3) outside_socket: Since the socket's creator credentials are used
for scoping sockets, this test examines the cases where the socket's
credentials are different from the process using it.
Move protocol_variant, service_fixture, and sys_gettid() from net_test.c
to common.h, and factor out code into a new set_unix_address() helper.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/9321c3d3bcd9212ceb4b50693e29349f8d625e16.1725494372.git.fahimitahera@gmail.com
[mic: Fix commit message, remove useless clang-format tags, move
drop_caps() calls, move and rename variables, rename variants, use more
EXPECT, improve comments, simplify the outside_socket test]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduce a new "scoped" member to landlock_ruleset_attr that can
specify LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET to restrict connection to
abstract UNIX sockets from a process outside of the socket's domain.
Two hooks are implemented to enforce these restrictions:
unix_stream_connect and unix_may_send.
Closes: https://github.com/landlock-lsm/linux/issues/7
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/5f7ad85243b78427242275b93481cfc7c127764b.1725494372.git.fahimitahera@gmail.com
[mic: Fix commit message formatting, improve documentation, simplify
hook_unix_may_send(), and cosmetic fixes including rename of
LANDLOCK_SCOPED_ABSTRACT_UNIX_SOCKET]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
+6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
TAUJ1sDQAzZ8Fy0=
=HIAO
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Move the LSM framework to static calls
This transitions the vast majority of the LSM callbacks into static
calls. Those callbacks which haven't been converted were left as-is
due to the general ugliness of the changes required to support the
static call conversion; we can revisit those callbacks at a future
date.
- Add the Integrity Policy Enforcement (IPE) LSM
This adds a new LSM, Integrity Policy Enforcement (IPE). There is
plenty of documentation about IPE in this patches, so I'll refrain
from going into too much detail here, but the basic motivation behind
IPE is to provide a mechanism such that administrators can restrict
execution to only those binaries which come from integrity protected
storage, e.g. a dm-verity protected filesystem. You will notice that
IPE requires additional LSM hooks in the initramfs, dm-verity, and
fs-verity code, with the associated patches carrying ACK/review tags
from the associated maintainers. We couldn't find an obvious
maintainer for the initramfs code, but the IPE patchset has been
widely posted over several years.
Both Deven Bowers and Fan Wu have contributed to IPE's development
over the past several years, with Fan Wu agreeing to serve as the IPE
maintainer moving forward. Once IPE is accepted into your tree, I'll
start working with Fan to ensure he has the necessary accounts, keys,
etc. so that he can start submitting IPE pull requests to you
directly during the next merge window.
- Move the lifecycle management of the LSM blobs to the LSM framework
Management of the LSM blobs (the LSM state buffers attached to
various kernel structs, typically via a void pointer named "security"
or similar) has been mixed, some blobs were allocated/managed by
individual LSMs, others were managed by the LSM framework itself.
Starting with this pull we move management of all the LSM blobs,
minus the XFRM blob, into the framework itself, improving consistency
across LSMs, and reducing the amount of duplicated code across LSMs.
Due to some additional work required to migrate the XFRM blob, it has
been left as a todo item for a later date; from a practical
standpoint this omission should have little impact as only SELinux
provides a XFRM LSM implementation.
- Fix problems with the LSM's handling of F_SETOWN
The LSM hook for the fcntl(F_SETOWN) operation had a couple of
problems: it was racy with itself, and it was disconnected from the
associated DAC related logic in such a way that the LSM state could
be updated in cases where the DAC state would not. We fix both of
these problems by moving the security_file_set_fowner() hook into the
same section of code where the DAC attributes are updated. Not only
does this resolve the DAC/LSM synchronization issue, but as that code
block is protected by a lock, it also resolve the race condition.
- Fix potential problems with the security_inode_free() LSM hook
Due to use of RCU to protect inodes and the placement of the LSM hook
associated with freeing the inode, there is a bit of a challenge when
it comes to managing any LSM state associated with an inode. The VFS
folks are not open to relocating the LSM hook so we have to get
creative when it comes to releasing an inode's LSM state.
Traditionally we have used a single LSM callback within the hook that
is triggered when the inode is "marked for death", but not actually
released due to RCU.
Unfortunately, this causes problems for LSMs which want to take an
action when the inode's associated LSM state is actually released; so
we add an additional LSM callback, inode_free_security_rcu(), that is
called when the inode's LSM state is released in the RCU free
callback.
- Refactor two LSM hooks to better fit the LSM return value patterns
The vast majority of the LSM hooks follow the "return 0 on success,
negative values on failure" pattern, however, there are a small
handful that have unique return value behaviors which has caused
confusion in the past and makes it difficult for the BPF verifier to
properly vet BPF LSM programs. This includes patches to
convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.
- Various cleanups and improvements
A handful of patches to remove redundant code, better leverage the
IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
minor style fixups.
* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
security: Update file_set_fowner documentation
fs: Fix file_set_fowner LSM hook inconsistencies
lsm: Use IS_ERR_OR_NULL() helper function
lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
ipe: Remove duplicated include in ipe.c
lsm: replace indirect LSM hook calls with static calls
lsm: count the LSMs enabled at compile time
kernel: Add helper macros for loop unrolling
init/main.c: Initialize early LSMs after arch code, static keys and calls.
MAINTAINERS: add IPE entry with Fan Wu as maintainer
documentation: add IPE documentation
ipe: kunit test for parser
scripts: add boot policy generation program
ipe: enable support for fs-verity as a trust provider
fsverity: expose verified fsverity built-in signatures to LSMs
lsm: add security_inode_setintegrity() hook
ipe: add support for dm-verity as a trust provider
dm-verity: expose root hash digest and signature data to LSMs
block,lsm: add LSM blob and new LSM hooks for block devices
ipe: add permissive toggle
...
Some archs -- arm64 and s390x -- implemented chacha using instructions
that are available most places, but aren't always available. The kernel
handles this just fine, but the selftest does not. Check the hwcaps
before running, and skip the test if the cpu doesn't support it. As
well, on s390x, always emit the fallback instructions of an alternative
block, to ensure maximum compatibility.
Co-developed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Add test case running code interacting with registers within a
ucontrol VM.
* Add uc_gprs test case
The test uses the same VM setup using the fixture and debug macros
introduced in earlier patches in this series.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240807154512.316936-7-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Removed leftover comment line]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20240807154512.316936-7-schlameuss@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc
ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W
IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ=
=9zGL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual pile of misc updates:
Features:
- Add F_CREATED_QUERY fcntl() that allows userspace to query whether
a file was actually created. Often userspace wants to know whether
an O_CREATE request did actually create a file without using
O_EXCL. The current logic is that to first attempts to open the
file without O_CREAT | O_EXCL and if ENOENT is returned userspace
tries again with both flags. If that succeeds all is well. If it
now reports EEXIST it retries.
That works fairly well but some corner cases make this more
involved. If this operates on a dangling symlink the first openat()
without O_CREAT | O_EXCL will return ENOENT but the second openat()
with O_CREAT | O_EXCL will fail with EEXIST.
The reason is that openat() without O_CREAT | O_EXCL follows the
symlink while O_CREAT | O_EXCL doesn't for security reasons. So
it's not something we can really change unless we add an explicit
opt-in via O_FOLLOW which seems really ugly.
All available workarounds are really nasty (fanotify, bpf lsm etc)
so add a simple fcntl().
- Try an opportunistic lookup for O_CREAT. Today, when opening a file
we'll typically do a fast lookup, but if O_CREAT is set, the kernel
always takes the exclusive inode lock. This was likely done with
the expectation that O_CREAT means that we always expect to do the
create, but that's often not the case. Many programs set O_CREAT
even in scenarios where the file already exists (see related
F_CREATED_QUERY patch motivation above).
The series contained in the pr rearranges the pathwalk-for-open
code to also attempt a fast_lookup in certain O_CREAT cases. If a
positive dentry is found, the inode_lock can be avoided altogether
and it can stay in rcuwalk mode for the last step_into.
- Expose the 64 bit mount id via name_to_handle_at()
Now that we provide a unique 64-bit mount ID interface in statx(2),
we can now provide a race-free way for name_to_handle_at(2) to
provide a file handle and corresponding mount without needing to
worry about racing with /proc/mountinfo parsing or having to open a
file just to do statx(2).
While this is not necessary if you are using AT_EMPTY_PATH and
don't care about an extra statx(2) call, users that pass full paths
into name_to_handle_at(2) need to know which mount the file handle
comes from (to make sure they don't try to open_by_handle_at a file
handle from a different filesystem) and switching to AT_EMPTY_PATH
would require allocating a file for every name_to_handle_at(2) call
- Add a per dentry expire timeout to autofs
There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).
Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.
One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as
the current autofs default).
2) "nounmount", don't expire this mount (same as setting the
autofs timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
timeout (again same as setting the autofs timeout but only for
this mount)
To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map
keys (mounts) for autofs indirect mounts use an expire timeout
stored in the autofs mount super block info. structure and all
indirect mounts use the same expire timeout.
Fixes:
- Fix missing fput for FSCONFIG_SET_FD in autofs
- Use param->file for FSCONFIG_SET_FD in coda
- Delete the 'fs/netfs' proc subtreee when netfs module exits
- Make sure that struct uid_gid_map fits into a single cacheline
- Don't flush in-flight wb switches for superblocks without cgroup
writeback
- Correcting the idmapping mount example in the idmapping
documentation
- Fix a race between evice_inodes() and find_inode() and iput()
- Refine the show_inode_state() macro definition in writeback code
- Prevent dump_mapping() from accessing invalid dentry.d_name.name
- Show actual source for debugfs in /proc/mounts
- Annotate data-race of busy_poll_usecs in eventpoll
- Don't WARN for racy path_noexec check in exec code
- Handle OOM on mnt_warn_timestamp_expiry()
- Fix some spelling in the iomap design documentation
- Fix typo in procfs comment
- Fix typo in fs/namespace.c comment
Cleanups:
- Add the VFS git tree to the MAINTAINERS file
- Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode
bit in struct file bringing us to 5 free f_mode bits
- Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify
the wait mechanism
- Remove the unused path_put_init() helper
- Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi
specific
- Replace the unsigned long i_state member with a u32 i_state member
in struct inode freeing up 4 bytes in struct inode. Instead of
using the bit based wait apis we're now using the var event apis
and using the individual bytes of the i_state member to wait on
state changes
- Explain how per-syscall AT_* flags should be allocated
- Use in_group_or_capable() helper to simplify the posix acl mode
update code
- Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code
- Removed comment about d_rcu_to_refcount() as that function doesn't
exist anymore
- Add kernel documentation for lookup_fast()
- Don't re-zero evenpoll fields
- Remove outdated comment after close_fd()
- Fix imprecise wording in comment about the pipe filesystem
- Drop GFP_NOFAIL mode from alloc_page_buffers
- Missing blank line warnings and struct declaration improved in
file_table
- Annotate struct poll_list with __counted_by()
- Remove the unused read parameter in percpu-rwsem
- Remove linux/prefetch.h include from direct-io code
- Use kmemdup_array instead of kmemdup for multiple allocation in
mnt_idmapping code
- Remove unused mnt_cursor_del() declaration
Performance tweaks:
- Dodge smp_mb in break_lease and break_deleg in the common case
- Only read fops once in fops_{get,put}()
- Use RCU in ilookup()
- Elide smp_mb in iversion handling in the common case
- Drop one lock trip in evict()"
* tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits)
uidgid: make sure we fit into one cacheline
proc: Fix typo in the comment
fs/pipe: Correct imprecise wording in comment
fhandle: expose u64 mount id to name_to_handle_at(2)
uapi: explain how per-syscall AT_* flags should be allocated
fs: drop GFP_NOFAIL mode from alloc_page_buffers
writeback: Refine the show_inode_state() macro definition
fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name
mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation
netfs: Delete subtree of 'fs/netfs' when netfs module exits
fs: use LIST_HEAD() to simplify code
inode: make i_state a u32
inode: port __I_LRU_ISOLATING to var event
vfs: fix race between evice_inodes() and find_inode()&iput()
inode: port __I_NEW to var event
inode: port __I_SYNC to var event
fs: reorder i_state bits
fs: add i_state helpers
MAINTAINERS: add the VFS git tree
fs: s/__u32/u32/ for s_fsnotify_mask
...
* New Stage-2 page table dumper, reusing the main ptdump infrastructure
* FP8 support
* Nested virtualization now supports the address translation (FEAT_ATS1A)
family of instructions
* Add selftest checks for a bunch of timer emulation corner cases
* Fix multiple cases where KVM/arm64 doesn't correctly handle the guest
trying to use a GICv3 that wasn't advertised
* Remove REG_HIDDEN_USER from the sysreg infrastructure, making
things little simpler
* Prevent MTE tags being restored by userspace if we are actively
logging writes, as that's a recipe for disaster
* Correct the refcount on a page that is not considered for MTE tag
copying (such as a device)
* When walking a page table to split block mappings, synchronize only
at the end the walk rather than on every store
* Fix boundary check when transfering memory using FFA
* Fix pKVM TLB invalidation, only affecting currently out of tree
code but worth addressing for peace of mind
LoongArch:
* Revert qspinlock to test-and-set simple lock on VM.
* Add Loongson Binary Translation extension support.
* Add PMU support for guest.
* Enable paravirt feature control from VMM.
* Implement function kvm_para_has_feature().
RISC-V:
* Fix sbiret init before forwarding to userspace
* Don't zero-out PMU snapshot area before freeing data
* Allow legacy PMU access from guest
* Fix to allow hpmcounter31 from the guest
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmbmghAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPFQgf+Ijeqlx90BGy96pyzo/NkYKPeEc8G
gKhlm8PdtdZYaRdJ53MVRLLpzbLuzqbwrn0ZX2tvoDRLzuAqTt2GTFoT6e2HtY5B
Sf7KQMFwHWGtGklC1EmZ1fXsCocswpuAcexCLKLRBoWUcKABlgwV3N3vJo5gx/Ag
8XXhYpcLTh+p7bjMdJShQy019pTwEDE68pPVnL2NPzla1G6Qox7ZJIdOEMZXuyJA
MJ4jbFWE/T8vLFUf/8MGQ/+bo+4140kzB8N9wkazNcBRoodY6Hx+Lm1LiZjNudO1
ilIdB4P3Ht+D8UuBv2DO5XTakfJz9T9YsoRcPlwrOWi/8xBRbt236gFB3Q==
=sHTI
-----END PGP SIGNATURE-----
Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"These are the non-x86 changes (mostly ARM, as is usually the case).
The generic and x86 changes will come later"
ARM:
- New Stage-2 page table dumper, reusing the main ptdump
infrastructure
- FP8 support
- Nested virtualization now supports the address translation
(FEAT_ATS1A) family of instructions
- Add selftest checks for a bunch of timer emulation corner cases
- Fix multiple cases where KVM/arm64 doesn't correctly handle the
guest trying to use a GICv3 that wasn't advertised
- Remove REG_HIDDEN_USER from the sysreg infrastructure, making
things little simpler
- Prevent MTE tags being restored by userspace if we are actively
logging writes, as that's a recipe for disaster
- Correct the refcount on a page that is not considered for MTE tag
copying (such as a device)
- When walking a page table to split block mappings, synchronize only
at the end the walk rather than on every store
- Fix boundary check when transfering memory using FFA
- Fix pKVM TLB invalidation, only affecting currently out of tree
code but worth addressing for peace of mind
LoongArch:
- Revert qspinlock to test-and-set simple lock on VM.
- Add Loongson Binary Translation extension support.
- Add PMU support for guest.
- Enable paravirt feature control from VMM.
- Implement function kvm_para_has_feature().
RISC-V:
- Fix sbiret init before forwarding to userspace
- Don't zero-out PMU snapshot area before freeing data
- Allow legacy PMU access from guest
- Fix to allow hpmcounter31 from the guest"
* tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (64 commits)
LoongArch: KVM: Implement function kvm_para_has_feature()
LoongArch: KVM: Enable paravirt feature control from VMM
LoongArch: KVM: Add PMU support for guest
KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier
KVM: arm64: Simplify visibility handling of AArch32 SPSR_*
KVM: arm64: Simplify handling of CNTKCTL_EL12
LoongArch: KVM: Add vm migration support for LBT registers
LoongArch: KVM: Add Binary Translation extension support
LoongArch: KVM: Add VM feature detection function
LoongArch: Revert qspinlock to test-and-set simple lock on VM
KVM: arm64: Register ptdump with debugfs on guest creation
arm64: ptdump: Don't override the level when operating on the stage-2 tables
arm64: ptdump: Use the ptdump description from a local context
arm64: ptdump: Expose the attribute parsing functionality
KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer
KVM: arm64: Move pagetable definitions to common header
KVM: arm64: nv: Add support for FEAT_ATS1A
KVM: arm64: nv: Plumb handling of AT S1* traps from EL2
KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3
KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration
...
ACPI:
* Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
* Ensure arm64-specific IORT header is covered by MAINTAINERS.
CPU Errata:
* Enable workaround for hardware access/dirty issue on Ampere-1A cores.
Memory management:
* Define PHYSMEM_END to fix a crash in the amdgpu driver.
* Avoid tripping over invalid kernel mappings on the kexec() path.
* Userspace support for the Permission Overlay Extension (POE) using
protection keys.
Perf and PMUs:
* Add support for the "fixed instruction counter" extension in the CPU
PMU architecture.
* Extend and fix the event encodings for Apple's M1 CPU PMU.
* Allow LSM hooks to decide on SPE permissions for physical profiling.
* Add support for the CMN S3 and NI-700 PMUs.
Confidential Computing:
* Add support for booting an arm64 kernel as a protected guest under
Android's "Protected KVM" (pKVM) hypervisor.
Selftests:
* Fix vector length issues in the SVE/SME sigreturn tests
* Fix build warning in the ptrace tests.
Timers:
* Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
non-determinism arising from the architected counter.
Miscellaneous:
* Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
don't succeed.
* Minor fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
+zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
=6osL
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights are support for Arm's "Permission Overlay Extension"
using memory protection keys, support for running as a protected guest
on Android as well as perf support for a bunch of new interconnect
PMUs.
Summary:
ACPI:
- Enable PMCG erratum workaround for HiSilicon HIP10 and 11
platforms.
- Ensure arm64-specific IORT header is covered by MAINTAINERS.
CPU Errata:
- Enable workaround for hardware access/dirty issue on Ampere-1A
cores.
Memory management:
- Define PHYSMEM_END to fix a crash in the amdgpu driver.
- Avoid tripping over invalid kernel mappings on the kexec() path.
- Userspace support for the Permission Overlay Extension (POE) using
protection keys.
Perf and PMUs:
- Add support for the "fixed instruction counter" extension in the
CPU PMU architecture.
- Extend and fix the event encodings for Apple's M1 CPU PMU.
- Allow LSM hooks to decide on SPE permissions for physical
profiling.
- Add support for the CMN S3 and NI-700 PMUs.
Confidential Computing:
- Add support for booting an arm64 kernel as a protected guest under
Android's "Protected KVM" (pKVM) hypervisor.
Selftests:
- Fix vector length issues in the SVE/SME sigreturn tests
- Fix build warning in the ptrace tests.
Timers:
- Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
non-determinism arising from the architected counter.
Miscellaneous:
- Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
don't succeed.
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
perf: arm-ni: Fix an NULL vs IS_ERR() bug
arm64: hibernate: Fix warning for cast from restricted gfp_t
arm64: esr: Define ESR_ELx_EC_* constants as UL
arm64: pkeys: remove redundant WARN
perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
MAINTAINERS: List Arm interconnect PMUs as supported
perf: Add driver for Arm NI-700 interconnect PMU
dt-bindings/perf: Add Arm NI-700 PMU
perf/arm-cmn: Improve format attr printing
perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
arm64/mm: use lm_alias() with addresses passed to memblock_free()
mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
arm64: Expose the end of the linear map in PHYSMEM_END
arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
arm64/mm: Delete __init region from memblock.reserved
perf/arm-cmn: Support CMN S3
dt-bindings: perf: arm-cmn: Add CMN S3
perf/arm-cmn: Refactor DTC PMU register access
perf/arm-cmn: Make cycle counts less surprising
perf/arm-cmn: Improve build-time assertion
...
configurations and scenarios where the mitigations code is irrelevant
- Other small fixlets and improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbfDhUACgkQEsHwGGHe
VUrF9A//UkVKmIihXXak0GPqFhu8XrWeYlmwLxWe/uIy2hZCLp9L7n4pg0Ikxqz3
9D9hYk+Jykfu/jsv0sR6LH6OAUTlJi+P0w3x3VeL1sgFPUkwFtOaN2v/t5H3SW5r
l+VQpdUXPmLH6QbhvT84U6L/OQYr2cjhiYro47uwM9vO/SNao4HcbC/pdBr2dwxM
KzzA9sEDg3Le391phIhEOIogA1lPNV7KMScg2VjPTqQzEJ3NQVzyYmqjPO70sN9F
sAuksdF+rnPjc9K/W+qUcvlp8e9lDB8g0oPlyoOeubjXsnZU5YchriPdBbyAl0dJ
bjpftXIrBj8Vtmh7Tc0Jx2tlMFXNT5FrzcqdD4sviLnhrKEJSkwAoFgIMp5A+tN8
Kl8MrlABO8I8+zGRQB7TzhwaCC4AxCqUS3UEcYd4CBf5AWqT5i12ijbtIxPtdpG4
5itngIV4HT8casudpC8i8OTjOTggorMa7Pu/bQULhnZwagH8chlBdoOlKKQVkeVG
FUi+L/BljL9mASic7NRZI11tk44m9xWWkbbJOPlZaGJw9YzGrxD0YOfhbgcc9iaX
SOUMVJEhJVJMBISGiBUQDB6r51ee6B8RKJ3ByxzpAbwsUR9cXyfSYfUyE5reQJy9
3luj/iorL3guYU6EGEAtvbuTLGbKqybrV6zOB/QRXHWyhtUgrUA=
=GFld
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hw mitigation updates from Borislav Petkov:
- Add CONFIG_ option for every hw CPU mitigation. The intent is to
support configurations and scenarios where the mitigations code is
irrelevant
- Other small fixlets and improvements
* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Fix handling when SRSO mitigation is disabled
x86/bugs: Add missing NO_SSB flag
Documentation/srso: Document a method for checking safe RET operates properly
x86/bugs: Add a separate config for GDS
x86/bugs: Remove GDS Force Kconfig option
x86/bugs: Add a separate config for SSB
x86/bugs: Add a separate config for Spectre V2
x86/bugs: Add a separate config for SRBDS
x86/bugs: Add a separate config for Spectre v1
x86/bugs: Add a separate config for RETBLEED
x86/bugs: Add a separate config for L1TF
x86/bugs: Add a separate config for MMIO Stable Data
x86/bugs: Add a separate config for TAA
x86/bugs: Add a separate config for MDS
LoongArch KVM changes for v6.12
1. Revert qspinlock to test-and-set simple lock on VM.
2. Add Loongson Binary Translation extension support.
3. Add PMU support for guest.
4. Enable paravirt feature control from VMM.
5. Implement function kvm_para_has_feature().
Test that locally generated traffic from a socket that specifies a DS
Field using the IP_TOS / IPV6_TCLASS socket options is correctly
redirected using a FIB rule that matches on DSCP. Add negative tests to
verify that the rule is not it when it should not. Test with both IPv4
and IPv6 and with both TCP and UDP sockets.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240911093748.3662015-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests for the new FIB rule DSCP selector. Test with both IPv4 and
IPv6 and with both input and output routes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240911093748.3662015-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test which attempts to call bpf_check_mtu() and writes the MTU
into .rodata section of the BPF program, and for comparison this adds
test cases also for .bss and .data section again. The bpf_check_mtu()
is a bit more special in that the passed mtu argument is read and
written by the helper (instead of just written to). Assert that writes
into .rodata remain rejected by the verifier.
# ./vmtest.sh -- ./test_progs -t verifier_const
[...]
./test_progs -t verifier_const
[ 1.657367] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.657773] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#473/1 verifier_const/rodata/strtol: write rejected:OK
#473/2 verifier_const/bss/strtol: write accepted:OK
#473/3 verifier_const/data/strtol: write accepted:OK
#473/4 verifier_const/rodata/mtu: write rejected:OK
#473/5 verifier_const/bss/mtu: write accepted:OK
#473/6 verifier_const/data/mtu: write accepted:OK
#473 verifier_const:OK
[...]
Summary: 2/10 PASSED, 0 SKIPPED, 0 FAILED
For comparison, without the MEM_UNINIT on bpf_check_mtu's proto:
# ./vmtest.sh -- ./test_progs -t verifier_const
[...]
#473/3 verifier_const/data/strtol: write accepted:OK
run_subtest:PASS:obj_open_mem 0 nsec
run_subtest:FAIL:unexpected_load_success unexpected success: 0
#473/4 verifier_const/rodata/mtu: write rejected:FAIL
#473/5 verifier_const/bss/mtu: write accepted:OK
#473/6 verifier_const/data/mtu: write accepted:OK
#473 verifier_const:FAIL
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240913191754.13290-9-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The assumption of 'in privileged mode reads from uninitialized stack locations
are permitted' is not quite correct since the verifier was probing for read
access rather than write access. Both tests need to be annotated as __success
for privileged and unprivileged.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240913191754.13290-6-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Subtests are added to exercise the patched code which handles
- LLONG_MIN/-1
- INT_MIN/-1
- LLONG_MIN%-1
- INT_MIN%-1
where -1 could be an immediate or in a register.
Without the previous patch, all these cases will crash the kernel on
x86_64 platform.
Additional tests are added to use small values (e.g. -5/-1, 5%-1, etc.)
in order to exercise the additional logic with patched insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240913150332.1188102-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Provide the s390 specific vdso getrandom() architecture backend.
_vdso_rng_data required data is placed within the _vdso_data vvar page,
by using a hardcoded offset larger than vdso_data.
As required the chacha20 implementation does not write to the stack.
The implementation follows more or less the arm64 implementations and
makes use of vector instructions. It has a fallback to the getrandom()
system call for machines where the vector facility is not installed.
The check if the vector facility is installed, as well as an
optimization for machines with the vector-enhancements facility 2, is
implemented with alternatives, avoiding runtime checks.
Note that __kernel_getrandom() is implemented without the vdso user
wrapper which would setup a stack frame for odd cases (aka very old
glibc variants) where the caller has not done that. All callers of
__kernel_getrandom() are required to setup a stack frame, like the C ABI
requires it.
The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.
Benchmark on a z16:
$ ./vdso_test_getrandom bench-single
vdso: 25000000 times in 0.493703559 seconds
syscall: 25000000 times in 6.584025337 seconds
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Running vdso_test_correctness on s390x (aka s390 64 bit) emits a warning:
Warning: failed to find clock_gettime64 in vDSO
This is caused by the "#elif defined (__s390__)" check in vdso_config.h
which the defines VDSO_32BIT.
If __s390x__ is defined also __s390__ is defined. Therefore the correct
check must make sure that only __s390__ is defined.
Therefore add the missing !defined(__s390x__). Also use common
__s390x__ define instead of __s390X__.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The vDSO self tests fail on s390x for a vDSO linked with the GNU linker
ld as follows:
# ./vdso_test_gettimeofday
Floating point exception (core dumped)
On s390x the ELF hash table entries are 64 bits instead of 32 bits in
size (see Glibc sysdeps/unix/sysv/linux/s390/bits/elfclass.h).
Fixes: 40723419f4 ("kselftest: Enable vDSO test on non x86 platforms")
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
To be consistent with other VDSO functions, the function is called
__kernel_getrandom()
__arch_chacha20_blocks_nostack() fonction is implemented basically
with 32 bits operations. It performs 4 QUARTERROUND operations in
parallele. There are enough registers to avoid using the stack:
On input:
r3: output bytes
r4: 32-byte key input
r5: 8-byte counter input/output
r6: number of 64-byte blocks to write to output
During operation:
stack: pointer to counter (r5) and non-volatile registers (r14-131)
r0: counter of blocks (initialised with r6)
r4: Value '4' after key has been read, used for indexing
r5-r12: key
r14-r15: block counter
r16-r31: chacha state
At the end:
r0, r6-r12: Zeroised
r5, r14-r31: Restored
Performance on powerpc 885 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 62.938002291 seconds
libc: 25000000 times in 535.581916866 seconds
syscall: 25000000 times in 531.525042806 seconds
Performance on powerpc 8321 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 16.899318858 seconds
libc: 25000000 times in 131.050596522 seconds
syscall: 25000000 times in 129.794790389 seconds
This first patch adds support for VDSO32. As selftests cannot easily
be generated only for VDSO32, and because the following patch brings
support for VDSO64 anyway, this patch opts out all code in
__arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
fail to compile and will not crash on PPC64/PPC64LE, allthough the
selftest itself will fail.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
It's not correct to use $(top_srcdir) for generated header files, for
builds that are done out of tree via O=, and $(objtree) isn't valid in
the selftests context. Instead, just obviate the need for these
generated header files by defining empty stubs in tools/include, which
is the same thing that's done for rwlock.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Hook up the generic vDSO implementation to the aarch64 vDSO data page.
The _vdso_rng_data required data is placed within the _vdso_data vvar
page, by using a offset larger than the vdso_data.
The vDSO function requires a ChaCha20 implementation that does not write
to the stack, and that can do an entire ChaCha20 permutation. The one
provided uses NEON on the permute operation, with a fallback to the
syscall for chips that do not support AdvSIMD.
This also passes the vdso_test_chacha test along with
vdso_test_getrandom. The vdso_test_getrandom bench-single result on
Neoverse-N1 shows:
vdso: 25000000 times in 0.783884250 seconds
libc: 25000000 times in 8.780275399 seconds
syscall: 25000000 times in 8.786581518 seconds
A small fixup to arch/arm64/include/asm/mman.h was required to avoid
pulling kernel code into the vDSO, similar to what's already done in
arch/arm64/include/asm/rwonce.h.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The chacha vDSO selftest doesn't check the way the counter is handled by
__arch_chacha20_blocks_nostack(). It indirectly checks that the counter
is writen on exit and read back on new entry, but it doesn't check that
the format is correct. When implementing this function on powerpc, I
missed a case where the counter was writen and read in wrong byte order.
Also, the counter uses two words, but the tests with a zero counter and
uses a small amount of blocks, so at the end the upper part of the
counter is always 0, so it is not checked.
Add a verification of counter's content in addition to the verification
of the output.
Also add two tests where the counter crosses the u32 upper limit. The
first test verifies that the function properly writes back the upper
word, the second test verifies that the function properly reads back the
upper word.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Without -O2, the generated code for testing chacha function is awful.
GCC even implements rol32() as a function of 20 instructions instead of
just using the rotlwi instruction.
~# time ./vdso_test_chacha
TAP version 13
1..1
ok 1 chacha: PASS
real 0m 37.16s
user 0m 36.89s
sys 0m 0.26s
Several other selftests directory add -O2, and the kernel is also
always built with optimisation active. Do the same for vDSO selftests.
With this patch the time is reduced by approximately 15%.
~# time ./vdso_test_chacha
TAP version 13
1..1
ok 1 chacha: PASS
real 0m 32.09s
user 0m 31.86s
sys 0m 0.22s
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Unlike the check for the standalone x86 test, the check for building the
vDSO getrandom and chacaha tests looks at the architecture for the host
rather than the architecture for the target when deciding if they should
be built. Since the chacha test includes some assembler code this means
that cross building with x86 as either the target or host is broken.
There's also some additional complications, where ARCH can legitimately
be either x86_64 or x86, but the source code we need to compile lives in
a directory path containing arch/x86. The standard SRCARCH variable
handles that. And actually, all these variables and proper substitutions
are already described in tools/scripts/Makefile.arch, so just include
that to handle it.
Similarly, ARCH=x86 can actually describe ARCH=x86_64,
just with CONFIG_64BIT, so we can't rely on ARCH for selecting
non-32-bit tests. For that, check against $(ARCH)$(CONFIG_X86_32). This
won't help for people manually running this inside the vDSO selftest
directory (which isn't really supported anyway and has problems on
various archs), but it should work for builds of the kselftests, where
the CONFIG_* variables are defined. On x86_64 machines,
$(ARCH)$(CONFIG_X86_32) will evaluate to x86. On arm64 machines, it will
evaluate to arm64. On 32-bit x86 machines, it will evaluate to x86y,
which won't match the filter list.
Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Linking to libsodium makes building this test annoying in cross
compilation environments and is just way too much. Since this is just a
basic correctness test, simply open code a simple, unoptimized, dumb
chacha, rather than linking to libsodium.
This also fixes a correctness issue on big endian systems. The kernel's
random.c doesn't bother doing a le32_to_cpu operation on the random
bytes that are passed as the key, and consequently neither does
vgetrandom-chacha.S. However, libsodium's chacha _does_ do this, since
it takes the key as an array of bytes. This meant that the test was
broken on big endian systems, which this commit rectifies.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Having the prototype for __arch_chacha20_blocks_nostack in
arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large
doc comment were cloned by every architecture, which has been causing
unnecessary churn. Instead move it into include/vdso/getrandom.h, where
it can be shared by all archs implementing it.
As a side bonus, this then lets us use that prototype in the
vdso_test_chacha self test, to ensure that it matches the source, and
indeed doing so turned up some inconsistencies, which are rectified
here.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
After verifying that vDSO getrandom does work, which ensures that the
RNG is initialized, test to see if it also works inside of a time
namespace. This is important to test, because the vvar pages get
swizzled around there. If the arch code isn't careful, the RNG will
appear uninitialized inside of a time namespace.
Because broken code makes the RNG appear uninitialized, test that
everything works by issuing a call to vgetrandom from a fork in a time
namespace, and use ptrace to ensure that the actual syscall getrandom
doesn't get called. If it doesn't get called, then the test succeeds.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZuH9UQAKCRDbK58LschI
g0/zAP99WOcCBp1M/jSTUOba230+eiol7l5RirDEA6wu7TqY2QEAuvMG0KfCCpTI
I0WqStrK1QMbhwKPodJC1k+17jArKgw=
=jfMU
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-09-11
We've added 12 non-merge commits during the last 16 day(s) which contain
a total of 20 files changed, 228 insertions(+), 30 deletions(-).
There's a minor merge conflict in drivers/net/netkit.c:
00d066a4d4 ("netdev_features: convert NETIF_F_LLTX to dev->lltx")
d966087948 ("netkit: Disable netpoll support")
The main changes are:
1) Enable bpf_dynptr_from_skb for tp_btf such that this can be used
to easily parse skbs in BPF programs attached to tracepoints,
from Philo Lu.
2) Add a cond_resched() point in BPF's sock_hash_free() as there have
been several syzbot soft lockup reports recently, from Eric Dumazet.
3) Fix xsk_buff_can_alloc() to account for queue_empty_descs which
got noticed when zero copy ice driver started to use it,
from Maciej Fijalkowski.
4) Move the xdp:xdp_cpumap_kthread tracepoint before cpumap pushes skbs
up via netif_receive_skb_list() to better measure latencies,
from Daniel Xu.
5) Follow-up to disable netpoll support from netkit, from Daniel Borkmann.
6) Improve xsk selftests to not assume a fixed MAX_SKB_FRAGS of 17 but
instead gather the actual value via /proc/sys/net/core/max_skb_frags,
also from Maciej Fijalkowski.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
sock_map: Add a cond_resched() in sock_hash_free()
selftests/bpf: Expand skb dynptr selftests for tp_btf
bpf: Allow bpf_dynptr_from_skb() for tp_btf
tcp: Use skb__nullable in trace_tcp_send_reset
selftests/bpf: Add test for __nullable suffix in tp_btf
bpf: Support __nullable argument suffix for tp_btf
bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcv
selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knob
xsk: Bump xsk_queue::queue_empty_descs in xp_can_alloc()
tcp_bpf: Remove an unused parameter for bpf_tcp_ingress()
bpf, sockmap: Correct spelling skmsg.c
netkit: Disable netpoll support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://patch.msgid.link/20240911211525.13834-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Same import process as previous tests.
Also add CONFIG_NET_SCH_FQ to config, as one test uses that.
Same test process as previous tests. Both with and without debug mode.
Recording the steps once:
make mrproper
vng --build \
--config tools/testing/selftests/net/packetdrill/config \
--config kernel/configs/debug.config
vng -v --run . --user root --cpus 4 -- \
make -C tools/testing/selftests TARGETS=net/packetdrill run_tests
Link: https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#how-to-build
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240912005317.1253001-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Same as initial tests, import verbatim from
github.com/google/packetdrill, aside from:
- update `source ./defaults.sh` path to adjust for flat dir
- add SPDX headers
- remove author statements if any
- drop blank lines at EOF (new)
Also import set_sysctls.py, which many scripts depend on to set
sysctls and then restore them later. This is no longer strictly needed
for namespacified sysctl. But not all sysctls are namespacified, and
doesn't hurt if they are.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240912005317.1253001-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts (sort of) and no adjacent changes.
This merge reverts commit b3c9e65eb2 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*think* such fix will not land in the next week.
Including fixes from netfilter.
Current release - regressions:
- core: tighten bad gso csum offset check in virtio_net_hdr
- netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()
- eth: ice: stop calling pci_disable_device() as we use pcim
- eth: fou: fix null-ptr-deref in GRO.
Current release - new code bugs:
- hsr: prevent NULL pointer dereference in hsr_proxy_announce()
Previous releases - regressions:
- hsr: remove seqnr_lock
- netfilter: nft_socket: fix sk refcount leaks
- mptcp: pm: fix uaf in __timer_delete_sync
- phy: dp83822: fix NULL pointer dereference on DP83825 devices
- eth: revert "virtio_net: rx enable premapped mode by default"
- eth: octeontx2-af: Modify SMQ flush sequence to drop packets
Previous releases - always broken:
- eth: mlx5: fix bridge mode operations when there are no VFs
- eth: igb: Always call igb_xdp_ring_update_tail() under Tx lock
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmbi8isSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkzNsQALiaGTGqOcFwVwchlWOfAuheiDeKtxM6
LihsxBFFi+s7p5p75yL+ko3mpxz8ZxO3joPNevh+wBy7cOXgEuikPbeNokUsHjeG
ofmz2B9+CHpf8PL0PgE+oyAi8kTZyn81oDrVLereBJqT50hKXjWbbip/s8niTJaY
tXsYiJPZgvTFdkkJjV6INrHRWAse/tXP1o+KqIkbuEw8aerlxjOaZ1dubtuzcj3C
rAwXNU9r6ojqmQLovfUx3Gw/RsMwAGxra0Ni9AdUxiKtqKD7r0dnfNvjUI76HWcf
S62zMwjKS+DZ/cqiTnsqTIK7HN4gQ/R+W+C7RWlVFhu5CQoX7zL+4pkpbJn3ULEM
mjFKnBBPoU0ET5hMOXY79iAkGIDiQSpMz5fNi85S0drPLG6ooJxIBBLgFdDspG2f
7TASuV5Dne3Toh9YMm4mZtgWZTCR85yF8Dzy8kA6/bTNtvVQmMMQh7RYW10SrB1I
ntDYGQcoxzPVzU42gWaDwa5ubDz6XTxLM6vsweK9mbjm9U1R1GEd6cx2cGvt88oD
gIgLcrP7szImDzdASb0Ce3kEdKc/g0xMften10MOjPFHJGcehvawwgvRJK8Oz720
g6Xa+WBwfGF5QrljWVk7V9sGiSK1ssAut81VBO+lBCEFwI8iMzjbjTC9kzCvO6nj
ZECYg5JZa0XR
=tbcX
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
There is a recently notified BT regression with no fix yet. I do not
think a fix will land in the next week.
Current release - regressions:
- core: tighten bad gso csum offset check in virtio_net_hdr
- netfilter: move nf flowtable bpf initialization in
nf_flow_table_module_init()
- eth: ice: stop calling pci_disable_device() as we use pcim
- eth: fou: fix null-ptr-deref in GRO.
Current release - new code bugs:
- hsr: prevent NULL pointer dereference in hsr_proxy_announce()
Previous releases - regressions:
- hsr: remove seqnr_lock
- netfilter: nft_socket: fix sk refcount leaks
- mptcp: pm: fix uaf in __timer_delete_sync
- phy: dp83822: fix NULL pointer dereference on DP83825 devices
- eth: revert "virtio_net: rx enable premapped mode by default"
- eth: octeontx2-af: Modify SMQ flush sequence to drop packets
Previous releases - always broken:
- eth: mlx5: fix bridge mode operations when there are no VFs
- eth: igb: Always call igb_xdp_ring_update_tail() under Tx lock"
* tag 'net-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()
net: tighten bad gso csum offset check in virtio_net_hdr
netlink: specs: mptcp: fix port endianness
net: dpaa: Pad packets to ETH_ZLEN
mptcp: pm: Fix uaf in __timer_delete_sync
net: libwx: fix number of Rx and Tx descriptors
net: dsa: felix: ignore pending status of TAS module when it's disabled
net: hsr: prevent NULL pointer dereference in hsr_proxy_announce()
selftests: mptcp: include net_helper.sh file
selftests: mptcp: include lib.sh file
selftests: mptcp: join: restrict fullmesh endp on 1st sf
netfilter: nft_socket: make cgroupsv2 matching work with namespaces
netfilter: nft_socket: fix sk refcount leaks
MAINTAINERS: Add ethtool pse-pd to PSE NETWORK DRIVER
dt-bindings: net: tja11xx: fix the broken binding
selftests: net: csum: Fix checksums for packets with non-zero padding
net: phy: dp83822: Fix NULL pointer dereference on DP83825 devices
virtio_net: disable premapped mode by default
Revert "virtio_net: big mode skip the unmap check"
Revert "virtio_net: rx remove premapped failover code"
...
compile_commands.json is used by clangd[1] to provide code navigation
and completion functionality to editors. See [2] for an example
configuration that includes this functionality for VSCode.
It can currently be built manually when using kunit.py, by running:
./scripts/clang-tools/gen_compile_commands.py -d .kunit
With this change however, it's built automatically so you don't need to
manually keep it up to date.
Unlike the manual approach, having make build the compile_commands.json
means that it appears in the build output tree instead of at the root of
the source tree, so you'll need to add --compile-commands-dir=.kunit to
your clangd args for it to be found. This might turn out to be pretty
annoying, I'm not sure yet. If so maybe we can later add some hackery to
kunit.py to work around it.
[1] https://clangd.llvm.org/
[2] https://github.com/FlorentRevest/linux-kernel-vscode
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Create a new 'struct cxl_mailbox' and move all mailbox related bits to
it. This allows isolation of all CXL mailbox data in order to export
some of the calls to external kernel callers and avoid exporting of CXL
driver specific bits such has device states. The allocation of
'struct cxl_mailbox' is also split out with cxl_mailbox_init() so the
mailbox can be created independently.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20240905223711.1990186-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
* for-next/selftests:
kselftest/arm64: Fix build warnings for ptrace
kselftest/arm64: Actually test SME vector length changes via sigreturn
kselftest/arm64: signal: fix/refactor SVE vector length enumeration
ncdevmem is a devmem TCP netcat. It works similarly to netcat, but it
sends and receives data using the devmem TCP APIs. It uses udmabuf as
the dmabuf provider. It is compatible with a regular netcat running on
a peer, or a ncdevmem running on a peer.
In addition to normal netcat support, ncdevmem has a validation mode,
where it sends a specific pattern and validates this pattern on the
receiver side to ensure data integrity.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240910171458.219195-13-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to the previous commit, the net_helper.sh file from the parent
directory is used by the MPTCP selftests and it needs to be present when
running the tests.
This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:
make -C tools/testing/selftests \
TARGETS=net/mptcp \
install INSTALL_PATH=$KSFT_INSTALL_PATH
cd $KSFT_INSTALL_PATH
./run_kselftest.sh -c net/mptcp
Fixes: 1af3bc912e ("selftests: mptcp: lib: use wait_local_port_listen helper")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The lib.sh file from the parent directory is used by the MPTCP selftests
and it needs to be present when running the tests.
This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:
make -C tools/testing/selftests \
TARGETS=net/mptcp \
install INSTALL_PATH=$KSFT_INSTALL_PATH
cd $KSFT_INSTALL_PATH
./run_kselftest.sh -c net/mptcp
Fixes: f265d3119a ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new endpoint using the IP of the initial subflow has been recently
added to increase the code coverage. But it breaks the test when using
old kernels not having commit 86e39e0448 ("mptcp: keep track of local
endpoint still available for each msk"), e.g. on v5.15.
Similar to commit d4c81bbb86 ("selftests: mptcp: join: support local
endpoint being tracked or not"), it is possible to add the new endpoint
conditionally, by checking if "mptcp_pm_subflow_check_next" is present
in kallsyms: this is not directly linked to the commit introducing this
symbol but for the parent one which is linked anyway. So we can know in
advance what will be the expected behaviour, and add the new endpoint
only when it makes sense to do so.
Fixes: 4878f9f842 ("selftests: mptcp: join: validate fullmesh endp on 1st sf")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This test case checks the errno message when percpu map value size
exceeds PCPU_MIN_UNIT_SIZE.
root@debian:~# ./test_maps
...
test_map_percpu_stats_hash_of_maps:PASS
test_map_percpu_stats_map_value_size:PASS
test_sk_storage_map:PASS
Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910144111.1464912-3-chen.dylane@gmail.com
llvm change [1] made a change such that __sync_fetch_and_{and,or,xor}()
will generate atomic_fetch_*() insns even if the return value is not used.
This is a deliberate choice to make sure barrier semantics are preserved
from source code to asm insn.
But the change in [1] caused arena_atomics selftest failure.
test_arena_atomics:PASS:arena atomics skeleton open 0 nsec
libbpf: prog 'and': BPF program load failed: Permission denied
libbpf: prog 'and': -- BEGIN PROG LOAD LOG --
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
0: R1=ctx() R10=fp0
; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87
0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4)
2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,v
ar_off=(0x0; 0xffffffff))
3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar()
4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x)
; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91
6: (18) r1 = 0x100000000060 ; R1_w=scalar()
8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena
9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000
11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2)
BPF_ATOMIC stores into R1 arena is not allowed
processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'and': failed to load: -13
libbpf: failed to load object 'arena_atomics'
libbpf: failed to load BPF skeleton 'arena_atomics': -13
test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13)
#3 arena_atomics:FAIL
The reason of the failure is due to [2] where atomic{64,}_fetch_{and,or,xor}() are not
allowed by arena addresses.
Version 2 of the patch fixed the issue by using inline asm ([3]). But further discussion
suggested to find a way from source to generate locked insn which is more user
friendly. So in not-merged llvm patch ([4]), if relax memory ordering is used and
the return value is not used, locked insn could be generated.
So with llvm patch [4] to compile the bpf selftest, the following code
__c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed);
is able to generate locked insn, hence fixing the selftest failure.
[1] https://github.com/llvm/llvm-project/pull/106494
[2] d503a04f8b ("bpf: Add support for certain atomics in bpf_arena to x86 JIT")
[3] https://lore.kernel.org/bpf/20240803025928.4184433-1-yonghong.song@linux.dev/
[4] https://github.com/llvm/llvm-project/pull/107343
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240909223431.1666305-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new set of tests validating behavior of capturing stack traces
with build ID. We extend uprobe_multi target binary with ability to
trigger uprobe (so that we can capture stack traces from it), but also
we allow to force build ID data to be either resident or non-resident in
memory (see also a comment about quirks of MADV_PAGEOUT).
That way we can validate that in non-sleepable context we won't get
build ID (as expected), but with sleepable uprobes we will get that
build ID regardless of it being physically present in memory.
Also, we add a small add-on linker script which reorders
.note.gnu.build-id section and puts it after (big) .text section,
putting build ID data outside of the very first page of ELF file. This
will test all the relaxations we did in build ID parsing logic in kernel
thanks to freader abstraction.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240829174232.3133883-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Handle the case where the meta-page content is bigger than the system
page-size. This prepares the ground for extending features covered by
the meta-page.
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/20240910162335.2993310-3-vdonnefort@google.com
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Improve the ring-buffer meta-page test coverage by checking for the
entire padding region to be 0 instead of just looking at the first 4
bytes.
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/20240910162335.2993310-2-vdonnefort@google.com
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Add 3 test cases for skb dynptr used in tp_btf:
- test_dynptr_skb_tp_btf: use skb dynptr in tp_btf and make sure it is
read-only.
- skb_invalid_ctx_fentry/skb_invalid_ctx_fexit: bpf_dynptr_from_skb
should fail in fentry/fexit.
In test_dynptr_skb_tp_btf, to trigger the tracepoint in kfree_skb,
test_pkt_access is used for its test_run, as in kfree_skb.c. Because the
test process is different from others, a new setup type is defined,
i.e., SETUP_SKB_PROG_TP.
The result is like:
$ ./test_progs -t 'dynptr/test_dynptr_skb_tp_btf'
#84/14 dynptr/test_dynptr_skb_tp_btf:OK
#84 dynptr:OK
#127 kfunc_dynptr_param:OK
Summary: 2/1 PASSED, 0 SKIPPED, 0 FAILED
$ ./test_progs -t 'dynptr/skb_invalid_ctx_f'
#84/85 dynptr/skb_invalid_ctx_fentry:OK
#84/86 dynptr/skb_invalid_ctx_fexit:OK
#84 dynptr:OK
#127 kfunc_dynptr_param:OK
Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED
Also fix two coding style nits (change spaces to tabs).
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240911033719.91468-6-lulie@linux.alibaba.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Nolibc gained an implementation of strerror() recently.
Use it and drop the ifndef.
Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Acked-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, xskxceiver assumes that MAX_SKB_FRAGS value is always 17
which is not true - since the introduction of BIG TCP this can now take
any value between 17 to 45 via CONFIG_MAX_SKB_FRAGS.
Adjust the TOO_MANY_FRAGS test case to read the currently configured
MAX_SKB_FRAGS value by reading it from /proc/sys/net/core/max_skb_frags.
If running system does not provide that sysctl file then let us try
running the test with a default value.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20240910124129.289874-1-maciej.fijalkowski@intel.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmbf6xAACgkQrB3Eaf9P
W7eZQA/9HuHTWBg0V43QDT1rjNnKult+uBKYpKrh045outqMs+cU8bsww5ZuIAKx
ktN66OCE67d7XeFttb9UAJUPqQ98RjwjVUOpjRJ5iRDtj2bmn/5VGSYuH7zx5so0
msFs5gkomo2ZZNjcMOSrDVGUoCdlHh1og5L2KN/FgztSA1smDdUBQOWNm1peezbI
eJFt2Q6KCNfzwPthmQte0dmDnK5gWPducereSx03tMuSyUmPML1zrzOFXBXSg09e
dAlDTxbAXZDrXS4Ii0y/FEM2Ugkjg9FXbE1kvM0i05GIc/SGnEBGEcdW5YbmRhOL
4JlLnpiLTmKTaIZ0GdpADv7XZMga6R01AalSPsJz+H7aNAHTKkK+SzQY4YXRucZy
SsASM39oRLzo9Bm4ZZ773Nw83cxBgO/ZixK4KVvCZI/1ftD+9zn72eqk+CeveSeE
ChaXGuWpRdfAOsgozFJNFx/ffK5qzxFKkIeN9KN0QYV/XJuZJ7nD6eQkH9ydgvTI
4cexY+cs4wgfdi9dDkVHPVhCR7mRlfi5r/VL8rtWWnWzR07okKF4rW6dgvx33m60
9MmF1/EdD2uh3CLcBMjNg6qXdC07VeDpFLqWs+utJvSHMuI43uE4FkRQui/J6T9N
RX7zzkFBsPvPpm5GHLx2u/wvnzX1co1Rk9xzbC+J6FEPlm2/0vI=
=ErGl
-----END PGP SIGNATURE-----
Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test a few possible cases where we use SOF_TIMESTAMPING_OPT_RX_FILTER
with software or hardware report/generation flag.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Padding is not included in UDP and TCP checksums. Therefore, reduce the
length of the checksummed data to include only the data in the IP
payload. This fixes spurious reported checksum failures like
rx: pkt: sport=33000 len=26 csum=0xc850 verify=0xf9fe
pkt: bad csum
Technically it is possible for there to be trailing bytes after the UDP
data but before the Ethernet padding (e.g. if sizeof(ip) + sizeof(udp) +
udp.len < ip.len). However, we don't generate such packets.
Fixes: 91a7de8560 ("selftests/net: add csum offload test")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240906210743.627413-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In x86's debug_regs test, change the RDMSR(MISC_ENABLES) in the single-step
testcase to a WRMSR(TSC_DEADLINE) in order to verify that KVM honors
KVM_GUESTDBG_SINGLESTEP when handling a fastpath VM-Exit.
Note, the extra coverage is effectively Intel-only, as KVM only handles
TSC_DEADLINE in the fastpath when the timer is emulated via the hypervisor
timer, a.k.a. the VMX preemption timer.
Link: https://lore.kernel.org/r/20240830044448.130449-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
Florian recently added support for packetdrill tests in nf_conntrack,
in commit a8a388c2aa ("selftests: netfilter: add packetdrill based
conntrack tests").
This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.
Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.
Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.
The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.
Implementation notes:
- restore alphabetical order when adding the new directory to
tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
except for
- update `source ./defaults.sh` path (to adjust for flat dir)
- add SPDX headers
- remove one author statement
- Acknowledgment: drop an e (checkpatch)
Tested:
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
run_tests
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
install INSTALL_PATH=$KSFT_INSTALL_PATH
# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt
Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support testcases that are themselves not executable, but need an
interpreter to run them.
If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run
./ksft_runner.sh ./$BASENAME_TEST
Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.
Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.
Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is nice to have a visual alignment in the test output to present the
different results, but it makes less sense in the TAP output that is
there for computers.
It sounds then better to remove the duplicated whitespaces in the TAP
output, also because it can cause some issues with TAP parsers expecting
only one space around the directive delimiter (#).
While at it, change the variable name (result_msg) to something more
explicit.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-5-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Just to slightly improve the precision of the duration of the first
test.
In mptcp_join.sh, the last append_prev_results is now done as soon as
the last test is over: this will add the last result in the list, and
get a more precise time for this last test.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-3-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is now added by the MPTCP lib automatically, see the parent commit.
The time in the TAP output might be slightly different from the one
displayed before, but that's OK.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240906-net-next-mptcp-ksft-subtest-time-v2-2-31d5ee4f3bdf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When I was trying to modify the tx timestamping feature, I found that
running "./txtimestamp -4 -C -L 127.0.0.1" didn't reflect the error:
I succeeded to generate timestamp stored in the skb but later failed
to report it to the userspace (which means failed to put css into cmsg).
It can happen when someone writes buggy codes in __sock_recv_timestamp(),
for example.
After adding the check so that running ./txtimestamp will reflect the
result correctly like this if there is a bug in the reporting phase:
protocol: TCP
payload: 10
server port: 9000
family: INET
test SND
USR: 1725458477 s 667997 us (seq=0, len=0)
Failed to report timestamps
USR: 1725458477 s 718128 us (seq=0, len=0)
Failed to report timestamps
USR: 1725458477 s 768273 us (seq=0, len=0)
Failed to report timestamps
USR: 1725458477 s 818416 us (seq=0, len=0)
Failed to report timestamps
...
In the future, it will help us detect whether the new coming patch has
bugs or not.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905160035.62407-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was recently observed at [1] that during the folio unmapping stage of
migration, when the PTEs are cleared, a racing thread faulting on that
folio may increase the refcount of the folio, sleep on the folio lock (the
migration path has the lock), and migration ultimately fails when
asserting the actual refcount against the expected. Thereby, the
migration selftest fails on shared-anon mappings. The above enforces the
fact that migration is a best-effort service, therefore, it is wrong to
fail the test for just a single failure; hence, fail the test after 100
consecutive failures (where 100 is still a subjective choice). Note that,
this has no effect on the execution time of the test since that is
controlled by a timeout.
[1] https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/
Link: https://lkml.kernel.org/r/20240830051609.4037834-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a THP is split, any subpage that is zero-filled will be mapped to the
shared zeropage, hence saving memory. Add selftest to verify this by
allocating zero-filled THP and comparing RssAnon before and after split.
Link: https://lkml.kernel.org/r/20240830100438.3623486-4-usamaarif642@gmail.com
Signed-off-by: Alexander Zhu <alexlzhu@fb.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuang Zhai <zhais@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Shuang Zhai <szhai2@cs.rochester.edu>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When "arg#%d expected pointer to ctx, but got %s" error is printed, both
template parts actually point to the type of the argument, therefore, it
will also say "but got PTR", regardless of what was the actual register
type.
Fix the message to print the register type in the second part of the
template, change the existing test to adapt to the new format, and add a
new test to test the case when arg is a pointer to context, but reg is a
scalar.
Fixes: 00b85860fe ("bpf: Rewrite kfunc argument handling")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20240909133909.1315460-1-maxim@isovalent.com
commit ce17ad0d54 ("cxl: Wait Memory_Info_Valid before access memory
related info") added another implementation, which is
cxl_dvsec_mem_range_valid(), of waiting for memory_info_valid without
realizing it duplicated wait_for_valid(). Remove wait_for_valid() and
retain cxl_dvsec_mem_range_valid() as the former is hardcoded to check
only the Memory_Info_Valid bit of DVSEC range 1, while the latter allows
for selection between DVSEC range 1 or 2 via parameter.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20240828084231.1378789-3-yanfei.xu@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Currently exec-target.c is linked statically with libc, which on Fedora
at least requires installing an additional package (glibc-static).
If that package is not installed the build fails with:
CC exec_target
/usr/bin/ld: cannot find -lc: No such file or directory
collect2: error: ld returned 1 exit status
All exec_target.c does is call sys_exit, which can be done easily enough
using inline assembly, and removes the requirement for a static libc to
be installed.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240812094152.418586-1-maddy@linux.ibm.com
* A revert for the mmap() change that ties the allocation range to the
hint adress, as what we tried to do ended up regressing on other
userspace workloads.
* A fix to avoid a kernel memory leak when emulating misaligned accesses
from userspace.
* A Kconfig fix for toolchain vector detection, which now correctly
detects vector support on toolchains where the V extension depends on
the M extension.
* A fix to avoid failing the linear mapping bootmem bounds check on
NOMMU systems.
* A fix for early alternatives on relocatable kernels.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmbbD44THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiRR/EACqW46mbTmGrbDzbk2YcKbkc05djuB2
+yorDaO6d188xmHM74zEvt1+X+Mxj18pMm+V02L+27JA7asv+JugXQVwfxtZ769w
/XMKGrJTUCMSvFpsbhszbse3vXjc1F9uQ5wNa9o44MHAc2twSkJHtdhZJwkJJ9ru
Od0m99VXWB1gbA1hvCpQBs2uMSzLoU5X2//AaAzVFK1pyskZ7HPqFX16eFcT0gpA
GDNYIKLPVF1pcwS2gkQM7LAwveCsxuEdnLufJs5Coz9BZ/kQJPd3sK/z8Z58ghy2
Db6XXtcYJs64Ndjv1MSowb4rIii/BN2vlMCCT95xHH+tuJR6flXuIZQPpI971V/A
XOCglNQQkmzjJuFKn1/9ZJcVZGITOqDX37iMPW/3bQ/OFG0emBeGqYXKMmScI6f1
TtqiByz2VXNEJBNkQVA37Cj42DVmRg3MCjwy0ACLbqBpMeSbGK7MRNUk258wOp4V
ucmhf50D3a0w8y/3miaAH1Pk+tZz/rtVFkdbibDW3M91cOfdNoAYKhSJPEEnhaGm
pVTvW+usKDdim3nqqTrlZTfFTNF7wFkvoDc11lStgYFK8VoZWuyoBcf1LQ2+ghv9
qP/A5LRnWU4nXCxZG6dKRoZ/VvoGtsKdI6Iatnak4cAbsvXI+7foelgLgWY6aFzk
/ZUtSmWDz1E21Q==
=xYax
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A revert for the mmap() change that ties the allocation range to the
hint adress, as what we tried to do ended up regressing on other
userspace workloads.
- A fix to avoid a kernel memory leak when emulating misaligned
accesses from userspace.
- A Kconfig fix for toolchain vector detection, which now correctly
detects vector support on toolchains where the V extension depends on
the M extension.
- A fix to avoid failing the linear mapping bootmem bounds check on
NOMMU systems.
- A fix for early alternatives on relocatable kernels.
* tag 'riscv-for-linus-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix RISCV_ALTERNATIVE_EARLY
riscv: Do not restrict memory size because of linear mapping on nommu
riscv: Fix toolchain vector detection
riscv: misaligned: Restrict user access to kernel memory
riscv: mm: Do not restrict mmap address based on hint
riscv: selftests: Remove mmap hint address checks
Revert "RISC-V: mm: Document mmap changes"
By reading the code, I found the macro NSEC_PER_SEC
is never referenced in the code. Just remove it.
Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: John Stultz <jstultz@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When resctrl is built on architectures without __cpuid_count()
support, build fails. resctrl uses __cpuid_count() defined in
kselftest.h.
Even though the problem is seen while building resctrl on aarch64,
this error can be seen on any platform that doesn't support CPUID.
CPUID is a x86/x86-64 feature and code paths with CPUID asm commands
will fail to build on all other architectures.
All others tests call __cpuid_count() do so from x86/x86_64 code paths
when _i386__ or __x86_64__ are defined. resctrl is an exception.
Fix the problem by defining __cpuid_count() only when __i386__ or
__x86_64__ are defined in kselftest.h and changing resctrl to call
__cpuid_count() only when __i386__ or __x86_64__ are defined.
In file included from resctrl.h:24,
from cat_test.c:11:
In function ‘arch_supports_noncont_cat’,
inlined from ‘noncont_cat_run_test’ at cat_test.c:326:6:
../kselftest.h:74:9: error: impossible constraint in ‘asm’
74 | __asm__ __volatile__ ("cpuid\n\t" \
| ^~~~~~~
cat_test.c:304:17: note: in expansion of macro ‘__cpuid_count’
304 | __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
| ^~~~~~~~~~~~~
../kselftest.h:74:9: error: impossible constraint in ‘asm’
74 | __asm__ __volatile__ ("cpuid\n\t" \
| ^~~~~~~
cat_test.c:306:17: note: in expansion of macro ‘__cpuid_count’
306 | __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
Fixes: ae638551ab ("selftests/resctrl: Add non-contiguous CBMs CAT test")
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Closes: https://lore.kernel.org/lkml/20240809071059.265914-1-usama.anjum@collabora.com/
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbaYFMACgkQ6rmadz2v
bTq7JBAAipwHeOL3IYproQxGy+f0W3Uik9FNlavSQ3zpJHmTJcpf0ysXkqH23g2q
26CF0R44gmGMkdbZsxbk3HLI2qRmzxmznYCDH0g7d9qwzQMhFHIiY7TW7UD/XbKx
UHdHLb5PYrj+j94T1WGiQdvbZYDlpmdz5rFA9K/TBtBArqYp9mA4D/cIlTDBfFpk
cjhSGVl9x/BKbiHKApxSGcR7Fh/+ux9mVdlssWQNhRfm3V2tbRSAw1i1/ydTG+4c
bf/m0RSIDfPMxy1i7D0lNRbclzWVisTqNzDXHfQoRUJMuMDfsK4UZB/6gvh+2LKy
D60vT8AfN5ygjJbLdFbwFGnEymjfsXWguyqfQB0d9Hj/2/EsZ01rI2ikJv9J+qKl
wwZM3YeA3Q/V0mZ5wCONp2dn+s+82nga+fdvCRFz6SLkWQwgbW5BYHFF1c60V9MH
Pbd9Y5VfCOEZRzR6RxbmguPrnoU1+BUwQeIAp9L73bllrzhtmh/aL/b03uw8/wUh
I+peLxJ+DVp6wTudgvSMviMySWcztuz397G7TnFyG0V4nKe1+QxSaQWWw2HKvpy3
i+m98qoWqbuJqz49FpEtX6x/17gZZNA0LK648D77nrOfsGWOLTKOZUDbNWbTPw9a
Gojg5obJ8P82yO9UCYQLyGsAJxJrKZv3OEmqy0mRG1hrSMsozxg=
=5Quw
-----END PGP SIGNATURE-----
Merge tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix crash when btf_parse_base() returns an error (Martin Lau)
- Fix out of bounds access in btf_name_valid_section() (Jeongjun Park)
* tag 'bpf-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add a selftest to check for incorrect names
bpf: add check for invalid name in btf_name_valid_section()
bpf: Fix a crash when btf_parse_base() returns an error pointer
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during mcp251x_open
- eth: r8152: fix the firmware communication error due to use
of bulk write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmbaMbsACgkQMUZtbf5S
IrtpUg/+J6rNaZuGVTHJQAjdSlMx/HzpN3GIbhYyUSg+iHNclqtxJ706b2vyrG88
Rw5a+f8aQueONNsoFfa/ooU4cGsdO1oYlch0Wtuj5taCqy2SvtVqJAyiuDyNNjU0
BQ1Rf7aRLI7enmEpZJN2FFu106YTVccBcLqhPkx0CPcEjV+p5RvypfeQL72H6ZKx
+7/HzEl4bagHIQW3W1uJGNUdwNP7fP2/Kg7TrTJ1t629nLiJCxKL7LrsmebO5o9a
v+2NxAa1eujTZ1k7ITcM0wYlxKOaGNFF4sT+dA+GfMe+SFssdhGeZYyv1t0zm5VI
3apJ/pSHza1a/hXFa6PaOSw5M5LWn4bJOqeZLl/yIV0upu5xadWqmT0gVay8V9lY
+x/MURGr3seuNRSMsaToHDIq+Us45Dt/qkDDNO/P+9R/BsJKCW05Pfqx3Mr/OHzv
eeCPbXRh4YYBdrUicBWo04gSD+BUA53vW8FC3pxU5ieLOOcX4kOPeb8wNPHcXjMU
73D+kyO1ufsfsFMkd3VfgDI1mMz+xpEuZ6pxs33tJ/1Ny7DdG1Q49xlQVh4Wnobk
uQqUSzdoelOROeg1rwmsIbfwIvj5a5dVIyBu8TDjHlb/rk1QNTkyu+fFmQRWEotL
fQ7U62wXlpoCT8WchSMtiU32IDJ2+Lwhwecguy1Z7kLOLrtL8XU=
=ju86
-----END PGP SIGNATURE-----
Merge tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, bluetooth and wireless.
No known regressions at this point. Another calm week, but chances are
that has more to do with vacation season than the quality of our work.
Current release - new code bugs:
- smc: prevent NULL pointer dereference in txopt_get
- eth: ti: am65-cpsw: number of XDP-related fixes
Previous releases - regressions:
- Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over
BREDR/LE", it breaks existing user space
- Bluetooth: qca: if memdump doesn't work, re-enable IBS to avoid
later problems with suspend
- can: mcp251x: fix deadlock if an interrupt occurs during
mcp251x_open
- eth: r8152: fix the firmware communication error due to use of bulk
write
- ptp: ocp: fix serial port information export
- eth: igb: fix not clearing TimeSync interrupts for 82580
- Revert "wifi: ath11k: support hibernation", fix suspend on Lenovo
Previous releases - always broken:
- eth: intel: fix crashes and bugs when reconfiguration and resets
happening in parallel
- wifi: ath11k: fix NULL dereference in ath11k_mac_get_eirp_power()
Misc:
- docs: netdev: document guidance on cleanup.h"
* tag 'net-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
ila: call nf_unregister_net_hooks() sooner
tools/net/ynl: fix cli.py --subscribe feature
MAINTAINERS: fix ptp ocp driver maintainers address
selftests: net: enable bind tests
net: dsa: vsc73xx: fix possible subblocks range of CAPT block
sched: sch_cake: fix bulk flow accounting logic for host fairness
docs: netdev: document guidance on cleanup.h
net: xilinx: axienet: Fix race in axienet_stop
net: bridge: br_fdb_external_learn_add(): always set EXT_LEARN
r8152: fix the firmware doesn't work
fou: Fix null-ptr-deref in GRO.
bareudp: Fix device stats updates.
net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanup
bpf, net: Fix a potential race in do_sock_getsockopt()
net: dqs: Do not use extern for unused dql_group
sch/netem: fix use after free in netem_dequeue
usbnet: modern method to get random MAC
MAINTAINERS: wifi: cw1200: add net-cw1200.h
ice: do not bring the VSI up, if it was down before the XDP setup
ice: remove ICE_CFG_BUSY locking from AF_XDP code
...
bind_wildcard is compiled but not run, bind_timewait is not compiled.
These two tests complete in a very short time, use the test harness
properly, and seem reasonable to enable.
The author of the tests confirmed via email that these were
intended to be run.
Enable these two tests.
Fixes: 13715acf8a ("selftest: Add test for bind() conflicts.")
Fixes: 2c042e8e54 ("tcp: Add selftest for bind() and TIME_WAIT.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/5a009b26cf5fb1ad1512d89c61b37e2fac702323.1725430322.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds DENYLIST.riscv64 file for riscv64. It will help BPF CI
and local vmtest to mask failing and unsupported test cases.
We can use the following command to use deny list in local vmtest as
previously mentioned by Manu.
PLATFORM=riscv64 CROSS_COMPILE=riscv64-linux-gnu- vmtest.sh \
-l ./libbpf-vmtest-rootfs-2024.08.30-noble-riscv64.tar.zst -- \
./test_progs -d \
\"$(cat tools/testing/selftests/bpf/DENYLIST.riscv64 \
| cut -d'#' -f1 \
| sed -e 's/^[[:space:]]*//' \
-e 's/[[:space:]]*$//' \
| tr -s '\n' ','\
)\"
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-9-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add support cross platform testing for vmtest. The variable $ARCH in the
current script is platform semantics, not kernel semantics. Rename it to
$PLATFORM so that we can easily use $ARCH in cross-compilation. And drop
`set -u` unbound variable check as we will use CROSS_COMPILE env
variable. For now, Using PLATFORM= and CROSS_COMPILE= options will
enable cross platform testing:
PLATFORM=<platform> CROSS_COMPILE=<toolchain> vmtest.sh -- ./test_progs
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-7-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Support vmtest to use local rootfs image generated by [0] that is
consistent with BPF CI. Now we can specify the local rootfs image
through the `-l` parameter like as follows:
vmtest.sh -l ./libbpf-vmtest-rootfs-2024.08.22-noble-amd64.tar.zst -- ./test_progs
Meanwhile, some descriptions have been flushed.
Link: https://github.com/libbpf/ci/blob/main/rootfs/mkrootfs_debian.sh [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-6-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The URLS array is only valid in the download_rootfs function and does
not need to be parsed globally in advance. At the same time, the logic
of loading rootfs is refactored to prepare vmtest for supporting local
rootfs.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-5-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It is not always convenient to have LLVM libraries installed inside CI
rootfs images, thus request static libraries from llvm-config.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240905081401.1894789-4-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Recently, when compiling bpf selftests on RV64, the following
compilation failure occurred:
progs/bpf_dctcp.c:29:21: error: redefinition of 'fallback' as different kind of symbol
29 | volatile const char fallback[TCP_CA_NAME_MAX];
| ^
/workspace/tools/testing/selftests/bpf/tools/include/vmlinux.h:86812:15: note: previous definition is here
86812 | typedef u32 (*fallback)(u32, const unsigned char *, size_t);
The reason is that the `fallback` symbol has been defined in
arch/riscv/lib/crc32.c, which will cause symbol conflicts when vmlinux.h
is included in bpf_dctcp. Let we rename `fallback` string to
`fallback_cc` in bpf_dctcp to fix this compilation failure.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-3-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The $(let ...) function is only supported by GNU Make version 4.4 [0]
and above, otherwise the following exception file or directory will be
generated:
tools/testing/selftests/bpfFEATURE-DUMP.selftests
tools/testing/selftests/bpffeature/
Considering that the GNU Make version of most Linux distributions is
lower than 4.4, let us adapt the corresponding logic to it.
Link: https://lists.gnu.org/archive/html/info-gnu/2022-10/msg00008.html [0]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240905081401.1894789-2-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The idea is to run same test as for test_pid_filter_process, but instead
of standard fork-ed process we create the process with clone(CLONE_VM..)
to make sure the thread leader process filter works properly in this case.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-5-jolsa@kernel.org
The idea is to create and monitor 3 uprobes, each trigered in separate
process and make sure the bpf program gets executed just for the proper
PID specified via pid filter.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240905115124.1503998-4-jolsa@kernel.org
With uprobe_unregister() having grown a synchronize_srcu(), it becomes
fairly slow to call. Esp. since both users of this API call it in a
loop.
Peel off the sync_srcu() and do it once, after the loop.
We also need to add uprobe_unregister_sync() into uprobe_register()'s
error handling path, as we need to be careful about returning to the
caller before we have a guarantee that partially attached consumer won't
be called anymore. This is an unlikely slow path and this should be
totally fine to be slow in the case of a failed attach.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20240903174603.3554182-6-andrii@kernel.org
Filter out nodes that have one of its ancestors disabled as they aren't
expected to probe.
This removes the following false-positive failures on the
sc7180-trogdor-lazor-limozeen-nots-r5 platform:
/soc@0/geniqup@8c0000/i2c@894000/proximity@28
/soc@0/geniqup@ac0000/spi@a90000/ec@0
/soc@0/remoteproc@62400000/glink-edge/apr
/soc@0/remoteproc@62400000/glink-edge/apr/service@3
/soc@0/remoteproc@62400000/glink-edge/apr/service@4
/soc@0/remoteproc@62400000/glink-edge/apr/service@4/clock-controller
/soc@0/remoteproc@62400000/glink-edge/apr/service@4/dais
/soc@0/remoteproc@62400000/glink-edge/apr/service@7
/soc@0/remoteproc@62400000/glink-edge/apr/service@7/dais
/soc@0/remoteproc@62400000/glink-edge/apr/service@8
/soc@0/remoteproc@62400000/glink-edge/apr/service@8/routing
/soc@0/remoteproc@62400000/glink-edge/fastrpc
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@3
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@4
/soc@0/remoteproc@62400000/glink-edge/fastrpc/compute-cb@5
/soc@0/spmi@c440000/pmic@0/pon@800/pwrkey
Fixes: 14571ab1ad ("kselftest: Add new test for detecting unprobed Devicetree devices")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20240729-dt-kselftest-parent-disabled-v2-1-d7a001c4930d@collabora.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Considering that CO-RE direct read access to the first system call
argument is already available on s390 and arm64, let's enable
test_bpf_syscall_macro:syscall_arg1 on these architectures.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240831041934.1629216-4-pulehui@huaweicloud.com
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240904014441.1065753-1-nichen@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The core part of the selftest, i.e., the je <-> jmp cycle, mimics the
original sched-ext bpf program. The test will fail without the
previous patch.
I tried to create some cases for other potential cycles
(je <-> je, jmp <-> je and jmp <-> jmp) with similar pattern
to the test in this patch, but failed. So this patch
only contains one test for je <-> jmp cycle.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221256.37389-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull bpf/master to receive baebe9aaba ("bpf: allow passing struct
bpf_iter_<type> as kfunc arguments") and related changes in preparation for
the DSQ iterator patchset.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix eventfs ownership testcase to find mount point if stat -c "%m" failed.
This can happen on the system based on busybox. In this case, this will
try to use the current working directory, which should be a tracefs top
directory (and eventfs is mounted as a part of tracefs.)
If it does not work, the test is skipped as UNRESOLVED because of
the environmental problem.
Fixes: ee9793be08 ("tracing/selftests: Add ownership modification tests for eventfs")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add sched_ext_ops operations to init/exit cgroups, and track task migrations
and config changes. A BPF scheduler may not implement or implement only
subset of cgroup features. The implemented features can be indicated using
%SCX_OPS_HAS_CGOUP_* flags. If cgroup configuration makes use of features
that are not implemented, a warning is triggered.
While a BPF scheduler is being enabled and disabled, relevant cgroup
operations are locked out using scx_cgroup_rwsem. This avoids situations
like task prep taking place while the task is being moved across cgroups,
making things easier for BPF schedulers.
v7: - cgroup interface file visibility toggling is dropped in favor just
warning messages. Dynamically changing interface visiblity caused more
confusion than helping.
v6: - Updated to reflect the removal of SCX_KF_SLEEPABLE.
- Updated to use CONFIG_GROUP_SCHED_WEIGHT and fixes for
!CONFIG_FAIR_GROUP_SCHED && CONFIG_EXT_GROUP_SCHED.
v5: - Flipped the locking order between scx_cgroup_rwsem and
cpus_read_lock() to avoid locking order conflict w/ cpuset. Better
documentation around locking.
- sched_move_task() takes an early exit if the source and destination
are identical. This triggered the warning in scx_cgroup_can_attach()
as it left p->scx.cgrp_moving_from uncleared. Updated the cgroup
migration path so that ops.cgroup_prep_move() is skipped for identity
migrations so that its invocations always match ops.cgroup_move()
one-to-one.
v4: - Example schedulers moved into their own patches.
- Fix build failure when !CONFIG_CGROUP_SCHED, reported by Andrea Righi.
v3: - Make scx_example_pair switch all tasks by default.
- Convert to BPF inline iterators.
- scx_bpf_task_cgroup() is added to determine the current cgroup from
CPU controller's POV. This allows BPF schedulers to accurately track
CPU cgroup membership.
- scx_example_flatcg added. This demonstrates flattened hierarchy
implementation of CPU cgroup control and shows significant performance
improvement when cgroups which are nested multiple levels are under
competition.
v2: - Build fixes for different CONFIG combinations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrea Righi <andrea.righi@canonical.com>
Add selftest for cases where btf_name_valid_section() does not properly
check for certain types of names.
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Link: https://lore.kernel.org/r/20240831054742.364585-1-aha310510@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Some distros have grub2 config files with the lines
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
else
menuentry_id_option=""
fi
which match the skip regex defined for grub2 in get_grub_index():
$skip = '^\s*menuentry';
These false positives cause the grub number to be higher than it
should be, and the wrong kernel can end up booting.
Grub documents the menuentry command with whitespace between it and the
title, so make the skip regex reflect this.
Link: https://lore.kernel.org/20240904175530.84175-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: John 'Warthog9' Hawley (Tenstorrent) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If a warning happens at build, give a warning at the end:
Build time: 1 minute 40 seconds
Install time: 17 seconds
Reboot time: 25 seconds
*** WARNING found in build: 1 ***
*******************************************
*******************************************
KTEST RESULT: TEST 1 SUCCESS!!!! **
*******************************************
*******************************************
This way, even if the test isn't made to fail on warnings during the
build, a message is still displayed that warnings were found.
Link: https://lore.kernel.org/<20240819172028.3a7fae09@gandalf.local.home>
Acked-by: John 'Warthog9' Hawley (Tenstorrent) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the PROCMAP_QUERY is not defined, a compilation error occurs due to the
mismatch of the procmap_query()'s params, procmap_query() only be called in
the file where the function is defined, modify the params so they can match.
We get a warning when build samples/bpf:
trace_helpers.c:252:5: warning: no previous prototype for ‘procmap_query’ [-Wmissing-prototypes]
252 | int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
| ^~~~~~~~~~~~~
As this function is only used in the file, mark it as 'static'.
Fixes: 4e9e07603e ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available")
Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Link: https://lore.kernel.org/r/20240903012839.3178-1-chenyuan_fl@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the SO_PEEK_OFF selftest for UDP. In this patch, I mainly do
three things:
1. rename tcp_so_peek_off.c
2. adjust for UDP protocol
3. add selftests into it
Suggested-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that we get signal context for POR_EL0 if and only if POE is present
on the system.
Copied from the TPIDR2 test.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-30-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Teach the signal frame parsing about the new POE frame, avoids warning when it
is generated.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-29-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Check that when POE is enabled, the POR_EL0 register is accessible.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-28-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The encoding of the pkey register differs on arm64, than on x86/ppc. On those
platforms, a bit in the register is used to disable permissions, for arm64, a
bit enabled in the register indicates that the permission is allowed.
This drops two asserts of the form:
assert(read_pkey_reg() <= orig_pkey_reg);
Because on arm64 this doesn't hold, due to the encoding.
The pkey must be reset to both access allow and write allow in the signal
handler. pkey_access_allow() works currently for PowerPC as the
PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE have overlapping bits set.
Access to the uc_mcontext is abstracted, as arm64 has a different structure.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-27-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
arm64's fpregs are not at a constant offset from sigcontext. Since this is
not an important part of the test, don't print the fpregs pointer on arm64.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20240822151113.1479789-26-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Put this function in the header so that it can be used by other tests, without
needing to link to testcases.c.
This will be used by selftest/mm/protection_keys.c
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240822151113.1479789-25-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Some test scripts are missing executable permissions. It causes warnings
that make the test output unnecessarily verbose. Add executable
permissions.
Link: https://lkml.kernel.org/r/20240827030336.7930-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Python-based tests creates __pycache__/ directory. Remove it with 'make
clean' by defining it as EXTRA_CLEAN.
Link: https://lkml.kernel.org/r/20240827030336.7930-3-sj@kernel.org
Fixes: b5906f5f73 ("selftests/damon: add a test for update_schemes_tried_regions sysfs command")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "misc fixups for DAMON {self,kunit} tests".
This patchset is for minor fixups of DAMON selftests and kunit tests.
First three patches make DAMON selftests more cleanly maintained (patches
1 and 2) without unnecessary warnings (patch 3). Following six patches
remove unnecessary test case (patch 4), handle configs combinations that
can make tests fail (patches 5-7), reorganize the test files following the
new guideline (patch 8), and add reference kunitconfig for DAMON kunit
tests (patch 9).
This patch (of 9):
DAMON selftests build access_memory_even, but its not on the .gitignore
list. Add it to make 'git status' output cleaner.
Link: https://lkml.kernel.org/r/20240827030336.7930-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240827030336.7930-2-sj@kernel.org
Fixes: c94df805c7 ("selftests/damon: implement a program for even-numbered memory regions access")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In commit 714965ca82 ("mm/mmap: start distinguishing if vma can be
removed in mergeability test") we relaxed the VMA merge rules for VMAs
possessing a vm_ops->close() hook, permitting this operation in instances
where we wouldn't delete the VMA as part of the merge operation.
This was later corrected in commit fc0c8f9089 ("mm, mmap: fix
vma_merge() case 7 with vma_ops->close") to account for a subtle case that
the previous commit had not taken into account.
In both instances, we first rely on is_mergeable_vma() to determine
whether we might be dealing with a VMA that might be removed, taking
advantage of the fact that a 'previous' VMA will never be deleted, only
VMAs that follow it.
The second patch corrects the instance where a merge of the previous VMA
into a subsequent one did not correctly check whether the subsequent VMA
had a vm_ops->close() handler.
Both changes prevent merge cases that are actually permissible (for
instance a merge of a VMA into a following VMA with a vm_ops->close(), but
with no previous VMA, which would result in the next VMA being extended,
not deleted).
In addition, both changes fail to consider the case where a VMA that would
otherwise be merged with the previous and next VMA might have
vm_ops->close(), on the assumption that for this to be the case, all three
would have to have the same vma->vm_file to be mergeable and thus the same
vm_ops.
And in addition both changes operate at 50,000 feet, trying to guess
whether a VMA will be deleted.
As we have majorly refactored the VMA merge operation and de-duplicated
code to the point where we know precisely where deletions will occur, this
patch removes the aforementioned checks altogether and instead explicitly
checks whether a VMA will be deleted.
In cases where a reduced merge is still possible (where we merge both
previous and next VMA but the next VMA has a vm_ops->close hook, meaning
we could just merge the previous and current VMA), we do so, otherwise the
merge is not permitted.
We take advantage of our userland testing to assert that this functions
correctly - replacing the previous limited vm_ops->close() tests with
tests for every single case where we delete a VMA.
We also update all testing for both new and modified VMAs to set
vma->vm_ops->close() in every single instance where this would not prevent
the merge, to assert that we never do so.
Link: https://lkml.kernel.org/r/9f96b8cfeef3d14afabddac3d6144afdfbef2e22.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The existing vma_merge() function is no longer required to handle what
were previously referred to as cases 1-3 (i.e. the merging of a new VMA),
as this is now handled by vma_merge_new_vma().
Additionally, simplify the convoluted control flow of the original,
maintaining identical logic only expressed more clearly and doing away
with a complicated set of cases, rather logically examining each possible
outcome - merging of both the previous and subsequent VMA, merging of the
previous VMA and merging of the subsequent VMA alone.
We now utilise the previously implemented commit_merge() function to share
logic with vma_expand() de-duplicating code and providing less surface
area for bugs and confusion. In order to do so, we adjust this function
to accept parameters specific to merging existing ranges.
Link: https://lkml.kernel.org/r/2cf6016b7bfcc4965fc3cde10827560c42e4f12c.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
resultant function vma_merge_new_range() to be clear what the purpose of
this function is - a new VMA is desired in the specified range, and we
wish to see if it is possible to 'merge' surrounding VMAs into this range
rather than having to allocate a new VMA.
Note that this function uses vma_extend() exclusively, so adopts its
requirement that the iterator point at or before the gap. We add an
assert to this effect.
This is as opposed to vma_merge_existing_range(), which will be introduced
in a subsequent commit, and provide the same functionality for cases in
which we are modifying an existing VMA.
In mmap_region() and do_brk_flags() we open code scenarios where we prefer
to use vma_expand() rather than invoke a full vma_merge() operation.
Abstract this logic and eliminate all of the open-coding, and also use the
same logic for all cases where we add new VMAs to, rather than ultimately
use vma_merge(), rather use vma_expand().
Doing so removes duplication and simplifies VMA merging in all such cases,
laying the ground for us to eliminate the merging of new VMAs in
vma_merge() altogether.
Also add the ability for the vmg to track state, and able to report
errors, allowing for us to differentiate a failed merge from an inability
to allocate memory in callers.
This makes it far easier to understand what is happening in these cases
avoiding confusion, bugs and allowing for future optimisation.
Also introduce vma_iter_next_rewind() to allow for retrieval of the next,
and (optionally) the prev VMA, rewinding to the start of the previous gap.
Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma
comparison for the case of merging on both sides where the anon_vma of the
VMA being merged maybe compatible with prev and next, but prev and next's
anon_vma's may not be compatible with each other.
Finally also introduce can_vma_merge_left() / can_vma_merge_right() to
check adjacent VMA compatibility and that they are indeed adjacent.
Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The purpose of the vmg is to thread merge state through functions and
avoid egregious parameter lists. We expand this to vma_expand(), which is
used for a number of merge cases.
Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
use a vmg.
An added purpose of this change is the ability in a future commit to
perform all new VMA range merging using vma_expand().
Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').
Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.
Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.
We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA. Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range
(a rather confusing state of affairs), which are now provided by the vmg
directly.
We also remove the pgoff calculation previously performed vma_modify(),
and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.
Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a variety of VMA merge unit tests to assert that the behaviour of VMA
merge is correct at an abstract level and VMAs are merged or not merged as
expected.
These are intentionally added _before_ we start refactoring vma_merge() in
order that we can continually assert correctness throughout the rest of
the series.
In order to reduce churn going forward, we backport the vma_merge_struct
data type to the test code which we introduce and use in a future commit,
and add wrappers around the merge new and existing VMA cases.
Link: https://lkml.kernel.org/r/1c7a0b43cfad2c511a6b1b52f3507696478ff51a.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: remove vma_merge()", v3.
The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.
It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.
The fact that the function requires there to be a numbering system to
cover each possible eventuality with references to each in the many
branches of its implementation as to which case you are looking at speaks
to all this.
Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.
However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.
Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.
However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while
maintaining confidence in its correct operation.
An attempt was previously made by Vlastimil [0] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.
This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.
Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.
By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge()
for these instances altogether.
By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.
Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.
This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the
latter 2 being either partial or full merges).
The code is heavily annotated with ASCII diagrams and greatly simplified
in comparison to the existing vma_merge() function.
Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca82
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).
This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).
In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.
There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.
The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089 ("mm, mmap: fix vma_merge()
case 7 with vma_ops->close") was tested and confirmed passing.
[0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/
This patch (of 10):
Have vma.o depend on its source dependencies explicitly, as previously
these were simply being ignored as existing object files were up to date.
This now correctly re-triggers the build if mm/ source is changed as well
as local source code.
Also set clean as a phony rule.
Link: https://lkml.kernel.org/r/cover.1725040657.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/e3ea58f08364ae5432c9a074de0195a7c7e0b04a.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ensure that zswap.writeback check goes up the cgroup tree, i.e. is
hierarchical. Create a subcgroup which has zswap.writeback set to 1, and
the upper hierarchy's restrictions shall apply.
Link: https://lkml.kernel.org/r/20240823162506.12117-2-me@yhndnzj.com
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently, running the charge_reserved_hugetlb.sh selftest we can
sometimes observe something like:
$ ./charge_reserved_hugetlb.sh -cgroup-v2
...
write_result is 0
After write:
hugetlb_usage=0
reserved_usage=10485760
killing write_to_hugetlbfs
Received 2.
Deleting the memory
Detach failure: Invalid argument
umount: /mnt/huge: target is busy.
Both cases are issues in the test.
While the unmount error seems to be racy, it will make the test fail:
$ ./run_vmtests.sh -t hugetlb
...
# [FAIL]
not ok 10 charge_reserved_hugetlb.sh -cgroup-v2 # exit=32
The issue is that we are not waiting for the write_to_hugetlbfs process to
quit. So it might still have a hugetlbfs file open, about which umount is
not happy. Fix that by making "killall" wait for the process to quit.
The other error ("Detach failure: Invalid argument") does not seem to
result in a test error, but is misleading. Turns out write_to_hugetlbfs.c
unconditionally tries to cleanup using shmdt(), even when we only
mmap()'ed a hugetlb file. Even worse, shmaddr is never even set for the
SHM case. Fix that as well.
With this change it seems to work as expected.
Link: https://lkml.kernel.org/r/20240821123115.2068812-1-david@redhat.com
Fixes: 29750f71a9 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add more mseal traversal tests across VMAs, where we could possibly screw
up sealing checks. These test more across-vma traversal for mprotect,
munmap and madvise. Particularly, we test for the case where a regular
VMA is followed by a sealed VMA.
[akpm@linux-foundation.org: remove incorrect comment, per review]
[akpm@linux-foundation.org: remove the correct comment, per Pedro]
[pedro.falcato@gmail.com: fix mseal's length]
Link: https://lkml.kernel.org/r/vc4czyuemmu3kylqb4ctaga6y5yvondlyabimx6jvljlw2fkea@djawlllf45xa
Link: https://lkml.kernel.org/r/20240817-mseal-depessimize-v3-7-d8d2e037df30@gmail.com
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add shmem mTHP collpase testing. Similar to the anonymous page, users can
use the '-s' parameter to specify the shmem mTHP size for testing.
Link: https://lkml.kernel.org/r/fa44bfa20ca5b9fd6f9163a048f3d3c1e53cd0a8.1724140601.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
IA64 has gone with commit cf8e865810 ("arch: Remove Itanium (IA-64)
architecture"), so remove unnecessary ia64 special mm code and comment in
selftests too.
Link: https://lkml.kernel.org/r/20240819130609.3386195-1-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The name of cxl_setup_parent_dport() function is not clear, the function
is used to initialize AER and RAS capabilities on a dport, therefore,
rename the function to cxl_dport_init_ras_reporting(), it is easier for
user to understand what the function does. Besides, adjust the order of
the function parameters, the subject of cxl_dport_init_ras_reporting()
is a cxl dport, so a struct cxl_dport as the first parameter of the
function should be better.
cxl_dport_map_regs() is used to map CXL RAS capability on a cxl dport,
using cxl_dport_map_ras() as the function name.
Signed-off-by: Li Ming <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240830061308.2327065-1-ming4.li@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
'MPTCP_PM_NAME' is defined in 'linux/mptcp_pm.h', included in
'linux/mptcp.h', no need to re-define it.
'MPTCP_PM_EVENTS' is not defined in 'linux/mptcp.h', but
'MPTCP_PM_EV_GRP_NAME' is, with the same value. We can then use the
latter, and drop the other one.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-11-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The four checksum tests are similar, only one line is different. So
a for-loop can be used to simplify these tests.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-10-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test is supposed to be killed before the end, which will likely
cause "Connection reset by peer" errors. It is confusing, especially
because in case of real transfer errors, the test will not be marked as
failed. But that's OK, there are many other tests checking that.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-9-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of displaying 'invert' when looking at some events like MP_FAIL,
MP_FASTCLOSE, MP_RESET, RM_ADDR, which is a bit vague because they are
not traditionnaly sent from one side, the host being checked is now
printed.
For the ADD_ADDR, only display the host when it is the client sending
it, which is more unusual.
Also before, the 'invert' message was printed after a few checks, but it
was not clear which ones exactly.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-8-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before, the check names had to be very short. It is no longer the case
now that these checks are printed on a dedicated line.
Then, it looks better to have more explicit names.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-7-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A few new MPJoinSynTx MIB counters have been added in a previous commit.
They are being validated here in mptcp_join.sh selftest, each time the
number of received MPJ are checked.
Most of the time, the number of sent SYN+MPJ is the same as the received
ones. But sometimes, there are more, because there are dropped, or there
are errors.
While at it, the "no MPC reuse with single endpoint" subtest has been
modified to force a bind() error.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-6-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most tests are checking if the expected number of SYN/SYN+ACK/ACK JOINs
have been received, each of them on one line.
More Join related tests are going to be checked soon, no need to add 5
new lines per test in case of success, just one is enough. In case of
issue, the errors will still be reported like before.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-5-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
chk_join_nr() currently takes 9 positional parameters, 6 of them are
optional. It makes it hard to read:
chk_join_nr 1 1 1 1 0 1 1 0 4
Naming these vars helps to make it easier to read:
join_csum_ns1=1 join_csum_ns2=0 \
join_fail_nr=1 join_rst_nr=1 join_infi_nr=0 \
join_corrupted_pkts=4 \
chk_join_nr 1 1 1
It will then be easier to add new optional parameters.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-4-d3e0f3773b90@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use dev_is_platform() instead of checking bus type directly.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240827095123.168696-1-kunwu.chan@linux.dev
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Add return value checks for read & write calls in test_listmount_ns
function. This patch resolves below compilation warnings:
```
statmount_test_ns.c: In function ‘test_listmount_ns’:
statmount_test_ns.c:322:17: warning: ignoring return value of ‘write’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
statmount_test_ns.c:323:17: warning: ignoring return value of ‘read’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
```
Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The sctp selftest is very slow on debug kernels.
Its possible that the nf_queue listener program exits due to timeout
before first sctp packet is processed.
In this case socat hangs until script times out.
Fix this by removing the -t option where possible and kill the test
program once the file transfer/socat has exited.
-t sets SO_RCVTIMEO, its inteded for the 'ping' part of the selftest
where we want to make sure that packets get reinjected properly without
skipping a second queue request.
While at it, add a helper to compare the (binary) files instead of diff.
The 'diff' part was copied from a another sub-test that compares text.
Let helper dump file sizes on error so we can see the progress made.
Tested on an old 2010-ish box with a debug kernel and 100 iterations.
This is a followup to the earlier filesize reduction change.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240829080109.GB30766@breakpoint.cc/
Fixes: 0a8b08c554 ("selftests: netfilter: nft_queue.sh: reduce test file size for debug build")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20240830092254.8029-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Change the file permissions of tools/testing/fault-injection/failcmd.sh to
allow execution. This ensures the script can be run directly without
explicitly invoking a shell.
Link: https://lkml.kernel.org/r/20240729085215.3403417-1-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The failcmd.sh script in the fault-injection toolkit does not currently
validate whether the provided address is in hexadecimal format. This can
lead to silent failures if the address is sourced from places like
`/proc/kallsyms`, which omits the '0x' prefix, potentially causing users
to operate under incorrect assumptions.
Introduce a new function, `exit_if_not_hex`, which checks the format of
the provided address and exits with an error message if the address is not
a valid hexadecimal number.
This enhancement prevents users from running the command with improperly
formatted addresses, thus improving the robustness and usability of the
failcmd tool.
Link: https://lkml.kernel.org/r/20240729084512.3349928-1-leitao@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Separate call to mas_destroy() from mas_nomem() so we can check for no
memory errors without destroying the current maple state in
mas_store_gfp(). We then add calls to mas_destroy() to callers of
mas_nomem().
Link: https://lkml.kernel.org/r/20240814161944.55347-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Introduce mas_wr_store_type() which will set the correct store type based
on a walk of the tree. In mas_wr_node_store() the <= min_slots condition
is changed to < as if new_end is = to mt_min_slots then there is not
enough room.
mas_prealloc_calc() is also introduced to abstract the calculation used to
determine the number of nodes needed for a store operation.
In this change a call to mas_reset() is removed in the error case of
mas_prealloc(). This is only needed in the MA_STATE_REBALANCE case of
mas_destroy(). We can move the call to mas_reset() directly to
mas_destroy().
Also, add a test case to validate the order that we check the store type
in is correct. This test models a vma expanding and then shrinking which
is part of the boot process.
Link: https://lkml.kernel.org/r/20240814161944.55347-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add new callback fields to the userspace implementation of struct
kmem_cache. This allows for executing callback functions in order to
further test low memory scenarios where node allocation is retried.
This callback can help test race conditions by calling a function when a
low memory event is tested.
Link: https://lkml.kernel.org/r/20240812190543.71967-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make accept_memory() and range_contains_unaccepted_memory() take 'start'
and 'size' arguments instead of 'start' and 'end'.
Remove accept_page(), replacing it with direct calls to accept_memory().
The accept_page() name is going to be used for a different function.
Link: https://lkml.kernel.org/r/20240809114854.3745464-6-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
the syscall remap accepts following:
mremap(src, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, dst)
when the src is sealed, the call will fail with error code:
EPERM
Previously, the test uses hard-coded 0xdeaddead as dst, and it
will fail on the system with newer glibc installed.
This patch removes test's dependency on glibc for mremap(), also
fix the test and remove the hardcoded address.
Link: https://lkml.kernel.org/r/20240807212320.2831848-1-jeffxu@chromium.org
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reported-by: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add an mseal test for madvise() operations that aren't considered
"discard" (e.g purely advisory ops such as MADV_RANDOM).
[pedro.falcato@gmail.com: adjust the mseal test's plan]
Link: https://lkml.kernel.org/r/20240807203724.2686144-1-pedro.falcato@gmail.com
Link: https://lkml.kernel.org/r/20240807173336.2523757-3-pedro.falcato@gmail.com
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Establish a new userland VMA unit testing implementation under
tools/testing which utilises existing logic providing maple tree support
in userland utilising the now-shared code previously exclusive to radix
tree testing.
This provides fundamental VMA operations whose API is defined in mm/vma.h,
while stubbing out superfluous functionality.
This exists as a proof-of-concept, with the test implementation functional
and sufficient to allow userland compilation of vma.c, but containing only
cursory tests to demonstrate basic functionality.
Link: https://lkml.kernel.org/r/533ffa2eec771cbe6b387dd049a7f128a53eb616.1722251717.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The core components contained within the radix-tree tests which provide
shims for kernel headers and access to the maple tree are useful for
testing other things, so separate them out and make the radix tree tests
dependent on the shared components.
This lays the groundwork for us to add VMA tests of the newly introduced
vma.c file.
Link: https://lkml.kernel.org/r/1ee720c265808168e0d75608e687607d77c36719.1722251717.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rae Moar <rmoar@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Extend two existing tests to cover extracting memory usage through the
newly mutable memory.peak and memory.swap.peak handlers.
In particular, make sure to exercise adding and removing watchers with
overlapping lifetimes so the less-trivial logic gets tested.
The new/updated tests attempt to detect a lack of the write handler by
fstat'ing the memory.peak and memory.swap.peak files and skip the tests if
that's the case. Additionally, skip if the file doesn't exist at all.
[davidf@vimeo.com: update tests]
Link: https://lkml.kernel.org/r/20240730231304.761942-3-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-3-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The __NR_mmap isn't found on armhf. The mmap() is commonly available
system call and its wrapper is present on all architectures. So it should
be used directly. It solves problem for armhf and doesn't create problem
for other architectures.
Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap(). There is no need to set errno = 0 manually as glibc
always resets it.
For reference errors are as following:
CC seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
39 | sret = (void *) syscall(__NR_mmap, addr, len, prot,
| ^~~~~~~~~
mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
90 | sret = (void *) syscall(__NR_mmap, addr, len, prot,
| ^~~~~~~~~
Link: https://lkml.kernel.org/r/20240809082511.497266-1-usama.anjum@collabora.com
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is only hotplug test for cpuset v1, just add base read/write test
for cpuset v1.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Since isolated CPUs can be reserved at boot time via the "isolcpus"
boot command line option, these pre-isolated CPUs may interfere with
testing done by test_cpuset_prs.sh.
With the previous commit that incorporates those boot time isolated CPUs
into "cpuset.cpus.isolated", we can check for those before testing is
started to make sure that there will be no interference. Otherwise,
this test will be skipped if incorrect test failure can happen.
As "cpuset.cpus.isolated" is now available in a non cgroup_debug kernel,
we don't need to check for its existence anymore.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
%.bpf.o objects depend on vmlinux.h, which makes them transitively
dependent on unnecessary libbpf headers. However vmlinux.h doesn't
actually change as often.
When generating vmlinux.h, compare it to a previous version and update
it only if there are changes.
Example of build time improvement (after first clean build):
$ touch ../../../lib/bpf/bpf.h
$ time make -j8
Before: real 1m37.592s
After: real 0m27.310s
Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed.
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20240828174608.377204-2-ihor.solodrai@pm.me
Including:
- Fix a device-stall problem in bad io-page-fault setups (faults
received from devices with no supporting domain attached).
- Context flush fix for Intel VT-d.
- Do not allow non-read+non-write mapping through iommufd as most
implementations can not handle that.
- Fix a possible infinite-loop issue in map_pages() path.
- Add Jean-Philippe as reviewer for SMMUv3 SVA support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmbRvfEACgkQK/BELZcB
GuOB8w//WLapQpxMw9w+4l3Z3SqxB5gSPF6pdCJwYRrpFGBX1yNZ0vWtF2TpKtOC
NaMa/EC1C2FWjcArCP21uFtDvN04FgXSVl6sjFUHsUf+YALrUfljQk/XFI4SenTq
PtvPv8PVGbhqLtdJDXMlQWBN3RX0qK/PIFmuUX5ySBk7J7k5QyBi2HEuK2DbPM7j
+LMnyTHj5Aa2jRz/NSCDIRKbSFJKgvd8apval2VX0zljjpyqk5KmHHjkLtiOiTTI
G6ZJlRYCn98eTLU2ww8b7/y0vVYop7C1Q7Cyds/72xvW+a3jbSRIGf6yqtmdbMYd
faxRng5rWHWsq3XMZC+Ts9k2FA3pUIvOmfptCFfrQYYXvZI6dD6o7uMko6SF82n4
xEy+H6AEWZXF70xaJDp1cn1PpURJgJly/l/6qAIB746qNT7j/CcOOha1bpbCy81x
EIOl0B4wyJGjQnxjKsH01K9ec3uT6rugbpFEE9PL8l25khhyweBwuQWc2EVxRZgH
ICH4pCmvU9Wy6mpXL2R/SyzECWjgg0oJr+pq3Yxv7xufSGQswWJ/StFozSBHnH01
OGGA/2xMrNeRzlm4PZfRzdAiCfYX9kEodiF1jGLA4B1V5Tx/y1LSX7W/nCeZmlRz
/OhEC07DWZumeSCTe5I+BmZwiXh/DEAlUypDQkVKaaeGltlyvl8=
=8XuD
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Fix a device-stall problem in bad io-page-fault setups (faults
received from devices with no supporting domain attached).
- Context flush fix for Intel VT-d.
- Do not allow non-read+non-write mapping through iommufd as most
implementations can not handle that.
- Fix a possible infinite-loop issue in map_pages() path.
- Add Jean-Philippe as reviewer for SMMUv3 SVA support
* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
iommu: Do not return 0 from map_pages if it doesn't do anything
iommufd: Do not allow creating areas without READ or WRITE
iommu/vt-d: Fix incorrect domain ID in context flush helper
iommu: Handle iommu faults for a bad iopf setup
Create a BTF with endianness different from host, make a distilled
base/split BTF pair from it, dump as raw bytes, import again and
verify that endianness is preserved.
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com
When building with clang, there's this warning:
vdso_test_getrandom.c:145:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
145 | static void *test_vdso_getrandom(void *)
| ^
vdso_test_getrandom.c:155:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
155 | static void *test_libc_getrandom(void *)
| ^
vdso_test_getrandom.c:165:43: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions]
165 | static void *test_syscall_getrandom(void *)
Add the named ctx parameter to quash it.
Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
A "%s" is missing in ksft_exit_fail_msg(); instead, use the newly
introduced ksft_exit_fail_perror().
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240830052911.4040970-1-dev.jain@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
vdso_test_correctness test fails on powerpc:
~ # ./vdso_test_correctness
...
[RUN] Testing clock_gettime for clock CLOCK_REALTIME_ALARM (8)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
[RUN] Testing clock_gettime for clock CLOCK_BOOTTIME_ALARM (9)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
[RUN] Testing clock_gettime for clock CLOCK_SGI_CYCLE (10)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
...
[RUN] Testing clock_gettime for clock invalid (-1)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
[RUN] Testing clock_gettime for clock invalid (-2147483648)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
[RUN] Testing clock_gettime for clock invalid (2147483647)...
[FAIL] No such clock, but __vdso_clock_gettime returned 22
On powerpc, a call to a VDSO function is not an ordinary C function
call. Unlike several architectures which returns a negative error code
in case of an error, powerpc sets CR[SO] and returns the error code
as a positive value.
Define and use a macro called VDSO_CALL() which takes a pointer
to the function to call, the number of arguments and the arguments.
Also update ABI vdso documentation to reflect this subtlety.
Provide a specific version of VDSO_CALL() for powerpc that negates
the error code on return when CR[SO] is set.
Fixes: c7e5789b24 ("kselftest: Move test_vdso to the vDSO test suite")
Fixes: 2e9a972566 ("selftests: vdso: Add a selftest for vDSO getcpu()")
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Fixes: b2f1c3db28 ("kselftest: Extend vdso correctness test to clock_gettime64")
Fixes: 4920a2590e ("selftests/vDSO: add tests for vgetrandom")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Running vdso_test_correctness on powerpc64 gives the following warning:
~ # ./vdso_test_correctness
Warning: failed to find clock_gettime64 in vDSO
This is because vdso_test_correctness was built with VDSO_32BIT defined.
__powerpc__ macro is defined on both powerpc32 and powerpc64 so
__powerpc64__ needs to be checked first in vdso_config.h
Fixes: 693f5ca08c ("kselftest: Extend vDSO selftest")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Following error occurs when running vdso_test_correctness on powerpc:
~ # ./vdso_test_correctness
[WARN] failed to find vDSO
[SKIP] No vDSO, so skipping clock_gettime() tests
[SKIP] No vDSO, so skipping clock_gettime64() tests
[RUN] Testing getcpu...
[OK] CPU 0: syscall: cpu 0, node 0
On powerpc, vDSO is neither called linux-vdso.so.1 nor linux-gate.so.1
but linux-vdso32.so.1 or linux-vdso64.so.1.
Also search those two names before giving up.
Fixes: c7e5789b24 ("kselftest: Move test_vdso to the vDSO test suite")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
If the getrandom test compiles for an arch, don't exit fatally if the
actual cpu it's running on is unsupported.
Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Building test_vdso_getrandom currently leads to following issue:
In file included from /home/xry111/git-repos/linux/tools/include/linux/compiler_types.h:36,
from /home/xry111/git-repos/linux/include/uapi/linux/stddef.h:5,
from /home/xry111/git-repos/linux/include/uapi/linux/posix_types.h:5,
from /usr/include/asm/sigcontext.h:12,
from /usr/include/bits/sigcontext.h:30,
from /usr/include/signal.h:301,
from vdso_test_getrandom.c:14:
/home/xry111/git-repos/linux/tools/include/linux/compiler-gcc.h:3:2: error: #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead."
3 | #error "Please don't include <linux/compiler-gcc.h> directly, include <linux/compiler.h> instead."
| ^~~~~
It's because the compiler_types.h inclusion in
include/uapi/linux/stddef.h is expected to be removed by the
header_install.sh script, as compiler_types.h shouldn't be used from
user space.
Add KHDR_INCLUDES before the existing include/uapi inclusion so that
usr/include takes precedence if it's populated.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
CONFIG_FUNCTION_ALIGNMENT=0 is no longer necessary and BULID_VDSO wasn't
spelled right while BUILD_VDSO isn't necessary, so just remove these.
Reported-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
On systems that set -Wl,--as-needed, putting the -lsodium in the wrong
place on the command line means we get a linker error:
CC vdso_test_chacha
/usr/bin/ld: /tmp/ccKpjnSM.o: in function `main':
vdso_test_chacha.c:(.text+0x276): undefined reference to `crypto_stream_chacha20'
collect2: error: ld returned 1 exit status
Fix this by passing pkg-config's --libs output to the LDFLAGS field
instead of the CFLAGS field, as is customary.
Reported-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When libsodium is installed into its own prefix, the --cflags output is
needed for the compiler to find libsodium headers.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Don't hard-code x86 specific names. Rather, use vdso_config definitions
to find the correct function matching the architecture.
Add random VDSO function names in names[][]. Remove the #ifdef
CONFIG_VDSO32, as having the name there all the time is harmless and
guaranties a steady index for following strings.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[Jason: add [6] to variable declaration rather than each usage site.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Building test_vdso_chacha currently leads to following issue:
In file included from /home/chleroy/linux-powerpc/include/linux/limits.h:7,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/local_lim.h:38,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/posix1_lim.h:161,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/limits.h:195,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:203,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/syslimits.h:7,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:34,
from /tmp/sodium/usr/local/include/sodium/export.h:7,
from /tmp/sodium/usr/local/include/sodium/crypto_stream_chacha20.h:14,
from vdso_test_chacha.c:6:
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:99:6: error: missing binary operator before token "("
99 | # if INT_MAX == 32767
| ^~~~~~~
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:102:7: error: missing binary operator before token "("
102 | # if INT_MAX == 2147483647
| ^~~~~~~
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:126:6: error: missing binary operator before token "("
126 | # if LONG_MAX == 2147483647
| ^~~~~~~~
This is due to kernel include/linux/limits.h being included instead of
libc's limits.h.
This is because directory include/ is added through option -isystem so
it goes prior to glibc's include directory.
Replace -isystem by -idirafter.
But this implies that now tools/include/linux/linkage.h is included
instead of include/linux/linkage.h, so define a stub for
SYM_FUNC_START() and SYM_FUNC_END().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Architectures use different location for vDSO sources:
arch/mips/vdso
arch/sparc/vdso
arch/arm64/kernel/vdso
arch/riscv/kernel/vdso
arch/csky/kernel/vdso
arch/x86/um/vdso
arch/x86/entry/vdso
arch/powerpc/kernel/vdso
arch/arm/vdso
arch/loongarch/vdso
Don't hard-code vdso sources location in selftest Makefile. Instead
create a vdso/ symbolic link in tools/arch/$arch/ and update Makefile
accordingly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rather than using pthread_get/set_specific, just use gcc's __thread
annotation, which is noticeably faster and makes the code more obvious.
Also, just have one simplified struct called vgrnd, instead of trying to
split things up semantically. Those divisions were useful when this code
was split across several commit *messages*, but doesn't make as much
sense within a single file. This should make the code more clear and
provide a better example for implementers.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The test case for SME vector length changes via sigreturn use a bit too
much cut'n'paste and only actually changed the SVE vector length in the
test itself. Andre's recent factoring out of the initialisation code caused
this to be exposed and the test to start failing. Fix the test to actually
cover the thing it's supposed to test.
Fixes: 4963aeb35a ("kselftest/arm64: signal: Add SME signal handling tests")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20240829-arm64-sme-signal-vl-change-test-v1-1-42d7534cb818@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>