mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-12-02 16:44:10 +08:00
17553ba8e1
3404 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
c150b809f7 |
RISC-V Patches for the 6.9 Merge Window
* Support for various vector-accelerated crypto routines. * Hibernation is now enabled for portable kernel builds. * mmap_rnd_bits_max is larger on systems with larger VAs. * Support for fast GUP. * Support for membarrier-based instruction cache synchronization. * Support for the Andes hart-level interrupt controller and PMU. * Some cleanups around unaligned access speed probing and Kconfig settings. * Support for ACPI LPI and CPPC. * Various cleanus related to barriers. * A handful of fixes. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmX9icgTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYib+UD/4xyL6UMixx6A06BVBL9UT4vOrxRvNr JIihG5y5QNMjes9DHWL35mZTMqFtQ0tq94ViWFLmJWloV/8KRVM2C9R9KX7vplf3 M/OwvP106spxgvNHoeQbycgs42RU1t2mpqT7N1iK2hCjqieP3vLn6hsSLXWTAG0L 3gQbQw6XCLC3hPyLq+nbFY2i4faeCmpXWmixoy/IvQ5calZQrRU0LNlP6lcMBhVo uocjG0uGAhrahw2s81jxcMZcxa3AvUCiplapdD5H5v9rBM85SkYJj2Q9SqdSorkb xzuimRnKPI5s47yM3pTfZY0qnQUYHV7PXXuw4WujpCQVQdhaG+Ggq63UUZA61J9t IzZK2zdcfHqICrGTtXImUzRT3dcc3oq+IFq4tTY+rEJm29hrXkAtx+qBm5xtMvax fJz5feJ/iT0u7MDj4Oq24n+Kpl+Olm+MJaZX3m5Ovi/9V6a9iK9HXqxg9/Fs0fMO +J/0kTgd8Vu9CYH7KNWz3uztcO9eMAH3VyzuXuab4BGj1i1Y/9EjpALQi7rDN73S OsYQX6NnzMkBV4dvElJVLXiPlvNlMHZZwdak5CqPb48jaJu6iiIZAuvOrG6/naGP wnQSLVA2WWWoOkl3AJhxfpa11CLhbMl9E2gYm1VtNvASXoSFIxlAq1Yv3sG8yjty 4ZT0rYFJOstYiQ== =3dL5 -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various vector-accelerated crypto routines - Hibernation is now enabled for portable kernel builds - mmap_rnd_bits_max is larger on systems with larger VAs - Support for fast GUP - Support for membarrier-based instruction cache synchronization - Support for the Andes hart-level interrupt controller and PMU - Some cleanups around unaligned access speed probing and Kconfig settings - Support for ACPI LPI and CPPC - Various cleanus related to barriers - A handful of fixes * tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ... |
||
Linus Torvalds
|
1d35aae78f |
Kbuild updates for v6.9
- Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmX8HGIVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGYfIQAIl/zEFoNVSHGR4TIvO7SIwkT4MM VAm0W6XRFaXfIGw8HL/MXe+U9jAyeQ9yL9uUVv8PqFTO+LzBbW1X1X97tlmrlQsC 7mdxbA1KJXwkwt4wH/8/EZQMwHr327vtVH4AilSm+gAaWMXaSKAye3ulKQQ2gevz vP6aOcfbHIWOPdxA53cLdSl9LOGrYNczKySHXKV9O39T81F+ko7wPpdkiMWw5LWG ISRCV8bdXli8j10Pmg8jlbevSKl4Z5FG2BVw/Cl8rQ5tBBoCzFsUPnnp9A29G8QP OqRhbwxtkSm67BMJAYdHnhjp/l0AOEbmetTGpna+R06hirOuXhR3vc6YXZxhQjff LmKaqfG5YchRALS1fNDsRUNIkQxVJade+tOUG+V4WbxHQKWX7Ghu5EDlt2/x7P0p +XLPE48HoNQLQOJ+pgIOkaEDl7WLfGhoEtEgprZBuEP2h39xcdbYJyF10ZAAR4UZ FF6J9lDHbf7v1uqD2YnAQJQ6jJ06CvN6/s6SdiJnCWSs5cYRW0fnYigSIuwAgGHZ c/QFECoGEflXGGuqZDl5iXiIjhWKzH2nADSVEs7maP47vapcMWb9gA7VBNoOr5M0 IXuFo1khChF4V2pxqlDj3H5TkDlFENYT/Wjh+vvjx8XplKCRKaSh+LaZ39hja61V dWH7BPecS44h4KXx =tFdl -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ... |
||
Sami Tolvanen
|
a9ad73295c
|
riscv: Fix syscall wrapper for >word-size arguments
The current syscall wrapper macros break 64-bit arguments on
rv32 because they only guarantee the first N input registers are
passed to syscalls that accept N arguments. According to the
calling convention, values twice the word size reside in register
pairs and as a result, syscall arguments don't always have a
direct register mapping on rv32.
Instead of using `__MAP(x,__SC_LONG,__VA_ARGS__)` to declare the
type of the `__se(_compat)_sys_*` functions on rv32, change the
function declarations to accept `ulong` arguments and alias them
to the actual syscall implementations, similarly to the existing
macros in include/linux/syscalls.h. This matches previous
behavior and ensures registers are passed to syscalls as-is, no
matter which argument types they expect.
Fixes:
|
||
Palmer Dabbelt
|
cd6c916ccf
|
Merge patch series "riscv/barrier: tidying up barrier-related macro"
Eric Chan <ericchancf@google.com> says: This series makes barrier-related macro more neat and clear. This is a follow-up to [0-3], change to multiple patches, for readability, create new message thread. [0](v1/v2) https://lore.kernel.org/lkml/20240209125048.4078639-1-ericchancf@google.com/ [1] (v3) https://lore.kernel.org/lkml/20240213142856.2416073-1-ericchancf@google.com/ [2] (v4) https://lore.kernel.org/lkml/20240213200923.2547570-1-ericchancf@google.com/ [4] (v5) https://lore.kernel.org/lkml/20240213223810.2595804-1-ericchancf@google.com/ * b4-shazam-merge: riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} Link: https://lore.kernel.org/r/20240217131206.3667544-1-ericchancf@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Biggers
|
c70dfa4a27
|
crypto: riscv - add vector crypto accelerated AES-CBC-CTS
Add an implementation of cts(cbc(aes)) accelerated using the Zvkned RISC-V vector crypto extension. This is mainly useful for fscrypt, where cts(cbc(aes)) is the "default" filenames encryption algorithm. In that use case, typically most messages are short and are block-aligned. The CBC-CTS variant implemented is CS3; this is the variant Linux uses. To perform well on short messages, the new implementation processes the full message in one call to the assembly function if the data is contiguous. Otherwise it falls back to CBC operations followed by CTS at the end. For decryption, to further improve performance on short messages, especially block-aligned messages, the CBC-CTS assembly function parallelizes the AES decryption of all full blocks. This improves on the arm64 implementation of cts(cbc(aes)), which always splits the CBC part(s) from the CTS part, doing the AES decryptions for the last two blocks serially and usually loading the round keys twice. Tested in QEMU with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240213055442.35954-1-ebiggers@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Biggers
|
da215b089b
|
crypto: riscv - parallelize AES-CBC decryption
Since CBC decryption is parallelizable, make the RISC-V implementation of AES-CBC decryption process multiple blocks at a time, instead of processing the blocks one by one. This should improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240208060851.154129-1-ebiggers@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
028d1aee1f
|
Merge patch series "RISC-V: ACPI: Enable CPPC based cpufreq support"
Sunil V L <sunilvl@ventanamicro.com> says: This series enables the support for "Collaborative Processor Performance Control (CPPC) on ACPI based RISC-V platforms. It depends on the encoding of CPPC registers as defined in RISC-V FFH spec [2]. CPPC is described in the ACPI spec [1]. RISC-V FFH spec required to enable this, is available at [2]. [1] - https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#collaborative-processor-performance-control [2] - https://github.com/riscv-non-isa/riscv-acpi-ffh/releases/download/v1.0.0/riscv-ffh.pdf * b4-shazam-merge: RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver Link: https://lore.kernel.org/r/20240208034414.22579-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Alexandre Ghiti
|
01261e24cf
|
riscv: Only flush the mm icache when setting an exec pte
We used to emit a flush_icache_all() whenever a dirty executable mapping is set in the page table but we can instead call flush_icache_mm() which will only send IPIs to cores that currently run this mm and add a deferred icache flush to the others. The number of calls to sbi_remote_fence_i() (tested without IPI support): With a simple buildroot rootfs: * Before: ~5k * After : 4 (!) Tested on HW, the boot to login is ~4.5% faster. With an ubuntu rootfs: * Before: ~24k * After : ~13k Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240202124711.256146-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Erick Archer
|
28e4748e5e
|
riscv: Use kcalloc() instead of kzalloc()
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the purpose specific kcalloc() function instead of the argument count * size in the kzalloc() function. Also, it is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240120135400.4710-1-erick.archer@gmx.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
85ab6fdf37
|
Merge patch series "RISC-V: ACPI: Add LPI support"
Sunil V L <sunilvl@ventanamicro.com> says: This series adds support for Low Power Idle (LPI) on ACPI based platforms. LPI is described in the ACPI spec [1]. RISC-V FFH spec required to enable this is available at [2]. [1] - https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html#lpi-low-power-idle-states [2] - https://github.com/riscv-non-isa/riscv-acpi-ffh/releases/download/v/riscv-ffh.pdf * b4-shazam-merge: ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv Link: https://lore.kernel.org/r/20240118062930.245937-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
728e7ea2b5
|
Merge patch series "riscv: Introduce compat-mode helpers & improve arch_get_mmap_end()"
Leonardo Bras <leobras@redhat.com> says: I just saw the opportunity of optimizing the helper is_compat_task() by introducing a compile-time test, and it made possible to remove some #ifdef's without any loss of performance. I also saw the possibility of removing the direct check of task flags from general code, and concentrated it in asm/compat.h by creating a few more helpers, which in the end helped optimize code. arch_get_mmap_end() just got a simple improvement and some extra docs. * b4-shazam-merge: riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro Link: https://lore.kernel.org/r/20240103160024.70305-2-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Chan
|
9133e6e690
|
riscv/barrier: Add missing space after ','
The past form of RISCV_FENCE would cause checkpatch.pl to issue error messages, the example is as follows: ERROR: space required after that ',' (ctx:VxV) 26: FILE: arch/riscv/include/asm/barrier.h:27: +#define __smp_mb() RISCV_FENCE(rw,rw) ^ fix the remaining of RISCV_FENCE. Signed-off-by: Eric Chan <ericchancf@google.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240217131328.3669364-1-ericchancf@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Chan
|
c85688e2b0
|
riscv/barrier: Consolidate fence definitions
Disparate fence implementations are consolidated into fence.h. Also introduce RISCV_FENCE_ASM to make fence macro more reusable. Signed-off-by: Eric Chan <ericchancf@google.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240217131316.3668927-1-ericchancf@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Chan
|
b3c8064ccc
|
riscv/barrier: Define RISCV_FULL_BARRIER
Introduce RISCV_FULL_BARRIER and use in arch_atomic* function. like RISCV_ACQUIRE_BARRIER and RISCV_RELEASE_BARRIER, the fence instruction can be eliminated When SMP is not enabled. Signed-off-by: Eric Chan <ericchancf@google.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240217131302.3668481-1-ericchancf@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Eric Chan
|
89f4fd7b1a
|
riscv/barrier: Define __{mb,rmb,wmb}
Introduce __{mb,rmb,wmb}, and rely on the generic definitions for {mb,rmb,wmb}. Although KCSAN is not supported yet, the definitions can be made more consistent with generic instrumentation. Also add a space to make the changes pass check by checkpatch.pl. Without the space, the error message is as below: ERROR: space required after that ',' (ctx:VxV) 26: FILE: arch/riscv/include/asm/barrier.h:23: +#define __mb() RISCV_FENCE(iorw,iorw) ^ Signed-off-by: Eric Chan <ericchancf@google.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240217131249.3668103-1-ericchancf@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Sunil V L
|
282b9df4e9
|
RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
CONFIG_ACPI_CPPC_CPUFREQ is required to enable CPPC for RISC-V. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20240208034414.22579-4-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Sunil V L
|
6649182a38
|
cpuidle: RISC-V: Move few functions to arch/riscv
To support ACPI Low Power Idle (LPI), few functions are required which are currently static functions in the DT based cpuidle driver. Hence, move them under arch/riscv so that ACPI driver also can use them. Since they are no longer static functions, append "riscv_" prefix to the function name. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20240118062930.245937-2-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Leonardo Bras
|
2a8986fc5e
|
riscv: Introduce set_compat_task() in asm/compat.h
In order to have all task compat bit access directly in compat.h, introduce set_compat_task() to set/reset those when needed. Also, since it's only used on an if/else scenario, simplify the macro using it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240103160024.70305-7-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Leonardo Bras
|
5917ea17ad
|
riscv: Introduce is_compat_thread() into compat.h
task_user_regset_view() makes use of a function very similar to is_compat_task(), but pointing to a any thread. In arm64 asm/compat.h there is a function very similar to that: is_compat_thread(struct thread_info *thread) Copy this function to riscv asm/compat.h and make use of it into task_user_regset_view(). Also, introduce a compile-time test for CONFIG_COMPAT and simplify the function code by removing the #ifdef. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240103160024.70305-6-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Leonardo Bras
|
4c0b5a4516
|
riscv: add compile-time test into is_compat_task()
Currently several places will test for CONFIG_COMPAT before testing is_compat_task(), probably in order to avoid a run-time test into the task structure. Since is_compat_task() is an inlined function, it would be helpful to add a compile-time test of CONFIG_COMPAT, making sure it always returns zero when the option is not enabled during the kernel build. With this, the compiler is able to understand in build-time that is_compat_task() will always return 0, and optimize-out some of the extra code introduced by the option. This will also allow removing a lot #ifdefs that were introduced, and make the code more clean. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240103160024.70305-5-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Leonardo Bras
|
9dc3041924
|
riscv: Replace direct thread flag check with is_compat_task()
There is some code that detects compat mode into a task by checking the flag directly, and other code that check using the helper is_compat_task(). Since the helper already exists, use it instead of checking the flags directly. Signed-off-by: Leonardo Bras <leobras@redhat.com> Link: https://lore.kernel.org/r/20240103160024.70305-4-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Leonardo Bras
|
6be7ee4beb
|
riscv: Improve arch_get_mmap_end() macro
This macro caused me some confusion, which took some reviewer's time to make it clear, so I propose adding a short comment in code to avoid confusion in the future. Also, added some improvements to the macro, such as removing the assumption of VA_USER_SV57 being the largest address space. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240103160024.70305-3-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Linus Torvalds
|
78c3925c04 |
ARM: late SoC changes for 6.9
These are changes that for some reason ended up not making it into the first four branches but that should still make it into 6.9: - A rework of the omap clock support that touches both drivers and device tree files - The reset controller branch changes that had a dependency on late bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the drivers branch - The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree changes that got delayed and needed some extra time in linux-next for wider testing. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmX5vYcACgkQYKtH/8kJ UiemkhAAu2lYNpttx+qVlEzQvPKyID5Y+E0cVRmM5e79/fOumNomSzFwtKztCbz2 PV1CHwmDYANKsI8tl91PAe8PzD+9Er+8xa6YYVSMG5bLC2aGdF4k5hzMnRmfhlDe uRT/9iNH0w+S1p44+wXI9Y++uZhxJtCqa6kytxybl6YrG2/l3Wm0PVcMAD/MWT1l OULRg5gv3+7qHLKE0ffd0J7I7zCvKA5cEqnieGSO8+k1jsOE3BvgLttfPUuUsi3x 8yWAJ2cEv293Cao8x8rw39TYIHQOznLMNzK/GCIemL4k9TafbGbuVPUGQZ6oX1SQ +/biiUV8CMLzanw2Ds7piQ/4J8EoJjh7jCf9pETORlHLaCMQaYUk4I2KnBWmjxuO QBy6Py68EkyT1zv7YFkpdxeABkwkrObMmVsjfyltd2lCF6oC+xbIw5IOVPgnUiTc WANL3y+hS5zv+ABmpkRhDPe9KrcoO95sJgGaoMPatwD1/2JkdV7EkvbXWdnipb1w REYk4xuRlJcAgyjc5nrQXR8FuPX63c08NFkOw+AInFV8ipyH+8nkesb0w54aegsR Tihhl0WUxk/e9FLFVlPiYRNdyqOb2HKteRwRxsA1LqqcWdpYjplBrkZhHb3+ESnP lQaQ7AtZRoIjwsImYen3M2W1cFS214BAqoonLLYSd0ponCB05Ng= =IzoE -----END PGP SIGNATURE----- Merge tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull more ARM SoC updates from Arnd Bergmann: "These are changes that for some reason ended up not making it into the first four branches but that should still make it into 6.9: - A rework of the omap clock support that touches both drivers and device tree files - The reset controller branch changes that had a dependency on late bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the drivers branch - The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree changes that got delayed and needed some extra time in linux-next for wider testing" * tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits) soc: fsl: dpio: fix kcalloc() argument order bus: ts-nbus: Improve error reporting bus: ts-nbus: Convert to atomic pwm API riscv: dts: starfive: jh7110: Add camera subsystem nodes ARM: bcm: stop selecing CONFIG_TICK_ONESHOT ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift clk: ti: Improve clksel clock bit parsing for reg property clk: ti: Handle possible address in the node name dt-bindings: pwm: opencores: Add compatible for StarFive JH8100 dt-bindings: riscv: cpus: reg matches hart ID reset: Instantiate reset GPIO controller for shared reset-gpios reset: gpio: Add GPIO-based reset controller cpufreq: do not open-code of_phandle_args_equal() of: Add of_phandle_args_equal() helper reset: simple: add support for Sophgo SG2042 dt-bindings: reset: sophgo: support SG2042 riscv: dts: microchip: add specific compatible for mpfs pdma riscv: dts: microchip: add missing CAN bus clocks ARM: brcmstb: Add debug UART entry for 74165 ... |
||
Linus Torvalds
|
4f712ee0cb |
S390:
* Changes to FPU handling came in via the main s390 pull request * Only deliver to the guest the SCLP events that userspace has requested. * More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same). * Fix selftests undefined behavior. x86: * Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec. * Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests). * Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized. * Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. * Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit. * Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code. * Add a VMX flag in /proc/cpuinfo to report 5-level EPT support. * Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot. * Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels. * Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization. * Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives. * Fix the debugregs ABI for 32-bit KVM. * Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD. * Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work. * Cleanup the logic for checking if the currently loaded vCPU is in-kernel. * Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel. x86 Xen emulation: * Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same. * When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation. * Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior). * Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs. RISC-V: * Support exception and interrupt handling in selftests * New self test for RISC-V architectural timer (Sstc extension) * New extension support (Ztso, Zacas) * Support userspace emulation of random number seed CSRs. ARM: * Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers * Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it * Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path * Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register * Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: * Set reserved bits as zero in CPUCFG. * Start SW timer only when vcpu is blocking. * Do not restart SW timer when it is expired. * Remove unnecessary CSR register saving during enter guest. * Misc cleanups and fixes as usual. Generic: * cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else. * Factor common "select" statements in common code instead of requiring each architecture to specify it * Remove thoroughly obsolete APIs from the uapi headers. * Move architecture-dependent stuff to uapi/asm/kvm.h * Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded. * Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker. Selftests: * Reduce boilerplate especially when utilize selftest TAP infrastructure. * Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory. * Fix benign bugs where tests neglect to close() guest_memfd files. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP 6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y 5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA== =mqOV -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ... |
||
Song Shuai
|
700c2d9b1b
|
riscv: vector: Fix a typo of preempt_v
The term "preempt_v" represents the RISCV_PREEMPT_V field of riscv_v_flags and is used in lots of comments. Here corrects the miss-spelling "prempt_v". And s/acheived/achieved/. Reviewed-by: Andy Chiu <andybnac@gmail.com> Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Link: https://lore.kernel.org/r/20240221100252.3990445-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
07f2c040fa
|
Merge patch series "riscv: mm: Extend mappable memory up to hint address"
Charlie Jenkins <charlie@rivosinc.com> says: On riscv, mmap currently returns an address from the largest address space that can fit entirely inside of the hint address. This makes it such that the hint address is almost never returned. This patch raises the mappable area up to and including the hint address. This allows mmap to often return the hint address, which allows a performance improvement over searching for a valid address as well as making the behavior more similar to other architectures. Note that a previous patch introduced stronger semantics compared to other architectures for riscv mmap. On riscv, mmap will not use bits in the upper bits of the virtual address depending on the hint address. On other architectures, a random address is returned in the address space requested. On all architectures the hint address will be returned if it is available. This allows riscv applications to configure how many bits in the virtual address should be left empty. This has the two benefits of being able to request address spaces that are smaller than the default and doesn't require the application to know the page table layout of riscv. * b4-shazam-merge: docs: riscv: Define behavior of mmap selftests: riscv: Generalize mm selftests riscv: mm: Use hint address in mmap if available Link: https://lore.kernel.org/r/20240130-use_mmap_hint_address-v3-0-8a655cfa8bcb@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
099dbac6e9
|
Merge patch series "riscv: Use Kconfig to set unaligned access speed"
Charlie Jenkins <charlie@rivosinc.com> says: If the hardware unaligned access speed is known at compile time, it is possible to avoid running the unaligned access speed probe to speedup boot-time. * b4-shazam-merge: riscv: Set unaligned access speed at compile time riscv: Decouple emulated unaligned accesses from access speed riscv: Only check online cpus for emulated accesses riscv: lib: Introduce has_fast_unaligned_access() Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-0-a388770ba0ce@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
0fd283cb64
|
Merge patch series "Support Andes PMU extension"
Yu Chien Peter Lin <peterlin@andestech.com> says: This patch series introduces the Andes PMU extension, which serves the same purpose as Sscofpmf and Smcntrpmf. Its non-standard local interrupt is assigned to bit 18 in the custom S-mode local interrupt enable and pending registers (slie/slip), while the interrupt cause is (256 + 18). * b4-shazam-merge: riscv: andes: Support specifying symbolic firmware and hardware raw events riscv: dts: renesas: Add Andes PMU extension for r9a07g043f dt-bindings: riscv: Add Andes PMU extension description perf: RISC-V: Introduce Andes PMU to support perf event sampling perf: RISC-V: Eliminate redundant interrupt enable/disable operations riscv: dts: renesas: r9a07g043f: Update compatible string to use Andes INTC dt-bindings: riscv: Add Andes interrupt controller compatible string riscv: errata: Rename defines for Andes Link: https://lore.kernel.org/r/20240222083946.3977135-1-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
3b6be8d235
|
Merge patch "riscv: Fix compilation error with FAST_GUP and rv32"
I'm picking this up on top of the broken patch for the merge window, as the offending patch breaks the rv32 build and was itself a fix so isn't on for-next. * b4-shazam-merge: riscv: Fix compilation error with FAST_GUP and rv32 riscv: Fix pte_leaf_size() for NAPOT Revert "riscv: mm: support Svnapot in huge vmap" Link: https://lore.kernel.org/r/20240304080247.387710-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Alexandre Ghiti
|
2bb7e0c493
|
riscv: Fix compilation error with FAST_GUP and rv32
By surrounding the definition of pte_leaf_size() with a ifdef napot as
it should have been.
Fixes:
|
||
Linus Torvalds
|
e5eb28f6d1 |
- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh tKVmvihFxfAhdDthseXcIf1nBjMALwY= =8rVc -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ... |
||
Linus Torvalds
|
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ... |
||
Linus Torvalds
|
a3df5d5422 |
Pin control changes for the v6.9 kernel cycle:
No core changes this time around. New drivers: - New driver for Renesas R8A779H0 also known as R-Car V4M. - New driver for the Awinic AW9523/B I2C GPIO expander. I found this living out-of-tree in OpenWrt as an upstream attempt had stalled on the finishing line, so I picked it up and finished the job. Improvements: - The Nomadik pin control driver was for years re-used out of tree for the ST STA chips, and now the IP was re-used in a MIPS automotive SoC called MobilEyeq5, so it has been split in pin control and GPIO drivers so the latter can be reused by MobilEyeq5. (Along with a long list of cleanups.) - A lot of overall cleanup and tidying up. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmXyJa8ACgkQQRCzN7AZ XXMQQw//aq+HJt4vtX4s9v+g/aYBDoasxOgryz96/yq+CqTBBiYh0ib7MJvy/rFs bBufH4DEvlpjzgsvaN14L6H+3hFDXKeN6VnAs3DPeBkda4bTS3VOpqNzSoOTeqG9 4a1jTe3SHche80SNW1YskTAMT0lG6WqC4RvipQ8K14DqIioX0AmgF7nJ1cddKaTK tXCWkqKY3TW2Z4edi9ZJLL6vRSQ9tJP9AugjkPX3F91yqwGHb8PV7Ggb1EXzC/X0 SW7F7Ke0lIvCYidDUpr04TOKdPTmzl2j6zoZfGU1WbypxG0LVhTKAMVk2DL2/v7T E8NNimVtGJtqxDfCVSbLc6jRgdYaLRa4dLhMCRWmevfEqFfrA79XFY6BS084T3oP 619RqbOtxc21AzlJjy2Kp7v3YpmXi6pX5E23L/d8U+0poEiM5qPqdYZaTbGNcGsV Yg4xCtxtBfPUdE3SqUCbHDzpWdSDGFsT3LSAUionvYyoW4iF8AXg8SBV1QpfzLM8 KTWTmsWG/15lCZyf7tq83EJjjVgYhGL4SMnkE8fT+smiXm95Muj1tzUjgapspMiU VpnhwtKj1hQrButWlN7+HXKseMrJ8zQJszcP5e3QdWOm7vzZGP3nXpgWosqGCImq ThjtyOKvc4+/AqgbIojDCz2BfxD8a7OPOkYHkLGJ96fA5JscBoU= =Pt9Y -----END PGP SIGNATURE----- Merge tag 'pinctrl-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "No core changes this time around. New drivers: - New driver for Renesas R8A779H0 also known as R-Car V4M. - New driver for the Awinic AW9523/B I2C GPIO expander. I found this living out-of-tree in OpenWrt as an upstream attempt had stalled on the finishing line, so I picked it up and finished the job. Improvements: - The Nomadik pin control driver was for years re-used out of tree for the ST STA chips, and now the IP was re-used in a MIPS automotive SoC called MobilEyeq5, so it has been split in pin control and GPIO drivers so the latter can be reused by MobilEyeq5. (Along with a long list of cleanups) - A lot of overall cleanup and tidying up" * tag 'pinctrl-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (87 commits) drivers/gpio/nomadik: move dummy nmk_gpio_dbg_show_one() to header gpio: nomadik: remove BUG_ON() in nmk_gpio_populate_chip() dt-bindings: pinctrl: qcom: update compatible name for match with driver pinctrl: aw9523: Make the driver tristate pinctrl: nomadik: fix dereference of error pointer gpio: nomadik: Back out some managed resources pinctrl: aw9523: Add proper terminator pinctrl: core: comment that pinctrl_add_gpio_range() is deprecated pinctrl: pinmux: Suppress error message for -EPROBE_DEFER pinctrl: Add driver for Awinic AW9523/B I2C GPIO Expander dt-bindings: pinctrl: Add bindings for Awinic AW9523/AW9523B gpio: nomadik: Finish conversion to use firmware node APIs gpio: nomadik: fix Kconfig dependencies inbetween pinctrl & GPIO pinctrl: da9062: Add OF table dt-bindings: pinctrl: at91: add sam9x7 pinctrl: ocelot: remove redundant assignment to variable ret gpio: nomadik: grab optional reset control and deassert it at probe gpio: nomadik: support mobileye,eyeq5-gpio gpio: nomadik: handle variadic GPIO count gpio: nomadik: support shared GPIO IRQs ... |
||
Charlie Jenkins
|
b5b4287acc
|
riscv: mm: Use hint address in mmap if available
On riscv it is guaranteed that the address returned by mmap is less than the hint address. Allow mmap to return an address all the way up to addr, if provided, rather than just up to the lower address space. This provides a performance benefit as well, allowing mmap to exit after checking that the address is in range rather than searching for a valid address. It is possible to provide an address that uses at most the same number of bits, however it is significantly more computationally expensive to provide that number rather than setting the max to be the hint address. There is the instruction clz/clzw in Zbb that returns the highest set bit which could be used to performantly implement this, but it would still be slower than the current implementation. At worst case, half of the address would not be able to be allocated when a hint address is provided. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240130-use_mmap_hint_address-v3-1-8a655cfa8bcb@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Charlie Jenkins
|
f413aae96c
|
riscv: Set unaligned access speed at compile time
Introduce Kconfig options to set the kernel unaligned access support. These options provide a non-portable alternative to the runtime unaligned access probe. To support this, the unaligned access probing code is moved into it's own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT option. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-4-a388770ba0ce@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Charlie Jenkins
|
6e5ce7f2ea
|
riscv: Decouple emulated unaligned accesses from access speed
Detecting if a system traps into the kernel on an unaligned access can be performed separately from checking the speed of unaligned accesses. This decoupling will make it possible to selectively enable or disable each of these checks. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-3-a388770ba0ce@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Charlie Jenkins
|
313130c62c
|
riscv: Only check online cpus for emulated accesses
The unaligned access checker only sets valid values for online cpus.
Check for these values on online cpus rather than on present cpus.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Fixes:
|
||
Charlie Jenkins
|
5a83e7313e
|
riscv: lib: Introduce has_fast_unaligned_access()
Create has_fast_unaligned_access to avoid needing to explicitly check the fast_misaligned_access_speed_key static key. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240308-disable_misaligned_probe_config-v9-1-a388770ba0ce@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Linus Torvalds
|
9187210eee |
Networking changes for 6.9.
Core & protocols ---------------- - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc.) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code -------------------------------------------- - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter --------- - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF --- - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless -------- - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API ---------- - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc ---- - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers ------- - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXv0mgACgkQMUZtbf5S IrtgMxAAuRd+WJW++SENr4KxIWhYO1q6Xcxnai43wrNkan9swD24icG8TYALt4f3 yoT6idQvWReAb5JNlh9rUQz8R7E0nJXlvEFn5MtJwcthx2C6wFo/XkJlddlRrT+j c2xGILwLjRhW65LaC0MZ2ECbEERkFz8xcGfK2SWzUgh6KYvPjcRfKFxugpM7xOQK P/Wnqhs4fVRS/Mj/bCcXcO+yhwC121Q3qVeQVjGS0AzEC65hAW87a/kc2BfgcegD EyI9R7mf6criQwX+0awubjfoIdr4oW/8oDVNvUDczkJkbaEVaLMQk9P5x/0XnnVS UHUchWXyI80Q8Rj12uN1/I0h3WtwNQnCRBuLSmtm6GLfCAwbLvp2nGWDnaXiqryW DVKUIHGvqPKjkOOMOVfSvfB3LvkS3xsFVVYiQBQCn0YSs/gtu4CoF2Nty9CiLPbK tTuxUnLdPDZDxU//l0VArZmP8p2JM7XQGJ+JH8GFH4SBTyBR23e0iyPSoyaxjnYn RReDnHMVsrS1i7GPhbqDJWn+uqMSs7N149i0XmmyeqwQHUVSJN3J2BApP2nCaDfy H2lTuYly5FfEezt61NvCE4qr/VsWeEjm1fYlFQ9dFn4pGn+HghyCpw+xD1ZN56DN lujemau5B3kk1UTtAT4ypPqvuqjkRFqpNV2LzsJSk/Js+hApw8Y= =oY52 -----END PGP SIGNATURE----- Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ... |
||
Linus Torvalds
|
216532e147 |
hardening updates for v6.9-rc1
- string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmXvm5kWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJiQqD/4mM6SWZpYHKlR1nEiqIyz7Hqr9 g4oguuw6HIVNJXLyeBI5Hd43CTeHPA0e++EETqhUAt7HhErxfYJY+JB221nRYmu+ zhhQ7N/xbTMV/Je7AR03kQjhiMm8LyEcM2X4BNrsAcoCieQzmO3g0zSp8ISzLUE0 PEEmf1lOzMe3gK2KOFCPt5Hiz9sGWyN6at+BQubY18tQGtjEXYAQNXkpD5qhGn4a EF693r/17wmc8hvSsjf4AGaWy1k8crG0WfpMCZsaqftjj0BbvOC60IDyx4eFjpcy tGyAJKETq161AkCdNweIh2Q107fG3tm0fcvw2dv8Wt1eQCko6M8dUGCBinQs/thh TexjJFS/XbSz+IvxLqgU+C5qkOP23E0M9m1dbIbOFxJAya/5n16WOBlGr3ae2Wdq /+t8wVSJw3vZiku5emWdFYP1VsdIHUjVa5QizFaaRhzLGRwhxVV49SP4IQC/5oM5 3MAgNOFTP6yRQn9Y9wP+SZs+SsfaIE7yfKa9zOi4S+Ve+LI2v4YFhh8NCRiLkeWZ R1dhp8Pgtuq76f/v0qUaWcuuVeGfJ37M31KOGIhi1sI/3sr7UMrngL8D1+F8UZMi zcLu+x4GtfUZCHl6znx1rNUBqE5S/5ndVhLpOqfCXKaQ+RAm7lkOJ3jXE2VhNkhp yVEmeSOLnlCaQjZvXQ== =OP+o -----END PGP SIGNATURE----- Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ... |
||
Linus Torvalds
|
65d287c7eb |
asm-generic updates for 6.9
Just two small updates this time: - A series I did to unify the definition of PAGE_SIZE through Kconfig, intended to help with a vdso rework that needs the constant but cannot include the normal kernel headers when building the compat VDSO on arm64 and potentially others. - a patch from Yan Zhao to remove the pfn_to_virt() definitions from a couple of architectures after finding they were both incorrect and entirely unused. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXwEjQACgkQYKtH/8kJ UifwHxAAqXl6R4cZtjUKxHpQoX7TTtBgWyZ9OID8KYt8V/QN+Jme6EhuGV/5CJ1k 5n30PuDvSKPB9865HfCZgh0BDSzSFo2xtc/bDuqiPHO5deNhXUDKX5MowIs3Pf2J EM1OJYiXG/g9vR19uaHvWVA4I1eJk01+Pl5nZ3DA+n9ZYcnM35+HO7EQcH80FGwz jkjN1HizxDmuMDDKn24hrSt6mVoE54JWyeDvklbY4CbwZbtFbtBJiFv3NWTfaxSf MPR1fopgaAkT0aJzUXOh36qDodyqR2tz4M7ucpRKa6/YlOewDN59tFwgwtun0s74 lLJPBqQ6cT8no1VODNnKPb1M5Jh3uzsF1fuhnU6B06Z+1s7sxxqOli1Q0yrpivYY SCAh6WmiCMhHeP/sxfQHRhhrx9l0gOarXh7s4wRJFp+LAi59NuUTeJotoOfboX4M ozeFgW1Rlr+wORzUargRnQiXMLObC/RFdogLgiBJwa8XOI8bOPZg9JfAUPOwbfa2 37IFZRleu+V2NaBF8rS5wRGI8hVp99XSMjlskKLM/645doqNq1cyR9UO68jb1hhF d5X2+BEaEJTHJbXEQ9YtThpNWYzHXL5dFswVJfHDs+CW1FWi5GVqCufZGzr7xihy uNLlVqXLhjM+hU2dDoS4ZshygxN3b8f2qa+GtlIMBYrLcbcjxd4= =X4Cs -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Just two small updates this time: - A series I did to unify the definition of PAGE_SIZE through Kconfig, intended to help with a vdso rework that needs the constant but cannot include the normal kernel headers when building the compat VDSO on arm64 and potentially others - a patch from Yan Zhao to remove the pfn_to_virt() definitions from a couple of architectures after finding they were both incorrect and entirely unused" * tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch: simplify architecture specific page size configuration arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc |
||
Linus Torvalds
|
306bee64b7 |
SoC: device tree updates for 6.9
There is very little going on with new SoC support this time, all the new chips are variations of others that we already support, and they are all based on ARMv8 cores: - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are networking SoCs designed to be used in wireless routers, similar to the already supported MT7986A (Filogic 830). - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less. These are used in many embedded and industrial applications. - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M) are automotive SoCs. - TI J722S is another automotive variant of its K3 family, related to the AM62 series. There are a total of 7 new arm32 machines and 45 arm64 ones, including - Two Android phones based on the old Tegra30 chip - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and a SoM development board - A set-top box using Amlogic Meson G12A S905X2 - Eight embedded board using NXP i.MX6/8/9 - Three machines using Mediatek network router chips - Ten Chromebooks, all based on Mediatek MT8186 - One development board based on Mediatek MT8395 (Genio 1200) - Seven tablets and phones based on Qualcomm SoCs, most of them from Samsung. - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2) - Three variants of the "White Hawk" board for Renesas automotive SoCs - Ten Rockchips RK35xx based machines, including NAS, Tablet, Game console and industrial form factors. - Three evaluation boards for TI K3 based SoCs The other changes are mainly the usual feature additions for existing hardware, cleanups, and dtc compile time fixes. One notable change is the inclusion of PowerVR SGX GPU nodes on TI SoCs. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmXvLwQACgkQYKtH/8kJ Uidkhw/+LjDOIqF8f4+6TBCCS3pFAVSAZxKxlm7L4VhsVOOeGZdspOY57eKZJWqW bVqj+B22UjJSw/9LOrFBNApkV8vk+rR7UfJjzijXM34WB80DC8+s7DbenCHagqR8 fsKCB4tHKTYbBk6EefzyWy7fSA1SFu7hpTg5qWK8XONbGdHnkhbj1aQDbUe7p961 huKGM+2spO+bFs3ljHGymBWywFKtuMTmVzoq16mBZl/bnuIKobm7W2kF+n3NAo+h CMta6J9mBlinBT+VtIg2Xax+KvkjmoitevOmyURxp/33+14A64dafI+RLiSyeqb6 DfeAp9ptrBbVGzYZq2r07WYX9AIBdD2hvdkrtrjOy6JPqtJpWdfA4slYzWCzZfOz O08sV3l7ERggpNkMcTWiwBiuB/y5Hci7SYVeQm8N8bp5PydgNpoo6kNVpnc1e6ri Ug8t/jQYvpkCVHT3ld8PmgpWoZRinKIe6PNmqdg5jUu8aH+m4TNNmHyA2IjBcovj 006FBBGVKp4HlCrGz4t9/XsmKzt+cRxLaX06duoZ93FQknXSzs7j7UDkPhpR07kF yEHjETnfhziyONL2fHZ+ejBoK/9psTFtzbpgMreBJ0mFZM0yvL0c+gcMvDgDD8ho PCp2ohDYpKPoklrTqMLKM7Yjev5bTOdrAJeWoLDWCbgkzVDkyjw= =krkR -----END PGP SIGNATURE----- Merge tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC device tree updates from Arnd Bergmann: "There is very little going on with new SoC support this time, all the new chips are variations of others that we already support, and they are all based on ARMv8 cores: - Mediatek MT7981B (Filogic 820) and MT7988A (Filogic 880) are networking SoCs designed to be used in wireless routers, similar to the already supported MT7986A (Filogic 830). - NXP i.MX8DXP is a variant of i.MX8QXP, with two CPU cores less. These are used in many embedded and industrial applications. - Renesas R8A779G2 (R-Car V4H ES2.0) and R8A779H0 (R-Car V4M) are automotive SoCs. - TI J722S is another automotive variant of its K3 family, related to the AM62 series. There are a total of 7 new arm32 machines and 45 arm64 ones, including - Two Android phones based on the old Tegra30 chip - Two machines using Cortex-A53 SoCs from Allwinner, a mini PC and a SoM development board - A set-top box using Amlogic Meson G12A S905X2 - Eight embedded board using NXP i.MX6/8/9 - Three machines using Mediatek network router chips - Ten Chromebooks, all based on Mediatek MT8186 - One development board based on Mediatek MT8395 (Genio 1200) - Seven tablets and phones based on Qualcomm SoCs, most of them from Samsung. - A third development board for Qualcomm SM8550 (Snapdragon 8 Gen 2) - Three variants of the "White Hawk" board for Renesas automotive SoCs - Ten Rockchips RK35xx based machines, including NAS, Tablet, Game console and industrial form factors. - Three evaluation boards for TI K3 based SoCs The other changes are mainly the usual feature additions for existing hardware, cleanups, and dtc compile time fixes. One notable change is the inclusion of PowerVR SGX GPU nodes on TI SoCs" * tag 'soc-dt-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (824 commits) riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig riscv: dts: starfive: jh7100: fix root clock names ARM: dts: samsung: exynos4412: decrease memory to account for unusable region arm64: dts: qcom: sm8250-xiaomi-elish: set rotation arm64: dts: qcom: sm8650: Fix SPMI channels size arm64: dts: qcom: sm8550: Fix SPMI channels size arm64: dts: rockchip: Fix name for UART pin header on qnap-ts433 arm: dts: marvell: clearfog-gtr-l8: align port numbers with enclosure arm: dts: marvell: clearfog-gtr-l8: add support for second sfp connector dt-bindings: soc: renesas: renesas-soc: Add pattern for gray-hawk dtc: Enable dtc interrupt_provider check arm64: dts: st: add video encoder support to stm32mp255 arm64: dts: st: add video decoder support to stm32mp255 ARM: dts: stm32: enable crypto accelerator on stm32mp135f-dk ARM: dts: stm32: enable CRC on stm32mp135f-dk ARM: dts: stm32: add CRC on stm32mp131 ARM: dts: add stm32f769-disco-mb1166-reva09 ARM: dts: stm32: add display support on stm32f769-disco ARM: dts: stm32: rename mmc_vcard to vcc-3v3 on stm32f769-disco ARM: dts: stm32: add DSI support on stm32f769 ... |
||
Yu Chien Peter Lin
|
270fc77e7b
|
riscv: dts: renesas: Add Andes PMU extension for r9a07g043f
xandespmu stands for Andes Performance Monitor Unit extension. Based on the added Andes PMU ISA string, the SBI PMU driver will make use of the non-standard irq source. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240222083946.3977135-10-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Yu Chien Peter Lin
|
bc969d6cc9
|
perf: RISC-V: Introduce Andes PMU to support perf event sampling
Assign riscv_pmu_irq_num the value of (256 + 18) for the custome PMU and add SSCOUNTOVF and SIP alternatives to ALT_SBI_PMU_OVERFLOW() and ALT_SBI_PMU_OVF_CLEAR_PENDING() macros, respectively. To make use of Andes PMU extension, "xandespmu" needs to be appended to the riscv,isa-extensions for each cpu node in device-tree, and make sure CONFIG_ANDES_CUSTOM_PMU is enabled. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Charles Ci-Jyun Wu <dminus@andestech.com> Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com> Co-developed-by: Locus Wei-Han Chen <locus84@andestech.com> Signed-off-by: Locus Wei-Han Chen <locus84@andestech.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240222083946.3977135-8-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Yu Chien Peter Lin
|
95113bb705
|
riscv: dts: renesas: r9a07g043f: Update compatible string to use Andes INTC
The Andes hart-level interrupt controller (Andes INTC) allows AX45MP cores to handle custom local interrupts, such as the performance counter overflow interrupt. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240222083946.3977135-6-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Yu Chien Peter Lin
|
be5e8872b3
|
riscv: errata: Rename defines for Andes
Use "ANDES" rather than "ANDESTECH" to unify the naming convention with directory, file names, Kconfig options and other definitions. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Charles Ci-Jyun Wu <dminus@andestech.com> Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240222083946.3977135-2-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Linus Torvalds
|
fcc196579a |
Misc cleanups, including a large series from Thomas Gleixner to
cure Sparse warnings. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmXvAFQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hkDRAAwASVCQ88kiGqNQtHibXlK54mAFGsc0xv T8OPds15DUzoLg/y8lw0X0DHly6MdGXVmygybejNIw2BN4lhLjQ7f4Ria7rv7LDy FcI1jfvysEMyYRFHGRefb/GBFzuEfKoROwf+QylGmKz0ZK674gNMngsI9pwOBdbe wElq3IkHoNuTUfH9QA4BvqGam1n122nvVTop3g0PMHWzx9ky8hd/BEUjXFZhfINL zZk3fwUbER2QYbhHt+BN2GRbdf2BrKvqTkXpKxyXTdnpiqAo0CzBGKerZ62H82qG n737Nib1lrsfM5yDHySnau02aamRXaGvCJUd6gpac1ZmNpZMWhEOT/0Tr/Nj5ztF lUAvKqMZn/CwwQky1/XxD0LHegnve0G+syqQt/7x7o1ELdiwTzOWMCx016UeodzB yyHf3Xx9J8nt3snlrlZBaGEfegg9ePLu5Vir7iXjg3vrloUW8A+GZM62NVxF4HVV QWF80BfWf8zbLQ/OS1382t1shaioIe5pEXzIjcnyVIZCiiP2/5kP2O6P4XVbwVlo Ca5eEt8U1rtsLUZaCzI2ZRTQf/8SLMQWyaV+ZmkVwcVdFoARC31EgdE5wYYoZOf6 7Vl+rXd+rZCuTWk0ZgznCZEm75aaqukaQCBa2V8hIVociLFVzhg/Tjedv7s0CspA hNfxdN1LDZc= =0eJ7 -----END PGP SIGNATURE----- Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups, including a large series from Thomas Gleixner to cure sparse warnings" * tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Drop unused declaration of proc_nmi_enabled() x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address() x86/percpu: Cure per CPU madness on UP smp: Consolidate smp_prepare_boot_cpu() x86/msr: Add missing __percpu annotations x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h> perf/x86/amd/uncore: Fix __percpu annotation x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP) x86/apm_32: Remove dead function apm_get_battery_status() x86/insn-eval: Fix function param name in get_eff_addr_sib() |
||
Jakub Kicinski
|
5f20e6ab1f |
for-netdev
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmXvm7IACgkQ6rmadz2v bTqdMA//VMHNHVLb4oROoXyQD9fw2mCmIUEKzP88RXfqcxsfEX7HF+k8B5ZTk0ro CHXTAnc79+Qqg0j24bkQKxup/fKBQVw9D+Ia4b3ytlm1I2MtyU/16xNEzVhAPU2D iKk6mVBsEdCbt/GjpWORy/VVnZlZpC7BOpZLxsbbxgXOndnCegyjXzSnLGJGxdvi zkrQTn2SrFzLi6aNpVLqrv6Nks6HJusfCKsIrtlbkQ85dulasHOtwK9s6GF60nte aaho+MPx3L+lWEgapsm8rR779pHaYIB/GbZUgEPxE/xUJ/V8BzDgFNLMzEiIBRMN a0zZam11BkBzCfcO9gkvDRByaei/dZz2jdqfU4GlHklFj1WFfz8Q7fRLEPINksvj WXLgJADGY5mtGbjG21FScThxzj+Ruqwx0a13ddlyI/W+P3y5yzSWsLwJG5F9p0oU 6nlkJ4U8yg+9E1ie5ae0TibqvRJzXPjfOERZGwYDSVvfQGzv1z+DGSOPMmgNcWYM dIaO+A/+NS3zdbk8+1PP2SBbhHPk6kWyCUByWc7wMzCPTiwriFGY/DD2sN+Fsufo zorzfikUQOlTfzzD5jbmT49U8hUQUf6QIWsu7BijSiHaaC7am4S8QB2O6ibJMqdv yNiwvuX+ThgVIY3QKrLLqL0KPGeKMR5mtfq6rrwSpfp/b4g27FE= =eFgA -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2024-03-11 We've added 59 non-merge commits during the last 9 day(s) which contain a total of 88 files changed, 4181 insertions(+), 590 deletions(-). The main changes are: 1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena, from Alexei. 2) Introduce bpf_arena which is sparse shared memory region between bpf program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and bpf programs, from Alexei and Andrii. 3) Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it, from Alexei. 4) Use IETF format for field definitions in the BPF standard document, from Dave. 5) Extend struct_ops libbpf APIs to allow specify version suffixes for stuct_ops map types, share the same BPF program between several map definitions, and other improvements, from Eduard. 6) Enable struct_ops support for more than one page in trampolines, from Kui-Feng. 7) Support kCFI + BPF on riscv64, from Puranjay. 8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay. 9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke. ==================== Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
d08c407f71 |
A large set of updates and features for timers and timekeeping:
- The hierarchical timer pull model When timer wheel timers are armed they are placed into the timer wheel of a CPU which is likely to be busy at the time of expiry. This is done to avoid wakeups on potentially idle CPUs. This is wrong in several aspects: 1) The heuristics to select the target CPU are wrong by definition as the chance to get the prediction right is close to zero. 2) Due to #1 it is possible that timers are accumulated on a single target CPU 3) The required computation in the enqueue path is just overhead for dubious value especially under the consideration that the vast majority of timer wheel timers are either canceled or rearmed before they expire. The timer pull model avoids the above by removing the target computation on enqueue and queueing timers always on the CPU on which they get armed. This is achieved by having separate wheels for CPU pinned timers and global timers which do not care about where they expire. As long as a CPU is busy it handles both the pinned and the global timers which are queued on the CPU local timer wheels. When a CPU goes idle it evaluates its own timer wheels: - If the first expiring timer is a pinned timer, then the global timers can be ignored as the CPU will wake up before they expire. - If the first expiring timer is a global timer, then the expiry time is propagated into the timer pull hierarchy and the CPU makes sure to wake up for the first pinned timer. The timer pull hierarchy organizes CPUs in groups of eight at the lowest level and at the next levels groups of eight groups up to the point where no further aggregation of groups is required, i.e. the number of levels is log8(NR_CPUS). The magic number of eight has been established by experimention, but can be adjusted if needed. In each group one busy CPU acts as the migrator. It's only one CPU to avoid lock contention on remote timer wheels. The migrator CPU checks in its own timer wheel handling whether there are other CPUs in the group which have gone idle and have global timers to expire. If there are global timers to expire, the migrator locks the remote CPU timer wheel and handles the expiry. Depending on the group level in the hierarchy this handling can require to walk the hierarchy downwards to the CPU level. Special care is taken when the last CPU goes idle. At this point the CPU is the systemwide migrator at the top of the hierarchy and it therefore cannot delegate to the hierarchy. It needs to arm its own timer device to expire either at the first expiring timer in the hierarchy or at the first CPU local timer, which ever expires first. This completely removes the overhead from the enqueue path, which is e.g. for networking a true hotpath and trades it for a slightly more complex idle path. This has been in development for a couple of years and the final series has been extensively tested by various teams from silicon vendors and ran through extensive CI. There have been slight performance improvements observed on network centric workloads and an Intel team confirmed that this allows them to power down a die completely on a mult-die socket for the first time in a mostly idle scenario. There is only one outstanding ~1.5% regression on a specific overloaded netperf test which is currently investigated, but the rest is either positive or neutral performance wise and positive on the power management side. - Fixes for the timekeeping interpolation code for cross-timestamps: cross-timestamps are used for PTP to get snapshots from hardware timers and interpolated them back to clock MONOTONIC. The changes address a few corner cases in the interpolation code which got the math and logic wrong. - Simplifcation of the clocksource watchdog retry logic to automatically adjust to handle larger systems correctly instead of having more incomprehensible command line parameters. - Treewide consolidation of the VDSO data structures. - The usual small improvements and cleanups all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmXuAN0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVKXEADIR45rjR1Xtz32js7B53Y65O4WNoOQ 6/ycWcswuGzg/h4QUpPSJ6gOGVmKSWwZi4n0P/VadCiXGSPPm0aUKsoRUt9DZsPY mtj2wjCSXKXiyhTl9OtrZME86ZAIGO1dQXa/sOHsiP5PCjgQkD0b5CYi1+B6eHDt 1/Uo2Tb9g8VAPppq20V5Uo93GrPf642oyi3FCFrR1M112Uuak5DmqHJYiDpreNcG D5SgI+ykSiaUaVyHifvqijoJk0rYXkqEC6evl02477lJ/X0vVo2/M8XPS95BxHST s5Iruo4rP+qeAy8QvhZpoPX59fO0m/AgA7cf77XXAtOpVdLH+bs4ILsEbouAIOtv lsmRkcYt+TpvrZFHPAxks+6g3afuROiDtxD5sXXpVWxvofi8FwWqubdlqdsbw9MP ZCTNyzNyKL47QeDwBfSynYUL1RSyqsphtIwk4oeQklH9rwMAnW21hi30z15hQ0pQ FOVkmcwi79JNvl/G+jRkDzw7r8/zcHshWdSjyUM04CDjjnCDjQOFWSIjEPwbQjjz S4HXpJKJW963dBgs9Z84/Ctw1GwoBk1qedDWDJE1257Qvmo/Wpe/7GddWcazOGnN RRFMzGPbOqBDbjtErOKGU+iCisgNEvz2XK+TI16uRjWde7DxZpiTVYgNDrZ+/Pyh rQ23UBms6ZRR+A== =iQlu -----END PGP SIGNATURE----- Merge tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A large set of updates and features for timers and timekeeping: - The hierarchical timer pull model When timer wheel timers are armed they are placed into the timer wheel of a CPU which is likely to be busy at the time of expiry. This is done to avoid wakeups on potentially idle CPUs. This is wrong in several aspects: 1) The heuristics to select the target CPU are wrong by definition as the chance to get the prediction right is close to zero. 2) Due to #1 it is possible that timers are accumulated on a single target CPU 3) The required computation in the enqueue path is just overhead for dubious value especially under the consideration that the vast majority of timer wheel timers are either canceled or rearmed before they expire. The timer pull model avoids the above by removing the target computation on enqueue and queueing timers always on the CPU on which they get armed. This is achieved by having separate wheels for CPU pinned timers and global timers which do not care about where they expire. As long as a CPU is busy it handles both the pinned and the global timers which are queued on the CPU local timer wheels. When a CPU goes idle it evaluates its own timer wheels: - If the first expiring timer is a pinned timer, then the global timers can be ignored as the CPU will wake up before they expire. - If the first expiring timer is a global timer, then the expiry time is propagated into the timer pull hierarchy and the CPU makes sure to wake up for the first pinned timer. The timer pull hierarchy organizes CPUs in groups of eight at the lowest level and at the next levels groups of eight groups up to the point where no further aggregation of groups is required, i.e. the number of levels is log8(NR_CPUS). The magic number of eight has been established by experimention, but can be adjusted if needed. In each group one busy CPU acts as the migrator. It's only one CPU to avoid lock contention on remote timer wheels. The migrator CPU checks in its own timer wheel handling whether there are other CPUs in the group which have gone idle and have global timers to expire. If there are global timers to expire, the migrator locks the remote CPU timer wheel and handles the expiry. Depending on the group level in the hierarchy this handling can require to walk the hierarchy downwards to the CPU level. Special care is taken when the last CPU goes idle. At this point the CPU is the systemwide migrator at the top of the hierarchy and it therefore cannot delegate to the hierarchy. It needs to arm its own timer device to expire either at the first expiring timer in the hierarchy or at the first CPU local timer, which ever expires first. This completely removes the overhead from the enqueue path, which is e.g. for networking a true hotpath and trades it for a slightly more complex idle path. This has been in development for a couple of years and the final series has been extensively tested by various teams from silicon vendors and ran through extensive CI. There have been slight performance improvements observed on network centric workloads and an Intel team confirmed that this allows them to power down a die completely on a mult-die socket for the first time in a mostly idle scenario. There is only one outstanding ~1.5% regression on a specific overloaded netperf test which is currently investigated, but the rest is either positive or neutral performance wise and positive on the power management side. - Fixes for the timekeeping interpolation code for cross-timestamps: cross-timestamps are used for PTP to get snapshots from hardware timers and interpolated them back to clock MONOTONIC. The changes address a few corner cases in the interpolation code which got the math and logic wrong. - Simplifcation of the clocksource watchdog retry logic to automatically adjust to handle larger systems correctly instead of having more incomprehensible command line parameters. - Treewide consolidation of the VDSO data structures. - The usual small improvements and cleanups all over the place" * tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) timer/migration: Fix quick check reporting late expiry tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n vdso/datapage: Quick fix - use asm/page-def.h for ARM64 timers: Assert no next dyntick timer look-up while CPU is offline tick: Assume timekeeping is correctly handed over upon last offline idle call tick: Shut down low-res tick from dying CPU tick: Split nohz and highres features from nohz_mode tick: Move individual bit features to debuggable mask accesses tick: Move got_idle_tick away from common flags tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING tick: Move tick cancellation up to CPUHP_AP_TICK_DYING tick: Start centralizing tick related CPU hotplug operations tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick() tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick() tick: Use IS_ENABLED() whenever possible tick/sched: Remove useless oneshot ifdeffery tick/nohz: Remove duplicate between lowres and highres handlers tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer() hrtimer: Select housekeeping CPU during migration ... |
||
Paolo Bonzini
|
f074158a0d |
KVM/riscv changes for 6.9
- Exception and interrupt handling for selftests - Sstc (aka arch_timer) selftest - Forward seed CSR access to KVM userspace - Ztso extension support for Guest/VM - Zacas extension support for Guest/VM -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmXpXfMACgkQrUjsVaLH LAfZoRAAgIJqgL9jMQ2dliTq8sk24dQQUPo1V2yP7yEYLeUyp/9EAR31ZhBDP1EE yTT0yvEbyn21fxhdHagVnqJXOepHz1MPmGGxEFx96RVib/m80zDhyDKogW6IgP4c e9yXW1Wo6m0R+7aQhn8YA9+47xbmq/cNDCwjlkIp0oL/SyktJdTcPjZUTr524jde dYTGLqSyoQyZMm+wBrJYTiME6nFK3RhKf7V9Mn77VTTFnhIk4J2upl+1kE2pLTAQ Zp3EwXCK1B2D2J1bdOuqclCApglw3H0CM4c81knDaEyB4w/l/OwpuCA+u58tSU9g z6DTO+vYvwlROmfeFjjmHKx1pl9uSktYlFlVqinelW7IG7Y3qDD1zPnbT27OpkLP rFIsF4Dm42MnmcC0sTxaMgKQtMvb56lgpaoa9XHL/DD76pAUvgKoWUnWay+32j1e 8Hhx/PEp16ALGvDfm+9Wo8AgbvrFGl37epe2LXFFr6+zOzmGN+6vATW2EqgJ3Ueo P5TNkcFSyKg61r0moEr/ZSKZNvv8MuVUKSe0EmPykIccqg6oYHAoAJ66FL5yaX1k n/aIVNnIavzwlN6DeaQyjAxZxfSg39Z7wOH5PyKmateYKg9yM8P3RY/DndiO6F3c me9Q0eshu+C7h4YBRXPVstdWkkhCSVa+TN1b/6hnlSP0HLtyHik= =cWpl -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.9-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.9 - Exception and interrupt handling for selftests - Sstc (aka arch_timer) selftest - Forward seed CSR access to KVM userspace - Ztso extension support for Guest/VM - Zacas extension support for Guest/VM |