Commit Graph

2341 Commits

Author SHA1 Message Date
Linus Torvalds
3a93e40326 RISC-V:
* Fix VM hang in case of timer delta being zero
 
 ARM:
 
 * Fixes for the MMU:
 
   * Read the MMU notifier seq before dropping the mmap lock to guard
     against reading a potentially stale VMA
 
   * Disable interrupts when walking user page tables to protect against
     the page table being freed
 
   * Read the MTE permissions for the VMA within the mmap lock critical
     section, avoiding the use of a potentally stale VMA pointer
 
 * Fixes for the vPMU:
 
   * Return the sum of the current perf event value and PMC snapshot for
     reads from userspace
 
   * Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
     otherwise lead to userspace erroneously resetting the vPMU during VM
     save/restore
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmQh12wUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMPyAgAhe9P5efFlkg/N/BUoxCdiQjhTEGO
 BbNtZFbSgdqsTA0No5sjSsq8+dYuj4IpYHxvTRNXffxpn/6x736x9ff6fH3QwY0P
 n65q37m3XrBzbaUv2XlX6l4GJC/KHWpXsawVszmnQpw8wYUYglo9JWnWVvpo/vLV
 byWXcG9Rt0MQVnV13bYtjnpdNYCdVuhZuvjDBvbBa2I9BRKShvWyzcRcc48K1RNq
 UI5PCEb8c2NUOB/6b9lRkEaNYLssVLOxpI3abGJP+IVSl/WE+hgCbFpz7ZkYwt79
 AYm4wnZTSKBMQb5P2IOlgkfasvEyFiZ4Jj7x3RLXcXSWIB6g0J20iNasYg==
 =qX9k
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "RISC-V:

   - Fix VM hang in case of timer delta being zero

 ARM:

   - MMU fixes:

      - Read the MMU notifier seq before dropping the mmap lock to guard
        against reading a potentially stale VMA

      - Disable interrupts when walking user page tables to protect
        against the page table being freed

      - Read the MTE permissions for the VMA within the mmap lock
        critical section, avoiding the use of a potentally stale VMA
        pointer

   - vPMU fixes:

      - Return the sum of the current perf event value and PMC snapshot
        for reads from userspace

      - Don't save the value of guest writes to PMCR_EL0.{C,P}, which
        could otherwise lead to userspace erroneously resetting the vPMU
        during VM save/restore"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  riscv/kvm: Fix VM hang in case of timer delta being zero.
  KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
  KVM: arm64: Disable interrupts while walking userspace PTs
  KVM: arm64: Retry fault if vma_lookup() results become invalid
  KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
  KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
2023-03-27 12:22:45 -07:00
Linus Torvalds
19a6b66ca5 RISC-V Fixes for 6.3-rc4
* A fix to match the CSR ASID masking rules when passing ASIDs to
   firmware.
 * Force GCC to use ISA 2.2, to avoid a host of compatibily issues
   between toolchains.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQdtzITHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiV/cD/oDpfixwripxGxnVXaRcQiNbQLrrY3U
 Ft2paTcBBlD0pOIRi9NlMWS0GTxYUqOvTrxZmgMUu6yq7lZ/CegOKjhwqeCnzAV2
 c0O1ZNQ5ljSJHflDQEeu9cM6VGpb9UWozWOKwS23MLnObGN1DvzhQ9zLR9vKd5bh
 ErqSAuFpeWS53UvXtrbAR72U5Hdy+hCz9syYQkubpjCxvKl9yA9n1c0eevcfjO/6
 epzABtW+UVRfm7zHRFEarqy9rkLn9e35Oo1nn5+owwglCtCVAikIXQm/eGZcvV2e
 al2DoMXazSK92hbP/+R3MNdUgJ0PPv9CDpL++5jYtolmYszj1qPff8O2lhaPzTH/
 947aVNdlmfeBpbCwNACTQHAQGw6KEta6a9bnr/FXF4ruSJTy/YTq0aa611MsJ532
 jCcGB76HHefk0WHsOFaZCG75TWsacM1uohwZ6wZ73x2xl+2BunWKGKnac7wb0hnQ
 JDikhiEOA6J2hAltRutQTHcKoKCdFOzGWpYmh+86AefMOzUbW5UKVEsT/cTi0nBb
 03m9M5FiHgDCFD9HJPpJwDYIuntbbmzXVHGZx5afgxCsfnE33Pi3vPI4GOZw9eY1
 HBKapZM2jfeC9ptNkePlA8Il51DtAwBi+McL9DpfbcSNLJRRaBJMedx8uZU92CWY
 fkbSwdhhEob1rQ==
 =13qD
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to match the CSR ASID masking rules when passing ASIDs to
   firmware

 - Force GCC to use ISA 2.2, to avoid a host of compatibily issues
   between toolchains

* tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Handle zicsr/zifencei issues between clang and binutils
  riscv: mm: Fix incorrect ASID argument when flushing TLB
2023-03-24 09:52:26 -07:00
Nathan Chancellor
e89c2e815e
riscv: Handle zicsr/zifencei issues between clang and binutils
There are two related issues that appear in certain combinations with
clang and GNU binutils.

The first occurs when a version of clang that supports zicsr or zifencei
via '-march=' [1] (i.e, >= 17.x) is used in combination with a version
of GNU binutils that do not recognize zicsr and zifencei in the
'-march=' value (i.e., < 2.36):

  riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
  riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o
  riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
  riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o

The second occurs when a version of clang that does not support zicsr or
zifencei via '-march=' (i.e., <= 16.x) is used in combination with a
version of GNU as that defaults to a newer ISA base spec, which requires
specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >=
2.38):

  ../arch/riscv/kernel/kexec_relocate.S: Assembler messages:
  ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required
  clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)

This is the same issue addressed by commit 6df2a016c0 ("riscv: fix
build with binutils 2.38") (see [2] for additional information) but
older versions of clang miss out on it because the cc-option check
fails:

  clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
  clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'

To resolve the first issue, only attempt to add zicsr and zifencei to
the march string when using the GNU assembler 2.38 or newer, which is
when the default ISA spec was updated, requiring these extensions to be
specified explicitly. LLVM implements an older version of the base
specification for all currently released versions, so these instructions
are available as part of the 'i' extension. If LLVM's implementation is
updated in the future, a CONFIG_AS_IS_LLVM condition can be added to
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI.

To resolve the second issue, use version 2.2 of the base ISA spec when
using an older version of clang that does not support zicsr or zifencei
via '-march=', as that is the spec version most compatible with the one
clang/LLVM implements and avoids the need to specify zicsr and zifencei
explicitly due to still being a part of 'i'.

[1]: 22e199e6af
[2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/

Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1808
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 12:52:49 -07:00
Dylan Jhong
9a801afd3e
riscv: mm: Fix incorrect ASID argument when flushing TLB
Currently, we pass the CONTEXTID instead of the ASID to the TLB flush
function. We should only take the ASID field to prevent from touching
the reserved bit field.

Fixes: 3f1e782998 ("riscv: add ASID-based tlbflushing methods")
Signed-off-by: Dylan Jhong <dylan@andestech.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Link: https://lore.kernel.org/r/20230313034906.2401730-1-dylan@andestech.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-21 15:55:19 -07:00
Linus Torvalds
cb80b960ce RISC-V Fixes for 6.3-rc3
* A pair of fixes to the ASID allocator to avoid leaking stale mappings
   between tasks.
 * A fix to the vmalloc fault handler to tolerate huge pages.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQUf1cTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiafvD/4ixaHUMYFBBsw0Vo2kXaILmBYNZOmz
 KoAHykqlg4TRZ0xtOK/iLcSsiDXVVbI91iBeKjrwOiJ2+Sk4gDm01JMhOK6eJh4I
 boQAoRNgUBJLiKp7ZlybJ3R8yXw4VkKK0lJKNd9zOko+76Z8cQitsiwliWQwnpJw
 jtKpzYZ8Plxki+0jUt7/21FUF0sy1UspgFTQdV6XfBGtIqVuVNgRLK4emjrKxl7s
 fpkvQfD9ZPCuCNqg42o9VULK8fQfQSi5jt9POrGVKg7EaEHb7NfxttWxu/VkMBoI
 cTa9zNSM4DYfmubOTqPoE4MxxmY294vii2JnoimQPDWlT9gGRD5Puf/rmm420cUE
 yhsl4HdurDBRw3608pIfXWl9pTBo/doFImrQfY/IuGlR6Jy632NFFdPXa0vA/RoM
 JBpAVJrUGRRo6w5B+GM5XVpxQNiBtMtGSVYNG2185Gtszlw6CebG31Da39kBPr2O
 G/QFTVaZJnlHVqEJwOm/7TuYM/8u+uT6eiuYiRBcHImOIleUJPGYnDfG+dav3nln
 E4DXBref4ikAZX794rEQnB6Ayt3Hl1E5lZ9HA+sezMNwv2zhT9rYAgF+oM8/A6FV
 3JxcBmkNj3lqKzwNK85YOHE7us/5u+PY7HPrUngC7iORvh2wSh+AVfiu7mXdhrWD
 e6NwgE4EoZOgqw==
 =A1sl
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - fixes to the ASID allocator to avoid leaking stale mappings between
   tasks

 - fix the vmalloc fault handler to tolerate huge pages

* tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: mm: Support huge page in vmalloc_fault()
  riscv: asid: Fixup stale TLB entry cause application crash
  Revert "riscv: mm: notify remote harts about mmu cache updates"
2023-03-17 10:33:33 -07:00
Rajnesh Kanwal
6eff380489 riscv/kvm: Fix VM hang in case of timer delta being zero.
In case when VCPU is blocked due to WFI, we schedule the timer
from `kvm_riscv_vcpu_timer_blocking()` to keep timer interrupt
ticking.

But in case when delta_ns comes to be zero, we never schedule
the timer and VCPU keeps sleeping indefinitely until any activity
is done with VM console.

This is easily reproduce-able using kvmtool.
./lkvm-static run -c1 --console virtio -p "earlycon root=/dev/vda" \
         -k ./Image -d rootfs.ext4

Also, just add a print in kvm_riscv_vcpu_vstimer_expired() to
check the interrupt delivery and run `top` or similar auto-upating
cmd from guest. Within sometime one can notice that print from
timer expiry routine stops and the `top` cmd output will stop
updating.

This change fixes this by making sure we schedule the timer even
with delta_ns being zero to bring the VCPU out of sleep immediately.

Fixes: 8f5cb44b1b ("RISC-V: KVM: Support sstc extension")
Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-03-17 13:32:54 +05:30
Dylan Jhong
47dd902aae
RISC-V: mm: Support huge page in vmalloc_fault()
Since RISC-V supports ioremap() with huge page (pud/pmd) mapping,
However, vmalloc_fault() assumes that the vmalloc range is limited
to pte mappings. To complete the vmalloc_fault() function by adding
huge page support.

Fixes: 310f541a02 ("riscv: Enable HAVE_ARCH_HUGE_VMAP for 64BIT")
Cc: stable@vger.kernel.org
Signed-off-by: Dylan Jhong <dylan@andestech.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230310075021.3919290-1-dylan@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 19:15:34 -07:00
Linus Torvalds
55a21105ec RISC-V Fixes for 6.3-rc2
* RISC-V architecture-specific ELF attributes have been disabled in the
   kernel builds.
 * A fix for a locking failure while during errata patching that
   manifests on SiFive-based systems.
 * A fix for a KASAN failure during stack unwinding.
 * A fix for some lockdep failures during text patching.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQLXKsTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiTd6D/9QxHDNOEiyrY7bSLkbeLiRvASXyMyx
 /oKT8fmfsaikJngM/ZttQOiCEqPUDdeFfw0Mg3V9WxvrUsRtKDtfRN+pTUirFKpB
 yWMAHuQ69QaWUJqaxfRa6JhyjgbnOCp3t/8RM/TywU3O4QG2MWbPg8QENnxtDAUN
 X4zNis6xx19PCT2508irZNGysgIqlIRcDRJ/x85LQXIQBRREAnNTojgepwUxfDeZ
 E4iPjlPU/i0eFeqpZkd3vZjjpkAZ91VfqCqmJTKv7JGKg0xeKM4z+EGlhNiS8odF
 W9YS6Sf1wCa5smhq4vTF4PGEpineK+KrdJ5lPZXLKbbNtv/c1YNVSXBdi88DsWEn
 jYnptL5mKWXAAIdOKdP50LjmiwDQP3BCqR+Ck9HYvPEST65QRwYC2pvReszsX5Bu
 guiNSFuh0eEszu6VqJCFjrPKxLGODi0Ug2XeJd5NQ2sd7OkkrUP0wuJaYbn+xnUb
 RWJNZY5jpKf+euPuJd5VgPiiOiOE+7gfG0X3bOB27f2OJPS4BFT8Z5NTzIx3qUet
 I7hwuXHhNCCl+loczljNnwDZO1g4ktAvOjrRm42MZ2ERyAAvso9UiaI+zdciATP7
 UgWvxybQc8oe+XAPpsnr5UwBl3Hy9o1ELXBY1avV8b5XQy7tAIP5eQW4R7nNYxM7
 DvNBcO8rOEFx0w==
 =goS0
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - RISC-V architecture-specific ELF attributes have been disabled in the
   kernel builds

 - A fix for a locking failure while during errata patching that
   manifests on SiFive-based systems

 - A fix for a KASAN failure during stack unwinding

 - A fix for some lockdep failures during text patching

* tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Don't check text_mutex during stop_machine
  riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
  RISC-V: fix taking the text_mutex twice during sifive errata patching
  RISC-V: Stop emitting attributes
2023-03-10 09:19:30 -08:00
Palmer Dabbelt
9b7fef255c
Merge patch series "riscv: asid: switch to alternative way to fix stale TLB entries"
Sergey Matyukevich <geomatsi@gmail.com> says:

Some time ago two different patches have been posted to fix stale TLB
entries that caused applications crashes.

The patch [0] suggested 'aggregating' mm_cpumask, i.e. current cpu is not
cleared for the switched-out task in switch_mm function. For additional
explanations see the commit message by Guo Ren. The same approach is
used by arc architecture, so another good comment is for switch_mm
in arch/arc/include/asm/mmu_context.h.

The patch [1] attempted to reduce the number of TLB flushes by deferring
(and possibly avoiding) them for CPUs not running the task.

Patch [1] has been merged. However we already have two bug reports from
different vendors. So apparently something is missing in the approach
suggested in [1]. In both cases the patch [0] fixed the issue.

This patch series reverts [1] and replaces it by [0].

[0] https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/

* b4-shazam-merge:
  riscv: asid: Fixup stale TLB entry cause application crash
  Revert "riscv: mm: notify remote harts about mmu cache updates"

Link: https://lore.kernel.org/r/20230226150137.1919750-1-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 15:22:05 -08:00
Guo Ren
82dd33fde0
riscv: asid: Fixup stale TLB entry cause application crash
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.

Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
   0x0000003fd6d22524 <+4>:     auipc   s0,0x70
   0x0000003fd6d22528 <+8>:     ld      s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>:    ld      a5,0(s0)
(gdb) i r s0
s0          0x8082ed1cc3198b21       0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490:   0xd80ac8a8      0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.

When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.

   CPU0                               CPU1
   - switch 'task' in
   - read addr (Fill stale mapping
     entry into TLB)
   - switch 'task' out (no tlb_flush)
                                      - switch 'task' in (no tlb_flush)
                                      - write addr cause pagefault
                                        do_page_fault() (change to
                                        new addr mapping)
                                          wp_page_copy()
                                            ptep_clear_flush()
                                              ptep_get_and_clear()
                                              & flush_tlb_page()
                                        write new value into addr
                                      - switch 'task' out (no tlb_flush)
   - switch 'task' in (no tlb_flush)
   - read addr again (Use stale
     mapping entry in TLB)
     get wrong value from old phyical
     addr, BUG!

The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.

Fixes: 65d4b9c530 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Zong Li <zong.li@sifive.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230226150137.1919750-3-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 15:22:02 -08:00
Sergey Matyukevich
e921050022
Revert "riscv: mm: notify remote harts about mmu cache updates"
This reverts the remaining bits of commit 4bd1d80efb ("riscv: mm:
notify remote harts harts about mmu cache updates").

According to bug reports, suggested approach to fix stale TLB entries
is not sufficient. It needs to be replaced by a more robust solution.

Fixes: 4bd1d80efb ("riscv: mm: notify remote harts about mmu cache updates")
Reported-by: Zong Li <zong.li@sifive.com>
Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: stable@vger.kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230226150137.1919750-2-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 15:22:01 -08:00
Conor Dooley
2a8db5ec4a
RISC-V: Don't check text_mutex during stop_machine
We're currently using stop_machine() to update ftrace & kprobes, which
means that the thread that takes text_mutex during may not be the same
as the thread that eventually patches the code.  This isn't actually a
race because the lock is still held (preventing any other concurrent
accesses) and there is only one thread running during stop_machine(),
but it does trigger a lockdep failure.

This patch just elides the lockdep check during stop_machine.

Fixes: c15ac4fd60 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 14:58:51 -08:00
Alexandre Ghiti
76950340cf
riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
When CONFIG_FRAME_POINTER is unset, the stack unwinding function
walk_stackframe randomly reads the stack and then, when KASAN is enabled,
it can lead to the following backtrace:

[    0.000000] ==================================================================
[    0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a
[    0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] Call Trace:
[    0.000000] [<ffffffff80007ba8>] walk_stackframe+0x0/0x11a
[    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff80c49c80>] dump_stack_lvl+0x22/0x36
[    0.000000] [<ffffffff80c3783e>] print_report+0x198/0x4a8
[    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff8015f68a>] kasan_report+0x9a/0xc8
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff8006e99c>] desc_make_final+0x80/0x84
[    0.000000] [<ffffffff8009a04e>] stack_trace_save+0x88/0xa6
[    0.000000] [<ffffffff80099fc2>] filter_irq_stacks+0x72/0x76
[    0.000000] [<ffffffff8006b95e>] devkmsg_read+0x32a/0x32e
[    0.000000] [<ffffffff8015ec16>] kasan_save_stack+0x28/0x52
[    0.000000] [<ffffffff8006e998>] desc_make_final+0x7c/0x84
[    0.000000] [<ffffffff8009a04a>] stack_trace_save+0x84/0xa6
[    0.000000] [<ffffffff8015ec52>] kasan_set_track+0x12/0x20
[    0.000000] [<ffffffff8015f22e>] __kasan_slab_alloc+0x58/0x5e
[    0.000000] [<ffffffff8015e7ea>] __kmem_cache_create+0x21e/0x39a
[    0.000000] [<ffffffff80e133ac>] create_boot_cache+0x70/0x9c
[    0.000000] [<ffffffff80e17ab2>] kmem_cache_init+0x6c/0x11e
[    0.000000] [<ffffffff80e00fd6>] mm_init+0xd8/0xfe
[    0.000000] [<ffffffff80e011d8>] start_kernel+0x190/0x3ca
[    0.000000]
[    0.000000] The buggy address belongs to stack of task swapper/0
[    0.000000]  and is located at offset 0 in frame:
[    0.000000]  stack_trace_save+0x0/0xa6
[    0.000000]
[    0.000000] This frame has 1 object:
[    0.000000]  [32, 56) 'c'
[    0.000000]
[    0.000000] The buggy address belongs to the physical page:
[    0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07
[    0.000000] flags: 0x1000(reserved|zone=0)
[    0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000
[    0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff
[    0.000000] page dumped because: kasan: bad access detected
[    0.000000]
[    0.000000] Memory state around the buggy address:
[    0.000000]  ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3
[    0.000000]                                            ^
[    0.000000]  ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] ==================================================================

Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise
mode.

Fixes: 5d8544e2d0 ("RISC-V: Generic library routines and assembly")
Reported-by: Chathura Rajapaksha <chathura.abeyrathne.lk@gmail.com>
Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230308091639.602024-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 14:50:35 -08:00
Linus Torvalds
44889ba56c Networking fixes for 6.3-rc2, including fixes from netfilter, bpf
Current release - regressions:
 
   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()
 
   - sched:
     - act_connmark: handle errno on tcf_idr_check_alloc
     - flower: fix fl_change() error recovery path
 
   - ieee802154: prevent user from crashing the host
 
 Current release - new code bugs:
 
   - eth: bnxt_en: fix the double free during device removal
 
   - tools: ynl:
     - fix enum-as-flags in the generic CLI
     - fully inherit attrs in subsets
     - re-license uniformly under GPL-2.0 or BSD-3-clause
 
 Previous releases - regressions:
 
   - core: use indirect calls helpers for sk_exit_memory_pressure()
 
   - tls:
     - fix return value for async crypto
     - avoid hanging tasks on the tx_lock
 
   - eth: ice: copy last block omitted in ice_get_module_eeprom()
 
 Previous releases - always broken:
 
   - core: avoid double iput when sock_alloc_file fails
 
   - af_unix: fix struct pid leaks in OOB support
 
   - tls:
     - fix possible race condition
     - fix device-offloaded sendpage straddling records
 
   - bpf:
     - sockmap: fix an infinite loop error
     - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
     - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
 
   - netfilter: tproxy: fix deadlock due to missing BH disable
 
   - phylib: get rid of unnecessary locking
 
   - eth: bgmac: fix *initial* chip reset to support BCM5358
 
   - eth: nfp: fix csum for ipsec offload
 
   - eth: mtk_eth_soc: fix RX data corruption issue
 
 Misc:
 
   - usb: qmi_wwan: add telit 0x1080 composition
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmQJzQISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOky5YP/04Dbsbeqpk0Q94axmjoaS0J/4rW49js
 RaA7v8ci7sL1omW8k5tILPXniAouN4YHNOCW1KbLBMR5O7lyn9qM1RteHOIpOmte
 TLzAw+6Wl7CyGiiqirv2GU96Wd/jZoZpPXFZz/gXP59GnkChSHzQcpexmz0nrmxI
 eCRSs+qm+re3wmDKTYm5C+g+420PNXu9JItPnTNf+nTkTBxpmOEMyry03I0taXKS
 wceQHB2q5E0sSWXDfkxG/pmUuYTj3AdRSQ+vo+FLevSs/LWeThs2I6pT5sn8XS76
 1S8Lh6FytfBhyalFmRtrpqIJYyGae5MwEXQ29ddfmF4OFFLedx3IH0+JFQxTE9So
 i4gaXmM5SUI7c5vhib097xUISoLxKqqXQVQQSQ1MPZRfXtVubbA2gCv+vh6fXVoj
 zQYatZOLM7KT9q4Pw8A+9bJPof/FV+ObC67pbGQbJJgBoy+oOixDuP+x5DYT384L
 /5XS+23OZiFe7bvQoE/0SQMeRk3lF2XkS5l9gSbdSnGQPiaOqKhDgkoCmdkn1jvg
 qtkBS6+tRRoOBNsjC4r4eFXBVOQ1+myyjZetBnEOaSp22FaTJFQh9qX3AMFIHbUy
 m0jDi9OJZSWHICd6KNWPm3JK43cMjiyZbGftYqOHhuY5HN30vQN6sl7DXIJ0rIcE
 myHMfizwqmGT
 =hSXM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()

   - sched:
      - act_connmark: handle errno on tcf_idr_check_alloc
      - flower: fix fl_change() error recovery path

   - ieee802154: prevent user from crashing the host

  Current release - new code bugs:

   - eth: bnxt_en: fix the double free during device removal

   - tools: ynl:
      - fix enum-as-flags in the generic CLI
      - fully inherit attrs in subsets
      - re-license uniformly under GPL-2.0 or BSD-3-clause

  Previous releases - regressions:

   - core: use indirect calls helpers for sk_exit_memory_pressure()

   - tls:
      - fix return value for async crypto
      - avoid hanging tasks on the tx_lock

   - eth: ice: copy last block omitted in ice_get_module_eeprom()

  Previous releases - always broken:

   - core: avoid double iput when sock_alloc_file fails

   - af_unix: fix struct pid leaks in OOB support

   - tls:
      - fix possible race condition
      - fix device-offloaded sendpage straddling records

   - bpf:
      - sockmap: fix an infinite loop error
      - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
      - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR

   - netfilter: tproxy: fix deadlock due to missing BH disable

   - phylib: get rid of unnecessary locking

   - eth: bgmac: fix *initial* chip reset to support BCM5358

   - eth: nfp: fix csum for ipsec offload

   - eth: mtk_eth_soc: fix RX data corruption issue

  Misc:

   - usb: qmi_wwan: add telit 0x1080 composition"

* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  tools: ynl: fix enum-as-flags in the generic CLI
  tools: ynl: move the enum classes to shared code
  net: avoid double iput when sock_alloc_file fails
  af_unix: fix struct pid leaks in OOB support
  eth: fealnx: bring back this old driver
  net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
  net: microchip: sparx5: fix deletion of existing DSCP mappings
  octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
  net/smc: fix fallback failed while sendmsg with fastopen
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
  mailmap: update entries for Stephen Hemminger
  mailmap: add entry for Maxim Mikityanskiy
  nfc: change order inside nfc_se_io error path
  ethernet: ice: avoid gcc-9 integer overflow warning
  ice: don't ignore return codes in VSI related code
  ice: Fix DSCP PFC TLV creation
  net: usb: qmi_wwan: add Telit 0x1080 composition
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
  netfilter: conntrack: adopt safer max chain length
  net: tls: fix device-offloaded sendpage straddling records
  ...
2023-03-09 10:56:58 -08:00
Conor Dooley
bf89b7ee52
RISC-V: fix taking the text_mutex twice during sifive errata patching
Chris pointed out that some bonehead, *cough* me *cough*, added two
mutex_locks() to the SiFive errata patching. The second was meant to
have been a mutex_unlock().

This results in errors such as

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
6.2.0-rc1-starlight-00079-g9493e6f3ce02 #229
Hardware name: BeagleV Starlight Beta (DT)
epc : __schedule+0x42/0x500
 ra : schedule+0x46/0xce
epc : ffffffff8065957c ra : ffffffff80659a80 sp : ffffffff81203c80
 gp : ffffffff812d50a0 tp : ffffffff8120db40 t0 : ffffffff81203d68
 t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81203cf0
 s1 : ffffffff8120db40 a0 : 0000000000000000 a1 : ffffffff81213958
 a2 : ffffffff81213958 a3 : 0000000000000000 a4 : 0000000000000000
 a5 : ffffffff80a1bd00 a6 : 0000000000000000 a7 : 0000000052464e43
 s2 : ffffffff8120db41 s3 : ffffffff80a1ad00 s4 : 0000000000000000
 s5 : 0000000000000002 s6 : ffffffff81213938 s7 : 0000000000000000
 s8 : 0000000000000000 s9 : 0000000000000001 s10: ffffffff812d7204
 s11: ffffffff80d3c920 t3 : 0000000000000001 t4 : ffffffff812e6dd7
 t5 : ffffffff812e6dd8 t6 : ffffffff81203bb8
status: 0000000200000100 badaddr: 0000000000000030 cause: 000000000000000d
[<ffffffff80659a80>] schedule+0x46/0xce
[<ffffffff80659dce>] schedule_preempt_disabled+0x16/0x28
[<ffffffff8065ae0c>] __mutex_lock.constprop.0+0x3fe/0x652
[<ffffffff8065b138>] __mutex_lock_slowpath+0xe/0x16
[<ffffffff8065b182>] mutex_lock+0x42/0x4c
[<ffffffff8000ad94>] sifive_errata_patch_func+0xf6/0x18c
[<ffffffff80002b92>] _apply_alternatives+0x74/0x76
[<ffffffff80802ee8>] apply_boot_alternatives+0x3c/0xfa
[<ffffffff80803cb0>] setup_arch+0x60c/0x640
[<ffffffff80800926>] start_kernel+0x8e/0x99c
---[ end trace 0000000000000000 ]---

Reported-by: Chris Hofstaedtler <zeha@debian.org>
Fixes: 9493e6f3ce ("RISC-V: take text_mutex during alternative patching")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230302174154.970746-1-conor@kernel.org
[Palmer: pick up Geert's bug report from the thread]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-07 12:39:12 -08:00
Jakub Kicinski
757b56a6c7 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZZ1wAKCRDbK58LschI
 g4fcAQDYVsICeBDmhdBdZs7Kb91/s6SrU6B0jy4zs0gOIBBOhgD7B3jt3dMTD2tp
 rPLHlv6uUoYS7mbZsrZi/XjVw8UmewM=
 =VUnr
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-03-06

We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 9 files changed, 64 insertions(+), 18 deletions(-).

The main changes are:

1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
   that is, keep resolving such instances instead of bailing out,
   from Lorenz Bauer.

2) Fix BPF test framework with regards to xdp_frame info misplacement
   in the "live packet" code, from Alexander Lobakin.

3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
   from Liu Jian.

4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
   from Randy Dunlap.

5) Several BPF doc fixes with either broken links or external instead
   of internal doc links, from Bagas Sanjaya.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: check that modifier resolves after pointer
  btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
  bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
  bpf, doc: Link to submitting-patches.rst for general patch submission info
  bpf, doc: Do not link to docs.kernel.org for kselftest link
  bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
  riscv, bpf: Fix patch_text implicit declaration
  bpf, docs: Fix link to BTF doc
====================

Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06 20:28:00 -08:00
Palmer Dabbelt
e18048da9b
RISC-V: Stop emitting attributes
The RISC-V ELF attributes don't contain any useful information.  New
toolchains ignore them, but they frequently trip up various older/mixed
toolchains.  So just turn them off.

Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230223224605.6995-1-palmer@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-06 15:55:25 -08:00
Linus Torvalds
1a8d05a726 VM_FAULT_RETRY fixes
Some of the page fault handlers do not deal with the following case
 correctly:
 	* handle_mm_fault() has returned VM_FAULT_RETRY
 	* there is a pending fatal signal
 	* fault had happened in kernel mode
 Correct action in such case is not "return unconditionally" - fatal
 signals are handled only upon return to userland and something like
 copy_to_user() would end up retrying the faulting instruction and
 triggering the same fault again and again.
 
 What we need to do in such case is to make the caller to treat that
 as failed uaccess attempt - handle exception if there is an exception
 handler for faulting instruction or oops if there isn't one.
 
 Over the years some architectures had been fixed and now are handling
 that case properly; some still do not.  This series should fix the
 remaining ones.
 
 Status:
 	m68k, riscv, hexagon, parisc: tested/acked by maintainers.
 	alpha, sparc32, sparc64: tested locally - bug has been
 reproduced on the unpatched kernel and verified to be fixed by
 this series.
 	ia64, microblaze, nios2, openrisc: build, but otherwise
 completely untested.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZAP54AAKCRBZ7Krx/gZQ
 6/IjAP92oS8kDuodAtT7WorV+GETWr2HFKfvDiPSmEGiSnXIigD9Hj9svXyeeAgl
 /TqGR50Yrvn3IUQ0A0bUDpBAG1qyVwY=
 =9+Bd
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull VM_FAULT_RETRY fixes from Al Viro:
 "Some of the page fault handlers do not deal with the following case
  correctly:

   - handle_mm_fault() has returned VM_FAULT_RETRY

   - there is a pending fatal signal

   - fault had happened in kernel mode

  Correct action in such case is not "return unconditionally" - fatal
  signals are handled only upon return to userland and something like
  copy_to_user() would end up retrying the faulting instruction and
  triggering the same fault again and again.

  What we need to do in such case is to make the caller to treat that as
  failed uaccess attempt - handle exception if there is an exception
  handler for faulting instruction or oops if there isn't one.

  Over the years some architectures had been fixed and now are handling
  that case properly; some still do not. This series should fix the
  remaining ones.

  Status:

   - m68k, riscv, hexagon, parisc: tested/acked by maintainers.

   - alpha, sparc32, sparc64: tested locally - bug has been reproduced
     on the unpatched kernel and verified to be fixed by this series.

   - ia64, microblaze, nios2, openrisc: build, but otherwise completely
     untested"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  openrisc: fix livelock in uaccess
  nios2: fix livelock in uaccess
  microblaze: fix livelock in uaccess
  ia64: fix livelock in uaccess
  sparc: fix livelock in uaccess
  alpha: fix livelock in uaccess
  parisc: fix livelock in uaccess
  hexagon: fix livelock in uaccess
  riscv: fix livelock in uaccess
  m68k: fix livelock in uaccess
2023-03-05 11:07:58 -08:00
Linus Torvalds
bf1a1bad82 RISC-V Patches for the 6.3 Merge Window, Part 2
* Some cleanups and fixes for the Zbb-optimized string routines.
 * Support for custom (vendor or implementation defined) perf events.
 * COMMAND_LINE_SIZE has been increased to 1024.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmQBuDMTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYie2nEACDnj2w35BYDZDrbOiodXw1bA60IpJG
 5Q3bZEIWimaoDluD52UxrLhAZgJ3imGHF2nLLevqqsqAWKT2aRxsiEtgqARN0GzW
 Am2+ZBdMuo1oCOHhIzCI9nk1xMG1fjva1ZAhGcVK876W3NuLUDvb1+LeDcwZB5bX
 8KUu1haILIlngd8xOZE88xJOw83uQaF1Oc1NLqpQabbCiWxFBAIzI3O/7ZyuVk4M
 LrdD5zH7YBUpksol0CxWaJ/jI59t23na2421NKNm34zALsvVFSKc2CZ0+r5DFy3D
 4bzMdzXmtDjWfSCTGicGk03acrxUwgpAkFH/y2eJzhQl1WWrO/FiUzlI2uKoHou9
 MO+fxPHcOamnWH5OF8jzP9+v5fkKIX2eYWJl5E3Jge6KU8D1Y2cPbE/k75cOq70Y
 BVF+BUnQ26UdqYH0POFqikizF7dvt6G+rNpUnFrufc6xs+BALxfyIHRQnTw8dQu4
 FbZRLX9YNn8/TpwcfnzIzFk1CH35n8QPBvcumIiV/344Gs7nt5vyDO7b//ExS4IL
 mrENDNdaQ/u8T6xkhGSWPPMwNAj3OFpLRi00BInYvRXx9/0WArD1Z5N9uMl42dqd
 kB060nUH1Ao7aBQJztqZVDOYwyFhREGM+xLRBLE+CiD8EnVvm30Fhovq2FrFfbif
 GPHt+ouKJtlkTw==
 =68ZK
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Some cleanups and fixes for the Zbb-optimized string routines

 - Support for custom (vendor or implementation defined) perf events

 - COMMAND_LINE_SIZE has been increased to 1024

* tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Bump COMMAND_LINE_SIZE value to 1024
  drivers/perf: RISC-V: Allow programming custom firmware events
  riscv, lib: Fix Zbb strncmp
  RISC-V: improve string-function assembly
2023-03-03 09:32:51 -08:00
Al Viro
d835eb3a57 riscv: fix livelock in uaccess
riscv equivalent of 26178ec11e "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables.  In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Tested-by: Björn Töpel <bjorn@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-02 12:30:15 -05:00
Alexandre Ghiti
61fc1ee8be
riscv: Bump COMMAND_LINE_SIZE value to 1024
Increase COMMAND_LINE_SIZE as the current default value is too low
for syzbot kernel command line.

There has been considerable discussion on this patch that has led to a
larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all
ports.  That's not quite done yet, but it's gotten far enough we're
confident this is not a uABI change so this is safe.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr
[Palmer: it's not uabi]
Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-01 18:35:43 -08:00
Björn Töpel
81a1dd10b0
riscv, lib: Fix Zbb strncmp
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B
per iteration, and a slow that does one byte per iteration.

The idea is to compare aligned XLEN chunks for most of strings, and do
the remainder tail in the slow path.

The Zbb strncmp has two issues in the fast path:

Incorrect remainder handling (wrong compare): Assume that the string
length is 9. On 64b systems, the fast path should do one iteration,
and one iteration in the slow path. Instead, both were done in the
fast path, which lead to incorrect results. An example:

  strncmp("/dev/vda", "/dev/", 5);

Correct by changing "bgt" to "bge".

Missing NULL checks in the second string: This could lead to incorrect
results for:

  strncmp("/dev/vda", "/dev/vda\0", 8);

Correct by adding an additional check.

Fixes: b6fcdb191e ("RISC-V: add zbb support to string functions")
Suggested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-28 18:42:38 -08:00
Heiko Stuebner
6934cf8a3e
RISC-V: improve string-function assembly
Adapt the suggestions for the assembly string functions that Andrew
suggested but that I didn't manage to include into the series that
got applied.

This includes improvements to two comments, removal of unneeded labels
and moving one instruction slightly higher to contradict an
explanatory comment.

Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-28 08:00:34 -08:00
Randy Dunlap
2d311f480b riscv, bpf: Fix patch_text implicit declaration
bpf_jit_comp64.c uses patch_text(), so add <asm/patch.h> to it
to prevent the build error:

../arch/riscv/net/bpf_jit_comp64.c: In function 'bpf_arch_text_poke':
../arch/riscv/net/bpf_jit_comp64.c:691:23: error: implicit declaration of function 'patch_text'; did you mean 'path_get'? [-Werror=implicit-function-declaration]
  691 |                 ret = patch_text(ip, new_insns, ninsns);
      |                       ^~~~~~~~~~

Fixes: 596f2e6f9c ("riscv, bpf: Add bpf_arch_text_poke support for RV64")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/202302271000.Aj4nMXbZ-lkp@intel.com
Link: https://lore.kernel.org/bpf/20230227072016.14618-1-rdunlap@infradead.org
2023-02-27 22:14:33 +01:00
Linus Torvalds
49d5759268 ARM:
- Provide a virtual cache topology to the guest to avoid
   inconsistencies with migration on heterogenous systems. Non secure
   software has no practical need to traverse the caches by set/way in
   the first place.
 
 - Add support for taking stage-2 access faults in parallel. This was an
   accidental omission in the original parallel faults implementation,
   but should provide a marginal improvement to machines w/o FEAT_HAFDBS
   (such as hardware from the fruit company).
 
 - A preamble to adding support for nested virtualization to KVM,
   including vEL2 register state, rudimentary nested exception handling
   and masking unsupported features for nested guests.
 
 - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
   resuming a CPU when running pKVM.
 
 - VGIC maintenance interrupt support for the AIC
 
 - Improvements to the arch timer emulation, primarily aimed at reducing
   the trap overhead of running nested.
 
 - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
   interest of CI systems.
 
 - Avoid VM-wide stop-the-world operations when a vCPU accesses its own
   redistributor.
 
 - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
   in the host.
 
 - Aesthetic and comment/kerneldoc fixes
 
 - Drop the vestiges of the old Columbia mailing list and add [Oliver]
   as co-maintainer
 
 This also drags in arm64's 'for-next/sme2' branch, because both it and
 the PSCI relay changes touch the EL2 initialization code.
 
 RISC-V:
 
 - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
 
 - Correctly place the guest in S-mode after redirecting a trap to the guest
 
 - Redirect illegal instruction traps to guest
 
 - SBI PMU support for guest
 
 s390:
 
 - Two patches sorting out confusion between virtual and physical
   addresses, which currently are the same on s390.
 
 - A new ioctl that performs cmpxchg on guest memory
 
 - A few fixes
 
 x86:
 
 - Change tdp_mmu to a read-only parameter
 
 - Separate TDP and shadow MMU page fault paths
 
 - Enable Hyper-V invariant TSC control
 
 - Fix a variety of APICv and AVIC bugs, some of them real-world,
   some of them affecting architecurally legal but unlikely to
   happen in practice
 
 - Mark APIC timer as expired if its in one-shot mode and the count
   underflows while the vCPU task was being migrated
 
 - Advertise support for Intel's new fast REP string features
 
 - Fix a double-shootdown issue in the emergency reboot code
 
 - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
   similar treatment to VMX
 
 - Update Xen's TSC info CPUID sub-leaves as appropriate
 
 - Add support for Hyper-V's extended hypercalls, where "support" at this
   point is just forwarding the hypercalls to userspace
 
 - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
   MSR filters
 
 - One-off fixes and cleanups
 
 - Fix and cleanup the range-based TLB flushing code, used when KVM is
   running on Hyper-V
 
 - Add support for filtering PMU events using a mask.  If userspace
   wants to restrict heavily what events the guest can use, it can now
   do so without needing an absurd number of filter entries
 
 - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
   support is disabled
 
 - Add PEBS support for Intel Sapphire Rapids
 
 - Fix a mostly benign overflow bug in SEV's send|receive_update_data()
 
 - Move several SVM-specific flags into vcpu_svm
 
 x86 Intel:
 
 - Handle NMI VM-Exits before leaving the noinstr region
 
 - A few trivial cleanups in the VM-Enter flows
 
 - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
   EPTP switching (or any other VM function) for L1
 
 - Fix a crash when using eVMCS's enlighted MSR bitmaps
 
 Generic:
 
 - Clean up the hardware enable and initialization flow, which was
   scattered around multiple arch-specific hooks.  Instead, just
   let the arch code call into generic code.  Both x86 and ARM should
   benefit from not having to fight common KVM code's notion of how
   to do initialization.
 
 - Account allocations in generic kvm_arch_alloc_vm()
 
 - Fix a memory leak if coalesced MMIO unregistration fails
 
 selftests:
 
 - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
   the correct hypercall instruction instead of relying on KVM to patch
   in VMMCALL
 
 - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
 mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
 9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
 duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
 VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
 /WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
 =goe1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm->lock vs. kvm->srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
2023-02-25 11:30:21 -08:00
Linus Torvalds
01687e7c93 RISC-V Patches for the 6.3 Merge Window, Part 1
There's a bunch of fixes/cleanups throughout the tree as usual, but we
 also have a handful of new features.
 
 * Various improvements to the extension detection and alternative
   patching infrastructure.
 * Zbb-optimized string routines.
 * Support for cpu-capacity in the RISC-V DT bindings.
 * Zicbom no longer depends on toolchain support.
 * Some performance and code size improvements to ftrace.
 * Support for ARCH_WANT_LD_ORPHAN_WARN.
 * Oops now contain the faulting instruction.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmP49coTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiTFnEACuQtGJWSwzH+ORswVyItKzqHcBMU3t
 rWHTFxQ7MdWgQO8nrWUtSypGY4n0DFTCe9w4H3tQFRDaTXbI+ycFjidEDt3eJCMb
 n6WiGuZpdKVS81CQ0Es4dTWQ1i/28fe1851CGK/PkybXdrPPofdCJ9k3Wepxflb/
 2UYxRDyjKt3KbJ2OmN2oF8Ek1rrsGhIC/Dhbdb2JsGZhYF10ZYjquaOLs31WbHMG
 O+n/N/JfZRAif1MDQ71ygAm9KV0kGqe/wcRtsJGETwJ8U3I/cjn2mAGd8BRdy4iL
 9GFmTmi8q27ntUbakikNz3b4aE9xVnLDvRIyOciI3l8rQjrFAsfnQbuRwlaq6BVJ
 BF3e6nAjkcLj23FhbROTlfncEOzrklbNZ+uQIuvyffAUjDoePw9x7o0r+qj7FnOY
 WMfNecJMeE5OGVBqHSVFEcAMlN6uYu6wqbEipEpc+8sTg+w1LM0bUVNhV86/BrnL
 bh+4+7MPYtg45vy2Y8AuPUBFqR2uCekDpbxciCEGsaIzUYRas2zrt9UkWGjKs1VV
 q0qeLSNNA1wBq+q6FprTceipFQIqD5KnmI2GMucF6v4YFg5AzeSOpRc6aeqcs7Z2
 +ApShSOFPjjntZbcpTgkvhrPExr0Jel0xY7YSazUUqY0xOHUwGNBEh/E4rzsRLxr
 qvUpFAIZT60dfQ==
 =XgYl
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "There's a bunch of fixes/cleanups throughout the tree as usual, but we
  also have a handful of new features:

   - Various improvements to the extension detection and alternative
     patching infrastructure

   - Zbb-optimized string routines

   - Support for cpu-capacity in the RISC-V DT bindings

   - Zicbom no longer depends on toolchain support

   - Some performance and code size improvements to ftrace

   - Support for ARCH_WANT_LD_ORPHAN_WARN

   - Oops now contain the faulting instruction"

* tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (67 commits)
  RISC-V: add a spin_shadow_stack declaration
  riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
  riscv: Add header include guards to insn.h
  riscv: alternative: proceed one more instruction for auipc/jalr pair
  riscv: Avoid enabling interrupts in die()
  riscv, mm: Perform BPF exhandler fixup on page fault
  RISC-V: take text_mutex during alternative patching
  riscv: hwcap: Don't alphabetize ISA extension IDs
  RISC-V: fix ordering of Zbb extension
  riscv: jump_label: Fixup unaligned arch_static_branch function
  RISC-V: Only provide the single-letter extensions in HWCAP
  riscv: mm: fix regression due to update_mmu_cache change
  scripts/decodecode: Add support for RISC-V
  riscv: Add instruction dump to RISC-V splats
  riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
  riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
  riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
  riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
  riscv: lds: define RUNTIME_DISCARD_EXIT
  RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  ...
2023-02-25 11:14:08 -08:00
Linus Torvalds
a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Linus Torvalds
17cd4d6f05 TTY/Serial driver updates for 6.3-rc1
Here is the big set of serial and tty driver updates for 6.3-rc1.
 
 Once again, Jiri and Ilpo have done a number of core vt and tty/serial
 layer cleanups that were much needed and appreciated.  Other than that,
 it's just a bunch of little tty/serial driver updates:
   - qcom-geni-serial driver updates
   - liteuart driver updates
   - hvcs driver cleanups
   - n_gsm updates and additions for new features
   - more 8250 device support added
   - fpga/dfl update and additions
   - imx serial driver updates
   - fsl_lpuart updates
   - other tiny fixes and updates for serial drivers
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/itAw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykJbQCfWv/J4ZElO108iHBU5mJCDagUDBgAnAtLLN6A
 SEAnnokGCDtA/BNIXeES
 =luRi
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver updates from Greg KH:
 "Here is the big set of serial and tty driver updates for 6.3-rc1.

  Once again, Jiri and Ilpo have done a number of core vt and tty/serial
  layer cleanups that were much needed and appreciated. Other than that,
  it's just a bunch of little tty/serial driver updates:

   - qcom-geni-serial driver updates

   - liteuart driver updates

   - hvcs driver cleanups

   - n_gsm updates and additions for new features

   - more 8250 device support added

   - fpga/dfl update and additions

   - imx serial driver updates

   - fsl_lpuart updates

   - other tiny fixes and updates for serial drivers

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (143 commits)
  tty: n_gsm: add keep alive support
  serial: imx: remove a redundant check
  dt-bindings: serial: snps-dw-apb-uart: add dma & dma-names properties
  soc: qcom: geni-se: Move qcom-geni-se.h to linux/soc/qcom/geni-se.h
  tty: n_gsm: add TIOCMIWAIT support
  tty: n_gsm: add RING/CD control support
  tty: n_gsm: mark unusable ioctl structure fields accordingly
  serial: imx: get rid of registers shadowing
  serial: imx: refine local variables in rxint()
  serial: imx: stop using USR2 in FIFO reading loop
  serial: imx: remove redundant USR2 read from FIFO reading loop
  serial: imx: do not break from FIFO reading loop prematurely
  serial: imx: do not sysrq broken chars
  serial: imx: work-around for hardware RX flood
  serial: imx: factor-out common code to imx_uart_soft_reset()
  serial: 8250_pci1xxxx: Add power management functions to quad-uart driver
  serial: 8250_pci1xxxx: Add RS485 support to quad-uart driver
  serial: 8250_pci1xxxx: Add driver for quad-uart support
  serial: 8250_pci: Add serial8250_pci_setup_port definition in 8250_pcilib.c
  tty: pcn_uart: fix memory leak with using debugfs_lookup()
  ...
2023-02-24 12:17:14 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Conor Dooley
eb9be8310c
RISC-V: add a spin_shadow_stack declaration
The patchwork automation reported a sparse complaint that
spin_shadow_stack was not declared and should be static:
../arch/riscv/kernel/traps.c:335:15: warning: symbol 'spin_shadow_stack' was not declared. Should it be static?

However, this is used in entry.S and therefore shouldn't be static.
The same applies to the shadow_stack that this pseudo spinlock is
trying to protect, so do like its charge and add a declaration to
thread_info.h

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: 7e1864332f ("riscv: fix race when vmap stack overflow")
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230210185945.915806-1-conor@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-22 06:48:30 -08:00
Linus Torvalds
5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Guo Ren
a3c7d6b642
riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
Add HVO support for RISC-V; see commit 6be24bed9d ("mm: hugetlb:
introduce a new config HUGETLB_PAGE_FREE_VMEMMAP"). This patch is
similar to commit 1e63ac088f ("arm64: mm: hugetlb: enable
HUGETLB_PAGE_FREE_VMEMMAP for arm64"), and riscv's motivation is the
same as arm64. The current riscv was ready to enable HVO after fixup,
ref commit d33deda095 ("riscv/mm: hugepage's PG_dcache_clean flag
is only set in head page").

See Documentation/mm/vmemmap_dedup.rst for more details.

The HugeTLB VmemmapvOptimization (HVO) defaults to off in Kconfig.

Here is the riscv test log:
cat /proc/sys/vm/hugetlb_optimize_vmemmap
echo 8 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
mount -t hugetlbfs none test/ -o pagesize=2048k
<Try some simple hugetlb test in test dir, no problem found.>

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/linux-riscv/1F5AF29D-708A-483B-A29F-CAEE6F554866@linux.dev/
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230201015259.3222524-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:28:12 -08:00
Liao Chang
8ac6e619d9
riscv: Add header include guards to insn.h
Add header include guards to insn.h to prevent repeating declaration of
any identifiers in insn.h.

Fixes: edde5584c7 ("riscv: Add SW single-step support for KDB")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Fixes: c9c1af3f18 ("RISC-V: rename parse_asm.h to insn.h")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230129094242.282620-1-liaochang1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:22:54 -08:00
Palmer Dabbelt
b19aa282c5
Merge patch series "riscv: Dump faulting instructions in oops handler"
Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

RISC-V does not dump faulting instructions in the oops handler. This
series adds "Code:" dumps to the oops output together with
scripts/decodecode support.

* b4-shazam-merge:
  scripts/decodecode: Add support for RISC-V
  riscv: Add instruction dump to RISC-V splats

Link: https://lore.kernel.org/r/20230119074738.708301-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:51 -08:00
Jisheng Zhang
91612cfb17
riscv: alternative: proceed one more instruction for auipc/jalr pair
If we patched auipc + jalr pair, we'd better proceed one more
instruction. Andrew pointed out "There's not a problem now, since
we're only adding a fixup for jal, not jalr, but we should
future-proof this and there's no reason to revisit an already fixed-up
instruction anyway."

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230115162811.3146-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:50 -08:00
Palmer Dabbelt
f3af3b0039
Merge patch series "riscv: improve link and support ARCH_WANT_LD_ORPHAN_WARN"
Jisheng Zhang <jszhang@kernel.org> says:

This series tries to improve link time handling of riscv:
patch1 adds the missing RUNTIME_DISCARD_EXIT as suggested by Masahiro.

Similar as other architectures such as x86, arm64 and so on, enable
ARCH_WANT_LD_ORPHAN_WARN to enable linker orphan warnings to prevent
from missing any new sections in future. So the following two patches
are preparation ones, and the last patch finally selects
ARCH_WANT_LD_ORPHAN_WARN

* b4-shazam-merge:
  riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
  riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
  riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
  riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
  riscv: lds: define RUNTIME_DISCARD_EXIT

Link: https://lore.kernel.org/r/20230119155417.2600-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:49 -08:00
Mattias Nissler
130aee3fd9
riscv: Avoid enabling interrupts in die()
While working on something else, I noticed that the kernel would start
accepting interrupts again after crashing in an interrupt handler. Since
the kernel is already in inconsistent state, enabling interrupts is
dangerous and opens up risk of kernel state deteriorating further.
Interrupts do get enabled via what looks like an unintended side effect of
spin_unlock_irq, so switch to the more cautious
spin_lock_irqsave/spin_unlock_irqrestore instead.

Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:45 -08:00
Björn Töpel
416721ff05
riscv, mm: Perform BPF exhandler fixup on page fault
Commit 21855cac82 ("riscv/mm: Prevent kernel module to access user
memory without uaccess routines") added early exits/deaths for page
faults stemming from accesses to user-space without using proper
uaccess routines (where sstatus.SUM is set).

Unfortunatly, this is too strict for some BPF programs, which relies
on BPF exhandler fixups. These BPF programs loads "BTF pointers". A
BTF pointers could either be a valid kernel pointer or NULL, but not a
userspace address.

Resolve the problem by calling the fixup handler in the early exit
path.

Fixes: 21855cac82 ("riscv/mm: Prevent kernel module to access user memory without uaccess routines")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230214162515.184827-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:39 -08:00
Conor Dooley
9493e6f3ce
RISC-V: take text_mutex during alternative patching
Guenter reported a splat during boot, that Samuel pointed out was the
lockdep assertion failing in patch_insn_write():

WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
epc : patch_insn_write+0x222/0x2f6
 ra : patch_insn_write+0x21e/0x2f6
epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
 gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
 t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81803e40
 s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
 a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
 s2 : ffffffff80b4889c s3 : 000000000000082c s4 : ffffffff80b48828
 s5 : 0000000000000828 s6 : ffffffff8131a0a0 s7 : 0000000000000fff
 s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
 s11: 000000000000000b t3 : 0000000000000001 t4 : 000000000000000d
 t5 : ffffffffd8180000 t6 : ffffffff81803bc8
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
[<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
[<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
[<ffffffff80003348>] _apply_alternatives+0x46/0x86
[<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
[<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
[<ffffffff80c0075a>] start_kernel+0xa2/0x8f8

This issue was exposed by 702e64550b ("riscv: fpu: switch has_fpu() to
riscv_has_extension_likely()"), as it is the patching in has_fpu() that
triggers the splats in Guenter's report.

Take the text_mutex before doing any code patching to satisfy lockdep.

Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Fixes: a35707c3d8 ("riscv: add memory-type errata for T-Head")
Fixes: 1a0e5dbd37 ("riscv: sifive: Add SiFive alternative ports")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/20230212154333.GA3760469@roeck-us.net/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230212194735.491785-1-conor@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:33 -08:00
Andrew Jones
dac8bf14bb
riscv: hwcap: Don't alphabetize ISA extension IDs
While the comment above the ISA extension ID definitions says
"Entries are sorted alphabetically.", this stopped being good
advice with commit d8a3d8a752 ("riscv: hwcap: make ISA extension
ids can be used in asm"), as we now use macros instead of enums.
Reshuffling defines is error-prone, so, since they don't need to be
in any particular order, change the advice to just adding new
extensions at the bottom. Also, take the opportunity to change
spaces to tabs, merge three comments into one, and move the base
and max defines into more logical locations wrt the ID definitions.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230209123636.123537-1-ajones@ventanamicro.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:27 -08:00
Heiko Stuebner
1eac28201a
RISC-V: fix ordering of Zbb extension
As Andrew reported,
    Zb* comes after Zi* according 27.11 "Subset Naming Convention"
so fix the ordering accordingly.

Reported-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-2-heiko@sntech.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:21 -08:00
Andy Chiu
9ddfc3cd80
riscv: jump_label: Fixup unaligned arch_static_branch function
Runtime code patching must be done at a naturally aligned address, or we
may execute on a partial instruction.

We have encountered problems traced back to static jump functions during
the test. We switched the tracer randomly for every 1~5 seconds on a
dual-core QEMU setup and found the kernel sucking at a static branch
where it jumps to itself.

The reason is that the static branch was 2-byte but not 4-byte aligned.
Then, the kernel would patch the instruction, either J or NOP, with two
half-word stores if the machine does not have efficient unaligned
accesses. Thus, moments exist where half of the NOP mixes with the other
half of the J when transitioning the branch. In our particular case, on
a little-endian machine, the upper half of the NOP was mixed with the
lower part of the J when enabling the branch, resulting in a jump that
jumped to itself. Conversely, it would result in a HINT instruction when
disabling the branch, but it might not be observable.

ARM64 does not have this problem since all instructions must be 4-byte
aligned.

Fixes: ebc00dde8a ("riscv: Add jump-label implementation")
Link: https://lore.kernel.org/linux-riscv/20220913094252.3555240-6-andy.chiu@sifive.com/
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230206090440.1255001-1-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:16 -08:00
Palmer Dabbelt
2350bd192f
RISC-V: Only provide the single-letter extensions in HWCAP
The recent refactoring led to us leaking some HWCAP bits to userspace
that didn't make much sense.  With any luck we'll have a better scheme
soon, but for now just mask off those bits to avoid polluting userspace.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230202233832.11036-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:14 -08:00
Sergey Matyukevich
b49f700668
riscv: mm: fix regression due to update_mmu_cache change
This is a partial revert of the commit 4bd1d80efb ("riscv: mm: notify
remote harts about mmu cache updates"). Original commit included two
loosely related changes serving the same purpose of fixing stale TLB
entries causing user-space application crash:
- introduce deferred per-ASID TLB flush for CPUs not running the task
- switch to per-ASID TLB flush on all CPUs running the task in update_mmu_cache

According to report and discussion in [1], the second part caused a
regression on Renesas RZ/Five SoC. For now restore the old behavior
of the update_mmu_cache.

[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/

Fixes: 4bd1d80efb ("riscv: mm: notify remote harts about mmu cache updates")
Reported-by: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Link: trailer, so that it can be parsed with git's trailer functionality?
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230129211818.686557-1-geomatsi@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:09 -08:00
Björn Töpel
eb165bfa8e
riscv: Add instruction dump to RISC-V splats
Add instruction dump (Code:) output to RISC-V splats. Dump 16b
parcels.

An example:
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  Oops [#1]
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc3-00302-g840ff44c571d-dirty #27
  Hardware name: riscv-virtio,qemu (DT)
  epc : kernel_init+0xc8/0x10e
   ra : kernel_init+0x70/0x10e
  epc : ffffffff80bd9a40 ra : ffffffff80bd99e8 sp : ff2000000060bec0
   gp : ffffffff81730b28 tp : ff6000007ff00000 t0 : 7974697275636573
   t1 : 0000000000000000 t2 : 3030303270393d6e s0 : ff2000000060bee0
   s1 : ffffffff81732028 a0 : 0000000000000000 a1 : ff60000080dd1780
   a2 : 0000000000000002 a3 : ffffffff8176a470 a4 : 0000000000000000
   a5 : 000000000000000a a6 : 0000000000000081 a7 : ff60000080dd1780
   s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
   s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
   s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
   s11: 0000000000000000 t3 : ffffffff81186018 t4 : 0000000000000022
   t5 : 000000000000003d t6 : 0000000000000000
  status: 0000000200000120 badaddr: 0000000000000000 cause: 000000000000000f
  [<ffffffff80003528>] ret_from_exception+0x0/0x16
  Code: 862a d179 608c a517 0069 0513 2be5 d0ef db2e 47a9 (c11c) a517
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  SMP: stopping secondary CPUs
  ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230119074738.708301-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:53:53 -08:00
Jisheng Zhang
f4b71bff8d
riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
Now, after that all the sections are explicitly described and
declared in vmlinux.lds.S, we can enable ld orphan warnings for
!XIP_KERNEL to prevent from missing any new sections in future.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230119155417.2600-6-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:07 -08:00
Jisheng Zhang
0ed0031b09
riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
When enabling linker orphan section warning, I got warnings similar as
below:
ld.lld: warning:
./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss)
is being placed in '.init.bss'

Catch the sections so that we can enable linker orphan section warning.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:06 -08:00
Jisheng Zhang
b13e64d941
riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.riscv.attributes' from
`init/main.o' being placed in section `.riscv.attributes'

While I don't see any usage of .riscv.attributes sections' in kernel
now, just catch the sections so that we can enable linker orphan
section warning.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:05 -08:00
Jisheng Zhang
e5973191a8
riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.rela.text' from
`init/main.o' being placed in section `.rela.dyn'

Use the approach similar as ARM64 does and declare it in vmlinux.lds.S

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:04 -08:00