Commit Graph

45213 Commits

Author SHA1 Message Date
Linus Torvalds
f5837722ff 11 hotfixes. 7 are cc:stable and the other 4 address post-6.6 issues or
are not considered backporting material.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZYys4AAKCRDdBJ7gKXxA
 jtmaAQC+o04Ia7IfB8MIqp1p7dNZQo64x/EnGA8YjUnQ8N6IwQD+ImU7dHl9g9Oo
 ROiiAbtMRBUfeJRsExX/Yzc1DV9E9QM=
 =ZGcs
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "11 hotfixes. 7 are cc:stable and the other 4 address post-6.6 issues
  or are not considered backporting material"

* tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: add an old address for Naoya Horiguchi
  mm/memory-failure: cast index to loff_t before shifting it
  mm/memory-failure: check the mapcount of the precise page
  mm/memory-failure: pass the folio and the page to collect_procs()
  selftests: secretmem: floor the memory size to the multiple of page_size
  mm: migrate high-order folios in swap cache correctly
  maple_tree: do not preallocate nodes for slot stores
  mm/filemap: avoid buffered read/write race to read inconsistent data
  kunit: kasan_test: disable fortify string checker on kmalloc_oob_memset
  kexec: select CRYPTO from KEXEC_FILE instead of depending on it
  kexec: fix KEXEC_FILE dependencies
2023-12-27 16:14:41 -08:00
Linus Torvalds
3f82f1c3a0 Misc fixes:
- Fix a secondary CPUs enumeration regression caused by creative
    MADT APIC table entries on certain systems.
 
  - Fix a race in the NOP-patcher that can spuriously trigger crashes
    on bootup.
 
  - Fix a bootup failure regression caused by the parallel bringup
    code, caused by firmware inconsistency between the APIC
    initialization states of the boot and secondary CPUs, on certain
    systems.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWG704RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j52w/+IfvoA9zX6+qsl0lIU76FsD50szR7C8IR
 4DsN68BbjI6sUNN/RWrjTY249UCbl8ykL8ozaek+eBaCXcTXIoGh96oPnojY/8vA
 HTYCgilmCvwNozawkjAeWedwbxdcafTr7l1bad5Y6LyYJv7BNUnsELq67bIQoOwK
 QGtnjDnTojVBoGy920KgZZ0zkqM05zWQ24MvM0Pn8T9lC2TokqCrbY4T8ags1AW0
 j+swsd34ON+jKM0tXqQdBgxbb4TRW05Yg2PC9hLMxtXRXsYFfXW8udS0E83JpWD8
 XL4z34IQqRuuiCqVboeBsP2gta4/oEM+ctbTXbmYs/ca7euZ7mnrUDcDVG2qyKyP
 F4hGgySyRv5wHx1IdTDIjozhsK+oYPLpqj87bBNr5N/aZImlKYTkICFt6EFhxrZn
 aoQCdE3ba4EoQqes05aVAvNABAt2ml1UNMyz4uu83diJG7aszVY0WOz4laX4edof
 L4EBpl2RrWqWwr7CnkrMhHfAcYVKv6k50/MK4ZrQrm1Xqgw6fbtvyxp72iw+ktJk
 XXnlDU6Ngc+F4dBXNY1Wr1q5kB4eYWy9DgmpHC56H6QPWTCV/Vk9wayeSXBrpoXa
 w5E91MKj9Oo68QbtmczbvlTmjKgWsWKGNvHHFF2ItmF2Tc/8BKFazNNZ+wzjsRl+
 oJnjZILBHfI=
 =V+LX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix a secondary CPUs enumeration regression caused by creative MADT
   APIC table entries on certain systems.

 - Fix a race in the NOP-patcher that can spuriously trigger crashes on
   bootup.

 - Fix a bootup failure regression caused by the parallel bringup code,
   caused by firmware inconsistency between the APIC initialization
   states of the boot and secondary CPUs, on certain systems.

* tag 'x86-urgent-2023-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi: Handle bogus MADT APIC tables gracefully
  x86/alternatives: Disable interrupts and sync when optimizing NOPs in place
  x86/alternatives: Sync core before enabling interrupts
  x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
2023-12-23 12:13:28 -08:00
Linus Torvalds
867583b399 RISC-V
- Fix a race condition in updating external interrupt for
   trap-n-emulated IMSIC swfile
 
 - Fix print_reg defaults in get-reg-list selftest
 
 ARM:
 
 - Ensure a vCPU's redistributor is unregistered from the MMIO bus
   if vCPU creation fails
 
 - Fix building KVM selftests for arm64 from the top-level Makefile
 
 x86:
 
 - Fix breakage for SEV-ES guests that use XSAVES.
 
 Selftests:
 
 - Fix bad use of strcat(), by not using strcat() at all
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWGFv0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPczAf/e6AgAnyPG1UItZqpLD+JDURcVaV1
 QyP3kc240e9dEjEkGidQ8vyekgAU9nGt2rFNPaU+5Y1E5Ky+SpZbbIzgS1cZypxT
 J1lsrVhZgNdCKEVRdrUMIzhkUEk0Kjd7OsFMQ9F6OuITSv/HCgZ1g6KobgBzUGCR
 0vcYqM74VnZiGGd5A4w8qP2F0FmF/7tf9k6iKWoYu6UpFe9z50jpIRq6dynrOHOc
 fmwsptmGzjgzuLK9sZTXYETOQvcpmXLqSZ65k1LQG224J5AYjS08Y5XLo1QS4rpV
 /g8QAgi+9ChGSzC47fqr/solAsoz/NzALPqydy+FH4u+O/O4SG5I4V8OmA==
 =4/NU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"RISC-V:

   - Fix a race condition in updating external interrupt for
     trap-n-emulated IMSIC swfile

   - Fix print_reg defaults in get-reg-list selftest

  ARM:

   - Ensure a vCPU's redistributor is unregistered from the MMIO bus if
     vCPU creation fails

   - Fix building KVM selftests for arm64 from the top-level Makefile

  x86:

   - Fix breakage for SEV-ES guests that use XSAVES

  Selftests:

   - Fix bad use of strcat(), by not using strcat() at all"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
  KVM: selftests: Fix dynamic generation of configuration names
  RISCV: KVM: update external interrupt atomically for IMSIC swfile
  KVM: riscv: selftests: Fix get-reg-list print_reg defaults
  KVM: selftests: Ensure sysreg-defs.h is generated at the expected path
  KVM: Convert comment into an assertion in kvm_io_bus_register_dev()
  KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs()
  KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy
  KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()
  KVM: arm64: vgic: Simplify kvm_vgic_destroy()
2023-12-22 19:22:20 -08:00
Paolo Bonzini
ef5b28372c KVM/riscv fixes for 6.7, take #1
- Fix a race condition in updating external interrupt for
   trap-n-emulated IMSIC swfile
 - Fix print_reg defaults in get-reg-list selftest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmV8h2AACgkQrUjsVaLH
 LAeVKQ//TXVUf7I8y71MO7dtNu31oU1ud9HzD02iRecsOK/207OpGBrzPrOLO3sP
 Ttk0TJCk1o+BU8kS5FQGV2tiIAb3SAF4utVDNPUOzfAd4Leg3CBvkH6aQSDCmNyk
 91ZdJVmlt7LHKb9UlMIwldKDb2z51HtPotZOVVbhqtmoiWKCR1BBlgtTjQ8+P1ME
 PQorbjnRYVmjqpK/IbhXboQBiFZssnFdVuV0zftncBZECyFAI6TfZR/FTGFcRN6u
 VnW5cvyWET8DGhcpqt1BuQl8dPomDV32+HtKszyxkvUTrc1RqfFmoP02adppSU6r
 /8h5WcylzF66swDpm59ibH9c2k1f4T6fW1YQrWL4JwmL0IX5/u3ZrdbNcBtoD2fd
 81VDiJe+jjklH5EsTm1gwgNaw+qS2SNtXuQ+Phe9pSsuwtD4LL94ZcAY29dCEuPk
 GR1DKxgVR+nfwtUC3OaShjDLkto90kcpjCaiHVIXL8ZU4pMEvEbcBc7B6rxwDT98
 udziEoH42UPvf/0B2AcDJZHwPEvvY9uI/dsrEN0PZa5rhRL9Z7nBE673iRDgcp7e
 lad25lss+BbpOsGzmztEQG83rtwYTCLN5ah1eWGOzSq5xV5j+XrnK1qvWsF+OrGl
 q8CEzc7sUZ/9EVxWCAhCPV8WqMSVIsfuSmS+pt7Mo9G+oVF58uM=
 =0+bX
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-fixes-6.7-1' of https://github.com/kvm-riscv/linux into kvm-master

KVM/riscv fixes for 6.7, take #1

- Fix a race condition in updating external interrupt for
  trap-n-emulated IMSIC swfile
- Fix print_reg defaults in get-reg-list selftest
2023-12-22 18:05:07 -05:00
Linus Torvalds
b7bc7bce88 xen: branch for v6.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZYUtLwAKCRCAXGG7T9hj
 vvAkAQDJED+V3e3adOpkt1xjLyftY2kzHbX9i0koIg7ivbBYEwD/c3yfPz2L8SXK
 IgU0LgxIMgGnjMRrBOuW9kY5FtDGPQQ=
 =EUSt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
 "A single patch fixing a build issue for x86 32-bit configurations with
  CONFIG_XEN, which was introduced in the 6.7 development cycle"

* tag 'for-linus-6.7a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: add CPU dependencies for 32-bit build
2023-12-22 08:37:48 -08:00
Arnd Bergmann
93cd059764 x86/xen: add CPU dependencies for 32-bit build
Xen only supports modern CPUs even when running a 32-bit kernel, and it now
requires a kernel built for a 64 byte (or larger) cache line:

In file included from <command-line>:
In function 'xen_vcpu_setup',
    inlined from 'xen_vcpu_setup_restore' at arch/x86/xen/enlighten.c:111:3,
    inlined from 'xen_vcpu_restore' at arch/x86/xen/enlighten.c:141:3:
include/linux/compiler_types.h:435:45: error: call to '__compiletime_assert_287' declared with attribute error: BUILD_BUG_ON failed: sizeof(*vcpup) > SMP_CACHE_BYTES
arch/x86/xen/enlighten.c:166:9: note: in expansion of macro 'BUILD_BUG_ON'
  166 |         BUILD_BUG_ON(sizeof(*vcpup) > SMP_CACHE_BYTES);
      |         ^~~~~~~~~~~~

Enforce the dependency with a whitelist of CPU configurations. In normal
distro kernels, CONFIG_X86_GENERIC is enabled, and this works fine. When this
is not set, still allow Xen to be built on kernels that target a 64-bit
capable CPU.

Fixes: db2832309a ("x86/xen: fix percpu vcpu_info allocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Alyssa Ross <hi@alyssa.is>
Link: https://lore.kernel.org/r/20231204084722.3789473-1-arnd@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-12-21 08:30:25 +01:00
Linus Torvalds
a4aebe9365 posix-timers: Get rid of [COMPAT_]SYS_NI() uses
Only the posix timer system calls use this (when the posix timer support
is disabled, which does not actually happen in any normal case), because
they had debug code to print out a warning about missing system calls.

Get rid of that special case, and just use the standard COND_SYSCALL
interface that creates weak system call stubs that return -ENOSYS for
when the system call does not exist.

This fixes a kCFI issue with the SYS_NI() hackery:

  CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
  WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0

Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-20 21:30:27 -08:00
Arnd Bergmann
c1ad12ee0e kexec: fix KEXEC_FILE dependencies
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:

x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'

Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already.  On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.

Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available.  This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.

On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance.  Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd ("s390/kexec_file: Add kexec_file_load system call").

[arnd@arndb.de: fix riscv build]
  Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 13:46:19 -08:00
Thomas Gleixner
d5a10b976e x86/acpi: Handle bogus MADT APIC tables gracefully
The recent fix to ignore invalid x2APIC entries inadvertently broke
systems with creative MADT APIC tables. The affected systems have APIC
MADT tables where all entries have invalid APIC IDs (0xFF), which means
they register exactly zero CPUs.

But the condition to ignore the entries of APIC IDs < 255 in the X2APIC
MADT table is solely based on the count of MADT APIC table entries.

As a consequence, the affected machines enumerate no secondary CPUs at
all because the APIC table has entries and therefore the X2APIC table
entries with APIC IDs < 255 are ignored.

Change the condition so that the APIC table preference for APIC IDs <
255 only becomes effective when the APIC table has valid APIC ID
entries.

IOW, an APIC table full of invalid APIC IDs is considered to be empty
which in consequence enables the X2APIC table entries with a APIC ID
< 255 and restores the expected behaviour.

Fixes: ec9aedb2aa ("x86/acpi: Ignore invalid x2APIC entries")
Reported-by: John Sperbeck <jsperbeck@google.com>
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/169953729188.3135.6804572126118798018.tip-bot2@tip-bot2
2023-12-18 14:21:44 +01:00
Linus Torvalds
a62aa88ba1 17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6 issues.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZXxs8wAKCRDdBJ7gKXxA
 junbAQCdItfHHinkWziciOrb0387wW+5WZ1ohqRFW8pGYLuasQEArpKmw13bvX7z
 e+ec9K1Ek9MlIsO2RwORR4KHH4MAbwA=
 =YpZh
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6
  issues"

* tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mglru: reclaim offlined memcgs harder
  mm/mglru: respect min_ttl_ms with memcgs
  mm/mglru: try to stop at high watermarks
  mm/mglru: fix underprotected page cache
  mm/shmem: fix race in shmem_undo_range w/THP
  Revert "selftests: error out if kernel header files are not yet built"
  crash_core: fix the check for whether crashkernel is from high memory
  x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
  sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC
  m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXEC
  loongarch, kexec: change dependency of object files
  mm/damon/core: make damon_start() waits until kdamond_fn() starts
  selftests/mm: cow: print ksft header before printing anything else
  mm: fix VMA heap bounds checking
  riscv: fix VMALLOC_START definition
  kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
2023-12-15 12:00:54 -08:00
Thomas Gleixner
2dc4196138 x86/alternatives: Disable interrupts and sync when optimizing NOPs in place
apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set
special as it optimizes the existing NOPs in place.

Unfortunately, this happens with interrupts enabled and does not provide any
form of core synchronization.

So an interrupt hitting in the middle of the update and using the affected code
path will observe a half updated NOP and crash and burn. The following
3 NOP sequence was observed to expose this crash halfway reliably under QEMU
  32bit:

   0x90 0x90 0x90

which is replaced by the optimized 3 byte NOP:

   0x8d 0x76 0x00

So an interrupt can observe:

   1) 0x90 0x90 0x90		nop nop nop
   2) 0x8d 0x90 0x90		undefined
   3) 0x8d 0x76 0x90		lea    -0x70(%esi),%esi
   4) 0x8d 0x76 0x00		lea     0x0(%esi),%esi

Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously.

Disable interrupts around this NOP optimization and invoke sync_core()
before re-enabling them.

Fixes: 270a69c448 ("x86/alternative: Support relocations in alternatives")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
2023-12-15 19:34:42 +01:00
Thomas Gleixner
3ea1704a92 x86/alternatives: Sync core before enabling interrupts
text_poke_early() does:

   local_irq_save(flags);
   memcpy(addr, opcode, len);
   local_irq_restore(flags);
   sync_core();

That's not really correct because the synchronization should happen before
interrupts are re-enabled to ensure that a pending interrupt observes the
complete update of the opcodes.

It's not entirely clear whether the interrupt entry provides enough
serialization already, but moving the sync_core() invocation into interrupt
disabled region does no harm and is obviously correct.

Fixes: 6fffacb303 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
2023-12-15 19:34:42 +01:00
Thomas Gleixner
69a7386c1e x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
Chris reported that a Dell PowerEdge T340 system stopped to boot when upgrading
to a kernel which contains the parallel hotplug changes.  Disabling parallel
hotplug on the kernel command line makes it boot again.

It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes up in
X2APIC mode, but the APs come up inconsistently in xAPIC mode.

Parallel hotplug requires that the upcoming CPU reads out its APIC ID from the
local APIC in order to map it to the Linux CPU number.

In this particular case the readout on the APs uses the MMIO mapped registers
because the BIOS failed to enable x2APIC mode. That readout results in a page
fault because the kernel does not have the APIC MMIO space mapped when X2APIC
mode was enabled by the BIOS on the boot CPU and the kernel switched to X2APIC
mode early. That page fault can't be handled on the upcoming CPU that early and
results in a silent boot failure.

If parallel hotplug is disabled the system boots because in that case the APIC
ID read is not required as the Linux CPU number is provided to the AP in the
smpboot control word. When the kernel uses x2APIC mode then the APs are
switched to x2APIC mode too slightly later in the bringup process, but there is
no reason to do it that late.

Cure the BIOS bogosity by checking in the parallel bootup path whether the
kernel uses x2APIC mode and if so switching over the APs to x2APIC mode before
the APIC ID readout.

Fixes: 0c7ffa32db ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it")
Reported-by: Chris Lindee <chris.lindee@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Chris Lindee <chris.lindee@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CA%2B2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com
2023-12-15 19:33:54 +01:00
Michael Roth
a26b7cd225 KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.

However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.

Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.

Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.

Fixes: 376c6d2850 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231016132819.1002933-4-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-13 12:46:07 -05:00
Baoquan He
69f8ca8d36 x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC
With the current ifdeffery CONFIG_KEXEC, get_cmdline_acpi_rsdp() is only
available when kexec_load interface is taken, while kexec_file_load
interface can't make use of it.

Now change it to CONFIG_KEXEC_CORE.

Link: https://lkml.kernel.org/r/20231208073036.7884-6-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Ignat Korchagin <ignat@cloudflare.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12 17:20:18 -08:00
Linus Torvalds
5412fed784 - Add a forgotten CPU vendor check in the AMD microcode post-loading
callback so that the callback runs only on AMD
 
 - Make sure SEV-ES protocol negotiation happens only once and on the BSP
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmV1l4MACgkQEsHwGGHe
 VUoESQ//WxIdMb/ATNtBRXEM3GRzd/X2he7kXRGVSUeJZMxEOnpVEmo4NtAzXwI7
 E2yOk4EJ6QlN+p6/eea5X2FJ5IkpiDKj1jVvU6NxTJiF2I5t1XUjdcXiv05Is+bP
 5Cx1ysaznMQYvZUler15NkbLCZvVX5/1V9FthnLpS1d1cvem0NKoH+hVUQHkdHbo
 /QNzEy+dyii8mp+t6D4QEfjq2Ab7bRbx1FufJcvHy9tFt3u8JsuWI/kHLm5Jk3VN
 n7Efynluw1wJC3l/18QzrxUCL5LVwrs2LxC0GH42dq9vBp0EJclgsBOh+43cm+Ey
 ffowP3uJxayngdigRA4ANpfbJL7GHjEtoYlxk/ooI4l2ccNtSdCXn86UVI9q3BiT
 60j/qV9r/o0Wl46/BVrT6SXhHlcKm4YchUa53NbOHcU3hBpUuM78WAcqs+63voMq
 kFqXQVihlJWYYytyFKw6Ng7O6SHR/SMC3uSygG0q2KebiUlVMZSheJin2Od+shhC
 skD0z2EUzUGuNICFHuxwA2uu6jr/KvkEwDfjczwdE2haUfZLCnOfTjxZmqBbuzJ6
 l7lfGjBCpGGtGettjrM8WAuHnL6MzNDI2jVDV3K6bzqQVzk9wjFQjIG9X7c8XjN/
 E2IyQxNuiIKVj3GfFOVM9lgJ4C8qwNDa2A46Kad2dKHioiFfMuM=
 =uIo2
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a forgotten CPU vendor check in the AMD microcode post-loading
   callback so that the callback runs only on AMD

 - Make sure SEV-ES protocol negotiation happens only once and on the
   BSP

* tag 'x86_urgent_for_v6.7_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Check vendor in the AMD microcode callback
  x86/sev: Fix kernel crash due to late update to read-only ghcb_version
2023-12-10 10:53:55 -08:00
Linus Torvalds
0aea22c7ab Generic:
* Set .owner for various KVM file_operations so that files refcount the
   KVM module until KVM is done executing _all_ code, including the last
   few instructions of kvm_put_kvm().  And then revert the misguided
   attempt to rely on "struct kvm" refcounts to pin KVM-the-module.
 
 ARM:
 
 * Do not redo the mapping of vLPIs, if they have already been mapped
 
 s390:
 
 * Do not leave bits behind in PTEs
 
 * Properly catch page invalidations that affect the prefix of a nested
   guest
 
 x86:
 
 * When checking if a _running_ vCPU is "in-kernel", i.e. running at CPL0,
   get the CPL directly instead of relying on preempted_in_kernel (which
   is valid if and only if the vCPU was preempted, i.e. NOT running).
 
 * Fix a benign "return void" that was recently introduced.
 
 Selftests:
 
 * Makefile tweak for dependency generation
 
 * -Wformat fix
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmVzhWMUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMd3QgArPrc1tHKb5RAv3/OGxUhgmmKcYFz
 AERbxrzcEv25MjCy6wfb+WzOTud/7KM113oVwHpkWbPaOdR/NteCeNrypCOiFnxB
 BzKOdm23A8UmjgFlwYpIlukmn51QT2iMWfIQDetDboRpRPlIZ6AOQpIHph99K0J1
 xMlXKbd4eaeIS+FoqJGD2wlKDkMZmrQD1SjndLp7xtPjRBZQzVTcMg/fKJNcDLAg
 xIfecCeeU4/lJ0fQlnnO5a4L3gCzJnSGLKYvLgBkX4sC1yFr959WAnqSn8mb+1wT
 jQtsmCg7+GCuNF5mn4nF/YnnuD7+xgbpGKVGlHhKCmmvmbi74NVm5SBNgA==
 =aZLl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Generic:

   - Set .owner for various KVM file_operations so that files refcount
     the KVM module until KVM is done executing _all_ code, including
     the last few instructions of kvm_put_kvm(). And then revert the
     misguided attempt to rely on "struct kvm" refcounts to pin
     KVM-the-module.

  ARM:

   - Do not redo the mapping of vLPIs, if they have already been mapped

  s390:

   - Do not leave bits behind in PTEs

   - Properly catch page invalidations that affect the prefix of a
     nested guest

  x86:

   - When checking if a _running_ vCPU is "in-kernel", i.e. running at
     CPL0, get the CPL directly instead of relying on
     preempted_in_kernel (which is valid if and only if the vCPU was
     preempted, i.e. NOT running).

   - Fix a benign "return void" that was recently introduced.

  Selftests:

   - Makefile tweak for dependency generation

   - '-Wformat' fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Update EFER software model on CR0 trap for SEV-ES
  KVM: selftests: add -MP to CFLAGS
  KVM: selftests: Actually print out magic token in NX hugepages skip message
  KVM: x86: Remove 'return void' expression for 'void function'
  Revert "KVM: Prevent module exit until all VMs are freed"
  KVM: Set file_operations.owner appropriately for all such structures
  KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode
  KVM: arm64: GICv4: Do not perform a map to a mapped vLPI
  KVM: s390/mm: Properly reset no-dat
  KVM: s390: vsie: fix wrong VIR 37 when MSO is used
2023-12-10 10:46:46 -08:00
Sean Christopherson
4cdf351d36 KVM: SVM: Update EFER software model on CR0 trap for SEV-ES
In general, activating long mode involves setting the EFER_LME bit in
the EFER register and then enabling the X86_CR0_PG bit in the CR0
register. At this point, the EFER_LMA bit will be set automatically by
hardware.

In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
necessary for the host to set EFER_LMA on behalf of the guest since
hardware does not see the actual CR0 write.

In the case of SEV-ES guests where writes to CR0 are trapped instead of
intercepted, the hardware *does* see/record the write to CR0 before
exiting and passing the value on to the host, so as part of enabling
SEV-ES support commit f1c6366e30 ("KVM: SVM: Add required changes to
support intercepts under SEV-ES") dropped special handling of the
EFER_LMA bit with the understanding that it would be set automatically.

However, since the guest never explicitly sets the EFER_LMA bit, the
host never becomes aware that it has been set. This becomes problematic
when userspace tries to get/set the EFER values via
KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
will be missing the EFER_LMA bit, and when userspace attempts to pass
the EFER value back via KVM_SET_SREGS it will fail a sanity check that
asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
are set.

Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
and EFER_LME, regardless of whether or not SEV-ES is enabled.

Fixes: f1c6366e30 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Reported-by: Peter Gonda <pgonda@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210507165947.2502412-2-seanjc@google.com>
[A two year old patch that was revived after we noticed the failure in
 KVM_SET_SREGS and a similar patch was posted by Michael Roth.  This is
 Sean's patch, but with Michael's more complete commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-08 13:37:05 -05:00
Paolo Bonzini
6254eebad4 KVM fixes for 6.7-rcN:
- When checking if a _running_ vCPU is "in-kernel", i.e. running at CPL0,
    get the CPL directly instead of relying on preempted_in_kernel, which
    is valid if and only if the vCPU was preempted, i.e. NOT running.
 
  - Set .owner for various KVM file_operations so that files refcount the
    KVM module until KVM is done executing _all_ code, including the last
    few instructions of kvm_put_kvm().  And then revert the misguided
    attempt to rely on "struct kvm" refcounts to pin KVM-the-module.
 
  - Fix a benign "return void" that was recently introduced.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmVyeV0SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5aJIP/izKZivi/kZjuuKp2c1W2XM+mBZlM+Yj
 qYcdV0rZygQJOZXTpMaEVg7iUtvbwAT495nm8sXr/IxXw+omMcP+qyLRCZ6JafYy
 B19buCnAt2DymlJOurFzlIEeWtunkxk/gFLMB/BnSrok88cKz5PMxAVFPPBXsTms
 ZqSFlDhzG0G4Mxhr8t0elyjd4HrCbNjCn1MhJg+uzFHKakfOvbST5jO02LkTeIM2
 VFrqWZo1C6uPDrA8TzWzik54qOrDFrodNv/XvIJ0szgVOc+7Iwxy80A/v7o7jBET
 igH+6F3cbST5uoKFrFn7pPJdTOfX2u18DXcpxiYIu+24ToKyqdE1Np9M5W4ZMX/9
 Im5ilykfylHpRYAL4tECD6Jzd/Q/xIvpe8Uk6HTfFAtb/UdMY35/1keBnnkI2oj8
 /4USM7AHNiqoAs4+OE4kZslrFG8ttv3vIOr7Mtk2UjGyGp8TH8sRFYPJKToXsQIJ
 Gs96rsbiU+oo/IDp3UiRhWtwpwfKGbkDLp4r/3X6UOx6Re5u1ITVIoM14qFQaw3W
 CKdHKN/MoreYLS5gasjaGRSyQNJPonaS10l8SqzWflrUBZYfyjCNKliihjKood2g
 JykH4p69IFTWADT2VbrGCQVKfY1GJCxGwpGePFmChTsPUiQ2P+AbHCmaefnIRRbK
 8UR/OmsDtFRZ
 =gWp0
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-fixes-6.7-rcN' of https://github.com/kvm-x86/linux into kvm-master

KVM fixes for 6.7-rcN:

 - When checking if a _running_ vCPU is "in-kernel", i.e. running at CPL0,
   get the CPL directly instead of relying on preempted_in_kernel, which
   is valid if and only if the vCPU was preempted, i.e. NOT running.

 - Set .owner for various KVM file_operations so that files refcount the
   KVM module until KVM is done executing _all_ code, including the last
   few instructions of kvm_put_kvm().  And then revert the misguided
   attempt to rely on "struct kvm" refcounts to pin KVM-the-module.

 - Fix a benign "return void" that was recently introduced.
2023-12-08 13:13:45 -05:00
Linus Torvalds
5e3f5b81de Including fixes from bpf and netfilter.
Current release - regressions:
 
  - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff
 
 Current release - new code bugs:
 
  - tcp: assorted fixes to the new Auth Option support
 
 Older releases - regressions:
 
  - tcp: fix mid stream window clamp
 
  - tls: fix incorrect splice handling
 
  - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit()
 
  - dsa: mv88e6xxx: restore USXGMII support for 6393X
 
  - arcnet: restore support for multiple Sohard Arcnet cards
 
 Older releases - always broken:
 
  - tcp: do not accept ACK of bytes we never sent
 
  - require admin privileges to receive packet traces via netlink
 
  - packet: move reference count in packet_sock to atomic_long_t
 
  - bpf:
    - fix incorrect branch offset comparison with cpu=v4
    - fix prog_array_map_poke_run map poke update
 
  - netfilter:
    - 3 fixes for crashes on bad admin commands
    - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
    - nf_tables: fix 'exist' matching on bigendian arches
 
  - leds: netdev: fix RTNL handling to prevent potential deadlock
 
  - eth: tg3: prevent races in error/reset handling
 
  - eth: r8169: fix rtl8125b PAUSE storm when suspended
 
  - eth: r8152: improve reset and surprise removal handling
 
  - eth: hns: fix race between changing features and sending
 
  - eth: nfp: fix sleep in atomic for bonding offload
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmVyGxsACgkQMUZtbf5S
 IrvziA//XZQLEQ3OsZnnYuuGkH0lPnY6ABaK/hcjCHnk9xs8SfIKPVYpq1LaShEp
 TY6mBhLMIANbdNO+yPzaszWVTkBPyb0w8JNy43bhLhOL3m/6FS6qwsgN8SAL2qVv
 8rnDF9Gsb4yU27aMZ6+2m92WiuyPptf4HrWU2ISSv/oCYH9TWsPUrTwt+QuVUboN
 eSbvMzgIAkFIQVSbhMuinR9bOzAypSJPi18m1kkID5NsNUP/OToxPE7IFDEVS/oo
 f4P7Ru6g1Gw9pAJmVXy5c0528Hy2P4Pyyw3LD5i2FWZ7rhYJRADOC4EMs9lINzrn
 uscNUyztldaMHkKcZRqKbaXsnA3MPvuf3qycRH0wyHa1+OjL9N4A9P077FugtBln
 UlmgVokfONVlxRgwy7AqapQbZ30QmnUEOvWjFWV3dsCBS3ziq1h7ujCTaQkl6R/6
 i96xuiUPMrAnxAlbFOjoF8NeGvcvwujYCqs/q5JC43f+xZRGf52Pwf5U/AliOFym
 aBX1mF/mdMLjYIBlGwFABiybACRPMceT2RuCfvhfIdQiM01OHlydO933jS+R3I4O
 cB03ppK0QiNo5W4RlMqDGuXfVnBJ36pv/2tY8IUOZGXSR+jSQOxZHrhYrtzMM5F8
 sWjpEIrfzdtuz0ssEg9wwGBTffEf07uZyPttov3Pm+VnDrsmCMU=
 =bkyC
 -----END PGP SIGNATURE-----

Merge tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff

  Current release - new code bugs:

   - tcp: assorted fixes to the new Auth Option support

  Older releases - regressions:

   - tcp: fix mid stream window clamp

   - tls: fix incorrect splice handling

   - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit()

   - dsa: mv88e6xxx: restore USXGMII support for 6393X

   - arcnet: restore support for multiple Sohard Arcnet cards

  Older releases - always broken:

   - tcp: do not accept ACK of bytes we never sent

   - require admin privileges to receive packet traces via netlink

   - packet: move reference count in packet_sock to atomic_long_t

   - bpf:
      - fix incorrect branch offset comparison with cpu=v4
      - fix prog_array_map_poke_run map poke update

   - netfilter:
      - three fixes for crashes on bad admin commands
      - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
      - nf_tables: fix 'exist' matching on bigendian arches

   - leds: netdev: fix RTNL handling to prevent potential deadlock

   - eth: tg3: prevent races in error/reset handling

   - eth: r8169: fix rtl8125b PAUSE storm when suspended

   - eth: r8152: improve reset and surprise removal handling

   - eth: hns: fix race between changing features and sending

   - eth: nfp: fix sleep in atomic for bonding offload"

* tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
  net/smc: fix missing byte order conversion in CLC handshake
  net: dsa: microchip: provide a list of valid protocols for xmit handler
  drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
  psample: Require 'CAP_NET_ADMIN' when joining "packets" group
  bpf: sockmap, updating the sg structure should also update curr
  net: tls, update curr on splice as well
  nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
  net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
  tcp: do not accept ACK of bytes we never sent
  selftests/bpf: Add test for early update in prog_array_map_poke_run
  bpf: Fix prog_array_map_poke_run map poke update
  netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
  netfilter: nf_tables: validate family when identifying table via handle
  netfilter: nf_tables: bail out on mismatching dynset and set expressions
  netfilter: nf_tables: fix 'exist' matching on bigendian arches
  netfilter: nft_set_pipapo: skip inactive elements during set walk
  netfilter: bpf: fix bad registration on nf_defrag
  leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
  octeontx2-af: Update Tx link register range
  ...
2023-12-07 17:04:13 -08:00
Kirill A. Shutemov
f4116bfc44 x86/tdx: Allow 32-bit emulation by default
32-bit emulation was disabled on TDX to prevent a possible attack by
a VMM injecting an interrupt on vector 0x80.

Now that int80_emulation() has a check for external interrupts the
limitation can be lifted.

To distinguish software interrupts from external ones, int80_emulation()
checks the APIC ISR bit relevant to the 0x80 vector. For
software interrupts, this bit will be 0.

On TDX, the VAPIC state (including ISR) is protected and cannot be
manipulated by the VMM. The ISR bit is set by the microcode flow during
the handling of posted interrupts.

[ dhansen: more changelog tweaks ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07 09:51:29 -08:00
Thomas Gleixner
55617fb991 x86/entry: Do not allow external 0x80 interrupts
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
also triggers the same codepath.

An external interrupt on vector 0x80 will currently be interpreted as a
32-bit system call, and assuming that it was a user context.

Panic on external interrupts on the vector.

To distinguish software interrupts from external ones, the kernel checks
the APIC ISR bit relevant to the 0x80 vector. For software interrupts,
this bit will be 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07 09:51:29 -08:00
Thomas Gleixner
be5341eb0d x86/entry: Convert INT 0x80 emulation to IDTENTRY
There is no real reason to have a separate ASM entry point implementation
for the legacy INT 0x80 syscall emulation on 64-bit.

IDTENTRY provides all the functionality needed with the only difference
that it does not:

  - save the syscall number (AX) into pt_regs::orig_ax
  - set pt_regs::ax to -ENOSYS

Both can be done safely in the C code of an IDTENTRY before invoking any of
the syscall related functions which depend on this convention.

Aside of ASM code reduction this prepares for detecting and handling a
local APIC injected vector 0x80.

[ kirill.shutemov: More verbose comments ]
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07 09:51:29 -08:00
Kirill A. Shutemov
b82a8dbd3d x86/coco: Disable 32-bit emulation by default on TDX and SEV
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
triggers the same handler.

The kernel interprets an external interrupt on vector 0x80 as a 32-bit
system call that came from userspace.

A VMM can inject external interrupts on any arbitrary vector at any
time.  This remains true even for TDX and SEV guests where the VMM is
untrusted.

Put together, this allows an untrusted VMM to trigger int80 syscall
handling at any given point. The content of the guest register file at
that moment defines what syscall is triggered and its arguments. It
opens the guest OS to manipulation from the VMM side.

Disable 32-bit emulation by default for TDX and SEV. User can override
it with the ia32_emulation=y command line option.

[ dhansen: reword the changelog ]

Reported-by: Supraja Sridhara <supraja.sridhara@inf.ethz.ch>
Reported-by: Benedict Schlüter <benedict.schlueter@inf.ethz.ch>
Reported-by: Mark Kuhne <mark.kuhne@inf.ethz.ch>
Reported-by: Andrin Bertschi <andrin.bertschi@inf.ethz.ch>
Reported-by: Shweta Shinde <shweta.shinde@inf.ethz.ch>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org> # v6.0+: 1da5c9b x86: Introduce ia32_enabled()
Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07 09:51:10 -08:00
Jiri Olsa
4b7de80160 bpf: Fix prog_array_map_poke_run map poke update
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.

There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.

The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.

I'm hitting following race during the program load:

  CPU 0                             CPU 1

  bpf_prog_load
    bpf_check
      do_misc_fixups
        prog_array_map_poke_track

                                    map_update_elem
                                      bpf_fd_array_map_update_elem
                                        prog_array_map_poke_run

                                          bpf_arch_text_poke returns -EINVAL

    bpf_prog_kallsyms_add

After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

Similar race exists on the program unload.

Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.

Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.

  [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
2023-12-06 22:40:16 +01:00
Linus Torvalds
deb4b9dd3b xen: branch for v6.7-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZWraswAKCRCAXGG7T9hj
 vlV0AP9241p7vHlIW6PIdfNZt9/dZZpuFnKHz+cE99pTZDl5nwEAoYqXUm/kEb14
 VAy7x0XIVdEt+9l4bgO2Qggx+Y184Qs=
 =APgm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - A fix for the Xen event driver setting the correct return value when
   experiencing an allocation failure

 - A fix for allocating space for a struct in the percpu area to not
   cross page boundaries (this one is for x86, a similar one for Arm was
   already in the pull request for rc3)

* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: fix error code in xen_bind_pirq_msi_to_irq()
  x86/xen: fix percpu vcpu_info allocation
2023-12-03 08:31:53 +09:00
Borislav Petkov (AMD)
9b8493dc43 x86/CPU/AMD: Check vendor in the AMD microcode callback
Commit in Fixes added an AMD-specific microcode callback. However, it
didn't check the CPU vendor the kernel runs on explicitly.

The only reason the Zenbleed check in it didn't run on other x86 vendors
hardware was pure coincidental luck:

  if (!cpu_has_amd_erratum(c, amd_zenbleed))
	  return;

gives true on other vendors because they don't have those families and
models.

However, with the removal of the cpu_has_amd_erratum() in

  05f5f73936 ("x86/CPU/AMD: Drop now unused CPU erratum checking function")

that coincidental condition is gone, leading to the zenbleed check
getting executed on other vendors too.

Add the explicit vendor check for the whole callback as it should've
been done in the first place.

Fixes: 522b1d6921 ("x86/cpu/amd: Add a Zenbleed fix")
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231201184226.16749-1-bp@alien8.de
2023-12-02 11:40:24 +01:00
Like Xu
ef8d89033c KVM: x86: Remove 'return void' expression for 'void function'
The requested info will be stored in 'guest_xsave->region' referenced by
the incoming pointer "struct kvm_xsave *guest_xsave", thus there is no need
to explicitly use return void expression for a void function "static void
kvm_vcpu_ioctl_x86_get_xsave(...)". The issue is caught with [-Wpedantic].

Fixes: 2d287ec65e79 ("x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20231007064019.17472-1-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 08:14:27 -08:00
Sean Christopherson
087e15206d KVM: Set file_operations.owner appropriately for all such structures
Set .owner for all KVM-owned filed types so that the KVM module is pinned
until any files with callbacks back into KVM are completely freed.  Using
"struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive
while there are active VMs, doesn't provide full protection.

Userspace can invoke delete_module() the instant the last reference to KVM
is put.  If KVM itself puts the last reference, e.g. via kvm_destroy_vm(),
then it's possible for KVM to be preempted and deleted/unloaded before KVM
fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back
in, it will jump to a code page that is no longer mapped.

Note, file types that can call into sub-module code, e.g. kvm-intel.ko or
kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not
THIS_MODULE (which points at kvm.ko).  KVM assumes that if /dev/kvm is
reachable, e.g. VMs are active, then the vendor module is loaded.

To reduce the probability of forgetting to set .owner entirely, use
THIS_MODULE for stats files where KVM does not call back into vendor code.

This reverts commit 70375c2d8f, and fixes
several other file types that have been buggy since their introduction.

Fixes: 70375c2d8f ("Revert "KVM: set owner of cpu and vm file operations"")
Fixes: 3bcd0662d6 ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV
Link: https://lore.kernel.org/r/20231018204624.1905300-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-12-01 08:12:17 -08:00
Ashwin Dayanand Kamat
27d25348d4 x86/sev: Fix kernel crash due to late update to read-only ghcb_version
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.

Trying to write it results in a pagefault:

  BUG: unable to handle page fault for address: ffffffffba556e70
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  [ ...]
  Call Trace:
   <TASK>
   ? __die_body.cold+0x1a/0x1f
   ? __die+0x2a/0x35
   ? page_fault_oops+0x10c/0x270
   ? setup_ghcb+0x71/0x100
   ? __x86_return_thunk+0x5/0x6
   ? search_exception_tables+0x60/0x70
   ? __x86_return_thunk+0x5/0x6
   ? fixup_exception+0x27/0x320
   ? kernelmode_fixup_or_oops+0xa2/0x120
   ? __bad_area_nosemaphore+0x16a/0x1b0
   ? kernel_exc_vmm_communication+0x60/0xb0
   ? bad_area_nosemaphore+0x16/0x20
   ? do_kern_addr_fault+0x7a/0x90
   ? exc_page_fault+0xbd/0x160
   ? asm_exc_page_fault+0x27/0x30
   ? setup_ghcb+0x71/0x100
   ? setup_ghcb+0xe/0x100
   cpu_init_exception_handling+0x1b9/0x1f0

The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.

[ mingo: Refined the changelog. ]

Fixes: 95d33bfaa3 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/1701254429-18250-1-git-send-email-kashwindayan@vmware.com
2023-11-30 10:23:12 +01:00
Like Xu
547c91929f KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode
When querying whether or not a vCPU "is" running in kernel mode, directly
get the CPL if the vCPU is the currently loaded vCPU.  In scenarios where
a guest is profiled via perf-kvm, querying vcpu->arch.preempted_in_kernel
from kvm_guest_state() is wrong if vCPU is actively running, i.e. isn't
scheduled out due to being preempted and so preempted_in_kernel is stale.

This affects perf/core's ability to accurately tag guest RIP with
PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample.  This
causes perf/tool to fail to connect the vCPU RIPs to the guest kernel
space symbols when parsing these samples due to incorrect PERF_RECORD_MISC
flags:

   Before (perf-report of a cpu-cycles sample):
      1.23%  :58945   [unknown]         [u] 0xffffffff818012e0

   After:
      1.35%  :60703   [kernel.vmlinux]  [g] asm_exc_page_fault

Note, checking preempted_in_kernel in kvm_arch_vcpu_in_kernel() is awful
as nothing in the API's suggests that it's safe to use if and only if the
vCPU was preempted.  That can be cleaned up in the future, for now just
fix the glaring correctness bug.

Note #2, checking vcpu->preempted is NOT safe, as getting the CPL on VMX
requires VMREAD, i.e. is correct if and only if the vCPU is loaded.  If
the target vCPU *was* preempted, then it can be scheduled back in after
the check on vcpu->preempted in kvm_vcpu_on_spin(), i.e. KVM could end up
trying to do VMREAD on a VMCS that isn't loaded on the current pCPU.

Signed-off-by: Like Xu <likexu@tencent.com>
Fixes: e1bfc24577 ("KVM: Move x86's perf guest info callbacks to generic KVM")
Link: https://lore.kernel.org/r/20231123075818.12521-1-likexu@tencent.com
[sean: massage changelong, add Fixes]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-29 10:22:55 -08:00
Juergen Gross
db2832309a x86/xen: fix percpu vcpu_info allocation
Today the percpu struct vcpu_info is allocated via DEFINE_PER_CPU(),
meaning that it could cross a page boundary. In this case registering
it with the hypervisor will fail, resulting in a panic().

This can easily be fixed by using DEFINE_PER_CPU_ALIGNED() instead,
as struct vcpu_info is guaranteed to have a size of 64 bytes, matching
the cache line size of x86 64-bit processors (Xen doesn't support
32-bit processors).

Fixes: 5ead97c84f ("xen: Core Xen implementation")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.con>
Link: https://lore.kernel.org/r/20231124074852.25161-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-28 12:47:11 +01:00
Linus Torvalds
4892711ace Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version
 confusion when distros backport fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjFKgRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1h/7hAAs5hL1KvrfdW0VpAW91MbX6mtIe7Emc8T
 LCiBJtl9UngRdASUC9CGrcIZ5JIps0702gAq0qPVzk5zKxC22ySWsqMZybask+eF
 d6E7amMtF+KX0wiCZSuC66StCKA08JfrUXgxvYHnxDjNqERYFmVr1QabGL1IN5lZ
 KUrVUyvN8VOnzypOiQ98lXGWDJYwaV7t+IzMMh7mT5OUkoo09e6tFm7IF+NWg5xe
 NYCcvZqyo0Ipld7HOjlGHYG+blFkDxJpfTby5UevZybXsPd00cxBzDSR9zs1sYeG
 Kt6cgwDhfewfcM1QFVvNV/SDVsPp1BlVvMUa6Xa3vtsnWCit8zQqMbkYYWUaTcIh
 yUJvtzh/xZQtxaQ8Z8SbI7EhUBOFJXoHWV9JoEe3gWsWA5thu+4iOCh/P8C2ON3n
 6kLSgNQ4GAylH34MWoS84t2Jxv7XmNZljR/78ucRQrJ1JJIEA+r4sJ9hK9btxqf2
 0n86StHuwtNXSQwEhDcacqUpFPLZ65Za1Y9AXc69CDuiwj3DvTVBMwjEOBbnGTrZ
 dL9QOYG5gkklOx4o5ePj7RoLrzz/j6dj6idmu8FxZZ4q+QB9vvL2lRHusJnUEloE
 yxR7WWOB/kyUZT7FriLHRuEP7yQNRSvLs6U7b8uXCHiGcAb2mN0fFi2m/BcJGS4A
 hHA0t9WyNBM=
 =eYQ4
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode fixes from Ingo Molnar:
 "Fix/enhance x86 microcode version reporting: fix the bootup log spam,
  and remove the driver version announcement to avoid version confusion
  when distros backport fixes"

* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Rework early revisions reporting
  x86/microcode: Remove the driver announcement and version
2023-11-26 08:42:42 -08:00
Linus Torvalds
e81fe50520 Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
code resulting in non-working events on those platforms.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmVjEt8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g54g/9EKugplQhVSfEXhzfBx38ICyodLxJuomL
 x96auVes+H3CEC6CYADzAY14GPDxymzOVSK/Sy8fWkzyGVWG23SZgB0i/hOTIsFL
 CVZLrVztidvkWWzJj5RnzNstZsUXp4QfNtcIw3VvCcsgDhk8WVen1xe5XbIuzamJ
 HbTrlrL2S3feAWaWZUh6c/3unHsLhCupLg8u0DTe97OFlRRlSj+wtmJCMCQoRRpL
 /LVItmmxfRbuMeyY8GGR6kb2qcb2+Zr6AUOXUPIoYOqARLuaV81F0kD2zwpPwB68
 rFPYBG2ld2IHkbNFgxfCfNqP2DuY5TnSpS8SJzcfTAp6saIA5Ua3QLwg8IumV/2X
 G3ZI/G54er2+hhddX7tUjJIiutwNHxGkiOKy3Bnxm8K8c2H0I8XZ97fB1OWQUnVy
 pnwMzz3tlUOfF9SV8K0zttu3Dar9dcoBybT8HL/Wxr4RWF+FuZBiQNjjd92k3L4u
 Ua8Q6nLD/IRXwQfiEWzDSsWIQqxuSDGxvGHBgYJm4d5DxblFAeKmKMkFnO6Z17Kh
 fm1M4nW/YawSQYlG7Jo5XG+dWESJPhM5IAVBw19o0oYkqHV2gfmUerG8hvJYR7DY
 CpLrw4S9UDOLzsYAtNoozB/GMQnXQvjRz8mILWiwNJxpbo1saeZUj43zHvzqr7ta
 Vj96qrQLuNU=
 =jOSq
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf event fix from Ingo Molnar:
 "Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
  code resulting in non-working events on those platforms"

* tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
2023-11-26 08:34:12 -08:00
Linus Torvalds
05c8c94ed4 hyperv-fixes for 6.7-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmVdgqYTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXhBsCACzUGLF3vOQdrmgTMymzaaOzfLJtvNW
 oQ34FwMJMOAyJ6FxM12IJPHA2j+azl9CPjQc5O6F2CBcF8hVj2mDIINQIi+4wpV5
 FQv445g2KFml/+AJr/1waz1GmhHtr1rfu7B7NX6tPUtOpxKN7AHAQXWYmHnwK8BJ
 5Mh2a/7Lphjin4M1FWCeBTj0JtqF1oVAW2L9jsjqogq1JV0a51DIFutROtaPSC/4
 ssTLM5Rqpnw8Z1GWVYD2PObIW4A+h1LV1tNGOIoGW6mX56mPU+KmVA7tTKr8Je/i
 z3Jk8bZXFyLvPW2+KNJacbldKNcfwAFpReffNz/FO3R16Stq9Ta1mcE2
 =wXju
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - One fix for the KVP daemon (Ani Sinha)

 - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)

 - Micro-optimization for hv_nmi_unknown() (Uros Bizjak)

* tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
  x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
  hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22 09:56:26 -08:00
Uros Bizjak
18286883e7 x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
the ZF flag, so this change saves a compare after CMPXCHG. The generated
asm code improves from:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	83 f8 ff             	cmp    $0xffffffff,%eax
  55:	0f 95 c0             	setne  %al

to:

  3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
  45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
  51:	00
  52:	0f 95 c0             	setne  %al

No functional change intended.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
2023-11-22 03:47:44 +00:00
Borislav Petkov (AMD)
080990aa33 x86/microcode: Rework early revisions reporting
The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.

So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.

What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.

So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.

Then, dump them only once on driver init.

On Intel, do not dump the patch date - it is not needed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
2023-11-21 16:35:48 +01:00
Borislav Petkov (AMD)
2e569ada42 x86/microcode: Remove the driver announcement and version
First of all, the print is useless. The driver will either load and say
which microcode revision the machine has or issue an error.

Then, the version number is meaningless and actively confusing, as Yazen
mentioned recently: when a subset of patches are backported to a distro
kernel, one can't assume the driver version is the same as the upstream
one. And besides, the version number of the loader hasn't been used and
incremented for a long time. So drop it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de
2023-11-21 16:20:49 +01:00
Dapeng Mi
e8df9d9f42 perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:

  sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1

  Opening: cpu/cycles/:HG
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    config                           0xa00000000
    disabled                         1
  ------------------------------------------------------------
  sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -16

   Performance counter stats for 'sleep 1':

       <not counted>      cpu_atom/instructions/

It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.

Further investigation shows the issue was introduced by commit:

  97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")

The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.

Testing this fix on Intel hybrid platforms, the observed issues
disappear.

Fixes: 97588df87b ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
2023-11-21 13:44:36 +01:00
Linus Torvalds
cd557bc0a2 - Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
   use
 
 - A documentation fix
 
 - Add Kirill as TDX maintainer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVaChkACgkQEsHwGGHe
 VUraNQ/+KyCyJgG6bdIB3tS9qKr0Z4REaXQ+UQ7DfAjlhrzw7C6f4VReNLp3ohEv
 RdxNjKLEueYFQAo+v8uKGkqYIT6H1ob9uW+RjtjN+OJqWN/3AfK7CTx8HI1bJsW5
 wKM+Ey81cID0iQDiNPAdzRnu7suKKjF5jLwztAw6EYOsTRfUnLZ8Ct84uHBWd58v
 kZ+WkEyeOyeJo+Vdx07d/LEcCJ+S9G6WfA0AnhHPOZxRZTn2RhqNsnJvqTeOvWUM
 PSN9NjxFk0ymidwnhR1urw1wHGgTT990vNsPIHLE72TwXrWEOM14Xkq1XNI4PfD1
 Bp74ySpF0YUQrvgBW4V3qXgBFls4DkKys1amd2kK5KQGEpcXZm7ZPnI5w2NKMsY4
 1Tk379W/1jPY8cyZjIqn92eFEkAjfID4eHICLj5IJhVMUusNEPmxgoycvKDqI8sK
 NihF1wUjyfRibh4ujYaurqKUBgxVHo2dyXPPo7UNzeaMfvqkFaxgwNJVF0gQ+MyI
 5BzeY71RCFb8ZKtCT6SVN6oUeWLg+QAZApoJVDDnhF9InG+wJj+D400T7pZnNHbo
 ag6L2gJFJ2+XsV8DJhiaII0gfbf9cUppn4G7RcvQfL2HivYnZV3q1dBKf6C35H44
 Kpz5w/eoJPOIcuZ48a6ph80zuRpuN6MSBigZ0G2Q7IwrmFx1Vcg=
 =PGYO
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Ignore invalid x2APIC entries in order to not waste per-CPU data

 - Fix a back-to-back signals handling scenario when shadow stack is in
   use

 - A documentation fix

 - Add Kirill as TDX maintainer

* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi: Ignore invalid x2APIC entries
  x86/shstk: Delay signal entry SSP write until after user accesses
  x86/Documentation: Indent 'note::' directive for protocol version number note
  MAINTAINERS: Add Intel TDX entry
2023-11-19 13:46:17 -08:00
Roger Pau Monne
bfa993b355 acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
The Processor capability bits notify ACPI of the OS capabilities, and
so ACPI can adjust the return of other Processor methods taking the OS
capabilities into account.

When Linux is running as a Xen dom0, the hypervisor is the entity
in charge of processor power management, and hence Xen needs to make
sure the capabilities reported by _OSC/_PDC match the capabilities of
the driver in Xen.

Introduce a small helper to sanitize the buffer when running as Xen
dom0.

When Xen supports HWP, this serves as the equivalent of commit
a21211672c ("ACPI / processor: Request native thermal interrupt
handling via _OSC") to avoid SMM crashes.  Xen will set bit
ACPI_PROC_CAP_COLLAB_PROC_PERF (bit 12) in the capability bits and the
_OSC/_PDC call will apply it.

[ jandryuk: Mention Xen HWP's need.  Support _OSC & _PDC ]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20231108212517.72279-1-jandryuk@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-11-13 07:22:00 +01:00
Saurabh Sengar
7e8037b099 x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
A Gen2 VM doesn't support legacy PCI/PCIe, so both raw_pci_ops and
raw_pci_ext_ops are NULL, and pci_subsys_init() -> pcibios_init()
doesn't call pcibios_resource_survey() -> e820__reserve_resources_late();
as a result, any emulated persistent memory of E820_TYPE_PRAM (12) via
the kernel parameter memmap=nn[KMG]!ss is not added into iomem_resource
and hence can't be detected by register_e820_pmem().

Fix this by directly calling e820__reserve_resources_late() in
hv_pci_init(), which is called from arch_initcall(pci_arch_init).

It's ok to move a Gen2 VM's e820__reserve_resources_late() from
subsys_initcall(pci_subsys_init) to arch_initcall(pci_arch_init) because
the code in-between doesn't depend on the E820 resources.
e820__reserve_resources_late() depends on e820__reserve_resources(),
which has been called earlier from setup_arch().

For a Gen-2 VM, the new hv_pci_init() also adds any memory of
E820_TYPE_PMEM (7) into iomem_resource, and acpi_nfit_register_region() ->
acpi_nfit_insert_resource() -> region_intersects() returns
REGION_INTERSECTS, so the memory of E820_TYPE_PMEM won't get added twice.

Changed the local variable "int gen2vm" to "bool gen2vm".

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1699691867-9827-1-git-send-email-ssengar@linux.microsoft.com>
2023-11-12 22:50:30 +00:00
Arnd Bergmann
abc28463c8 kprobes: unify kprobes_exceptions_nofify() prototypes
Most architectures that support kprobes declare this function in their
own asm/kprobes.h header and provide an override, but some are missing
the prototype, which causes a warning for the __weak stub implementation:

kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes]
 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self,

Move the prototype into linux/kprobes.h so it is visible to all
the definitions.

Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-10 19:59:05 +09:00
Zhang Rui
ec9aedb2aa x86/acpi: Ignore invalid x2APIC entries
Currently, the kernel enumerates the possible CPUs by parsing both ACPI
MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
enumerated.

Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,

[02Ch 0044   1]                Subtable Type : 00 [Processor Local APIC]
[02Fh 0047   1]                Local Apic ID : 00
...
[164h 0356   1]                Subtable Type : 00 [Processor Local APIC]
[167h 0359   1]                Local Apic ID : 39
[16Ch 0364   1]                Subtable Type : 00 [Processor Local APIC]
[16Fh 0367   1]                Local Apic ID : FF
...
[3ECh 1004   1]                Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008   4]                Processor x2Apic ID : 00000000
...
[B5Ch 2908   1]                Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912   4]                Processor x2Apic ID : 00000077

As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.

According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure:

  "[Compatibility note] On some legacy OSes, Logical processors with APIC
   ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
   Processor Local APIC structure to convey their APIC information to OSPM,
   and those processors must be declared in the DSDT using the Processor()
   keyword. Logical processors with APIC ID values 255 and greater must use
   the Processor Local x2APIC structure and be declared using the Device()
   keyword."

Therefore prevent the registration of x2APIC entries with an APIC ID less
than 255 if the local APIC table enumerates valid APIC IDs.

[ tglx: Simplify the logic ]

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
2023-11-09 14:33:30 +01:00
Rick Edgecombe
31255e072b x86/shstk: Delay signal entry SSP write until after user accesses
When a signal is being delivered, the kernel needs to make accesses to
userspace. These accesses could encounter an access error, in which case
the signal delivery itself will trigger a segfault. Usually this would
result in the kernel killing the process. But in the case of a SEGV signal
handler being configured, the failure of the first signal delivery will
result in *another* signal getting delivered. The second signal may
succeed if another thread has resolved the issue that triggered the
segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
being delivered to another stack (i.e. an alt stack).

On x86, in the non-shadow stack case, all the accesses to userspace are
done before changes to the registers (in pt_regs). The operation is
aborted when an access error occurs, so although there may be writes done
for the first signal, control flow changes for the signal (regs->ip,
regs->sp, etc) are not committed until all the accesses have already
completed successfully. This means that the second signal will be
delivered as if it happened at the time of the first signal. It will
effectively replace the first aborted signal, overwriting the half-written
frame of the aborted signal. So on sigreturn from the second signal,
control flow will resume happily from the point of control flow where the
original signal was delivered.

The problem is, when shadow stack is active, the shadow stack SSP
register/MSR is updated *before* some of the userspace accesses. This
means if the earlier accesses succeed and the later ones fail, the second
signal will not be delivered at the same spot on the shadow stack as the
first one. So on sigreturn from the second signal, the SSP will be
pointing to the wrong location on the shadow stack (off by a frame).

Pengfei privately reported that while using a shadow stack enabled glibc,
the “signal06” test in the LTP test-suite hung. It turns out it is
testing the above described double signal scenario. When this test was
compiled with shadow stack, the first signal pushed a shadow stack
sigframe, then the second pushed another. When the second signal was
handled, the SSP was at the first shadow stack signal frame instead of
the original location. The test then got stuck as the #CP from the twice
incremented SSP was incorrect and generated segfaults in a loop.

Fix this by adjusting the SSP register only after any userspace accesses,
such that there can be no failures after the SSP is adjusted. Do this by
moving the shadow stack sigframe push logic to happen after all other
userspace accesses.

Note, sigreturn (as opposed to the signal delivery dealt with in this
patch) has ordering behavior that could lead to similar failures. The
ordering issues there extend beyond shadow stack to include the alt stack
restoration. Fixing that would require cross-arch changes, and the
ordering today does not cause any known test or apps breakages. So leave
it as is, for now.

[ dhansen: minor changelog/subject tweak ]

Fixes: 05e36022c0 ("x86/shstk: Handle signals for shadow stack")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
2023-11-08 08:55:37 -08:00
Linus Torvalds
5e2cb28dd7 configfs-tsm for v6.7
- Introduce configfs-tsm as a shared ABI for confidential computing
   attestation reports
 
 - Convert sev-guest to additionally support configfs-tsm alongside its
   vendor specific ioctl()
 
 - Added signed attestation report retrieval to the tdx-guest driver
   forgoing a new vendor specific ioctl()
 
 - Misc. cleanups and a new __free() annotation for kvfree()
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZUQhiQAKCRDfioYZHlFs
 Z2gMAQCJdtP0f2kH+pvf3oxAkA1OubKBqJqWOppeyrhTsNMpDQEA9ljXH9h7eRB/
 2NQ6USrU6jqcdu3gB5Tzq8J/ZZabMQU=
 =1Eiv
 -----END PGP SIGNATURE-----

Merge tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux

Pull unified attestation reporting from Dan Williams:
 "In an ideal world there would be a cross-vendor standard attestation
  report format for confidential guests along with a common device
  definition to act as the transport.

  In the real world the situation ended up with multiple platform
  vendors inventing their own attestation report formats with the
  SEV-SNP implementation being a first mover to define a custom
  sev-guest character device and corresponding ioctl(). Later, this
  configfs-tsm proposal intercepted an attempt to add a tdx-guest
  character device and a corresponding new ioctl(). It also anticipated
  ARM and RISC-V showing up with more chardevs and more ioctls().

  The proposal takes for granted that Linux tolerates the vendor report
  format differentiation until a standard arrives. From talking with
  folks involved, it sounds like that standardization work is unlikely
  to resolve anytime soon. It also takes the position that kernfs ABIs
  are easier to maintain than ioctl(). The result is a shared configfs
  mechanism to return per-vendor report-blobs with the option to later
  support a standard when that arrives.

  Part of the goal here also is to get the community into the
  "uncomfortable, but beneficial to the long term maintainability of the
  kernel" state of talking to each other about their differentiation and
  opportunities to collaborate. Think of this like the device-driver
  equivalent of the common memory-management infrastructure for
  confidential-computing being built up in KVM.

  As for establishing an "upstream path for cross-vendor
  confidential-computing device driver infrastructure" this is something
  I want to discuss at Plumbers. At present, the multiple vendor
  proposals for assigning devices to confidential computing VMs likely
  needs a new dedicated repository and maintainer team, but that is a
  discussion for v6.8.

  For now, Greg and Thomas have acked this approach and this is passing
  is AMD, Intel, and Google tests.

  Summary:

   - Introduce configfs-tsm as a shared ABI for confidential computing
     attestation reports

   - Convert sev-guest to additionally support configfs-tsm alongside
     its vendor specific ioctl()

   - Added signed attestation report retrieval to the tdx-guest driver
     forgoing a new vendor specific ioctl()

   - Misc cleanups and a new __free() annotation for kvfree()"

* tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux:
  virt: tdx-guest: Add Quote generation support using TSM_REPORTS
  virt: sevguest: Add TSM_REPORTS support for SNP_GET_EXT_REPORT
  mm/slab: Add __free() support for kvfree
  virt: sevguest: Prep for kernel internal get_ext_report()
  configfs-tsm: Introduce a shared ABI for attestation reports
  virt: coco: Add a coco/Makefile and coco/Kconfig
  virt: sevguest: Fix passing a stack buffer as a scatterlist target
2023-11-04 15:58:13 -10:00
Linus Torvalds
0a23fb262d Major microcode loader restructuring, cleanup and improvements by Thomas
Gleixner:
 
 - Restructure the code needed for it and add a temporary initrd mapping
   on 32-bit so that the loader can access the microcode blobs. This in
   itself is a preparation for the next major improvement:
 
 - Do not load microcode on 32-bit before paging has been enabled.
   Handling this has caused an endless stream of headaches, issues, ugly
   code and unnecessary hacks in the past. And there really wasn't any
   sensible reason to do that in the first place. So switch the 32-bit
   loading to happen after paging has been enabled and turn the loader
   code "real purrty" again
 
 - Drop mixed microcode steppings loading on Intel - there, a single patch
   loaded on the whole system is sufficient
 
 - Rework late loading to track which CPUs have updated microcode
   successfully and which haven't, act accordingly
 
 - Move late microcode loading on Intel in NMI context in order to
   guarantee concurrent loading on all threads
 
 - Make the late loading CPU-hotplug-safe and have the offlined threads
   be woken up for the purpose of the update
 
 - Add support for a minimum revision which determines whether late
   microcode loading is safe on a machine and the microcode does not
   change software visible features which the machine cannot use anyway
   since feature detection has happened already. Roughly, the minimum
   revision is the smallest revision number which must be loaded
   currently on the system so that late updates can be allowed
 
 - Other nice leanups, fixess, etc all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmVE0xkACgkQEsHwGGHe
 VUrCuBAAhOqqwkYPiGXPWd2hvdn1zGtD5fvEdXn3Orzd+Lwc6YaQTsCxCjIO/0ws
 8inpPFuOeGz4TZcplzipi3G5oatPVc7ORDuW+/BvQQQljZOsSKfhiaC29t6dvS6z
 UG3sbCXKVwlJ5Kwv3Qe4eWur4Ex6GeFDZkIvBCmbaAdGPFlfu1i2uO1yBooNP1Rs
 GiUkp+dP1/KREWwR/dOIsHYL2QjWIWfHQEWit/9Bj46rxE9ERx/TWt3AeKPfKriO
 Wp0JKp6QY78jg6a0a2/JVmbT1BKz69Z9aPp6hl4P2MfbBYOnqijRhdezFW0NyqV2
 pn6nsuiLIiXbnSOEw0+Wdnw5Q0qhICs5B5eaBfQrwgfZ8pxPHv2Ir777GvUTV01E
 Dv0ZpYsHa+mHe17nlK8V3+4eajt0PetExcXAYNiIE+pCb7pLjjKkl8e+lcEvEsO0
 QSL3zG5i5RWUMPYUvaFRgepWy3k/GPIoDQjRcUD3P+1T0GmnogNN10MMNhmOzfWU
 pyafe4tJUOVsq0HJ7V/bxIHk2p+Q+5JLKh5xBm9janE4BpabmSQnvFWNblVfK4ig
 M9ohjI/yMtgXROC4xkNXgi8wE5jfDKBghT6FjTqKWSV45vknF1mNEjvuaY+aRZ3H
 MB4P3HCj+PKWJimWHRYnDshcytkgcgVcYDiim8va/4UDrw8O2ks=
 =JOZu
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loading updates from Borislac Petkov:
 "Major microcode loader restructuring, cleanup and improvements by
  Thomas Gleixner:

   - Restructure the code needed for it and add a temporary initrd
     mapping on 32-bit so that the loader can access the microcode
     blobs. This in itself is a preparation for the next major
     improvement:

   - Do not load microcode on 32-bit before paging has been enabled.

     Handling this has caused an endless stream of headaches, issues,
     ugly code and unnecessary hacks in the past. And there really
     wasn't any sensible reason to do that in the first place. So switch
     the 32-bit loading to happen after paging has been enabled and turn
     the loader code "real purrty" again

   - Drop mixed microcode steppings loading on Intel - there, a single
     patch loaded on the whole system is sufficient

   - Rework late loading to track which CPUs have updated microcode
     successfully and which haven't, act accordingly

   - Move late microcode loading on Intel in NMI context in order to
     guarantee concurrent loading on all threads

   - Make the late loading CPU-hotplug-safe and have the offlined
     threads be woken up for the purpose of the update

   - Add support for a minimum revision which determines whether late
     microcode loading is safe on a machine and the microcode does not
     change software visible features which the machine cannot use
     anyway since feature detection has happened already. Roughly, the
     minimum revision is the smallest revision number which must be
     loaded currently on the system so that late updates can be allowed

   - Other nice leanups, fixess, etc all over the place"

* tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  x86/microcode/intel: Add a minimum required revision for late loading
  x86/microcode: Prepare for minimal revision check
  x86/microcode: Handle "offline" CPUs correctly
  x86/apic: Provide apic_force_nmi_on_cpu()
  x86/microcode: Protect against instrumentation
  x86/microcode: Rendezvous and load in NMI
  x86/microcode: Replace the all-in-one rendevous handler
  x86/microcode: Provide new control functions
  x86/microcode: Add per CPU control field
  x86/microcode: Add per CPU result state
  x86/microcode: Sanitize __wait_for_cpus()
  x86/microcode: Clarify the late load logic
  x86/microcode: Handle "nosmt" correctly
  x86/microcode: Clean up mc_cpu_down_prep()
  x86/microcode: Get rid of the schedule work indirection
  x86/microcode: Mop up early loading leftovers
  x86/microcode/amd: Use cached microcode for AP load
  x86/microcode/amd: Cache builtin/initrd microcode early
  x86/microcode/amd: Cache builtin microcode too
  x86/microcode/amd: Use correct per CPU ucode_cpu_info
  ...
2023-11-04 08:46:37 -10:00
Linus Torvalds
5c5e048b24 Kbuild updates for v6.7
- Implement the binary search in modpost for faster symbol lookup
 
  - Respect HOSTCC when linking host programs written in Rust
 
  - Change the binrpm-pkg target to generate kernel-devel RPM package
 
  - Fix endianness issues for tee and ishtp MODULE_DEVICE_TABLE
 
  - Unify vdso_install rules
 
  - Remove unused __memexit* annotations
 
  - Eliminate stale whitelisting for __devinit/__devexit from modpost
 
  - Enable dummy-tools to handle the -fpatchable-function-entry flag
 
  - Add 'userldlibs' syntax
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmVFIZgVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGeKwP+wd2kCrxAgS4zPffOcO3cVHfZwJe
 AXOrTp/v73gzxb9eHXH6TmEDf1Rv7EwW3fmmGJosopJGD6itBqzJa4bNDrbq40rY
 XStmg0NRmTrIG20CHGgaGWxb8/7WMrYfu0rhFdUXJjmbny6XwJ3US9FvDPC0mZz7
 w9VCq5CZOqMsJcQyGkAR7uCHDRzNWiZ/Vnfbz3aa6abFzp7dsjhOgDy5SQ6qZgQz
 AwHHKNEN+G3HWmGDZqcbV9aDaCk4btnz64h843RAxjy2HNJF360Ohm2KOcdJr5lo
 DSSStkogBkZNSRQPtqtfknDjzITjeF4JAnUw5ivOtt8ERaO3JRUcr5gHjfw5iV/n
 o4pC1SXmFzdfoN4dogoYF9rz3j955mSFlT/DSbSbuQS/ELzQs0nsqERxhV4zNCsX
 KvYPUqKzZLW3i8pHNuhh7z7t4Nbz1zXqUa19FvaLNtFTCtS8/IA868a59S0uqT9I
 EAIqrNy9qAsk8UuQUxWVx0qf9f5wKGYxW62iMIF9F2lsFRWA8H588CFPUuSU9Bhk
 KAsvzq249MUGJd0RAjF92EWJgNz/nYzZfFTEL5HKAVauYY5UCyR3AVjrak761I8z
 ctVskA7eVkaW4eARfcp15Fna15FHVzxBJ3B26oKYIJBQfJLjzZcV8XeMtEcQjEGU
 jzl+oRqB/Q3oD7Nx
 =PeX7
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Implement the binary search in modpost for faster symbol lookup

 - Respect HOSTCC when linking host programs written in Rust

 - Change the binrpm-pkg target to generate kernel-devel RPM package

 - Fix endianness issues for tee and ishtp MODULE_DEVICE_TABLE

 - Unify vdso_install rules

 - Remove unused __memexit* annotations

 - Eliminate stale whitelisting for __devinit/__devexit from modpost

 - Enable dummy-tools to handle the -fpatchable-function-entry flag

 - Add 'userldlibs' syntax

* tag 'kbuild-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
  kbuild: support 'userldlibs' syntax
  kbuild: dummy-tools: pretend we understand -fpatchable-function-entry
  kbuild: Correct missing architecture-specific hyphens
  modpost: squash ALL_{INIT,EXIT}_TEXT_SECTIONS to ALL_TEXT_SECTIONS
  modpost: merge sectioncheck table entries regarding init/exit sections
  modpost: use ALL_INIT_SECTIONS for the section check from DATA_SECTIONS
  modpost: disallow the combination of EXPORT_SYMBOL and __meminit*
  modpost: remove EXIT_SECTIONS macro
  modpost: remove MEM_INIT_SECTIONS macro
  modpost: remove more symbol patterns from the section check whitelist
  modpost: disallow *driver to reference .meminit* sections
  linux/init: remove __memexit* annotations
  modpost: remove ALL_EXIT_DATA_SECTIONS macro
  kbuild: simplify cmd_ld_multi_m
  kbuild: avoid too many execution of scripts/pahole-flags.sh
  kbuild: remove ARCH_POSTLINK from module builds
  kbuild: unify no-compiler-targets and no-sync-config-targets
  kbuild: unify vdso_install rules
  docs: kbuild: add INSTALL_DTBS_PATH
  UML: remove unused cmd_vdso_install
  ...
2023-11-04 08:07:19 -10:00
Linus Torvalds
1f24458a10 TTY/Serial changes for 6.7-rc1
Here is the big set of tty/serial driver changes for 6.7-rc1.  Included
 in here are:
   - console/vgacon cleanups and removals from Arnd
   - tty core and n_tty cleanups from Jiri
   - lots of 8250 driver updates and cleanups
   - sc16is7xx serial driver updates
   - dt binding updates
   - first set of port lock wrapers from Thomas for the printk fixes
     coming in future releases
   - other small serial and tty core cleanups and updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTbaw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk9+gCeKdoRb8FDwGCO/GaoHwR4EzwQXhQAoKXZRmN5
 LTtw9sbfGIiBdOTtgLPb
 =6PJr
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty and serial updates from Greg KH:
 "Here is the big set of tty/serial driver changes for 6.7-rc1. Included
  in here are:

   - console/vgacon cleanups and removals from Arnd

   - tty core and n_tty cleanups from Jiri

   - lots of 8250 driver updates and cleanups

   - sc16is7xx serial driver updates

   - dt binding updates

   - first set of port lock wrapers from Thomas for the printk fixes
     coming in future releases

   - other small serial and tty core cleanups and updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (193 commits)
  serdev: Replace custom code with device_match_acpi_handle()
  serdev: Simplify devm_serdev_device_open() function
  serdev: Make use of device_set_node()
  tty: n_gsm: add copyright Siemens Mobility GmbH
  tty: n_gsm: fix race condition in status line change on dead connections
  serial: core: Fix runtime PM handling for pending tx
  vgacon: fix mips/sibyte build regression
  dt-bindings: serial: drop unsupported samsung bindings
  tty: serial: samsung: drop earlycon support for unsupported platforms
  tty: 8250: Add note for PX-835
  tty: 8250: Fix IS-200 PCI ID comment
  tty: 8250: Add Brainboxes Oxford Semiconductor-based quirks
  tty: 8250: Add support for Intashield IX cards
  tty: 8250: Add support for additional Brainboxes PX cards
  tty: 8250: Fix up PX-803/PX-857
  tty: 8250: Fix port count of PX-257
  tty: 8250: Add support for Intashield IS-100
  tty: 8250: Add support for Brainboxes UP cards
  tty: 8250: Add support for additional Brainboxes UC cards
  tty: 8250: Remove UC-257 and UC-431
  ...
2023-11-03 15:44:25 -10:00
Linus Torvalds
8f6f76a6a2 As usual, lots of singleton and doubleton patches all over the tree and
there's little I can say which isn't in the individual changelogs.
 
 The lengthier patch series are
 
 - "kdump: use generic functions to simplify crashkernel reservation in
   arch", from Baoquan He.  This is mainly cleanups and consolidation of
   the "crashkernel=" kernel parameter handling.
 
 - After much discussion, David Laight's "minmax: Relax type checks in
   min() and max()" is here.  Hopefully reduces some typecasting and the
   use of min_t() and max_t().
 
 - A group of patches from Oleg Nesterov which clean up and slightly fix
   our handling of reads from /proc/PID/task/...  and which remove
   task_struct.therad_group.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZUQP9wAKCRDdBJ7gKXxA
 jmOAAQDh8sxagQYocoVsSm28ICqXFeaY9Co1jzBIDdNesAvYVwD/c2DHRqJHEiS4
 63BNcG3+hM9nwGJHb5lyh5m79nBMRg0=
 =On4u
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00