linux/arch
Sean Christopherson cd0e615c49 KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required
Synthesize a triple fault if L2 guest state is invalid at the time of
VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
invalid guest state, since from L1's perspective, it's architecturally
impossible for L2 to have invalid state while L2 is running in hardware.
E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
or #GP.

Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
can trigger this scenario, as nested VM-Enter correctly rejects any
attempt to enter L2 with invalid state.

RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
Following AMD's SMRAM layout is important as AMD's layout saves/restores
the descriptor cache information, including CS.RPL and SS.RPL, and also
defines all the fields relevant to invalid guest state as read-only, i.e.
so long as the vCPU had valid state before the SMI, which is guaranteed
for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
layout saves/restores only the selector, which means that scenarios where
the selector and cached RPL don't match, e.g. conforming code segments,
would yield invalid guest state.  Intel CPUs fudge around this issued by
stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
Treatment of RSM", paraphrasing for brevity:

  IF internal storage indicates that the [CPU was post-VMXON]
  THEN
     enter VMX operation (root or non-root);
     restore VMX-critical state as defined in Section 34.14.1;
     set to their fixed values any bits in CR0 and CR4 whose values must
     be fixed in VMX operation [unless coming from an unrestricted guest];
     IF RFLAGS.VM = 0 AND (in VMX root operation OR the
        “unrestricted guest” VM-execution control is 0)
     THEN
       CS.RPL := SS.DPL;
       SS.RPL := SS.DPL;
     FI;
     restore current VMCS pointer;
  FI;

Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
only way for CR0 and/or CR4 to have illegal values is if they were
modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
section states "modifying these registers will result in unpredictable
behavior".

KVM's ioctl() behavior is less straightforward.  Because KVM allows
ioctls() to be executed in any order, rejecting an ioctl() if it would
result in invalid L2 guest state is not an option as KVM cannot know if
a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
reject KVM_RUN if L2 contained invalid guest state, but that carries the
risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
exited to userspace.  Setting a flag/request to detect such a scenario is
undesirable because (a) it's extremely unlikely to add value to KVM as a
whole, and (b) KVM would need to consider ioctl() interactions with such
a flag, e.g. if userspace migrated the vCPU while the flag were set.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211207193006.120997-3-seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20 08:06:54 -05:00
..
alpha futex: Wireup futex_waitv syscall 2021-11-25 14:26:12 +01:00
arc Add linux/cacheflush.h 2021-11-17 10:36:15 -05:00
arm ARM: SoC fixes for v5.16, part 2 2021-11-25 10:31:37 -08:00
arm64 arm64 fixes for -rc4 2021-12-03 10:50:14 -08:00
csky asm-generic: asm/syscall.h cleanup 2021-11-10 11:22:03 -08:00
h8300 Kbuild updates for v5.16 2021-11-08 09:15:45 -08:00
hexagon hexagon: ignore vmlinux.lds 2021-11-20 10:35:54 -08:00
ia64 futex: Wireup futex_waitv syscall 2021-11-25 14:26:12 +01:00
m68k asm-generic: syscall table updates 2021-11-25 10:41:28 -08:00
microblaze futex: Wireup futex_waitv syscall 2021-11-25 14:26:12 +01:00
mips - build fix for ZSTD enabled configs 2021-11-27 09:50:31 -08:00
nds32 Add linux/cacheflush.h 2021-11-17 10:36:15 -05:00
nios2 Add linux/cacheflush.h 2021-11-17 10:36:15 -05:00
openrisc Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
parisc parisc: Mark cr16 CPU clocksource unstable on all SMP machines 2021-12-04 21:36:04 +01:00
powerpc powerpc fixes for 5.16 #3 2021-11-27 10:06:15 -08:00
riscv RISC-V: KVM: Fix incorrect KVM_MAX_VCPUS value 2021-11-22 10:36:19 +05:30
s390 s390: update defconfigs 2021-12-02 19:29:44 +01:00
sh asm-generic: syscall table updates 2021-11-25 10:41:28 -08:00
sparc asm-generic: syscall table updates 2021-11-25 10:41:28 -08:00
um Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
x86 KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required 2021-12-20 08:06:54 -05:00
xtensa asm-generic: syscall table updates 2021-11-25 10:41:28 -08:00
.gitignore
Kconfig arch: Add generic Kconfig option indicating page size smaller than 64k 2021-11-27 14:34:41 -08:00