linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-11 21:14:07 +08:00

History

Sean Christopherson 52017608da KVM: nVMX: add option to perform early consistency checks via H/W KVM defers many VMX consistency checks to the CPU, ostensibly for performance reasons[1], including checks that result in VMFail (as opposed to VMExit). This behavior may be undesirable for some users since this means KVM detects certain classes of VMFail only after it has processed guest state, e.g. emulated MSR load-on-entry. Because there is a strict ordering between checks that cause VMFail and those that cause VMExit, i.e. all VMFail checks are performed before any checks that cause VMExit, we can detect (almost) all VMFail conditions via a dry run of sorts. The almost qualifier exists because some state in vmcs02 comes from L0, e.g. VPID, which means that hardware will never detect an invalid VPID in vmcs12 because it never sees said value. Software must (continue to) explicitly check such fields. After preparing vmcs02 with all state needed to pass the VMFail consistency checks, optionally do a "test" VMEnter with an invalid GUEST_RFLAGS. If the VMEnter results in a VMExit (due to bad guest state), then we can safely say that the nested VMEnter should not VMFail, i.e. any VMFail encountered in nested_vmx_vmexit() must be due to an L0 bug. GUEST_RFLAGS is used to induce VMExit as it is unconditionally loaded on all implementations of VMX, has an invalid value that is writable on a 32-bit system and its consistency check is performed relatively early in all implementations (the exact order of consistency checks is micro-architectural). Unfortunately, since the "passing" case causes a VMExit, KVM must be extra diligent to ensure that host state is restored, e.g. DR7 and RFLAGS are reset on VMExit. Failure to restore RFLAGS.IF is particularly fatal. And of course the extra VMEnter and VMExit impacts performance. The raw overhead of the early consistency checks is ~6% on modern hardware (though this could easily vary based on configuration), while the added latency observed from the L1 VMM is ~10%. The early consistency checks do not occur in a vacuum, e.g. spending more time in L0 can lead to more interrupts being serviced while emulating VMEnter, thereby increasing the latency observed by L1. Add a module param, early_consistency_checks, to provide control over whether or not VMX performs the early consistency checks. In addition to standard on/off behavior, the param accepts a value of -1, which is essentialy an "auto" setting whereby KVM does the early checks only when it thinks it's running on bare metal. When running nested, doing early checks is of dubious value since the resulting behavior is heavily dependent on L0. In the future, the "auto" setting could also be used to default to skipping the early hardware checks for certain configurations/platforms if KVM reaches a state where it has 100% coverage of VMFail conditions. [1] To my knowledge no one has implemented and tested full software emulation of the VMFail consistency checks. Until that happens, one can only speculate about the actual performance overhead of doing all VMFail consistency checks in software. Obviously any code is slower than no code, but in the grand scheme of nested virtualization it's entirely possible the overhead is negligible. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>		2018-10-17 00:29:59 +02:00
..
boot	Kbuild updates for v4.19 (2nd)	2018-08-25 13:40:38 -07:00
configs
crypto	crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2	2018-09-14 14:08:27 +08:00
entry	x86/vdso: Fix vDSO build if a retpoline is emitted	2018-08-20 18:04:41 +02:00
events	perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs	2018-09-10 10:03:01 +02:00
hyperv	x86/hyper-v: rename ipi_arg_{ex,non_ex} structures	2018-09-20 00:51:42 +02:00
ia32	syscalls/x86: auto-create compat_sys_*() prototypes	2018-04-02 20:16:18 +02:00
include	KVM: x86: hyperv: keep track of mismatched VP indexes	2018-10-17 00:29:45 +02:00
kernel	x86/APM: Fix build warning when PROC_FS is not enabled	2018-09-15 10:16:25 +02:00
kvm	KVM: nVMX: add option to perform early consistency checks via H/W	2018-10-17 00:29:59 +02:00
lib	x86/nmi: Fix NMI uaccess race against CR3 switching	2018-08-31 17:08:22 +02:00
math-emu
mm	x86/mm: Use WRITE_ONCE() when setting PTEs	2018-09-08 12:30:36 +02:00
net	bpf, x32: Fix regression caused by commit `24dea04767`	2018-07-26 02:51:12 +02:00
oprofile	x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()	2018-02-21 09:54:17 +01:00
pci	PCI: Make early dump functionality generic	2018-06-29 20:06:07 -05:00
platform	x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3	2018-09-12 21:53:34 +02:00
power	Power management updates for 4.19-rc1	2018-08-14 13:12:24 -07:00
purgatory	kbuild: move bin2c back to scripts/ from scripts/basic/	2018-07-18 01:18:05 +09:00
ras
realmode	x86-64/realmode: Add instruction suffix	2018-02-20 09:33:41 +01:00
tools	x86/relocs: Add __end_rodata_aligned to S_REL	2018-08-09 20:42:07 +02:00
um	Consolidation of Kconfig files by Christoph Hellwig.	2018-08-15 13:05:12 -07:00
video
xen	xen: fixes for 4.19-rc2	2018-08-31 08:45:16 -07:00
.gitignore	x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore	2018-02-13 14:10:29 +01:00
Kbuild
Kconfig	x86/Kconfig: Fix trivial typo	2018-08-27 10:29:14 +02:00
Kconfig.cpu	Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-03-25 07:36:02 -10:00
Kconfig.debug	Kconfig: consolidate the "Kernel hacking" menu	2018-08-02 08:06:48 +09:00
Makefile	x86: Allow generating user-space headers without a compiler	2018-08-31 17:08:22 +02:00
Makefile_32.cpu
Makefile.um	kbuild: rename LDFLAGS to KBUILD_LDFLAGS	2018-08-24 08:22:08 +09:00