2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 12:14:01 +08:00
linux-next/arch/x86
Michael Roth e79b91bb3c KVM: SVM: use vmsave/vmload for saving/restoring additional host state
Using a guest workload which simply issues 'hlt' in a tight loop to
generate VMEXITs, it was observed (on a recent EPYC processor) that a
significant amount of the VMEXIT overhead measured on the host was the
result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to
perf:

  67.49%--kvm_arch_vcpu_ioctl_run
          |
          |--23.13%--vcpu_put
          |          kvm_arch_vcpu_put
          |          |
          |          |--21.31%--native_write_msr
          |          |
          |           --1.27%--svm_set_cr4
          |
          |--16.11%--vcpu_load
          |          |
          |           --15.58%--kvm_arch_vcpu_load
          |                     |
          |                     |--13.97%--svm_set_cr4
          |                     |          |
          |                     |          |--12.64%--native_read_msr

Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and
can be saved/restored using 'vmsave'/'vmload' instructions rather than
explicit MSR reads/writes. In doing so there is a significant reduction
in the svm_vcpu_load/svm_vcpu_put overhead measured for the above
workload:

  50.92%--kvm_arch_vcpu_ioctl_run
          |
          |--19.28%--disable_nmi_singlestep
          |
          |--13.68%--vcpu_load
          |          kvm_arch_vcpu_load
          |          |
          |          |--9.19%--svm_set_cr4
          |          |          |
          |          |           --6.44%--native_read_msr
          |          |
          |           --3.55%--native_write_msr
          |
          |--6.05%--kvm_inject_nmi
          |--2.80%--kvm_sev_es_mmio_read
          |--2.19%--vcpu_put
          |          |
          |           --1.25%--kvm_arch_vcpu_put
          |                     native_write_msr

Quantifying this further, if we look at the raw cycle counts for a
normal iteration of the above workload (according to 'rdtscp'),
kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with
the current behavior. Using 'vmsave'/'vmload', this is reduced to
~2800 cycles, a savings of 39%.

While this approach doesn't seem to manifest in any noticeable
improvement for more realistic workloads like UnixBench, netperf, and
kernel builds, likely due to their exit paths generally involving IO
with comparatively high latencies, it does improve overall overhead
of KVM_RUN significantly, which may still be noticeable for certain
situations. It also simplifies some aspects of the code.

With this change, explicit save/restore is no longer needed for the
following host MSRs, since they are documented[1] as being part of the
VMCB State Save Area:

  MSR_STAR, MSR_LSTAR, MSR_CSTAR,
  MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
  MSR_IA32_SYSENTER_CS,
  MSR_IA32_SYSENTER_ESP,
  MSR_IA32_SYSENTER_EIP,
  MSR_FS_BASE, MSR_GS_BASE

and only the following MSR needs individual handling in
svm_vcpu_put/svm_vcpu_load:

  MSR_TSC_AUX

We could drop the host_save_user_msrs array/loop and instead handle
MSR read/write of MSR_TSC_AUX directly, but we leave that for now as
a potential follow-up.

Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment
registers (and associated hidden state)[2], some of the code
previously used to handle this is no longer needed, so we drop it
as well.

The first public release of the SVM spec[3] also documents the same
handling for the host state in question, so we make these changes
unconditionally.

Also worth noting is that we 'vmsave' to the same page that is
subsequently used by 'vmrun' to record some host additional state. This
is okay, since, in accordance with the spec[2], the additional state
written to the page by 'vmrun' does not overwrite any fields written by
'vmsave'. This has also been confirmed through testing (for the above
CPU, at least).

[1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2
[2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD
[3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20210202190126.2185715-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04 05:27:34 -05:00
..
boot EFI updates collected by Ard Biesheuvel: 2020-12-24 12:40:07 -08:00
configs * A defconfig fix, from Daniel Díaz. 2020-09-20 15:06:43 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-12-14 12:18:19 -08:00
entry epoll: wire up syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
events Perf updates: 2020-12-14 17:34:12 -08:00
hyperv hyperv-fixes for 5.10-rc3 2020-11-05 11:32:03 -08:00
ia32 x86/ia32_signal: Propagate __user annotation properly 2020-12-11 19:44:31 +01:00
include KVM: VMX: Use the kernel's version of VMXOFF 2021-02-04 05:27:33 -05:00
kernel x86/reboot: Force all cpus to exit VMX root if VMX is supported 2021-02-04 05:27:31 -05:00
kvm KVM: SVM: use vmsave/vmload for saving/restoring additional host state 2021-02-04 05:27:34 -05:00
lib Scheduler updates: 2020-12-14 18:29:11 -08:00
math-emu treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mm KVM: SVM: Treat SVM as unsupported when running as an SEV guest 2021-02-03 04:30:37 -05:00
net bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth 2020-09-29 16:47:39 -07:00
oprofile x86/oprofile: Avoid TIF_IA32 when checking 64bit mode 2020-10-26 13:46:46 +01:00
pci ARM: SoC drivers for v5.11 2020-12-16 16:38:41 -08:00
platform Yet another large set of x86 interrupt management updates: 2020-12-14 18:59:53 -08:00
power Kbuild updates for v5.9 2020-08-09 14:10:26 -07:00
purgatory crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
ras
realmode x86/head/64: Don't call verify_cpu() on starting APs 2020-09-09 11:33:20 +02:00
tools x86/insn: Make inat-tables.c suitable for pre-decompression code 2020-09-07 19:45:24 +02:00
um arch/um: partially revert the conversion to __section() macro 2020-10-26 15:39:37 -07:00
video
xen EFI updates collected by Ard Biesheuvel: 2020-12-24 12:40:07 -08:00
.gitignore
Kbuild
Kconfig fanotify: Fix sys_fanotify_mark() on native x86-32 2020-12-28 11:58:59 +01:00
Kconfig.assembler
Kconfig.cpu
Kconfig.debug x86, libnvdimm/test: Remove COPY_MC_TEST 2020-10-26 18:08:35 +01:00
Makefile - Fix the vmlinux size check on 64-bit along with adding useful clarifications on the topic 2020-12-14 13:54:50 -08:00
Makefile_32.cpu
Makefile.um