2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-15 09:03:59 +08:00
Commit Graph

25471 Commits

Author SHA1 Message Date
David Matlack
0115f9cbac KVM: nVMX: generate non-true VMX MSRs based on true versions
The "non-true" VMX capability MSRs can be generated from their "true"
counterparts, by OR-ing the default1 bits. The default1 bits are fixed
and defined in the SDM.

Since we can generate the non-true VMX MSRs from the true versions,
there's no need to store both in struct nested_vmx. This also lets
userspace avoid having to restore the non-true MSRs.

Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so,
we simply need to set all the default1 bits in the true MSRs (such that
the true MSRs and the generated non-true MSRs are equal).

Signed-off-by: David Matlack <dmatlack@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:06 +01:00
Kyle Huey
ea07e42dec KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.
The trap flag stays set until software clears it.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:06 +01:00
Kyle Huey
6affcbedca KVM: x86: Add kvm_skip_emulated_instruction and use it.
kvm_skip_emulated_instruction calls both
kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep,
skipping the emulated instruction and generating a trap if necessary.

Replacing skip_emulated_instruction calls with
kvm_skip_emulated_instruction is straightforward, except for:

- ICEBP, which is already inside a trap, so avoid triggering another trap.
- Instructions that can trigger exits to userspace, such as the IO insns,
  MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a
  KVM_GUESTDBG_SINGLESTEP exit, and the handling code for
  IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will
  take precedence. The singlestep will be triggered again on the next
  instruction, which is the current behavior.
- Task switch instructions which would require additional handling (e.g.
  the task switch bit) and are instead left alone.
- Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction,
  which do not trigger singlestep traps as mentioned previously.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:05 +01:00
Kyle Huey
eb27756217 KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12
We can't return both the pass/fail boolean for the vmcs and the upcoming
continue/exit-to-userspace boolean for skip_emulated_instruction out of
nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead.

Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when
they advance the IP to the following instruction, not when they a) succeed,
b) fail MSR validation or c) throw an exception. Add a separate call to
skip_emulated_instruction that will later not be converted to the variant
that checks the singlestep flag.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:04 +01:00
Kyle Huey
09ca3f2049 KVM: VMX: Reorder some skip_emulated_instruction calls
The functions being moved ahead of skip_emulated_instruction here don't
need updated IPs, and skipping the emulated instruction at the end will
make it easier to return its value.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:04 +01:00
Kyle Huey
6a908b628c KVM: x86: Add a return value to kvm_emulate_cpuid
Once skipping the emulated instruction can potentially trigger an exit to
userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to
propagate a return value.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08 15:31:03 +01:00
Tom Lendacky
8370c3d08b kvm: svm: Add kvm_fast_pio_in support
Update the I/O interception support to add the kvm_fast_pio_in function
to speed up the in instruction similar to the out instruction.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24 18:32:45 +01:00
Tom Lendacky
147277540b kvm: svm: Add support for additional SVM NPF error codes
AMD hardware adds two additional bits to aid in nested page fault handling.

Bit 32 - NPF occurred while translating the guest's final physical address
Bit 33 - NPF occurred while translating the guest page tables

The guest page tables fault indicator can be used as an aid for nested
virtualization. Using V0 for the host, V1 for the first level guest and
V2 for the second level guest, when both V1 and V2 are using nested paging
there are currently a number of unnecessary instruction emulations. When
V2 is launched shadow paging is used in V1 for the nested tables of V2. As
a result, KVM marks these pages as RO in the host nested page tables. When
V2 exits and we resume V1, these pages are still marked RO.

Every nested walk for a guest page table is treated as a user-level write
access and this causes a lot of NPFs because the V1 page tables are marked
RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
sees a write to a read-only page, emulates the V1 instruction and unprotects
the page (marking it RW). This patch looks for cases where we get a NPF due
to a guest page table walk where the page was marked RO. It immediately
unprotects the page and resumes the guest, leading to far fewer instruction
emulations when nested virtualization is used.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-24 18:32:26 +01:00
Bandan Das
ae0f549951 kvm: x86: don't print warning messages for unimplemented msrs
Change unimplemented msrs messages to use pr_debug.
If CONFIG_DYNAMIC_DEBUG is set, then these messages can be
enabled at run time or else -DDEBUG can be used at compile
time to enable them. These messages will still be printed if
ignore_msrs=1.

Signed-off-by: Bandan Das <bsd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22 18:29:10 +01:00
Jan Dakinevich
bcdde302b8 KVM: nVMX: invvpid handling improvements
- Expose all invalidation types to the L1

 - Reject invvpid instruction, if L1 passed zero vpid value to single
   context invalidations

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22 17:26:42 +01:00
Jan Dakinevich
63f3ac4813 KVM: VMX: clean up declaration of VPID/EPT invalidation types
- Remove VMX_EPT_EXTENT_INDIVIDUAL_ADDR, since there is no such type of
   EPT invalidation

 - Add missing VPID types names

Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
Tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22 17:26:15 +01:00
Jim Mattson
c7dd15b337 kvm: x86: CPUID.01H:EDX.APIC[bit 9] should mirror IA32_APIC_BASE[11]
From the Intel SDM, volume 3, section 10.4.3, "Enabling or Disabling the
Local APIC,"

  When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent
  to an IA-32 processor without an on-chip APIC. The CPUID feature flag
  for the APIC (see Section 10.4.2, "Presence of the Local APIC") is
  also set to 0.

Signed-off-by: Jim Mattson <jmattson@google.com>
[Changed subject tag from nVMX to x86.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22 14:51:55 +01:00
Luwei Kang
4504b5c941 kvm: x86: Add AVX512_4VNNIW and AVX512_4FMAPS support
Add two new AVX512 subfeatures support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
[Changed subject tags.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-16 22:20:02 +01:00
Radim Krčmář
283c95d0e3 KVM: x86: emulate FXSAVE and FXRSTOR
Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
Old Intels don't have unrestricted_guest, so we have to emulate them.

The patch takes advantage of the hardware implementation.

AMD and Intel differ in saving and restoring other fields in first 32
bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
and executed fxsave:

  Intel (Nehalem):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
    ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
  Intel (Haswell -- deprecated FPU CS and FPU DS):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
    ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
  AMD (Opteron 2300-series):
    7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
    ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00

fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
much to improve the situation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-16 22:09:46 +01:00
Radim Krčmář
aabba3c6ab KVM: x86: add asm_safe wrapper
Move the existing exception handling for inline assembly into a macro
and switch its return values to X86EMUL type.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Radim Krčmář
4852018789 KVM: x86: save one bit in ctxt->d
Alignments are exclusive, so 5 modes can be expressed in 3 bits.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Radim Krčmář
d3fe959f81 KVM: x86: add Align16 instruction flag
Needed for FXSAVE and FXRSTOR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:45 +01:00
Jiang Biao
695151964d kvm: x86: remove unused but set variable
The local variable *gpa_offset* is set but not used afterwards,
which make the compiler issue a warning with option
-Wunused-but-set-variable. Remove it to avoid the warning.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:44 +01:00
Jiang Biao
ecd8a8c2b4 kvm: x86: hyperv: make function static to avoid compiling warning
synic_set_irq is only used in hyperv.c, and should be static to
avoid compiling warning when with -Wmissing-prototypes option.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:44 +01:00
Jiang Biao
1e13175bd2 kvm: x86: cpuid: remove the unnecessary variable
The use of local variable *function* is not necessary here. Remove
it to avoid compiling warning with -Wunused-but-set-variable option.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:44 +01:00
Jiang Biao
ae6a237560 kvm: x86: make a function in x86.c static to avoid compiling warning
kvm_emulate_wbinvd_noskip is only used in x86.c, and should be
static to avoid compiling warning when with -Wmissing-prototypes
option.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:44 +01:00
Jiang Biao
33365e7a45 kvm: x86: make function static to avoid compiling warning
vmx_arm_hv_timer is only used in vmx.c, and should be static to
avoid compiling warning when with -Wmissing-prototypes option.

Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-16 22:09:43 +01:00
Radim Krčmář
813ae37e6a Merge branch 'x86/cpufeature' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into kvm/next
Topic branch for AVX512_4VNNIW and AVX512_4FMAPS support in KVM.
2016-11-16 22:07:36 +01:00
He Chen
47bdf3378d x86/cpuid: Provide get_scattered_cpuid_leaf()
Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.

Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 11:13:09 +01:00
He Chen
47f10a3600 x86/cpuid: Cleanup cpuid_regs definitions
cpuid_regs is defined multiple times as structure and enum. Rename the enum
and move all of it to processor.h so we don't end up with more instances.

Rename the misnomed register enumeration from CR_* to the obvious CPUID_*.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 11:13:09 +01:00
Arnd Bergmann
beae2c9eb5 crypto: aesni: shut up -Wmaybe-uninitialized warning
The rfc4106 encrypy/decrypt helper functions cause an annoying
false-positive warning in allmodconfig if we turn on
-Wmaybe-uninitialized warnings again:

  arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’:
  include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized]

The problem seems to be that the compiler doesn't track the state of the
'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end
section.

This takes the easy way out by adding a bogus initialization, which
should be harmless enough to get the patch into v4.9 so we can turn on
this warning again by default without producing useless output.  A
follow-up patch for v4.10 rearranges the code to make the warning go
away.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:45:08 -08:00
Arnd Bergmann
3a6d867612 x86: apm: avoid uninitialized data
apm_bios_call() can fail, and return a status in its argument structure.
If that status however is zero during a call from
apm_get_power_status(), we end up using data that may have never been
set, as reported by "gcc -Wmaybe-uninitialized":

  arch/x86/kernel/apm_32.c: In function ‘apm’:
  arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
  arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here

This changes the function to return "APM_NO_ERROR" here, which makes the
code more robust to broken BIOS versions, and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:45:08 -08:00
Paolo Bonzini
6314a17fec The three KVM patches that KVMGT needs.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJYIy78AAoJEL/70l94x66DzbQH/34Mgm52OdLzTe+Yf8Pcae5X
 A/z2Aicto+6M+dtYoWPn2DfoevPaSpVmymryRrRzAqk5QDPHiVHZ5iCW0RaIEU7M
 iiPudqSGVGa0oFYBkxsJxCBysAQsHX5sEXEszs4egbO1TtQ8LCAxYUVuLBGlx+Fa
 qj26Fi/p2ByHVp/RX55A2kF5T8J671KT4LWUvFjzTgGFWo8Kr1bk0q4hmRYB9OBc
 /pqczV3Mc6KcmzfIg3Rd6xt8UDlEGJ4YQhpNgY6nxrQ1py3AP7vNqBPAH4RXbHJB
 /OqdAjpqa8+rrwSpQ1f58U+7v/ZO1ZTg0IW9bf60qKjG/aV4fGN6y0/iXKGgKyQ=
 =cpog
 -----END PGP SIGNATURE-----

Merge tag 'tags/for-kvmgt' into HEAD

The three KVM patches that KVMGT needs.

Conflicts:
	arch/x86/include/asm/kvm_page_track.h
	arch/x86/kvm/mmu.c
2016-11-09 15:20:31 +01:00
Linus Torvalds
66cecb6789 One NULL pointer dereference, and two fixes for regressions introduced
during the merge window.  The rest are fixes for MIPS, s390 and nested VMX.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJYG2H5AAoJEL/70l94x66DK/cH/0jEQ3ynuLAd5CKux7JxI/EP
 msSJh1Xqr4+XhXZnuDpGQWrdsBlxoiqA6PsJrUTtyi4nQCDXlT8g+2MDuvqhWIHz
 7vw58j/EMJDCVQzYAbN5VDUfk13uB5aSWTo3M9Rf09v0hU1Ql7z8u4CtKEdLpN5Y
 LY9bT9fxUmXO7REKP7bdW6ZrDX/hUShYHgMqzXGFMyGBG3ym3a9bggXEzTCD6eNQ
 ioogQIWqg+icdhta0iLNAwFClPlcKB2/xo4IUuNgrPwGoHFGJN/8+qxT4+sVbp2B
 v8u1zOXlCFXBcskWE+yRRsGe72+mIzz6QScCyO+5HbhKYVfbE9H7KBlFX9rZZ2c=
 =IbKx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One NULL pointer dereference, and two fixes for regressions introduced
  during the merge window.

  The rest are fixes for MIPS, s390 and nested VMX"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: Check memopp before dereference (CVE-2016-8630)
  kvm: nVMX: VMCLEAR an active shadow VMCS after last use
  KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
  KVM: x86: fix wbinvd_dirty_mask use-after-free
  kvm/x86: Show WRMSR data is in hex
  kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types
  KVM: document lock orders
  KVM: fix OOPS on flush_work
  KVM: s390: Fix STHYI buffer alignment for diag224
  KVM: MIPS: Precalculate MMIO load resume PC
  KVM: MIPS: Make ERET handle ERL before EXL
  KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04 13:08:05 -07:00
Jike Song
871b7ef2a1 kvm/page_track: export symbols for external usage
Signed-off-by: Jike Song <jike.song@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-04 12:13:20 +01:00
Jike Song
d126363d8f kvm/page_track: call notifiers with kvm_page_track_notifier_node
The user of page_track might needs extra information, so pass
the kvm_page_track_notifier_node to callbacks.

Signed-off-by: Jike Song <jike.song@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-04 12:13:20 +01:00
Xiaoguang Chen
ae7cd87372 KVM: x86: add track_flush_slot page track notifier
When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.

This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.

Register the notifier type track_flush_slot to receive memslot move
and remove event.

Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
[Squashed commits to avoid bisection breakage and reworded the subject.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-04 12:13:19 +01:00
Paolo Bonzini
ad3610919e kvm: x86: avoid atomic operations on APICv vmentry
On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
posted interrupts turn out to be slower than interrupt injection via
KVM_REQ_EVENT.

This patch optimizes a bit the IRR update, avoiding expensive atomic
operations in the common case where PI.ON=0 at vmentry or the PIR vector
is mostly zero.  This saves at least 20 cycles (1%) per vmexit, as
measured by kvm-unit-tests' inl_from_qemu test (20 runs):

              | enable_apicv=1  |  enable_apicv=0
              | mean     stdev  |  mean     stdev
    ----------|-----------------|------------------
    before    | 5826     32.65  |  5765     47.09
    after     | 5809     43.42  |  5777     77.02

Of course, any change in the right column is just placebo effect. :)
The savings are bigger if interrupts are frequent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-03 12:27:51 +01:00
Paolo Bonzini
1b07304c58 KVM: nVMX: support descriptor table exits
These are never used by the host, but they can still be reflected to
the guest.

Tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 21:32:17 +01:00
Paolo Bonzini
5587859fb1 KVM: x86: use ktime_get instead of seeking the hrtimer_clock_base
The base clock for the LAPIC timer is always CLOCK_MONOTONIC.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
8003c9ae20 KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support
Most windows guests still utilize APIC Timer periodic/oneshot mode
instead of tsc-deadline mode, and the APIC Timer periodic/oneshot
mode are still emulated by high overhead hrtimer on host. This patch
converts the expected expire time of the periodic/oneshot mode to
guest deadline tsc in order to leverage VMX preemption timer logic
for APIC Timer tsc-deadline mode. After each preemption timer vmexit
preemption timer is restarted to emulate LVTT current-count register
is automatically reloaded from the initial-count register when the
count reaches 0. This patch reduces ~5600 cycles for each APIC Timer
periodic mode operation virtualization.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Squashed with my fixes that were reviewed-by Paolo.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
7e810a38e6 KVM: LAPIC: rename start/cancel_hv_tscdeadline to start/cancel_hv_timer
Rename start/cancel_hv_tscdeadline to start/cancel_hv_timer since
they will handle both APIC Timer periodic/oneshot mode and tsc-deadline
mode.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
498f816219 KVM: LAPIC: introduce kvm_get_lapic_target_expiration_tsc()
Introdce kvm_get_lapic_target_expiration_tsc() to get APIC Timer target
deadline tsc.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
a10388e11f KVM: LAPIC: guarantee the timer is in tsc-deadline mode
Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot()
and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when
rdmsr MSR_IA32_TSCDEADLINE.

Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
7d7f7da2f1 KVM: LAPIC: extract start_sw_period() to handle periodic/oneshot mode
Extract start_sw_period() to handle periodic/oneshot mode, it will be
used by later patch.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Longpeng(Mike)
868a32f327 kvm: x86: remove the misleading comment in vmx_handle_external_intr
Since Paolo has removed irq-enable-operation in vmx_handle_external_intr
(KVM: x86: use guest_exit_irqoff), the original comment about the IF bit
in rflags is incorrect and stale now, so remove it.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Xiaoguang Chen
b5f5fdca65 KVM: x86: add track_flush_slot page track notifier
When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.

This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.

Register the notifier type track_flush_slot to receive memslot move
and remove event.

Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
[Squashed commits to avoid bisection breakage and reworded the subject.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Radim Krčmář
2361133293 KVM: VMX: refactor setup of global page-sized bitmaps
We've had 10 page-sized bitmaps that were being allocated and freed one
by one when we could just use a cycle.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Radim Krčmář
2e69f86561 KVM: VMX: join functions that disable x2apic msr intercepts
vmx_disable_intercept_msr_read_x2apic() and
vmx_disable_intercept_msr_write_x2apic() differed only in the type.
Pass the type to a new function.

[Ordered and commented TPR intercept according to Paolo's suggestion.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Radim Krčmář
40d8338d09 KVM: VMX: remove functions that enable msr intercepts
All intercepts are enabled at the beginning, so they can only be used if
we disabled an intercept that we wanted to have enabled.
This was done for TMCCT to simplify a loop that disables all x2APIC MSR
intercepts, but just keeping TMCCT enabled yields better results.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Jim Mattson
83bafef1a1 kvm: nVMX: Update MSR load counts on a VMCS switch
When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit
MSR load lists, the change should affect the dormant VMCS as well as the
current VMCS. Moreover, the vmcs02 MSR-load addresses should be
initialized.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Jim Mattson
cf3215d939 kvm: nVMX: Fetch VM_INSTRUCTION_ERROR from vmcs02 on vmx->fail
When forwarding a hardware VM-entry failure to L1, fetch the
VM_INSTRUCTION_ERROR field from vmcs02 before loading vmcs01.

(Note that there is an implicit assumption that the VM-entry failure was
on the first VM-entry to vmcs02 after nested_vmx_run; otherwise, L1 is
going to be very confused.)

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Peter Feiner
66d73e12f2 KVM: X86: MMU: no mmu_notifier_seq++ in kvm_age_hva
The MMU notifier sequence number keeps GPA->HPA mappings in sync when
GPA->HPA lookups are done outside of the MMU lock (e.g., in
tdp_page_fault). Since kvm_age_hva doesn't change GPA->HPA, it's
unnecessary to increment the sequence number.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Wanpeng Li
c63e45635b KVM: VMX: Better name x2apic msr bitmaps
Renames x2apic_apicv_inactive msr_bitmaps to x2apic and original
x2apic bitmaps to x2apic_apicv.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02 21:32:17 +01:00
Owen Hofmann
d9092f52d7 kvm: x86: Check memopp before dereference (CVE-2016-8630)
Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 21:31:53 +01:00