linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-29 22:14:41 +08:00

Author	SHA1	Message	Date
Sean Christopherson	7cd138db5c	KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic Drop the pre-computed last_nonleaf_level, which is arguably wrong and at best confusing. Per the comment: Can have large pages at levels 2..last_nonleaf_level-1. the intent of the variable would appear to be to track what levels can _legally_ have large pages, but that intent doesn't align with reality. The computed value will be wrong for 5-level paging, or if 1gb pages are not supported. The flawed code is not a problem in practice, because except for 32-bit PSE paging, bit 7 is reserved if large pages aren't supported at the level. Take advantage of this invariant and simply omit the level magic math for 64-bit page tables (including PAE). For 32-bit paging (non-PAE), the adjustments are needed purely because bit 7 is ignored if PSE=0. Retain that logic as is, but make is_last_gpte() unique per PTTYPE so that the PSE check is avoided for PAE and EPT paging. In the spirit of avoiding branches, bump the "last nonleaf level" for 32-bit PSE paging by adding the PSE bit itself. Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are not PTEs and will blow up all the other gpte helpers. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-51-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:47 -04:00
Sean Christopherson	616007c866	KVM: x86: Enhance comments for MMU roles and nested transition trickiness Expand the comments for the MMU roles. The interactions with gfn_track PGD reuse in particular are hairy. Regarding PGD reuse, add comments in the nested virtualization flows to call out why kvm_init_mmu() is unconditionally called even when nested TDP is used. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:47 -04:00
Sean Christopherson	3b77daa5ef	KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE Replace make_spte()'s WARN on a collision with the magic MMIO value with a generic WARN on reserved bits being set (including EPT's reserved WX combination). Warning on any reserved bits covers MMIO, A/D tracking bits with PAE paging, and in theory any future goofs that are introduced. Opportunistically convert to ONCE behavior to avoid spamming the kernel log, odds are very good that if KVM screws up one SPTE, it will botch all SPTEs for the same MMU. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-49-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:46 -04:00
Sean Christopherson	961f84457c	KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU Extract the reserved SPTE check and print helpers in get_mmio_spte() to new helpers so that KVM can also WARN on reserved badness when making a SPTE. Tag the checking helper with __always_inline to improve the probability of the compiler generating optimal code for the checking loop, e.g. gcc appears to avoid using %rbp when the helper is tagged with a vanilla "inline". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-48-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:46 -04:00
Sean Christopherson	36f267871e	KVM: x86/mmu: Use MMU's role to determine PTTYPE Use the MMU's role instead of vCPU state or role_regs to determine the PTTYPE, i.e. which helpers to wire up. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-47-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:46 -04:00
Sean Christopherson	fe660f7244	KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers Skip paging32E_init_context() and paging64_init_context_common() and go directly to paging64_init_context() (was the common version) now that the relevant flows don't need to distinguish between 64-bit PAE and 32-bit PAE for other reasons. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-46-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:46 -04:00
Sean Christopherson	f4bd6f7376	KVM: x86/mmu: Add a helper to calculate root from role_regs Add a helper to calculate the level for non-EPT page tables from the MMU's role_regs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-45-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:45 -04:00
Sean Christopherson	533f9a4b38	KVM: x86/mmu: Add helper to update paging metadata Consolidate MMU guest metadata updates into a common helper for TDP, shadow, and nested MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-44-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:45 -04:00
Sean Christopherson	af0eb17e99	KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 Don't bother updating the bitmasks and last-leaf information if paging is disabled as the metadata will never be used. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-43-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:45 -04:00
Sean Christopherson	fa4b558802	KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls Move calls to reset_rsvds_bits_mask() out of the various mode statements and under a more generic CR0.PG=1 check. This will allow for additional code consolidation in the future. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-42-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:45 -04:00
Sean Christopherson	87e99d7d70	KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper Get LA57 from the role_regs, which are initialized from the vCPU even though TDP is enabled, instead of pulling the value directly from the vCPU when computing the guest's root_level for TDP MMUs. Note, the check is inside an is_long_mode() statement, so that requirement is not lost. Use role_regs even though the MMU's role is available and arguably "better". A future commit will consolidate the guest root level logic, and it needs access to EFER.LMA, which is not tracked in the role (it can't be toggled on VM-Exit, unlike LA57). Drop is_la57_mode() as there are no remaining users, and to discourage pulling MMU state from the vCPU (in the future). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-41-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:45 -04:00
Sean Christopherson	5472fcd4c6	KVM: x86/mmu: Get nested MMU's root level from the MMU's role Initialize the MMU's (guest) root_level using its mmu_role instead of redoing the calculations. The role_regs used to calculate the mmu_role are initialized from the vCPU, i.e. this should be a complete nop. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-40-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:44 -04:00
Sean Christopherson	a4c93252fe	KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers Drop kvm_mmu.nx as there no consumers left. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-39-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:44 -04:00
Sean Christopherson	90599c2801	KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration Get the MMU's effective EFER.NX from its role instead of using the one-off, dedicated flag. This will allow dropping said flag in a future commit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-38-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:44 -04:00
Sean Christopherson	84a1622604	KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata Use the MMU's role and role_regs to calculate the MMU's guest root level and NX bit. For some flows, the vCPU state may not be correct (or relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will configure the guest_mmu with possibly-stale vCPU state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-37-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:44 -04:00
Sean Christopherson	cd628f0f1e	KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk Use the NX bit from the MMU's role instead of the MMU itself so that the redundant, dedicated "nx" flag can be dropped. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-36-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:43 -04:00
Sean Christopherson	b67a93a87e	KVM: x86/mmu: Use MMU's roles to compute last non-leaf level Use the MMU's role to get CR4.PSE when determining the last level at which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a huge page. Note, the existing logic is arguably wrong when considering 5-level paging and the case where 1gb pages aren't supported. In practice, the logic is confusing but not broken, because except for 32-bit non-PAE paging, bit 7 (_PAGE_PSE) bit is reserved when a huge page isn't supported at that level. I.e. setting bit 7 will terminate the guest walk one way or another. Furthermore, last_nonleaf_level is only consulted after KVM has verified there are no reserved bits set. All that confusion will be addressed in a future patch by dropping last_nonleaf_level entirely. For now, massage the code to continue the march toward using mmu_role for (almost) all MMU computations. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:43 -04:00
Sean Christopherson	2e4c06618d	KVM: x86/mmu: Use MMU's role to compute PKRU bitmask Use the MMU's role to calculate the Protection Keys (Restrict Userspace) bitmask instead of pulling bits from current vCPU state. For some flows, the vCPU state may not be correct (or relevant), e.g. EPT doesn't interact with PKRU. Case in point, the "ept" param simply disappears. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-34-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:43 -04:00
Sean Christopherson	c596f1470a	KVM: x86/mmu: Use MMU's role to compute permission bitmask Use the MMU's role to generate the permission bitmasks for the MMU. For some flows, the vCPU state may not be correct (or relevant), e.g. the nested NPT MMU can be initialized with incoherent vCPU state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:43 -04:00
Sean Christopherson	b705a277b7	KVM: x86/mmu: Drop vCPU param from reserved bits calculator Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused, and ideally will remain unused in the future. Any information that's needed by the low level helper should be explicitly provided as it's used for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be meaningless or simply wrong. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:42 -04:00
Sean Christopherson	4e9c0d80db	KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits Use the MMU's role to get CR4.PSE when calculating reserved bits for the guest's PTEs. Practically speaking, this is a glorified nop as the role always come from vCPU state for the relevant flows, but converting to the roles will provide consistency once everything else is converted, and will Just Work if the "always comes from vCPU" behavior were ever to change (unlikely). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:42 -04:00
Sean Christopherson	8c985b2d8e	KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits Unconditionally pass pse=false when calculating reserved bits for shadow PTEs. CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does not use for shadow paging (including nested NPT). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:42 -04:00
Sean Christopherson	18db1b1790	KVM: x86/mmu: Always set new mmu_role immediately after checking old role Refactor shadow MMU initialization to immediately set its new mmu_role after verifying it differs from the old role, and so that all flavors of MMU initialization share the same check-and-set pattern. Immediately setting the role will allow future commits to use mmu_role to configure the MMU without consuming stale state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:42 -04:00
Sean Christopherson	84c679f5f5	KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active Don't set cr4_pke or cr4_la57 in the MMU role if long mode isn't active, which is required for protection keys and 5-level paging to be fully enabled. Ignoring the bit avoids unnecessary reconfiguration on reuse, and also means consumers of mmu_role don't need to manually check for long mode. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-28-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:41 -04:00
Sean Christopherson	ca8d664f50	KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0 Don't set CR0/CR4/EFER bits in the MMU role if paging is disabled, paging modifiers are irrelevant if there is no paging in the first place. Somewhat arbitrarily clear gpte_is_8_bytes for shadow paging if paging is disabled in the guest. Again, there are no guest PTEs to process, so the size is meaningless. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:41 -04:00
Sean Christopherson	6066772455	KVM: x86/mmu: Add accessors to query mmu_role bits Add accessors via a builder macro for all mmu_role bits that track a CR0, CR4, or EFER bit, abstracting whether the bits are in the base or the extended role. Future commits will switch to using mmu_role instead of vCPU state to configure the MMU, i.e. there are about to be a large number of users. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:41 -04:00
Sean Christopherson	167f8a5cae	KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans Rename "nxe" to "efer_nx" so that future macro magic can use the pattern <reg>_<bit> for all CR0, CR4, and EFER bits that included in the role. Using "efer_nx" also makes it clear that the role bit reflects EFER.NX, not the NX bit in the corresponding PTE. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:41 -04:00
Sean Christopherson	8626c120ba	KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role Use the provided role_regs to calculate the mmu_role instead of pulling bits from current vCPU state. For some flows, e.g. nested TDP, the vCPU state may not be correct (or relevant). Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-24-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:40 -04:00
Sean Christopherson	cd6767c334	KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role Do not incorporate CR0/CR4 bits into the role for the nested EPT MMU, as EPT behavior is not influenced by CR0/CR4. Note, this is the guest_mmu, (L1's EPT), not nested_mmu (L2's IA32 paging); the nested_mmu does need CR0/CR4, and is initialized in a separate flow. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-23-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:40 -04:00
Sean Christopherson	af09897229	KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context() Consolidate the MMU metadata update calls to deduplicate code, and to prep for future cleanup. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:40 -04:00
Sean Christopherson	594e91a100	KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs Introduce "struct kvm_mmu_role_regs" to hold the register state that is incorporated into the mmu_role. For nested TDP, the register state that is factored into the MMU isn't vCPU state; the dedicated struct will be used to propagate the correct state throughout the flows without having to pass multiple params, and also provides helpers for the various flag accessors. Intentionally make the new helpers cumbersome/ugly by prepending four underscores. In the not-too-distant future, it will be preferable to use the mmu_role to query bits as the mmu_role can drop irrelevant bits without creating contradictions, e.g. clearing CR4 bits when CR0.PG=0. Reserve the clean helper names (no underscores) for the mmu_role. Add a helper for vCPU conversion, which is the common case. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:40 -04:00
Sean Christopherson	d555f7057e	KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs Use the mmu_role to initialize shadow root level instead of assuming the level of KVM's shadow root (host) is the same as that of the guest root, or in the case of 32-bit non-PAE paging where KVM forces PAE paging. For nested NPT, the shadow root level cannot be adapted to L1's NPT root level and is instead always the TDP root level because NPT uses the current host CR0/CR4/EFER, e.g. 64-bit KVM can't drop into 32-bit PAE to shadow L1's NPT. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:39 -04:00
Sean Christopherson	16be1d1292	KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the MMU proper and unexport said function. Aside from dropping an export, this is a baby step toward eliminating the call entirely by fixing the shadow_root_level confusion. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:39 -04:00
Sean Christopherson	20f632bd00	KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper Grab all CR0/CR4 MMU role bits from current vCPU state when initializing a non-nested shadow MMU. Extract the masks from kvm_post_set_cr{0,4}(), as the CR0/CR4 update masks must exactly match the mmu_role bits, with one exception (see below). The "full" CR0/CR4 will be used by future commits to initialize the MMU and its role, as opposed to the current approach of pulling everything from vCPU, which is incorrect for certain flows, e.g. nested NPT. CR4.LA57 is an exception, as it can be toggled on VM-Exit (for L1's MMU) but can't be toggled via MOV CR4 while long mode is active. I.e. LA57 needs to be in the mmu_role, but technically doesn't need to be checked by kvm_post_set_cr4(). However, the extra check is completely benign as the hardware restrictions simply mean LA57 will never be _the_ cause of a MMU reset during MOV CR4. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:39 -04:00
Sean Christopherson	18feaad3c6	KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs Drop the smep_andnot_wp role check from the "uses NX" calculation now that all non-nested shadow MMUs treat NX as used via the !TDP check. The shadow MMU for nested NPT, which shares the helper, does not need to deal with SMEP (or WP) as NPT walks are always "user" accesses and WP is explicitly noted as being ignored: Table walks for guest page tables are always treated as user writes at the nested page table level. A table walk for the guest page itself is always treated as a user access at the nested page table level The host hCR0.WP bit is ignored under nested paging. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:39 -04:00
Sean Christopherson	31e96bc636	KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Add a comment in the nested NPT initialization flow to call out that it intentionally uses vmcb01 instead current vCPU state to get the effective hCR4 and hEFER for L1's NPT context. Note, despite nSVM's efforts to handle the case where vCPU state doesn't reflect L1 state, the MMU may still do the wrong thing due to pulling state from the vCPU instead of the passed in CR0/CR4/EFER values. This will be addressed in future commits. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:38 -04:00
Sean Christopherson	dbc4739b6b	KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER as a u64 in various flows (mostly MMU). Passing the params as u32s is functionally ok since all of the affected registers reserve bits 63:32 to zero (enforced by KVM), but it's technically wrong. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:38 -04:00
Sean Christopherson	0337f585f5	KVM: x86/mmu: Rename unsync helper and update related comments Rename mmu_need_write_protect() to mmu_try_to_unsync_pages() and update a variety of related, stale comments. Add several new comments to call out subtle details, e.g. that upper-level shadow pages are write-tracked, and that can_unsync is false iff KVM is in the process of synchronizing pages. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:38 -04:00
Sean Christopherson	479a1efc81	KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page() Nove the kvm_unlink_unsync_page() call out of kvm_sync_page() and into it's sole caller, and fold __kvm_sync_page() into kvm_sync_page() since the latter becomes a pure pass-through. There really should be no reason for code to do a complete sync of a shadow page outside of the full kvm_mmu_sync_roots(), e.g. the one use case that creeped in turned out to be flawed and counter-productive. Drop the stale comment about @sp->gfn needing to be write-protected, as it directly contradicts the kvm_mmu_get_page() usage. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:38 -04:00
Sean Christopherson	07dc4f35a4	KVM: x86/mmu: comment on kvm_mmu_get_page's syncing of pages Explain the usage of sync_page() in kvm_mmu_get_page(), which is subtle in how and why it differs from mmu_sync_children(). Signed-off-by: Sean Christopherson <seanjc@google.com> [Split out of a different patch by Sean. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:37 -04:00
Sean Christopherson	2640b08653	KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches When synchronizing a shadow page, WARN and zap the page if its mmu role isn't compatible with the current MMU context, where "compatible" is an exact match sans the bits that have no meaning in the overall MMU context or will be explicitly overwritten during the sync. Many of the helpers used by sync_page() are specific to the current context, updating a SMM vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2 PTEs might work but would be extremely bizaree, and so on and so forth. Drop the guard with respect to 8-byte vs. 4-byte PTEs in __kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped trying to sync shadow pages irrespective of the current MMU context. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:37 -04:00
Sean Christopherson	00a669780f	KVM: x86/mmu: Use MMU role to check for matching guest page sizes Originally, __kvm_sync_page used to check the cr4_pae bit in the role to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte or the other way round. However, in commit `47c42e6b41` ("KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it was observed that this did not work for nested EPT, where the page table size would be 8 bytes even if CR4.PAE=0. (Note that the check still has to be done for nested NPT, so it is not possible to use tdp_enabled or similar). Therefore, a hack was introduced to identify nested EPT shadow pages and unconditionally call __kvm_sync_page() on them. However, it is possible to do without the hack to identify nested EPT shadow pages: if EPT is active, there will be no shadow pages in non-EPT format, and all of them will have gpte_is_8_bytes set to true; we can just check the MMU role directly, and the test will always be true. Even for non-EPT shadow MMUs, this test should really always be true now that __kvm_sync_page() is called if and only if the role is an exact match (kvm_mmu_get_page()) or is part of the current MMU context (kvm_mmu_sync_roots()). A future commit will convert the likely-pointless check into a meaningful WARN to enforce that the mmu_roles of the current context and the shadow page are compatible. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:37 -04:00
Sean Christopherson	ddc16abbba	KVM: x86/mmu: Unconditionally zap unsync SPs when creating >4k SP at GFN When creating a new upper-level shadow page, zap unsync shadow pages at the same target gfn instead of attempting to sync the pages. This fixes a bug where an unsync shadow page could be sync'd with an incompatible context, e.g. wrong smm, is_guest, etc... flags. In practice, the bug is relatively benign as sync_page() is all but guaranteed to fail its check that the guest's desired gfn (for the to-be-sync'd page) matches the current gfn associated with the shadow page. I.e. kvm_sync_page() would end up zapping the page anyways. Alternatively, __kvm_sync_page() could be modified to explicitly verify the mmu_role of the unsync shadow page is compatible with the current MMU context. But, except for this specific case, __kvm_sync_page() is called iff the page is compatible, e.g. the transient sync in kvm_mmu_get_page() requires an exact role match, and the call from kvm_sync_mmu_roots() is only synchronizing shadow pages from the current MMU (which better be compatible or KVM has problems). And as described above, attempting to sync shadow pages when creating an upper-level shadow page is unlikely to succeed, e.g. zero successful syncs were observed when running Linux guests despite over a million attempts. Fixes: `9f1a122f97` ("KVM: MMU: allow more page become unsync at getting sp time") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-10-seanjc@google.com> [Remove WARN_ON after __kvm_sync_page. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:37 -04:00
Sean Christopherson	6c032f12dd	Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" Drop MAXPHYADDR from mmu_role now that all MMUs have their role invalidated after a CPUID update. Invalidating the role forces all MMUs to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can only be changed only through a CPUID update. This reverts commit `de3ccd26fa`. Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:36 -04:00
Sean Christopherson	63f5a1909f	KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest instability. Initialize last_vmentry_cpu to -1 and use it to detect if the vCPU has been run at least once when its CPUID model is changed. KVM does not correctly handle changes to paging related settings in the guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc... KVM could theoretically zap all shadow pages, but actually making that happen is a mess due to lock inversion (vcpu->mutex is held). And even then, updating paging settings on the fly would only work if all vCPUs are stopped, updated in concert with identical settings, then restarted. To support running vCPUs with different vCPU models (that affect paging), KVM would need to track all relevant information in kvm_mmu_page_role. Note, that's the _page_ role, not the full mmu_role. Updating mmu_role isn't sufficient as a vCPU can reuse a shadow page translation that was created by a vCPU with different settings and thus completely skip the reserved bit checks (that are tied to CPUID). Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as it would require doubling gfn_track from a u16 to a u32, i.e. would increase KVM's memory footprint by 2 bytes for every 4kb of guest memory. E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT would all need to be tracked. In practice, there is no remotely sane use case for changing any paging related CPUID entries on the fly, so just sweep it under the rug (after yelling at userspace). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:36 -04:00
Sean Christopherson	49c6f8756c	KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified Invalidate all MMUs' roles after a CPUID update to force reinitizliation of the MMU context/helpers. Despite the efforts of commit `de3ccd26fa` ("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"), there are still a handful of CPUID-based properties that affect MMU behavior but are not incorporated into mmu_role. E.g. 1gb hugepage support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all factor into the guest's reserved PTE bits. The obvious alternative would be to add all such properties to mmu_role, but doing so provides no benefit over simply forcing a reinitialization on every CPUID update, as setting guest CPUID is a rare operation. Note, reinitializing all MMUs after a CPUID update does not fix all of KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID properties, which means that a vCPU can reuse shadow pages that should not exist for the new vCPU model, e.g. that map GPAs that are now illegal (due to MAXPHYADDR changes) or that set bits that are now reserved (PAGE_SIZE for 1gb pages), etc... Tracking the relevant CPUID properties in kvm_mmu_page_role would address the majority of problems, but fully tracking that much state in the shadow page role comes with an unpalatable cost as it would require a non-trivial increase in KVM's memory footprint. The GBPAGES case is even worse, as neither Intel nor AMD provides a way to disable 1gb hugepage support in the hardware page walker, i.e. it's a virtualization hole that can't be closed when using TDP. In other words, resetting the MMU after a CPUID update is largely a superficial fix. But, it will allow reverting the tracking of MAXPHYADDR in the mmu_role, and that case in particular needs to mostly work because KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging is supported. For cases where KVM botches guest behavior, the damage is limited to that guest. But for the shadow_root_level, a misconfigured MMU can cause KVM to incorrectly access memory, e.g. due to walking off the end of its shadow page tables. Fixes: `7dcd575520` ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:36 -04:00
Sean Christopherson	f71a53d118	Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in mmu_role.base.level because that tracks the shadow root level, i.e. TDP level. Normally, this is not an issue because LA57 can't be toggled while long mode is active, i.e. the guest has to first disable paging, then toggle LA57, then re-enable paging, thus ensuring an MMU reinitialization. But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57 without having to bounce through an unpaged section. L1 can also load a new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a single entry PML5 is sufficient. Such shenanigans are only problematic if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode. Note, in the L2 case with nested TDP, even though L1 can switch between L2s with different LA57 settings, thus bypassing the paging requirement, in that case KVM's nested_mmu will track LA57 in base.level. This reverts commit `8053f924ca`. Fixes: `8053f924ca` ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:36 -04:00
Sean Christopherson	ef318b9edf	KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk Use the MMU's role to get its effective SMEP value when injecting a fault into the guest. When walking L1's (nested) NPT while L2 is active, vCPU state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0, CR4, EFER, etc... If L1 and L2 have different settings for SMEP and L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH when injecting #NPF. Fixes: `e57d4a356a` ("KVM: Add instruction fetch checking when walking guest page table") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:35 -04:00
Sean Christopherson	0aa1837533	KVM: x86: Properly reset MMU context at vCPU RESET/INIT Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG was set prior to INIT. Simply re-initializing the current MMU is not sufficient as the current root HPA may not be usable in the new context. E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode, KVM will fail to switch to the 32-bit pae_root and bomb on the next VM-Enter due to running with a 64-bit CR3 in 32-bit mode. This bug was papered over in both VMX and SVM, but still managed to rear its head in the MMU role on VMX. Because EFER.LMA=1 requires CR0.PG=1, kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first checking CR0.PG. VMX's RESET/INIT flow writes CR0 before EFER, and so an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to generate the wrong MMU role. In VMX, the INIT issue is specific to running without unrestricted guest since unrestricted guest is available if and only if EPT is enabled. Commit `8668a3c468` ("KVM: VMX: Reset mmu context when entering real mode") resolved the issue by forcing a reset when entering emulated real mode. In SVM, commit `ebae871a50` ("kvm: svm: reset mmu on VCPU reset") forced a MMU reset on every INIT to workaround the flaw in common x86. Note, at the time the bug was fixed, the SVM problem was exacerbated by a complete lack of a CR4 update. The vendor resets will be reverted in future patches, primarily to aid bisection in case there are non-INIT flows that rely on the existing VMX logic. Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU context reset. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:35 -04:00
Sean Christopherson	112022bdb5	KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs Mark NX as being used for all non-nested shadow MMUs, as KVM will set the NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled. Checking the mitigation itself is not sufficient as it can be toggled on at any time and KVM doesn't reset MMU contexts when that happens. KVM could reset the contexts, but that would require purging all SPTEs in all MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved for shadow MMUs. Fixes: `b8e8c8303f` ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:35 -04:00
Sean Christopherson	f0d4379087	KVM: x86/mmu: Remove broken WARN that fires on 32-bit KVM w/ nested EPT Remove a misguided WARN that attempts to detect the scenario where using a special A/D tracking flag will set reserved bits on a non-MMIO spte. The WARN triggers false positives when using EPT with 32-bit KVM because of the !64-bit clause, which is just flat out wrong. The whole A/D tracking goo is specific to EPT, and one of the big selling points of EPT is that EPT is decoupled from the host's native paging mode. Drop the WARN instead of trying to salvage the check. Keeping a check specific to A/D tracking bits would essentially regurgitate the same code that led to KVM needed the tracking bits in the first place. A better approach would be to add a generic WARN on reserved bits being set, which would naturally cover the A/D tracking bits, work for all flavors of paging, and be self-documenting to some extent. Fixes: `8a406c8953` ("KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:35 -04:00
Jing Zhang	bc9e9e672d	KVM: debugfs: Reuse binary stats descriptors To remove code duplication, use the binary stats descriptors in the implementation of the debugfs interface for statistics. This unifies the definition of statistics for the binary and debugfs interfaces. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-8-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:29 -04:00
Jing Zhang	ce55c04945	KVM: stats: Support binary stats retrieval for a VCPU Add a VCPU ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VCPU stats header, descriptors and data. Define VCPU statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:19 -04:00
Jing Zhang	fcfe1baedd	KVM: stats: Support binary stats retrieval for a VM Add a VM ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VM stats header, descriptors and data. Define VM statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-4-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:10 -04:00
Jing Zhang	cb082bfab5	KVM: stats: Add fd-based API to read binary stats data This commit defines the API for userspace and prepare the common functionalities to support per VM/VCPU binary stats data readings. The KVM stats now is only accessible by debugfs, which has some shortcomings this change series are supposed to fix: 1. The current debugfs stats solution in KVM could be disabled when kernel Lockdown mode is enabled, which is a potential rick for production. 2. The current debugfs stats solution in KVM is organized as "one stats per file", it is good for debugging, but not efficient for production. 3. The stats read/clear in current debugfs solution in KVM are protected by the global kvm_lock. Besides that, there are some other benefits with this change: 1. All KVM VM/VCPU stats can be read out in a bulk by one copy to userspace. 2. A schema is used to describe KVM statistics. From userspace's perspective, the KVM statistics are self-describing. 3. With the fd-based solution, a separate telemetry would be able to read KVM stats in a less privileged environment. 4. After the initial setup by reading in stats descriptors, a telemetry only needs to read the stats data itself, no more parsing or setup is needed. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-3-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 11:47:57 -04:00
Jing Zhang	0193cc908b	KVM: stats: Separate generic stats from architecture specific ones Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 11:47:56 -04:00
Sean Christopherson	6c6e166b2c	KVM: x86/mmu: Don't WARN on a NULL shadow page in TDP MMU check Treat a NULL shadow page in the "is a TDP MMU" check as valid, non-TDP root. KVM uses a "direct" PAE paging MMU when TDP is disabled and the guest is running with paging disabled. In that case, root_hpa points at the pae_root page (of which only 32 bytes are used), not a standard shadow page, and the WARN fires (a lot). Fixes: `0b873fd7fb` ("KVM: x86/mmu: Remove redundant is_tdp_mmu_enabled check") Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622072454.3449146-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 11:47:56 -04:00
Sean Christopherson	b33bb78a1f	KVM: nVMX: Handle split-lock #AC exceptions that happen in L2 Mark #ACs that won't be reinjected to the guest as wanted by L0 so that KVM handles split-lock #AC from L2 instead of forwarding the exception to L1. Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a regular #AC and do the wrong thing, e.g. reinject it into L2. Fixes: `e6f8b6c12f` ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest") Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622172244.3561540-1-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 04:31:16 -04:00
Colin Ian King	31c6565700	KVM: x86/mmu: Fix uninitialized boolean variable flush In the case where kvm_memslots_have_rmaps(kvm) is false the boolean variable flush is not set and is uninitialized. If is_tdp_mmu_enabled(kvm) is true then the call to kvm_tdp_mmu_zap_collapsible_sptes passes the uninitialized value of flush into the call. Fix this by initializing flush to false. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `e2209710cc` ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622150912.23429-1-colin.king@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 04:31:16 -04:00
Jim Mattson	18f63b15b0	KVM: x86: Print CPU of last attempted VM-entry when dumping VMCS/VMCB Failed VM-entry is often due to a faulty core. To help identify bad cores, print the id of the last logical processor that attempted VM-entry whenever dumping a VMCS or VMCB. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20210621221648.1833148-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 04:31:13 -04:00
Thomas Gleixner	72a6c08c44	x86/pkru: Remove xstate fiddling from write_pkru() The PKRU value of a task is stored in task->thread.pkru when the task is scheduled out. PKRU is restored on schedule in from there. So keeping the XSAVE buffer up to date is a pointless exercise. Remove the xstate fiddling and cleanup all related functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.897372712@linutronix.de	2021-06-23 19:55:51 +02:00
Dave Hansen	784a46618f	x86/pkeys: Move read_pkru() and write_pkru() write_pkru() was originally used just to write to the PKRU register. It was mercifully short and sweet and was not out of place in pgtable.h with some other pkey-related code. But, later work included a requirement to also modify the task XSAVE buffer when updating the register. This really is more related to the XSAVE architecture than to paging. Move the read/write_pkru() to asm/pkru.h. pgtable.h won't miss them. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.de	2021-06-23 18:52:57 +02:00
Thomas Gleixner	1c61fada30	x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate() This is not a copy functionality. It restores the register state from the supplied kernel buffer. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.716058365@linutronix.de	2021-06-23 18:36:42 +02:00
Thomas Gleixner	ebe7234b08	x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate() A copy is guaranteed to leave the source intact, which is not the case when FNSAVE is used as that reinitilizes the registers. Save does not make such guarantees and it matches what this is about, i.e. to save the state for a later restore. Rename it to save_fpregs_to_fpstate(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121454.508853062@linutronix.de	2021-06-23 18:26:43 +02:00
Dave Hansen	71ef453355	x86/kvm: Avoid looking up PKRU in XSAVE buffer PKRU is being removed from the kernel XSAVE/FPU buffers. This removal will probably include warnings for code that look up PKRU in those buffers. KVM currently looks up the location of PKRU but doesn't even use the pointer that it gets back. Rework the code to avoid calling get_xsave_addr() except in cases where its result is actually used. This makes the code more clear and also avoids the inevitable PKRU warnings. This is probably a good cleanup and could go upstream idependently of any PKRU rework. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121453.541037562@linutronix.de	2021-06-23 17:49:47 +02:00
Paolo Bonzini	c3ab0e28a4	Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes	2021-06-23 07:30:41 -04:00
Sean Christopherson	ba1f82456b	KVM: nVMX: Dynamically compute max VMCS index for vmcs12 Calculate the max VMCS index for vmcs12 by walking the array to find the actual max index. Hardcoding the index is prone to bitrot, and the calculation is only done on KVM bringup (albeit on every CPU, but there aren't _that_ many null entries in the array). Fixes: `3c0f99366e` ("KVM: nVMX: Add a TSC multiplier field in VMCS12") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210618214658.2700765-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-21 12:58:55 -04:00
Jim Mattson	5140bc7d6b	KVM: VMX: Skip #PF(RSVD) intercepts when emulating smaller maxphyaddr As part of smaller maxphyaddr emulation, kvm needs to intercept present page faults to see if it needs to add the RSVD flag (bit 3) to the error code. However, there is no need to intercept page faults that already have the RSVD flag set. When setting up the page fault intercept, add the RSVD flag into the #PF error code mask field (but not the #PF error code match field) to skip the intercept when the RSVD flag is already set. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20210618235941.1041604-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-21 12:58:55 -04:00
David Matlack	0485cf8dbe	KVM: x86/mmu: Remove redundant root_hpa checks The root_hpa checks below the top-level check in kvm_mmu_page_fault are theoretically redundant since there is no longer a way for the root_hpa to be reset during a page fault. The details of why are described in commit `ddce620821` ("KVM: x86/mmu: Move root_hpa validity checks to top of page fault handler") __direct_map, kvm_tdp_mmu_map, and get_mmio_spte are all only reachable through kvm_mmu_page_fault, therefore their root_hpa checks are redundant. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:45:47 -04:00
David Matlack	63c0cac938	KVM: x86/mmu: Refactor is_tdp_mmu_root into is_tdp_mmu This change simplifies the call sites slightly and also abstracts away the implementation detail of looking at root_hpa as the mechanism for determining if the mmu is the TDP MMU. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:45:46 -04:00
David Matlack	0b873fd7fb	KVM: x86/mmu: Remove redundant is_tdp_mmu_enabled check This check is redundant because the root shadow page will only be a TDP MMU page if is_tdp_mmu_enabled() returns true, and is_tdp_mmu_enabled() never changes for the lifetime of a VM. It's possible that this check was added for performance reasons but it is unlikely that it is useful in practice since to_shadow_page() is cheap. That being said, this patch also caches the return value of is_tdp_mmu_root() in direct_page_fault() since there's no reason to duplicate the call so many times, so performance is not a concern. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:45:46 -04:00
David Matlack	aa23c0ad14	KVM: x86/mmu: Remove redundant is_tdp_mmu_root check The check for is_tdp_mmu_root in kvm_tdp_mmu_map is redundant because kvm_tdp_mmu_map's only caller (direct_page_fault) already checks is_tdp_mmu_root. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:45:46 -04:00
Paolo Bonzini	c62efff28b	KVM: x86: Stub out is_tdp_mmu_root on 32-bit hosts If is_tdp_mmu_root is not inlined, the elimination of TDP MMU calls as dead code might not work out. To avoid this, explicitly declare the stubbed is_tdp_mmu_root on 32-bit hosts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:45:46 -04:00
Sean Christopherson	8bbed95d2c	KVM: x86: WARN and reject loading KVM if NX is supported but not enabled WARN if NX is reported as supported but not enabled in EFER. All flavors of the kernel, including non-PAE 32-bit kernels, set EFER.NX=1 if NX is supported, even if NX usage is disable via kernel command line. KVM relies on NX being enabled if it's supported, e.g. KVM will generate illegal NPT entries if nx_huge_pages is enabled and NX is supported but not enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:24:50 -04:00
Sean Christopherson	b26a71a1a5	KVM: SVM: Refuse to load kvm_amd if NX support is not available Refuse to load KVM if NX support is not available. Shadow paging has assumed NX support since commit `9167ab7993` ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), and NPT has assumed NX support since commit `b8e8c8303f` ("kvm: mmu: ITLB_MULTIHIT mitigation"). While the NX huge pages mitigation should not be enabled by default for AMD CPUs, it can be turned on by userspace at will. Unlike Intel CPUs, AMD does not provide a way for firmware to disable NX support, and Linux always sets EFER.NX=1 if it is supported. Given that it's extremely unlikely that a CPU supports NPT but not NX, making NX a formal requirement is far simpler than adding requirements to the mitigation flow. Fixes: `9167ab7993` ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active") Fixes: `b8e8c8303f` ("kvm: mmu: ITLB_MULTIHIT mitigation") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:24:49 -04:00
Sean Christopherson	23f079c249	KVM: VMX: Refuse to load kvm_intel if EPT and NX are disabled Refuse to load KVM if NX support is not available and EPT is not enabled. Shadow paging has assumed NX support since commit `9167ab7993` ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), so for all intents and purposes this has been a de facto requirement for over a year. Do not require NX support if EPT is enabled purely because Intel CPUs let firmware disable NX support via MSR_IA32_MISC_ENABLES. If not for that, VMX (and KVM as a whole) could require NX support with minimal risk to breaking userspace. Fixes: `9167ab7993` ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-18 06:24:49 -04:00
Ingo Molnar	b2c0931a07	Merge branch 'sched/urgent' into sched/core, to resolve conflicts This commit in sched/urgent moved the cfs_rq_is_decayed() function: `a7b359fc6a`: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") and this fresh commit in sched/core modified it in the old location: `9e077b52d8`: ("sched/pelt: Check that _avg are null when _sum are") Merge the two variants. Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2021-06-18 11:31:25 +02:00
Linus Torvalds	fd0aa1a456	Miscellaneous bugfixes. The main interesting one is a NULL pointer dereference reported by syzkaller ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared"). -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDLldwUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPTOgf/XpAehLdWlx2877ulcBD0Kjt0tLvH OFHRD1Ir0d1Ay3DX8qmxLquXHB4CoDGZBwi1d7AI165kUP/XLmV0bY6TZ74inI/P CaD8Bsbm8+iBl5jrovEPc+223bK+3OFOvo2pk6M/MlsO/ExRikaPDtHOnkfousbl nLX8v2qd7ihWyJOdLJMU9pV8E2iczQoCuH9yWBHdCrxRxWtPzkEekPWb0sujByiF 4tD7sqiEA3ugbF1Wm5keQV63NLplfxx+Zun0FV+tbpjjxQWAGl81dP+zmqok0sM/ qQCyZevt6jLLkL+Fn6hI6PP9OTeYreX2fgwhWXs71d2js33yNg5Veqx5Bw== =Gs/y -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Miscellaneous bugfixes. The main interesting one is a NULL pointer dereference reported by syzkaller ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Fix kvm_check_cap() assertion KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU KVM: X86: Fix x86_emulator slab cache leak KVM: SVM: Call SEV Guest Decommission if ASID binding fails KVM: x86: Immediately reset the MMU context when the SMM flag is cleared KVM: x86: Fix fall-through warnings for Clang KVM: SVM: fix doc warnings KVM: selftests: Fix compiling errors when initializing the static structure kvm: LAPIC: Restore guard to prevent illegal APIC register access	2021-06-17 13:14:53 -07:00
Kai Huang	f1b8325508	KVM: x86/mmu: Fix TDP MMU page table level TDP MMU iterator's level is identical to page table's actual level. For instance, for the last level page table (whose entry points to one 4K page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging, the iter->level is mmu->shadow_root_level, which is 5. However, struct kvm_mmu_page's level currently is not set correctly when it is allocated in kvm_tdp_mmu_map(). When iterator hits non-present SPTE and needs to allocate a new child page table, currently iter->level, which is the level of the page table where the non-present SPTE belongs to, is used. This results in struct kvm_mmu_page's level always having its parent's level (excpet root table's level, which is initialized explicitly using mmu->shadow_root_level). This is kinda wrong, and not consistent with existing non TDP MMU code. Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave correctly. However to make it consistent with legacy MMU code (and fix the issue that both root page table and its child page table have shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <bcb6569b6e96cb78aaa7b50640e6e6b53291a74e.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 14:27:49 -04:00
Kai Huang	857f84743e	KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level() Currently pf_fixed is not increased when prefault is true. This is not correct, since prefault here really means "async page fault completed". In that case, the original page fault from the guest was morphed into as async page fault and pf_fixed was not increased. So when prefault indicates async page fault is completed, pf_fixed should be increased. Additionally, currently pf_fixed is also increased even when page fault is spurious, while legacy MMU increases pf_fixed when page fault returns RET_PF_EMULATE or RET_PF_FIXED. To fix above two issues, change to increase pf_fixed when return value is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by reaching here). More information: https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5 Fixes: `bb18842e21` ("kvm: x86/mmu: Add TDP MMU PF handler") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <2ea8b7f5d4f03c99b32bc56fc982e1e4e3d3fc6b.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 14:27:16 -04:00
Kai Huang	57a3e96d6d	KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() Currently tdp_mmu_map_handle_target_level() returns 0, which is RET_PF_RETRY, when page fault is actually fixed. This makes kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of RET_PF_FIXED. Fix by initializing ret to RET_PF_FIXED. Note that kvm_mmu_page_fault() resumes guest on both RET_PF_RETRY and RET_PF_FIXED, which means in practice returning the two won't make difference, so this fix alone won't be necessary for stable tree. Fixes: `bb18842e21` ("kvm: x86/mmu: Add TDP MMU PF handler") Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <f9e8956223a586cd28c090879a8ff40f5eb6d609.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 14:26:45 -04:00
Wanpeng Li	2735886c9e	KVM: LAPIC: Keep stored TMCCT register value 0 after KVM_SET_LAPIC KVM_GET_LAPIC stores the current value of TMCCT and KVM_SET_LAPIC's memcpy stores it in vcpu->arch.apic->regs, KVM_SET_LAPIC could store zero in vcpu->arch.apic->regs after it uses it, and then the stored value would always be zero. In addition, the TMCCT is always computed on-demand and never directly readable. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1623223000-18116-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 14:26:11 -04:00
Ashish Kalra	0dbb112304	KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 14:25:39 -04:00
Sean Christopherson	ade74e1433	KVM: x86/mmu: Grab nx_lpage_splits as an unsigned long before division Snapshot kvm->stats.nx_lpage_splits into a local unsigned long to avoid 64-bit division on 32-bit kernels. Casting to an unsigned long is safe because the maximum number of shadow pages, n_max_mmu_pages, is also an unsigned long, i.e. KVM will start recycling shadow pages before the number of splits can exceed a 32-bit value. ERROR: modpost: "__udivdi3" [arch/x86/kvm/kvm.ko] undefined! Fixes: 7ee093d4f3f5 ("KVM: switch per-VM stats to u64") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210615162905.2132937-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:10:18 -04:00
Vitaly Kuznetsov	bca66dbcd2	KVM: x86: Check for pending interrupts when APICv is getting disabled When APICv is active, interrupt injection doesn't raise KVM_REQ_EVENT request (see __apic_accept_irq()) as the required work is done by hardware. In case KVM_REQ_APICV_UPDATE collides with such injection, the interrupt may never get delivered. Currently, the described situation is hardly possible: all kvm_request_apicv_update() calls normally happen upon VM creation when no interrupts are pending. We are, however, going to move unconditional kvm_request_apicv_update() call from kvm_hv_activate_synic() to synic_update_vector() and without this fix 'hyperv_connections' test from kvm-unit-tests gets stuck on IPI delivery attempt right after configuring a SynIC route which triggers APICv disablement. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:55 -04:00
Sean Christopherson	c5ffd408cd	KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulation Drop the explicit check on EPTP switching being enabled. The EPTP switching check is handled in the generic VMFUNC function check, while the underlying VMFUNC enablement check is done by hardware and redone by generic VMFUNC emulation. The vmcs12 EPT check is handled by KVM at VM-Enter in the form of a consistency check, keep it but add a WARN. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:54 -04:00
Sean Christopherson	546e8398bc	KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occur WARN and inject #UD when emulating VMFUNC for L2 if the function is out-of-bounds or if VMFUNC is not enabled in vmcs12. Neither condition should occur in practice, as the CPU is supposed to prioritize the #UD over VM-Exit for out-of-bounds input and KVM is supposed to enable VMFUNC in vmcs02 if and only if it's enabled in vmcs12, but neither of those dependencies is obvious. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:54 -04:00
Sean Christopherson	c906066288	KVM: x86: Drop pointless @reset_roots from kvm_init_mmu() Remove the @reset_roots param from kvm_init_mmu(), the one user, kvm_mmu_reset_context() has already unloaded the MMU and thus freed and invalidated all roots. This also happens to be why the reset_roots=true paths doesn't leak roots; they're already invalid. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:54 -04:00
Sean Christopherson	e62f1aa8b9	KVM: x86: Defer MMU sync on PCID invalidation Defer the MMU sync on PCID invalidation so that multiple sync requests in a single VM-Exit are batched. This is a very minor optimization as checking for unsync'd children is quite cheap. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:54 -04:00
Sean Christopherson	39353ab579	KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH] Use __kvm_mmu_new_pgd() via kvm_init_shadow_ept_mmu() to emulate VMFUNC[EPTP_SWITCH] instead of nuking all MMUs. EPTP_SWITCH is the EPT equivalent of MOV to CR3, i.e. is a perfect fit for the common PGD flow, the only hiccup being that A/D enabling is buried in the EPTP. But, that is easily handled by bouncing through kvm_init_shadow_ept_mmu(). Explicitly request a guest TLB flush if VPID is disabled. Per Intel's SDM, if VPID is disabled, "an EPTP-switching VMFUNC invalidates combined mappings associated with VPID 0000H (for all PCIDs and for all EP4TA values, where EP4TA is the value of bits 51:12 of EPTP)". Note, this technically is a very bizarre bug fix of sorts if L2 is using PAE paging, as avoiding the full MMU reload also avoids incorrectly reloading the PDPTEs, which the SDM explicitly states are not touched: If PAE paging is in use, an EPTP-switching VMFUNC does not load the four page-directory-pointer-table entries (PDPTEs) from the guest-physical address in CR3. The logical processor continues to use the four guest-physical addresses already present in the PDPTEs. The guest-physical address in CR3 is not translated through the new EPT paging structures (until some operation that would load the PDPTEs). In addition to optimizing L2's MMU shenanigans, avoiding the full reload also optimizes L1's MMU as KVM_REQ_MMU_RELOAD wipes out all roots in both root_mmu and guest_mmu. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:53 -04:00
Sean Christopherson	28f28d453f	KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulation Use KVM_REQ_TLB_FLUSH_GUEST instead of KVM_REQ_MMU_RELOAD when emulating INVPCID of all contexts. In the current code, this is a glorified nop as TLB_FLUSH_GUEST becomes kvm_mmu_unload(), same as MMU_RELOAD, when TDP is disabled, which is the only time INVPCID is only intercepted+emulated. In the future, reusing TLB_FLUSH_GUEST will simplify optimizing paths that emulate a guest TLB flush, e.g. by synchronizing as needed instead of completely unloading all MMUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:53 -04:00
Sean Christopherson	25b62c6274	KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT When emulating INVVPID for L1, free only L2+ roots, using the guest_mode tag in the MMU role to identify L2+ roots. From L1's perspective, its own TLB entries use VPID=0, and INVVPID is not requied to invalidate such entries. Per Intel's SDM, INVVPID _may_ invalidate entries with VPID=0, but it is not required to do so. Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:53 -04:00
Sean Christopherson	50a417962a	KVM: nVMX: Consolidate VM-Enter/VM-Exit TLB flush and MMU sync logic Drop the dedicated nested_vmx_transition_mmu_sync() now that the MMU sync is handled via KVM_REQ_TLB_FLUSH_GUEST, and fold that flush into the all-encompassing nested_vmx_transition_tlb_flush(). Opportunistically add a comment explaning why nested EPT never needs to sync the MMU on VM-Enter. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:53 -04:00
Sean Christopherson	b512910039	KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that all call sites unconditionally skip both the sync and flush. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:52 -04:00
Sean Christopherson	d2e5601907	KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper Introduce nested_svm_transition_tlb_flush() and use it force an MMU sync and TLB flush on nSVM VM-Enter and VM-Exit instead of sneaking the logic into the __kvm_mmu_new_pgd() call sites. Add a partial todo list to document issues that need to be addressed before the unconditional sync and flush can be modified to look more like nVMX's logic. In addition to making nSVM's forced flushing more overt (guess who keeps losing track of it), the new helper brings further convergence between nSVM and nVMX, and also sets the stage for dropping the "skip" params from __kvm_mmu_new_pgd(). Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:52 -04:00
Sean Christopherson	415b1a0105	KVM: x86: Uncondtionally skip MMU sync/TLB flush in MOV CR3's PGD switch Stop leveraging the MMU sync and TLB flush requested by the fast PGD switch helper now that kvm_set_cr3() manually handles the necessary sync, frees, and TLB flush. This will allow dropping the params from the fast PGD helpers since nested SVM is now the odd blob out. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:52 -04:00
Sean Christopherson	21823fbda5	KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush Flush and sync all PGDs for the current/target PCID on MOV CR3 with a TLB flush, i.e. without PCID_NOFLUSH set. Paraphrasing Intel's SDM regarding the behavior of MOV to CR3: - If CR4.PCIDE = 0, invalidates all TLB entries associated with PCID 000H and all entries in all paging-structure caches associated with PCID 000H. - If CR4.PCIDE = 1 and NOFLUSH=0, invalidates all TLB entries associated with the PCID specified in bits 11:0, and all entries in all paging-structure caches associated with that PCID. It is not required to invalidate entries in the TLBs and paging-structure caches that are associated with other PCIDs. - If CR4.PCIDE=1 and NOFLUSH=1, is not required to invalidate any TLB entries or entries in paging-structure caches. Extract and reuse the logic for INVPCID(single) which is effectively the same flow and works even if CR4.PCIDE=0, as the current PCID will be '0' in that case, thus honoring the requirement of flushing PCID=0. Continue passing skip_tlb_flush to kvm_mmu_new_pgd() even though it _should_ be redundant; the clean up will be done in a future patch. The overhead of an unnecessary nop sync is minimal (especially compared to the actual sync), and the TLB flush is handled via request. Avoiding the the negligible overhead is not worth the risk of breaking kernels that backport the fix. Fixes: `956bf3531f` ("kvm: x86: Skip shadow page resync on CR3 switch when indicated by guest") Cc: Junaid Shahid <junaids@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:52 -04:00
Sean Christopherson	272b0a998d	KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch Drop bogus logic that incorrectly clobbers the accessed/dirty enabling status of the nested MMU on an EPTP switch. When nested EPT is enabled, walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2. This is likely a benign bug, as mmu->ept_ad is never consumed (since the MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will never propagate into future shadow pages since the nested MMU isn't used to map anything, just to walk L2's page tables. Note, KVM also does a full MMU reload, i.e. the guest_mmu will be recreated using the new EPTP, and thus any change in A/D enabling will be properly recognized in the relevant MMU. Fixes: `41ab937274` ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:51 -04:00
Sean Christopherson	0e75225dfa	KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap Use BIT_ULL() instead of an open-coded shift to check whether or not a function is enabled in L1's VMFUNC bitmap. This is a benign bug as KVM supports only bit 0, and will fail VM-Enter if any other bits are set, i.e. bits 63:32 are guaranteed to be zero. Note, "function" is bounded by hardware as VMFUNC will #UD before taking a VM-Exit if the function is greater than 63. Before: if ((vmcs12->vm_function_control & (1 << function)) == 0) 0x000000000001a916 <+118>: mov $0x1,%eax 0x000000000001a91b <+123>: shl %cl,%eax 0x000000000001a91d <+125>: cltq 0x000000000001a91f <+127>: and 0x128(%rbx),%rax After: if (!(vmcs12->vm_function_control & BIT_ULL(function & 63))) 0x000000000001a955 <+117>: mov 0x128(%rbx),%rdx 0x000000000001a95c <+124>: bt %rax,%rdx Fixes: `27c42a1bb8` ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:51 -04:00
Sean Christopherson	07ffaf343e	KVM: nVMX: Sync all PGDs on nested transition with shadow paging Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: `41fab65e7c` ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:51 -04:00
Vitaly Kuznetsov	8629b625e0	KVM: nVMX: Request to sync eVMCS from VMCS12 after migration VMCS12 is used to keep the authoritative state during nested state migration. In case 'need_vmcs12_to_shadow_sync' flag is set, we're in between L2->L1 vmexit and L1 guest run when actual sync to enlightened (or shadow) VMCS happens. Nested state, however, has no flag for 'need_vmcs12_to_shadow_sync' so vmx_set_nested_state()-> set_current_vmptr() always sets it. Enlightened vmptrld path, however, doesn't have the quirk so some VMCS12 changes may not get properly reflected to eVMCS and L1 will see an incorrect state. Note, during L2 execution or when need_vmcs12_to_shadow_sync is not set the change is effectively a nop: in the former case all changes will get reflected during the first L2->L1 vmexit and in the later case VMCS12 and eVMCS are already in sync (thanks to copy_enlightened_to_vmcs12() in vmx_get_nested_state()). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:50 -04:00
Vitaly Kuznetsov	dc31338552	KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02() When nested state migration happens during L1's execution, it is incorrect to modify eVMCS as it is L1 who 'owns' it at the moment. At least genuine Hyper-V seems to not be very happy when 'clean fields' data changes underneath it. 'Clean fields' data is used in KVM twice: by copy_enlightened_to_vmcs12() and prepare_vmcs02_rare() so we can reset it from prepare_vmcs02() instead. While at it, update a comment stating why exactly we need to reset 'hv_clean_fields' data from L0. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:50 -04:00
Vitaly Kuznetsov	b7685cfd5e	KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid() 'need_vmcs12_to_shadow_sync' is used for both shadow and enlightened VMCS sync when we exit to L1. The comment in nested_vmx_failValid() validly states why shadow vmcs sync can be omitted but this doesn't apply to enlightened VMCS as it 'shadows' all VMCS12 fields. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:50 -04:00
Vitaly Kuznetsov	d6bf71a18c	KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in vmx_get_nested_state() 'Clean fields' data from enlightened VMCS is only valid upon vmentry: L1 hypervisor is not obliged to keep it up-to-date while it is mangling L2's state, KVM_GET_NESTED_STATE request may come at a wrong moment when actual eVMCS changes are unsynchronized with 'hv_clean_fields'. As upon migration VMCS12 is used as a source of ultimate truth, we must make sure we pick all the changes to eVMCS and thus 'clean fields' data must be ignored. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:49 -04:00
Vitaly Kuznetsov	3b19b81acf	KVM: nVMX: Release enlightened VMCS on VMCLEAR Unlike VMREAD/VMWRITE/VMPTRLD, VMCLEAR is a valid instruction when enlightened VMCS is in use. TLFS has the following brief description: "The L1 hypervisor can execute a VMCLEAR instruction to transition an enlightened VMCS from the active to the non-active state". Normally, this change can be ignored as unmapping active eVMCS can be postponed until the next VMLAUNCH instruction but in case nested state is migrated with KVM_GET_NESTED_STATE/KVM_SET_NESTED_STATE, keeping eVMCS mapped may result in its synchronization with VMCS12 and this is incorrect: L1 hypervisor is free to reuse inactive eVMCS memory for something else. Inactive eVMCS after VMCLEAR can just be unmapped. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:49 -04:00
Vitaly Kuznetsov	278499686b	KVM: nVMX: Introduce 'EVMPTR_MAP_PENDING' post-migration state Unlike regular set_current_vmptr(), nested_vmx_handle_enlightened_vmptrld() can not be called directly from vmx_set_nested_state() as KVM may not have all the information yet (e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not be restored yet). Enlightened VMCS is mapped later while getting nested state pages. In the meantime, vmx->nested.hv_evmcs_vmptr remains 'EVMPTR_INVALID' and it's indistinguishable from 'evmcs is not in use' case. This leads to certain issues, in particular, if KVM_GET_NESTED_STATE is called right after KVM_SET_NESTED_STATE, KVM_STATE_NESTED_EVMCS flag in the resulting state will be unset (and such state will later fail to load). Introduce 'EVMPTR_MAP_PENDING' state to detect not-yet-mapped eVMCS after restore. With this, the 'is_guest_mode(vcpu)' hack in vmx_has_valid_vmcs12() is no longer needed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:49 -04:00
Vitaly Kuznetsov	25641cafab	KVM: nVMX: Make copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12() return 'void' copy_vmcs12_to_enlightened()/copy_enlightened_to_vmcs12() don't return any result, make them return 'void'. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:49 -04:00
Vitaly Kuznetsov	0276171680	KVM: nVMX: Release eVMCS when enlightened VMENTRY was disabled In theory, L1 can try to disable enlightened VMENTRY in VP assist page and try to issue VMLAUNCH/VMRESUME. While nested_vmx_handle_enlightened_vmptrld() properly handles this as 'EVMPTRLD_DISABLED', previously mapped eVMCS remains mapped and thus all evmptr_is_valid() checks will still pass and nested_vmx_run() will proceed when it shouldn't. Release eVMCS immediately when we detect that enlightened vmentry was disabled by L1. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:48 -04:00
Vitaly Kuznetsov	6a789ca5d5	KVM: nVMX: Don't set 'dirty_vmcs12' flag on enlightened VMPTRLD 'dirty_vmcs12' is only checked in prepare_vmcs02_early()/prepare_vmcs02() and both checks look like: 'vmx->nested.dirty_vmcs12 \|\| evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)' so for eVMCS case the flag changes nothing. Drop the assignment to avoid the confusion. No functional change intended. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:48 -04:00
Vitaly Kuznetsov	1e9dfbd748	KVM: nVMX: Use '-1' in 'hv_evmcs_vmptr' to indicate that eVMCS is not in use Instead of checking 'vmx->nested.hv_evmcs' use '-1' in 'vmx->nested.hv_evmcs_vmptr' to indicate 'evmcs is not in use' state. This matches how we check 'vmx->nested.current_vmptr'. Introduce EVMPTR_INVALID and evmptr_is_valid() and use it instead of raw '-1' check as a preparation to adding other 'special' values. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210526132026.270394-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:48 -04:00
Maxim Levitsky	158a48ecf7	KVM: x86: avoid loading PDPTRs after migration when possible if new KVM_*_SREGS2 ioctls are used, the PDPTRs are a part of the migration state and are correctly restored by those ioctls. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:48 -04:00
Maxim Levitsky	6dba940352	KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2 This is a new version of KVM_GET_SREGS / KVM_SET_SREGS. It has the following changes: * Has flags for future extensions * Has vcpu's PDPTRs, allowing to save/restore them on migration. * Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS) New capability, KVM_CAP_SREGS2 is added to signal the userspace of this ioctl. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:47 -04:00
Maxim Levitsky	329675dde9	KVM: x86: introduce kvm_register_clear_available Small refactoring that will be used in the next patch. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:47 -04:00
Maxim Levitsky	0f85722341	KVM: nVMX: delay loading of PDPTRs to KVM_REQ_GET_NESTED_STATE_PAGES Similar to the rest of guest page accesses after a migration, this access should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:47 -04:00
Maxim Levitsky	b222b0b881	KVM: nSVM: refactor the CR3 reload on migration Document the actual reason why we need to do it on migration and move the call to svm_set_nested_state to be closer to VMX code. To avoid loading the PDPTRs from possibly not up to date memory map, in nested_svm_load_cr3 after the move, move this code to .get_nested_state_pages. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:47 -04:00
Sean Christopherson	c7313155bf	KVM: x86: Always load PDPTRs on CR3 load for SVM w/o NPT and a PAE guest Kill off pdptrs_changed() and instead go through the full kvm_set_cr3() for PAE guest, even if the new CR3 is the same as the current CR3. For VMX, and SVM with NPT enabled, the PDPTRs are unconditionally marked as unavailable after VM-Exit, i.e. the optimization is dead code except for SVM without NPT. In the unlikely scenario that anyone cares about SVM without NPT _and_ a PAE guest, they've got bigger problems if their guest is loading the same CR3 so frequently that the performance of kvm_set_cr3() is notable, especially since KVM's fast PGD switching means reloading the same CR3 does not require a full rebuild. Given that PAE and PCID are mutually exclusive, i.e. a sync and flush are guaranteed in any case, the actual benefits of the pdptrs_changed() optimization are marginal at best. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210607090203.133058-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:46 -04:00
Sean Christopherson	a36dbec67e	KVM: nSVM: Drop pointless pdptrs_changed() check on nested transition Remove the "PDPTRs unchanged" check to skip PDPTR loading during nested SVM transitions as it's not at all an optimization. Reading guest memory to get the PDPTRs isn't magically cheaper by doing it in pdptrs_changed(), and if the PDPTRs did change, KVM will end up doing the read twice. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210607090203.133058-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:46 -04:00
Sean Christopherson	bcb72d0627	KVM: nVMX: Drop obsolete (and pointless) pdptrs_changed() check Remove the pdptrs_changed() check when loading L2's CR3. The set of available registers is always reset when switching VMCSes (see commit `e5d03de593`, "KVM: nVMX: Reset register cache (available and dirty masks) on VMCS switch"), thus the "are PDPTRs available" check will always fail. And even if it didn't fail, reading guest memory to check the PDPTRs is just as expensive as reading guest memory to load 'em. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210607090203.133058-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:46 -04:00
Vitaly Kuznetsov	445caed021	KVM: x86: hyper-v: Honor HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED bit Hypercalls which use extended processor masks are only available when HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED privilege bit is exposed (and 'RECOMMENDED' is rather a misnomer). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-28-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:44 -04:00
Vitaly Kuznetsov	d264eb3c14	KVM: x86: hyper-v: Honor HV_X64_CLUSTER_IPI_RECOMMENDED bit Hyper-V partition must possess 'HV_X64_CLUSTER_IPI_RECOMMENDED' privilege ('recommended' is rather a misnomer) to issue HVCALL_SEND_IPI hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-27-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:44 -04:00
Vitaly Kuznetsov	bb53ecb4d6	KVM: x86: hyper-v: Honor HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED bit Hyper-V partition must possess 'HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED' privilege ('recommended' is rather a misnomer) to issue HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST/SPACE hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-26-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:44 -04:00
Vitaly Kuznetsov	a921cf83cc	KVM: x86: hyper-v: Honor HV_DEBUGGING privilege bit Hyper-V partition must possess 'HV_DEBUGGING' privilege to issue HVCALL_POST_DEBUG_DATA/HVCALL_RETRIEVE_DEBUG_DATA/ HVCALL_RESET_DEBUG_SESSION hypercalls. Note, when SynDBG is disabled hv_check_hypercall_access() returns 'true' (like for any other unknown hypercall) so the result will be HV_STATUS_INVALID_HYPERCALL_CODE and not HV_STATUS_ACCESS_DENIED. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-25-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:44 -04:00
Vitaly Kuznetsov	a60b3c594e	KVM: x86: hyper-v: Honor HV_SIGNAL_EVENTS privilege bit Hyper-V partition must possess 'HV_SIGNAL_EVENTS' privilege to issue HVCALL_SIGNAL_EVENT hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-24-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:43 -04:00
Vitaly Kuznetsov	4f532b7f96	KVM: x86: hyper-v: Honor HV_POST_MESSAGES privilege bit Hyper-V partition must possess 'HV_POST_MESSAGES' privilege to issue HVCALL_POST_MESSAGE hypercalls. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-23-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:43 -04:00
Vitaly Kuznetsov	34ef7d7b9c	KVM: x86: hyper-v: Check access to HVCALL_NOTIFY_LONG_SPIN_WAIT hypercall TLFS6.0b states that partition issuing HVCALL_NOTIFY_LONG_SPIN_WAIT must posess 'UseHypercallForLongSpinWait' privilege but there's no corresponding feature bit. Instead, we have "Recommended number of attempts to retry a spinlock failure before notifying the hypervisor about the failures. 0xFFFFFFFF indicates never notify." Use this to check access to the hypercall. Also, check against zero as the corresponding CPUID must be set (and '0' attempts before re-try is weird anyway). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-22-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:43 -04:00
Vitaly Kuznetsov	4ad81a9111	KVM: x86: hyper-v: Prepare to check access to Hyper-V hypercalls Introduce hv_check_hypercallr_access() to check if the particular hypercall should be available to guest, this will be used with KVM_CAP_HYPERV_ENFORCE_CPUID mode. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-21-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:43 -04:00
Vitaly Kuznetsov	1aa8a4184d	KVM: x86: hyper-v: Honor HV_STIMER_DIRECT_MODE_AVAILABLE privilege bit Synthetic timers can only be configured in 'direct' mode when HV_STIMER_DIRECT_MODE_AVAILABLE bit was exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-20-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:42 -04:00
Vitaly Kuznetsov	d66bfa36f9	KVM: x86: hyper-v: Inverse the default in hv_check_msr_access() Access to all MSRs is now properly checked. To avoid 'forgetting' to properly check access to new MSRs in the future change the default to 'false' meaning 'no access'. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-19-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:42 -04:00
Vitaly Kuznetsov	17b6d51771	KVM: x86: hyper-v: Honor HV_FEATURE_DEBUG_MSRS_AVAILABLE privilege bit Synthetic debugging MSRs (HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS, HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER, HV_X64_MSR_SYNDBG_PENDING_BUFFER, HV_X64_MSR_SYNDBG_OPTIONS) are only available to guest when HV_FEATURE_DEBUG_MSRS_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-18-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:42 -04:00
Vitaly Kuznetsov	0a19c8992d	KVM: x86: hyper-v: Honor HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE privilege bit HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL are only available to guest when HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-17-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:42 -04:00
Vitaly Kuznetsov	234d01baec	KVM: x86: hyper-v: Honor HV_ACCESS_REENLIGHTENMENT privilege bit HV_X64_MSR_REENLIGHTENMENT_CONTROL/HV_X64_MSR_TSC_EMULATION_CONTROL/ HV_X64_MSR_TSC_EMULATION_STATUS are only available to guest when HV_ACCESS_REENLIGHTENMENT bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-16-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:41 -04:00
Vitaly Kuznetsov	9442f3bd90	KVM: x86: hyper-v: Honor HV_ACCESS_FREQUENCY_MSRS privilege bit HV_X64_MSR_TSC_FREQUENCY/HV_X64_MSR_APIC_FREQUENCY are only available to guest when HV_ACCESS_FREQUENCY_MSRS bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-15-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:41 -04:00
Vitaly Kuznetsov	978b57475c	KVM: x86: hyper-v: Honor HV_MSR_APIC_ACCESS_AVAILABLE privilege bit HV_X64_MSR_EOI, HV_X64_MSR_ICR, HV_X64_MSR_TPR, and HV_X64_MSR_VP_ASSIST_PAGE are only available to guest when HV_MSR_APIC_ACCESS_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-14-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:41 -04:00
Vitaly Kuznetsov	eba60ddae7	KVM: x86: hyper-v: Honor HV_MSR_SYNTIMER_AVAILABLE privilege bit Synthetic timers MSRs (HV_X64_MSR_STIMER[0-3]_CONFIG, HV_X64_MSR_STIMER[0-3]_COUNT) are only available to guest when HV_MSR_SYNTIMER_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:41 -04:00
Vitaly Kuznetsov	9e2715ca20	KVM: x86: hyper-v: Honor HV_MSR_SYNIC_AVAILABLE privilege bit SynIC MSRs (HV_X64_MSR_SCONTROL, HV_X64_MSR_SVERSION, HV_X64_MSR_SIEFP, HV_X64_MSR_SIMP, HV_X64_MSR_EOM, HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15) are only available to guest when HV_MSR_SYNIC_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:40 -04:00
Vitaly Kuznetsov	a1ec661c3f	KVM: x86: hyper-v: Honor HV_MSR_REFERENCE_TSC_AVAILABLE privilege bit HV_X64_MSR_REFERENCE_TSC is only available to guest when HV_MSR_REFERENCE_TSC_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:40 -04:00
Vitaly Kuznetsov	679008e4bb	KVM: x86: hyper-v: Honor HV_MSR_RESET_AVAILABLE privilege bit HV_X64_MSR_RESET is only available to guest when HV_MSR_RESET_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:40 -04:00
Vitaly Kuznetsov	d2ac25d419	KVM: x86: hyper-v: Honor HV_MSR_VP_INDEX_AVAILABLE privilege bit HV_X64_MSR_VP_INDEX is only available to guest when HV_MSR_VP_INDEX_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:40 -04:00
Vitaly Kuznetsov	c2b32867f2	KVM: x86: hyper-v: Honor HV_MSR_TIME_REF_COUNT_AVAILABLE privilege bit HV_X64_MSR_TIME_REF_COUNT is only available to guest when HV_MSR_TIME_REF_COUNT_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:39 -04:00
Vitaly Kuznetsov	b80a92ff81	KVM: x86: hyper-v: Honor HV_MSR_VP_RUNTIME_AVAILABLE privilege bit HV_X64_MSR_VP_RUNTIME is only available to guest when HV_MSR_VP_RUNTIME_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:39 -04:00
Vitaly Kuznetsov	1561c2cb87	KVM: x86: hyper-v: Honor HV_MSR_HYPERCALL_AVAILABLE privilege bit HV_X64_MSR_GUEST_OS_ID/HV_X64_MSR_HYPERCALL are only available to guest when HV_MSR_HYPERCALL_AVAILABLE bit is exposed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:39 -04:00
Vitaly Kuznetsov	b4128000e2	KVM: x86: hyper-v: Prepare to check access to Hyper-V MSRs Introduce hv_check_msr_access() to check if the particular MSR should be accessible by guest, this will be used with KVM_CAP_HYPERV_ENFORCE_CPUID mode. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:39 -04:00
Vitaly Kuznetsov	10d7bf1e46	KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability Limiting exposed Hyper-V features requires a fast way to check if the particular feature is exposed in guest visible CPUIDs or not. To aboid looping through all CPUID entries on every hypercall/MSR access cache the required leaves on CPUID update. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:38 -04:00
Vitaly Kuznetsov	644f706719	KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows for limiting Hyper-V features to those exposed to the guest in Hyper-V CPUIDs (0x40000003, 0x40000004, ...). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:38 -04:00
Vineeth Pillai	1183646a67	KVM: SVM: hyper-v: Direct Virtual Flush support From Hyper-V TLFS: "The hypervisor exposes hypercalls (HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressList, and HvFlushVirtualAddressListEx) that allow operating systems to more efficiently manage the virtual TLB. The L1 hypervisor can choose to allow its guest to use those hypercalls and delegate the responsibility to handle them to the L0 hypervisor. This requires the use of a partition assist page." Add the Direct Virtual Flush support for SVM. Related VMX changes: commit `6f6a657c99` ("KVM/Hyper-V/VMX: Add direct tlb flush support") Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <fc8d24d8eb7017266bb961e39a171b0caf298d7f.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:38 -04:00
Vineeth Pillai	c4327f15df	KVM: SVM: hyper-v: Enlightened MSR-Bitmap support Enlightened MSR-Bitmap as per TLFS: "The L1 hypervisor may collaborate with the L0 hypervisor to make MSR accesses more efficient. It can enable enlightened MSR bitmaps by setting the corresponding field in the enlightened VMCS to 1. When enabled, L0 hypervisor does not monitor the MSR bitmaps for changes. Instead, the L1 hypervisor must invalidate the corresponding clean field after making changes to one of the MSR bitmaps." Enable this for SVM. Related VMX changes: commit `ceef7d10df` ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support") Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <87df0710f95d28b91cc4ea014fc4d71056eebbee.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:37 -04:00
Vineeth Pillai	1e0c7d4075	KVM: SVM: hyper-v: Remote TLB flush for SVM Enable remote TLB flush for SVM. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <1ee364e397e142aed662d2920d198cd03772f1a5.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:37 -04:00
Vineeth Pillai	59d21d67f3	KVM: SVM: Software reserved fields SVM added support for certain reserved fields to be used by software or hypervisor. Add the following reserved fields: - VMCB offset 0x3e0 - 0x3ff - Clean bit 31 - SVM intercept exit code 0xf0000000 Later patches will make use of this for supporting Hyper-V nested virtualization enhancements. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <a1f17a43a8e9e751a1a9cc0281649d71bdbf721b.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:37 -04:00
Vineeth Pillai	3c86c0d3db	KVM: x86: hyper-v: Move the remote TLB flush logic out of vmx Currently the remote TLB flush logic is specific to VMX. Move it to a common place so that SVM can use it as well. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <4f4e4ca19778437dae502f44363a38e99e3ef5d1.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:36 -04:00
Krish Sadhukhan	d5a0483f9f	KVM: nVMX: nSVM: Add a new VCPU statistic to show if VCPU is in guest mode Add the following per-VCPU statistic to KVM debugfs to show if a given VCPU is in guest mode: guest_mode Also add this as a per-VM statistic to KVM debugfs to show the total number of VCPUs that are in guest mode in a given VM. Signed-off-by: Krish Sadhukhan <Krish.Sadhukhan@oracle.com> Message-Id: <20210609180340.104248-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-17 13:09:36 -04:00

1 2 3 4 5 ...

7538 Commits