linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-03 12:24:45 +08:00

Author	SHA1	Message	Date
Sean Christopherson	5f8a7cf25a	KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210222024522.1751719-2-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:13:30 -05:00
Sean Christopherson	96ad91ae4e	KVM: x86/mmu: Remove a variety of unnecessary exports Remove several exports from the MMU that are no longer necessary. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	a1419f8b5b	KVM: x86: Fold "write-protect large" use case into generic write-protect Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	b6e16ae5d9	KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:35 -05:00
Sean Christopherson	a018eba538	KVM: x86: Move MMU's PML logic to common code Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:34 -05:00
Sean Christopherson	6dd03800b1	KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:33 -05:00
Sean Christopherson	2855f98265	KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() Expand the comment about need to use write-protection for nested EPT when PML is enabled to clarify that the tagging is a nop when PML is _not_ enabled. Without the clarification, omitting the PML check looks wrong at first^Wfifth glance. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:33 -05:00
Sean Christopherson	9eba50f8d7	KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs When zapping SPTEs in order to rebuild them as huge pages, use the new helper that computes the max mapping level to detect whether or not a SPTE should be zapped. Doing so avoids zapping SPTEs that can't possibly be rebuilt as huge pages, e.g. due to hardware constraints, memslot alignment, etc... This also avoids zapping SPTEs that are still large, e.g. if migration was canceled before write-protected huge pages were shattered to enable dirty logging. Note, such pages are still write-protected at this time, i.e. a page fault VM-Exit will still occur. This will hopefully be addressed in a future patch. Sadly, TDP MMU loses its const on the memslot, but that's a pervasive problem that's been around for quite some time. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:28 -05:00
Sean Christopherson	0a234f5dd0	KVM: x86/mmu: Pass the memslot to the rmap callbacks Pass the memslot to the rmap callbacks, it will be used when zapping collapsible SPTEs to verify the memslot is compatible with hugepages before zapping its SPTEs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:12 -05:00
Sean Christopherson	1b6d9d9ed5	KVM: x86/mmu: Split out max mapping level calculation to helper Factor out the logic for determining the maximum mapping level given a memslot and a gpa. The helper will be used when zapping collapsible SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that can't possibly be rebuilt as hugepages. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:08:11 -05:00
Sean Christopherson	c060c72ffe	KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to rebuild them as huge pages in the TDP MMU. ZONE_DEVICE huge pages are managed differently than "regular" pages and are not compound pages. Likewise, PageTransCompoundMap() will not detect HugeTLB, so switch to PageCompound(). This matches the similar check in kvm_mmu_zap_collapsible_spte. Cc: Ben Gardon <bgardon@google.com> Fixes: `1488199856` ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-19 03:07:16 -05:00
Maciej S. Szmigiero	8f5c44f953	KVM: x86/mmu: Make HVA handler retpoline-friendly When retpolines are enabled they have high overhead in the inner loop inside kvm_handle_hva_range() that iterates over the provided memory area. Let's mark this function and its TDP MMU equivalent __always_inline so compiler will be able to change the call to the actual handler function inside each of them into a direct one. This significantly improves performance on the unmap test on the existing kernel memslot code (tested on a Xeon 8167M machine): 30 slots in use: Test Before After Improvement Unmap 0.0353s 0.0334s 5% Unmap 2M 0.00104s 0.000407s 61% 509 slots in use: Test Before After Improvement Unmap 0.0742s 0.0740s None Unmap 2M 0.00221s 0.00159s 28% Looks like having an indirect call in these functions (and, so, a retpoline) might have interfered with unrolling of the whole loop in the CPU. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-09 08:42:09 -05:00
Paolo Bonzini	897218ff7c	KVM: x86: compile out TDP MMU on 32-bit systems The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs. Rather than just disabling it, compile it out completely so that it is possible to use for example 64-bit xchg. To limit the number of stubs, wrap all accesses to tdp_mmu_enabled or tdp_mmu_page with a function. Calls to all other functions in tdp_mmu.c are eliminated and do not even reach the linker. Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 14:49:01 -05:00
Sean Christopherson	6f8e65a601	KVM: x86/mmu: Add helper to generate mask of reserved HPA bits Add a helper to generate the mask of reserved PA bits in the host. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Sean Christopherson	5b7f575ccd	KVM: x86: Use reserved_gpa_bits to calculate reserved PxE bits Use reserved_gpa_bits, which accounts for exceptions to the maxphyaddr rule, e.g. SEV's C-bit, for the page {table,directory,etc...} entry (PxE) reserved bits checks. For SEV, the C-bit is ignored by hardware when walking pages tables, e.g. the APM states: Note that while the guest may choose to set the C-bit explicitly on instruction pages and page table addresses, the value of this bit is a don't-care in such situations as hardware always performs these as private accesses. Such behavior is expected to hold true for other features that repurpose GPA bits, e.g. KVM could theoretically emulate SME or MKTME, which both allow non-zero repurposed bits in the page tables. Conceptually, KVM should apply reserved GPA checks universally, and any features that do not adhere to the basic rule should be explicitly handled, i.e. if a GPA bit is repurposed but not allowed in page tables for whatever reason. Refactor __reset_rsvds_bits_mask() to take the pre-generated reserved bits mask, and opportunistically clean up its code, e.g. to align lines and comments. Practically speaking, this is change is a likely a glorified nop given the current KVM code base. SEV's C-bit is the only repurposed GPA bit, and KVM doesn't support shadowing encrypted page tables (which is theoretically possible via SEV debug APIs). Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 09:27:29 -05:00
Ben Gardon	a2855afc7e	KVM: x86/mmu: Allow parallel page faults for the TDP MMU Make the last few changes necessary to enable the TDP MMU to handle page faults in parallel while holding the mmu_lock in read mode. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-24-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:45 -05:00
Ben Gardon	e25f0e0cd5	KVM: x86/mmu: Mark SPTEs in disconnected pages as removed When clearing TDP MMU pages what have been disconnected from the paging structure root, set the SPTEs to a special non-present value which will not be overwritten by other threads. This is needed to prevent races in which a thread is clearing a disconnected page table, but another thread has already acquired a pointer to that memory and installs a mapping in an already cleared entry. This can lead to memory leaks and accounting errors. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-23-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:45 -05:00
Ben Gardon	08f07c800e	KVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler When the TDP MMU is allowed to handle page faults in parallel there is the possiblity of a race where an SPTE is cleared and then imediately replaced with a present SPTE pointing to a different PFN, before the TLBs can be flushed. This race would violate architectural specs. Ensure that the TLBs are flushed properly before other threads are allowed to install any present value for the SPTE. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-22-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:44 -05:00
Ben Gardon	9a77daacc8	KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map To prepare for handling page faults in parallel, change the TDP MMU page fault handler to use atomic operations to set SPTEs so that changes are not lost if multiple threads attempt to modify the same SPTE. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-21-bgardon@google.com> [Document new locking rules. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:44 -05:00
Ben Gardon	a9442f5941	KVM: x86/mmu: Factor out functions to add/remove TDP MMU pages Move the work of adding and removing TDP MMU pages to/from "secondary" data structures to helper functions. These functions will be built on in future commits to enable MMU operations to proceed (mostly) in parallel. No functional change expected. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-20-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:44 -05:00
Ben Gardon	531810caa9	KVM: x86/mmu: Use an rwlock for the x86 MMU Add a read / write lock to be used in place of the MMU spinlock on x86. The rwlock will enable the TDP MMU to handle page faults, and other operations in parallel in future commits. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-19-bgardon@google.com> [Introduce virt/kvm/mmu_lock.h - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:43 -05:00
Ben Gardon	7cca2d0b7e	KVM: x86/mmu: Protect TDP MMU page table memory with RCU In order to enable concurrent modifications to the paging structures in the TDP MMU, threads must be able to safely remove pages of page table memory while other threads are traversing the same memory. To ensure threads do not access PT memory after it is freed, protect PT memory with RCU. Protecting concurrent accesses to page table memory from use-after-free bugs could also have been acomplished using walk_shadow_page_lockless_begin/end() and READING_SHADOW_PAGE_TABLES, coupling with the barriers in a TLB flush. The use of RCU for this case has several distinct advantages over that approach. 1. Disabling interrupts for long running operations is not desirable. Future commits will allow operations besides page faults to operate without the exclusive protection of the MMU lock and those operations are too long to disable iterrupts for their duration. 2. The use of RCU here avoids long blocking / spinning operations in perfromance critical paths. By freeing memory with an asynchronous RCU API we avoid the longer wait times TLB flushes experience when overlapping with a thread in walk_shadow_page_lockless_begin/end(). 3. RCU provides a separation of concerns when removing memory from the paging structure. Because the RCU callback to free memory can be scheduled immediately after a TLB flush, there's no need for the thread to manually free a queue of pages later, as commit_zap_pages does. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reviewed-by: Peter Feiner <pfeiner@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-18-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:42 -05:00
Ben Gardon	f1b3b06a05	KVM: x86/mmu: Clear dirtied pages mask bit before early break In clear_dirty_pt_masked, the loop is intended to exit early after processing each of the GFNs with corresponding bits set in mask. This does not work as intended if another thread has already cleared the dirty bit or writable bit on the SPTE. In that case, the loop would proceed to the next iteration early and the bit in mask would not be cleared. As a result the loop could not exit early and would proceed uselessly. Move the unsetting of the mask bit before the check for a no-op SPTE change. Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-17-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:41 -05:00
Ben Gardon	0f99ee2c7a	KVM: x86/mmu: Skip no-op changes in TDP MMU functions Skip setting SPTEs if no change is expected. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-16-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:41 -05:00
Ben Gardon	1af4a96025	KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed Given certain conditions, some TDP MMU functions may not yield reliably / frequently enough. For example, if a paging structure was very large but had few, if any writable entries, wrprot_gfn_range could traverse many entries before finding a writable entry and yielding because the check for yielding only happens after an SPTE is modified. Fix this issue by moving the yield to the beginning of the loop. Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-15-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:41 -05:00
Ben Gardon	ed5e484b79	KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter In some functions the TDP iter risks not making forward progress if two threads livelock yielding to one another. This is possible if two threads are trying to execute wrprot_gfn_range. Each could write protect an entry and then yield. This would reset the tdp_iter's walk over the paging structure and the loop would end up repeating the same entry over and over, preventing either thread from making forward progress. Fix this issue by only yielding if the loop has made forward progress since the last yield. Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-14-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:40 -05:00
Ben Gardon	74953d3530	KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn The goal_gfn field in tdp_iter can be misleading as it implies that it is the iterator's final goal. It is really a target for the lowest gfn mapped by the leaf level SPTE the iterator will traverse towards. Change the field's name to be more precise. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-13-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:40 -05:00
Ben Gardon	e139a34ef9	KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched The flushing and non-flushing variants of tdp_mmu_iter_cond_resched have almost identical implementations. Merge the two functions and add a flush parameter. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-12-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:40 -05:00
Ben Gardon	8d1a182ea7	KVM: x86/mmu: Fix braces in kvm_recover_nx_lpages No functional change intended. Fixes: `29cf0f5007` ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-10-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:39 -05:00
Ben Gardon	a066e61f13	KVM: x86/mmu: Factor out handling of removed page tables Factor out the code to handle a disconnected subtree of the TDP paging structure from the code to handle the change to an individual SPTE. Future commits will build on this to allow asynchronous page freeing. No functional change intended. Reviewed-by: Peter Feiner <pfeiner@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:39 -05:00
Ben Gardon	734e45b329	KVM: x86/mmu: Don't redundantly clear TDP MMU pt memory The KVM MMU caches already guarantee that shadow page table memory will be zeroed, so there is no reason to re-zero the page in the TDP MMU page fault handler. No functional change intended. Reviewed-by: Peter Feiner <pfeiner@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:39 -05:00
Ben Gardon	3a9a4aa565	KVM: x86/mmu: Add lockdep when setting a TDP MMU SPTE Add lockdep to __tdp_mmu_set_spte to ensure that SPTEs are only modified under the MMU lock. No functional change intended. Reviewed-by: Peter Feiner <pfeiner@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:38 -05:00
Ben Gardon	fe43fa2f40	KVM: x86/mmu: Add comment on __tdp_mmu_set_spte __tdp_mmu_set_spte is a very important function in the TDP MMU which already accepts several arguments and will take more in future commits. To offset this complexity, add a comment to the function describing each of the arguemnts. No functional change intended. Reviewed-by: Peter Feiner <pfeiner@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:38 -05:00
Ben Gardon	e28a436ca4	KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched Currently the TDP MMU yield / cond_resched functions either return nothing or return true if the TLBs were not flushed. These are confusing semantics, especially when making control flow decisions in calling functions. To clean things up, change both functions to have the same return value semantics as cond_resched: true if the thread yielded, false if it did not. If the function yielded in the _flush_ version, then the TLBs will have been flushed. Reviewed-by: Peter Feiner <pfeiner@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:38 -05:00
Stephen Zhang	805a0f8390	KVM: x86/mmu: Add '__func__' in rmap_printk() Given the common pattern: rmap_printk("%s:"..., __func__,...) we could improve this by adding '__func__' in rmap_printk(). Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Message-Id: <1611713325-3591-1-git-send-email-stephenzhangzsd@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:36 -05:00
Jason Baron	b3646477d4	KVM: x86: use static calls to reduce kvm_x86_ops overhead Convert kvm_x86_ops to use static calls. Note that all kvm_x86_ops are covered here except for 'pmu_ops and 'nested ops'. Here are some numbers running cpuid in a loop of 1 million calls averaged over 5 runs, measured in the vm (lower is better). Intel Xeon 3000MHz: \|default \|mitigations=off ------------------------------------- vanilla \|.671s \|.486s static call\|.573s(-15%)\|.458s(-6%) AMD EPYC 2500MHz: \|default \|mitigations=off ------------------------------------- vanilla \|.710s \|.609s static call\|.664s(-6%) \|.609s(0%) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <e057bf1b8a7ad15652df6eeba3f907ae758d3399.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:30 -05:00
Cun Li	6e4e3b4df4	KVM: Stop using deprecated jump label APIs The use of 'struct static_key' and 'static_key_false' is deprecated. Use the new API. Signed-off-by: Cun Li <cun.jia.li@gmail.com> Message-Id: <20210111152435.50275-1-cun.jia.li@gmail.com> [Make it compile. While at it, rename kvm_no_apic_vcpu to kvm_has_noapic_vcpu; the former reads too much like "true if no vCPU has an APIC". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:29 -05:00
Sean Christopherson	c5e2184d15	KVM: x86/mmu: Remove the defunct update_pte() paging hook Remove the update_pte() shadow paging logic, which was obsoleted by commit `4731d4c7a0` ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: Yu Zhang <yu.c.zhang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:17 -05:00
Sean Christopherson	8fc517267f	KVM: x86: Zap the oldest MMU pages, not the newest Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages(). The list is FIFO, meaning new pages are inserted at the head and thus the oldest pages are at the tail. Using a "forward" iterator causes KVM to zap MMU pages that were just added, which obliterates guest performance once the max number of shadow MMU pages is reached. Fixes: `6b82ef2c9c` ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages") Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113205030.3481307-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:15 -05:00
Sean Christopherson	15e6a7e532	KVM: x86/mmu: Use boolean returns for (S)PTE accessors Return a 'bool' instead of an 'int' for various PTE accessors that are boolean in nature, e.g. is_shadow_present_pte(). Returning an int is goofy and potentially dangerous, e.g. if a flag being checked is moved into the upper 32 bits of a SPTE, then the compiler may silently squash the entire check since casting to an int is guaranteed to yield a return value of '0'. Opportunistically refactor is_last_spte() so that it naturally returns a bool value instead of letting it implicitly cast 0/1 to false/true. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123003003.3137525-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 05:27:15 -05:00
Ben Gardon	87aa9ec939	KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs There is a bug in the TDP MMU function to zap SPTEs which could be replaced with a larger mapping which prevents the function from doing anything. Fix this by correctly zapping the last level SPTEs. Cc: stable@vger.kernel.org Fixes: `1488199856` ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-04 04:38:53 -05:00
Ben Gardon	a889ea54b3	KVM: x86/mmu: Ensure TDP MMU roots are freed after yield Many TDP MMU functions which need to perform some action on all TDP MMU roots hold a reference on that root so that they can safely drop the MMU lock in order to yield to other threads. However, when releasing the reference on the root, there is a bug: the root will not be freed even if its reference count (root_count) is reduced to 0. To simplify acquiring and releasing references on TDP MMU root pages, and to ensure that these roots are properly freed, move the get/put operations into another TDP MMU root iterator macro. Moving the get/put operations into an iterator macro also helps simplify control flow when a root does need to be freed. Note that using the list_for_each_entry_safe macro would not have been appropriate in this situation because it could keep a pointer to the next root across an MMU lock release + reacquire, during which time that root could be freed. Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: `063afacd87` ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU") Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Fixes: `1488199856` ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210107001935.3732070-1-bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:11:31 -05:00
Paolo Bonzini	bc351f0726	Merge branch 'kvm-master' into kvm-next Fixes to get_mmio_spte, destined to 5.10 stable branch.	2021-01-07 18:06:52 -05:00
Sean Christopherson	9aa418792f	KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() Check only the terminal leaf for a "!PRESENT \|\| MMIO" SPTE when looking for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers terminate their walks if a not-present or MMIO SPTE is encountered, i.e. the non-terminal SPTEs have already been verified to be regular SPTEs. This eliminates an extra check-and-branch in a relatively hot loop. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:27 -05:00
Sean Christopherson	dde81f9477	KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array Bump the size of the sptes array by one and use the raw level of the SPTE to index into the sptes array. Using the SPTE level directly improves readability by eliminating the need to reason out why the level is being adjusted when indexing the array. The array is on the stack and is not explicitly initialized; bumping its size is nothing more than a superficial adjustment to the stack frame. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-4-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:26 -05:00
Sean Christopherson	39b4d43e60	KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE Get the so called "root" level from the low level shadow page table walkers instead of manually attempting to calculate it higher up the stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging, the starting level of the walk, from the callers perspective, is not the CR3 root but rather the PDPTR "root". Checking for reserved bits from the CR3 root causes get_mmio_spte() to consume uninitialized stack data due to indexing into sptes[] for a level that was not filled by get_walk(). This can result in false positives and/or negatives depending on what garbage happens to be on the stack. Opportunistically nuke a few extra newlines. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reported-by: Richard Herbert <rherbert@sympatico.ca> Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:24 -05:00
Sean Christopherson	2aa078932f	KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() Return -1 from the get_walk() helpers if the shadow walk doesn't fill at least one spte, which can theoretically happen if the walk hits a not-present PDPTR. Returning the root level in such a case will cause get_mmio_spte() to return garbage (uninitialized stack data). In practice, such a scenario should be impossible as KVM shouldn't get a reserved-bit page fault with a not-present PDPTR. Note, using mmu->root_level in get_walk() is wrong for other reasons, too, but that's now a moot point. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-07 18:00:23 -05:00
Maciej S. Szmigiero	34c0f6f269	KVM: mmu: Fix SPTE encoding of MMIO generation upper half Commit `cae7ed3c2c` ("KVM: x86: Refactor the MMIO SPTE generation handling") cleaned up the computation of MMIO generation SPTE masks, however it introduced a bug how the upper part was encoded: SPTE bits 52-61 were supposed to contain bits 10-19 of the current generation number, however a missing shift encoded bits 1-10 there instead (mostly duplicating the lower part of the encoded generation number that then consisted of bits 1-9). In the meantime, the upper part was shrunk by one bit and moved by subsequent commits to become an upper half of the encoded generation number (bits 9-17 of bits 0-17 encoded in a SPTE). In addition to the above, commit `56871d444b` ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation") has changed the SPTE bit range assigned to encode the generation number and the total number of bits encoded but did not update them in the comment attached to their defines, nor in the KVM MMU doc. Let's do it here, too, since it is too trivial thing to warrant a separate commit. Fixes: `cae7ed3c2c` ("KVM: x86: Refactor the MMIO SPTE generation handling") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org [Reorganize macros so that everything is computed from the bit ranges. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-11 19:18:43 -05:00
Rick Edgecombe	339f5a7fb2	kvm: x86/mmu: Use cpuid to determine max gfn In the TDP MMU, use shadow_phys_bits to dermine the maximum possible GFN mapped in the guest for zapping operations. boot_cpu_data.x86_phys_bits may be reduced in the case of HW features that steal HPA bits for other purposes. However, this doesn't necessarily reduce GPA space that can be accessed via TDP. So zap based on a maximum gfn calculated with MAXPHYADDR retrieved from CPUID. This is already stored in shadow_phys_bits, so use it instead of x86_phys_bits. Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-Id: <20201203231120.27307-1-rick.p.edgecombe@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-04 03:48:33 -05:00
Vitaly Kuznetsov	9a2a0d3ca1	kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT Commit `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-11-27 11:14:27 -05:00

1 2 3 4 5 ...

252 Commits