linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-05 21:35:04 +08:00

Author	SHA1	Message	Date
Sean Christopherson	03ca4589fa	KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging Disallow loading KVM SVM if 5-level paging is supported. In theory, NPT for L1 should simply work, but there unknowns with respect to how the guest's MAXPHYADDR will be handled by hardware. Nested NPT is more problematic, as running an L1 VMM that is using 2-level page tables requires stacking single-entry PDP and PML4 tables in KVM's NPT for L2, as there are no equivalent entries in L1's NPT to shadow. Barring hardware magic, for 5-level paging, KVM would need stack another layer to handle PML5. Opportunistically rename the lm_root pointer, which is used for the aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to call out that it's specifically for PML4. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210505204221.1934471-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:21 -04:00
Kai Huang	7f6231a391	KVM: x86/mmu: Fix kdoc of __handle_changed_spte The function name of kdoc of __handle_changed_spte() should be itself, rather than handle_changed_spte(). Fix the typo. Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <20210503042446.154695-1-kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:39 -04:00
Shahin, Md Shahadat Hossain	1699f65c8b	kvm/x86: Fix 'lpages' kvm stat for TDM MMU Large pages not being created properly may result in increased memory access time. The 'lpages' kvm stat used to keep track of the current number of large pages in the system, but with TDP MMU enabled the stat is not showing the correct number. This patch extends the lpages counter to cover the TDP case. Signed-off-by: Md Shahadat Hossain Shahin <shahinmd@amazon.de> Cc: Bartosz Szczepanek <bsz@amazon.de> Message-Id: <1619783551459.35424@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:34 -04:00
Kai Huang	ff76d50603	KVM: x86/mmu: Avoid unnecessary page table allocation in kvm_tdp_mmu_map() In kvm_tdp_mmu_map(), while iterating TDP MMU page table entries, it is possible SPTE has already been frozen by another thread but the frozen is not done yet, for instance, when another thread is still in middle of zapping large page. In this case, the !is_shadow_present_pte() check for old SPTE in tdp_mmu_for_each_pte() may hit true, and in this case allocating new page table is unnecessary since tdp_mmu_set_spte_atomic() later will return false and page table will need to be freed. Add is_removed_spte() check before allocating new page table to avoid this. Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <20210429041226.50279-1-kai.huang@intel.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-03 11:25:33 -04:00
Linus Torvalds	152d32aa84	ARM: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - Some selftests improvements -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+ Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ== =AXUi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...	2021-05-01 10:14:08 -07:00
Linus Torvalds	ea5bc7b977	Trivial cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+ lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0= =bO1i -----END PGP SIGNATURE----- Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions	2021-04-26 09:25:47 -07:00
Ben Gardon	4c6654bd16	KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns To avoid saddling a vCPU thread with the work of tearing down an entire paging structure, take a reference on each root before they become obsolete, so that the thread initiating the fast invalidation can tear down the paging structure and (most likely) release the last reference. As a bonus, this teardown can happen under the MMU lock in read mode so as not to block the progress of vCPU threads. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-14-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 18:04:45 -04:00
Ben Gardon	b7cccd397f	KVM: x86/mmu: Fast invalidation for TDP MMU Provide a real mechanism for fast invalidation by marking roots as invalid so that their reference count will quickly fall to zero and they will be torn down. One negative side affect of this approach is that a vCPU thread will likely drop the last reference to a root and be saddled with the work of tearing down an entire paging structure. This issue will be resolved in a later commit. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-13-bgardon@google.com> [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 18:04:35 -04:00
Ben Gardon	24ae4cfaaa	KVM: x86/mmu: Allow enabling/disabling dirty logging under MMU read lock To reduce lock contention and interference with page fault handlers, allow the TDP MMU functions which enable and disable dirty logging to operate under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-12-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:06:04 -04:00
Ben Gardon	2db6f772b5	KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock To reduce the impact of disabling dirty logging, change the TDP MMU function which zaps collapsible SPTEs to run under the MMU read lock. This way, page faults on zapped SPTEs can proceed in parallel with kvm_mmu_zap_collapsible_sptes. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:06:04 -04:00
Ben Gardon	6103bc0740	KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock To reduce lock contention and interference with page fault handlers, allow the TDP MMU function to zap a GFN range to operate under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-10-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:06:04 -04:00
Ben Gardon	c0e64238ac	KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU Protect the contents of the TDP MMU roots list with RCU in preparation for a future patch which will allow the iterator macro to be used under the MMU lock in read mode. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-9-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:06:01 -04:00
Ben Gardon	fb10129335	KVM: x86/mmu: handle cmpxchg failure in kvm_tdp_mmu_get_root To reduce dependence on the MMU write lock, don't rely on the assumption that the atomic operation in kvm_tdp_mmu_get_root will always succeed. By not relying on that assumption, threads do not need to hold the MMU lock in write mode in order to take a reference on a TDP MMU root. In the root iterator, this change means that some roots might have to be skipped if they are found to have a zero refcount. This will still never happen as of this patch, but a future patch will need that flexibility to make the root iterator safe under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-8-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:25 -04:00
Ben Gardon	11cccf5c04	KVM: x86/mmu: Make TDP MMU root refcount atomic In order to parallelize more operations for the TDP MMU, make the refcount on TDP MMU roots atomic, so that a future patch can allow multiple threads to take a reference on the root concurrently, while holding the MMU lock in read mode. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-7-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:25 -04:00
Ben Gardon	cfc109979b	KVM: x86/mmu: Refactor yield safe root iterator Refactor the yield safe TDP MMU root iterator to be more amenable to changes in future commits which will allow it to be used under the MMU lock in read mode. Currently the iterator requires a complicated dance between the helper functions and different parts of the for loop which makes it hard to reason about. Moving all the logic into a single function simplifies the iterator substantially. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:24 -04:00
Ben Gardon	2bdb3d84ce	KVM: x86/mmu: Merge TDP MMU put and free root kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called together, so merge the functions to simplify TDP MMU root refcounting / freeing. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:24 -04:00
Ben Gardon	4bba36d72b	KVM: x86/mmu: use tdp_mmu_free_sp to free roots Minor cleanup to deduplicate the code used to free a struct kvm_mmu_page in the TDP MMU. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:24 -04:00
Ben Gardon	76eb54e7e7	KVM: x86/mmu: Move kvm_mmu_(get\|put)_root to TDP MMU The TDP MMU is almost the only user of kvm_mmu_get_root and kvm_mmu_put_root. There is only one use of put_root in mmu.c for the legacy / shadow MMU. Open code that one use and move the get / put functions to the TDP MMU so they can be extended in future commits. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:24 -04:00
Ben Gardon	8ca6f063b7	KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const qualifier from its memlsot argument, leading to a compiler warning. Add the const annotation and pass it to subsequent functions. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:23 -04:00
Sean Christopherson	e1eed5847b	KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible Let the TDP MMU yield when unmapping a range in response to a MMU notification, if yielding is allowed by said notification. There is no reason to disallow yielding in this case, and in theory the range being invalidated could be quite large. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-19 09:05:23 -04:00
Sean Christopherson	3039bcc744	KVM: Move x86's MMU notifier memslot walkers to generic code Move the hva->gfn lookup for MMU notifiers into common code. Every arch does a similar lookup, and some arch code is all but identical across multiple architectures. In addition to consolidating code, this will allow introducing optimizations that will benefit all architectures without incurring multiple walks of the memslots, e.g. by taking mmu_lock if and only if a relevant range exists in the memslots. The use of __always_inline to avoid indirect call retpolines, as done by x86, may also benefit other architectures. Consolidating the lookups also fixes a wart in x86, where the legacy MMU and TDP MMU each do their own memslot walks. Lastly, future enhancements to the memslot implementation, e.g. to add an interval tree to track host address, will need to touch far less arch specific code. MIPS, PPC, and arm64 will be converted one at a time in future patches. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:06 -04:00
Paolo Bonzini	6c9dd6d262	KVM: constify kvm_arch_flush_remote_tlbs_memslot memslots are stored in RCU and there should be no need to change them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:04 -04:00
Paolo Bonzini	dbb6964e4c	KVM: MMU: protect TDP MMU pages only down to required level When using manual protection of dirty pages, it is not necessary to protect nested page tables down to the 4K level; instead KVM can protect only hugepages in order to split them lazily, and delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG. This was overlooked in the TDP MMU, so do it there as well. Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:04 -04:00
Sean Christopherson	8f8f52a45d	KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMU Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs, as opposed to the fancy ffs()+clear_bit() logic that was copied from the legacy MMU. The legacy MMU uses clear_bit() because it is operating on the SPTE itself, i.e. clearing needs to be atomic. The TDP MMU operates on a local variable that it later writes to the SPTE, and so doesn't need to be atomic or even resident in memory. Opportunistically drop unnecessary initialization of new_spte, it's guaranteed to be written before being accessed. Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from: 0x0000000000058be6 <+134>: test %rax,%rax 0x0000000000058be9 <+137>: je 0x58bf4 <age_gfn_range+148> 0x0000000000058beb <+139>: test %rax,%rdi 0x0000000000058bee <+142>: je 0x58cdc <age_gfn_range+380> 0x0000000000058bf4 <+148>: mov %rdi,0x8(%rsp) 0x0000000000058bf9 <+153>: mov $0xffffffff,%edx 0x0000000000058bfe <+158>: bsf %eax,%edx 0x0000000000058c01 <+161>: movslq %edx,%rdx 0x0000000000058c04 <+164>: lock btr %rdx,0x8(%rsp) 0x0000000000058c0b <+171>: mov 0x8(%rsp),%r15 to: 0x0000000000058bdd <+125>: test %rax,%rax 0x0000000000058be0 <+128>: je 0x58beb <age_gfn_range+139> 0x0000000000058be2 <+130>: test %rax,%r8 0x0000000000058be5 <+133>: je 0x58cc0 <age_gfn_range+352> 0x0000000000058beb <+139>: not %rax 0x0000000000058bee <+142>: and %r8,%rax 0x0000000000058bf1 <+145>: mov %rax,%r15 thus eliminating several memory accesses, including a locked access. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:57 -04:00
Sean Christopherson	6d9aafb96d	KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTE Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU notifier event). Prematurely clearing the dirty bit could cause spurious PML updates if aging a page happened to coincide with dirty logging. Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(), so the host PFN will be marked dirty, i.e. there is no potential for data corruption. Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:56 -04:00
Sean Christopherson	6dfbd6b5d5	KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint Remove x86's trace_kvm_age_page() tracepoint. It's mostly redundant with the common trace_kvm_age_hva() tracepoint, and if there is a need for the extra details, e.g. gfn, referenced, etc... those details should be added to the common tracepoint so that all architectures and MMUs benefit from the info. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:56 -04:00
Sean Christopherson	aaaac889cf	KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE Use the leaf-only TDP iterator when changing the SPTE in reaction to a MMU notifier. Practically speaking, this is a nop since the guts of the loop explicitly looks for 4k SPTEs, which are always leaf SPTEs. Switch the iterator to match age_gfn_range() and test_age_gfn() so that a future patch can consolidate the core iterating logic. No real functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:55 -04:00
Sean Christopherson	a3f15bda46	KVM: x86/mmu: Pass address space ID to TDP MMU root walkers Move the address space ID check that is performed when iterating over roots into the macro helpers to consolidate code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:55 -04:00
Sean Christopherson	2b9663d8a1	KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() Pass the address space ID to TDP MMU's primary "zap gfn range" helper to allow the MMU notifier paths to iterate over memslots exactly once. Currently, both the legacy MMU and TDP MMU iterate over memslots when looking for an overlapping hva range, which can be quite costly if there are a large number of memslots. Add a "flush" parameter so that iterating over multiple address spaces in the caller will continue to do the right thing when yielding while a flush is pending from a previous address space. Note, this also has a functional change in the form of coalescing TLB flushes across multiple address spaces in kvm_zap_gfn_range(), and also optimizes the TDP MMU to utilize range-based flushing when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-6-seanjc@google.com> [Keep separate for loops to prepare for other incoming patches. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:54 -04:00
Sean Christopherson	1a61b7db7a	KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap Gather pending TLB flushes across both address spaces when zapping a given gfn range. This requires feeding "flush" back into subsequent calls, but on the plus side sets the stage for further batching between the legacy MMU and TDP MMU. It also allows refactoring the address space iteration to cover the legacy and TDP MMUs without introducing truly ugly code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:54 -04:00
Sean Christopherson	142ccde1f7	KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs Gather pending TLB flushes across both the legacy and TDP MMUs when zapping collapsible SPTEs to avoid multiple flushes if both the legacy MMU (for nested guests) and TDP MMU have mappings for the memslot. Note, this also optimizes the TDP MMU to flush only the relevant range when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:54 -04:00
Sean Christopherson	302695a574	KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU Place the onus on the caller of slot_handle_*() to flush the TLB, rather than handling the flush in the helper, and rename parameters accordingly. This will allow future patches to coalesce flushes between address spaces and between the legacy and TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:53 -04:00
Sean Christopherson	af95b53e56	KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs When zapping collapsible SPTEs across multiple roots, gather pending flushes and perform a single remote TLB flush at the end, as opposed to flushing after processing every root. Note, flush may be cleared by the result of zap_collapsible_spte_range(). This is intended and correct, e.g. yielding may have serviced a prior pending flush. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:53 -04:00
Paolo Bonzini	4a38162ee9	KVM: MMU: load PDPTRs outside mmu_lock On SVM, reading PDPTRs might access guest memory, which might fault and thus might sleep. On the other hand, it is not possible to release the lock after make_mmu_pages_available has been called. Therefore, push the call to make_mmu_pages_available and the mmu_lock critical section within mmu_alloc_direct_roots and mmu_alloc_shadow_roots. Reported-by: Wanpeng Li <wanpengli@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:52 -04:00
Paolo Bonzini	315f02c60d	KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller will skip the TLB flush, which is wrong. There are two ways to fix it: - since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to use "flush \|= ..." - or we can chain the flush argument through kvm_tdp_mmu_zap_sp down to __kvm_tdp_mmu_zap_gfn_range. Note that kvm_tdp_mmu_zap_sp will neither yield nor flush, so flush would never go from true to false. This patch does the former to simplify application to stable kernels, and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush. Cc: seanjc@google.com Fixes: `048f49809c` ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping") Cc: <stable@vger.kernel.org> # 5.10.x: `048f49809c`: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping Cc: <stable@vger.kernel.org> # 5.10.x: `33a3164161`: KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages Cc: <stable@vger.kernel.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-08 07:48:18 -04:00
Paolo Bonzini	657f1d86a3	Merge branch 'kvm-tdp-fix-rcu' into HEAD	2021-04-02 07:25:32 -04:00
Paolo Bonzini	57e45ea487	Merge branch 'kvm-tdp-fix-flushes' into HEAD	2021-04-02 07:24:54 -04:00
Paolo Bonzini	825e34d3c9	Merge commit 'kvm-tdp-fix-flushes' into kvm-master	2021-03-31 07:45:41 -04:00
Sean Christopherson	33a3164161	KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages Prevent the TDP MMU from yielding when zapping a gfn range during NX page recovery. If a flush is pending from a previous invocation of the zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU has not accumulated a flush for the current invocation, then yielding will release mmu_lock with stale TLB entries. That being said, this isn't technically a bug fix in the current code, as the TDP MMU will never yield in this case. tdp_mmu_iter_cond_resched() will yield if and only if it has made forward progress, as defined by the current gfn vs. the last yielded (or starting) gfn. Because zapping a single shadow page is guaranteed to (a) find that page and (b) step sideways at the level of the shadow page, the TDP iter will break its loop before getting a chance to yield. But that is all very, very subtle, and will break at the slightest sneeze, e.g. zapping while holding mmu_lock for read would break as the TDP MMU wouldn't be guaranteed to see the present shadow page, and thus could step sideways at a lower level. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-4-seanjc@google.com> [Add lockdep assertion. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:56 -04:00
Sean Christopherson	048f49809c	KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which does the flush itself if and only if it yields (which it will never do in this particular scenario), and otherwise expects the caller to do the flush. If pages are zapped from the TDP MMU but not the legacy MMU, then no flush will occur. Fixes: `29cf0f5007` ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-3-seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:55 -04:00
Sean Christopherson	a835429cda	KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap When flushing a range of GFNs across multiple roots, ensure any pending flush from a previous root is honored before yielding while walking the tables of the current root. Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local "flush" with the result to avoid redundant flushes. zap_gfn_range() preserves and return the incoming "flush", unless of course the flush was performed prior to yielding and no new flush was triggered. Fixes: `1af4a96025` ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210325200119.1359384-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-30 13:19:55 -04:00
Ingo Molnar	ca8778c45e	Merge branch 'linus' into x86/cleanups, to resolve conflict Conflicts: arch/x86/kernel/kprobes/ftrace.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2021-03-21 22:16:08 +01:00
Ingo Molnar	d9f6e12fb0	x86: Fix various typos in comments Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org	2021-03-18 15:31:53 +01:00
Sean Christopherson	08889894cc	KVM: x86/mmu: Store the address space ID in the TDP iterator Store the address space ID in the TDP iterator so that it can be retrieved without having to bounce through the root shadow page. This streamlines the code and fixes a Sparse warning about not properly using rcu_dereference() when grabbing the ID from the root on the fly. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210315233803.2706477-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-16 14:16:34 -04:00
Ben Gardon	b601c3bc9d	KVM: x86/mmu: Factor out tdp_iter_return_to_root In tdp_mmu_iter_cond_resched there is a call to tdp_iter_start which causes the iterator to continue its walk over the paging structure from the root. This is needed after a yield as paging structure could have been freed in the interim. The tdp_iter_start call is not very clear and something of a hack. It requires exposing tdp_iter fields not used elsewhere in tdp_mmu.c and the effect is not obvious from the function name. Factor a more aptly named function out of tdp_iter_start and call it from tdp_mmu_iter_cond_resched and tdp_iter_start. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210315233803.2706477-4-bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-16 14:16:17 -04:00
Ben Gardon	14f6fec2e8	KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs Fix a missing rcu_dereference in tdp_mmu_zap_spte_atomic. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210315233803.2706477-3-bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-16 14:15:24 -04:00
Ben Gardon	70fb3e41a9	KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page The pt passed into handle_removed_tdp_mmu_page does not need RCU protection, as it is not at any risk of being freed by another thread at that point. However, the implicit cast from tdp_sptep_t to u64 * dropped the __rcu annotation without a proper rcu_derefrence. Fix this by passing the pt as a tdp_ptep_t and then rcu_dereferencing it in the function. Suggested-by: Sean Christopherson <seanjc@google.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210315233803.2706477-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-16 14:14:59 -04:00
Sean Christopherson	4a98623d5d	KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging Set the PAE roots used as decrypted to play nice with SME when KVM is using shadow paging. Explicitly skip setting the C-bit when loading CR3 for PAE shadow paging, even though it's completely ignored by the CPU. The extra documentation is nice to have. Note, there are several subtleties at play with NPT. In addition to legacy shadow paging, the PAE roots are used for SVM's NPT when either KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And 64-bit KVM can happily set the C-bit in CR3. This also means that keeping __sme_set(root) for 32-bit KVM when NPT is enabled is conceptually wrong, but functionally ok since SME is 64-bit only. Leave it as is to avoid unnecessary pollution. Fixes: `d0ec49d4de` ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210309224207.1218275-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:08 -04:00
Sean Christopherson	c834e5e44f	KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE. Unlike root_hpa, the pae_roots hold permission bits and thus are guaranteed to be non-zero. Having to deal with both values leads to bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong value, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210309224207.1218275-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:44:07 -04:00
Sean Christopherson	bb4cdf3af9	KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE Debugging unexpected reserved bit page faults sucks. Dump the reserved bits that (likely) caused the page fault to make debugging suck a little less. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-15 04:43:55 -04:00

1 2 3 4 5 ...

345 Commits