linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 03:33:59 +08:00

Author	SHA1	Message	Date
Jon Doron	f7d31e6536	x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit The problem the patch is trying to address is the fact that 'struct kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit modes. In 64-bit mode the default alignment boundary is 64 bits thus forcing extra gaps after 'type' and 'msr' but in 32-bit mode the boundary is at 32 bits thus no extra gaps. This is an issue as even when the kernel is 64 bit, the userspace using the interface can be both 32 and 64 bit but the same 32 bit userspace has to work with 32 bit kernel. The issue is fixed by forcing the 64 bit layout, this leads to ABI change for 32 bit builds and while we are obviously breaking '32 bit userspace with 32 bit kernel' case, we're fixing the '32 bit userspace with 64 bit kernel' one. As the interface has no (known) users and 32 bit KVM is rather baroque nowadays, this seems like a reasonable decision. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200424113746.3473563-2-arilou@gmail.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:09 -04:00
Like Xu	27461da310	KVM: x86/pmu: Support full width counting Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0) for GP counters that allows writing the full counter width. Enable this range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]). The guest would query CPUID to get the counter width, and sign extends the counter values as needed. The traditional MSRs always limit to 32bit, even though the counter internally is larger (48 or 57 bits). When the new capability is set, use the alternative range which do not have these restrictions. This lowers the overhead of perf stat slightly because it has to do less interrupts to accumulate the counter value. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:09 -04:00
Wei Wang	cbd717585b	KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in Change kvm_pmu_get_msr() to get the msr_data struct, as the host_initiated field from the struct could be used by get_msr. This also makes this API consistent with kvm_pmu_set_msr. No functional changes. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20200529074347.124619-2-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:08 -04:00
Vitaly Kuznetsov	72de5fa4c1	KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Introduce new capability to indicate that KVM supports interrupt based delivery of 'page ready' APF events. This includes support for both MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:08 -04:00
Vitaly Kuznetsov	557a961abb	KVM: x86: acknowledgment mechanism for async pf page ready notifications If two page ready notifications happen back to back the second one is not delivered and the only mechanism we currently have is kvm_check_async_pf_completion() check in vcpu_run() loop. The check will only be performed with the next vmexit when it happens and in some cases it may take a while. With interrupt based page ready notification delivery the situation is even worse: unlike exceptions, interrupts are not handled immediately so we must check if the slot is empty. This is slow and unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate the fact that the slot is free and host should check its notification queue. Mandate using it for interrupt based 'page ready' APF event delivery. As kvm_check_async_pf_completion() is going away from vcpu_run() we need a way to communicate the fact that vcpu->async_pf.done queue has transitioned from empty to non-empty state. Introduce kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:08 -04:00
Vitaly Kuznetsov	2635b5c4a0	KVM: x86: interrupt based APF 'page ready' event delivery Concerns were expressed around APF delivery via synthetic #PF exception as in some cases such delivery may collide with real page fault. For 'page ready' notifications we can easily switch to using an interrupt instead. Introduce new MSR_KVM_ASYNC_PF_INT mechanism and deprecate the legacy one. One notable difference between the two mechanisms is that interrupt may not get handled immediately so whenever we would like to deliver next event (regardless of its type) we must be sure the guest had read and cleared previous event in the slot. While on it, get rid on 'type 1/type 2' names for APF events in the documentation as they are causing confusion. Use 'page not present' and 'page ready' everywhere instead. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:07 -04:00
Vitaly Kuznetsov	0958f0cefe	KVM: introduce kvm_read_guest_offset_cached() We already have kvm_write_guest_offset_cached(), introduce read analogue. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:07 -04:00
Vitaly Kuznetsov	7c0ade6c90	KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() An innocent reader of the following x86 KVM code: bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) { if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) return true; ... may get very confused: if APF mechanism is not enabled, why do we report that we 'can inject async page present'? In reality, upon injection kvm_arch_async_page_present() will check the same condition again and, in case APF is disabled, will just drop the item. This is fine as the guest which deliberately disabled APF doesn't expect to get any APF notifications. Rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() to make it clear what we are checking: if the item can be dequeued (meaning either injected or just dropped). On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so the rename doesn't matter much. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:07 -04:00
Vitaly Kuznetsov	68fd66f100	KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Currently, APF mechanism relies on the #PF abuse where the token is being passed through CR2. If we switch to using interrupts to deliver page-ready notifications we need a different way to pass the data. Extent the existing 'struct kvm_vcpu_pv_apf_data' with token information for page-ready notifications. While on it, rename 'reason' to 'flags'. This doesn't change the semantics as we only have reasons '1' and '2' and these can be treated as bit flags but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery making 'reason' name misleading. The newly introduced apf_put_user_ready() temporary puts both flags and token information, this will be changed to put token only when we switch to interrupt based notifications. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:06 -04:00
Vitaly Kuznetsov	84b09f33a5	Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Commit `9a6e7c3981` (""KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously") added a protection against 'page ready' notification coming before 'page not present' is delivered. This situation seems to be impossible since commit `2a266f2355` ("KVM MMU: check pending exception before injecting APF) which added 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf. On x86, kvm_arch_async_page_present() has only one call site: kvm_check_async_pf_completion() loop and we only enter the loop when kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr is enabled, translates into kvm_can_do_async_pf(). There is also one problem with the cancellation mechanism. We don't seem to check that the 'page not present' notification we're canceling matches the 'page ready' notification so in theory, we may erroneously drop two valid events. Revert the commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:06 -04:00
Gustavo A. R. Silva	f4a9fdd5f1	KVM: VMX: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Message-Id: <20200507185618.GA14831@embeddedor> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:05 -04:00
Paolo Bonzini	a8387d0b47	Revert "KVM: No need to retry for hva_to_pfn_remapped()" This reverts commit `5b494aea13`. If unlocked==true then the vma pointer could be invalidated, so the 2nd follow_pfn() is potentially racy: we do need to get out and redo find_vma_intersection(). Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:05 -04:00
Paolo Bonzini	cc440cdad5	KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Similar to VMX, the state that is captured through the currently available IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. In particular, the SVM-specific state for nested virtualization includes the L1 saved state (including the interrupt flag), the cached L2 controls, and the GIF. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:05 -04:00
Vitaly Kuznetsov	8ec107c89b	selftests: kvm: fix smm test on SVM KVM_CAP_NESTED_STATE is now supported for AMD too but smm test acts like it is still Intel only. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200529130407.57176-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:04 -04:00
Paolo Bonzini	10b910cb7e	selftests: kvm: add a SVM version of state-test The test is similar to the existing one for VMX, but simpler because we don't have to test shadow VMCS or vmptrld/vmptrst/vmclear. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:04 -04:00
Vitaly Kuznetsov	ed88129733	selftests: kvm: introduce cpu_has_svm() check Many tests will want to check if the CPU is Intel or AMD in guest code, add cpu_has_svm() and put it as static inline to svm_util.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200529130407.57176-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:04 -04:00
Paolo Bonzini	929d1cfaa6	KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu This allows fetching the registers from the hsave area when setting up the NPT shadow MMU, and is needed for KVM_SET_NESTED_STATE (which runs long after the CR0, CR4 and EFER values in vcpu have been switched to hold L2 guest state). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:03 -04:00
Paolo Bonzini	c513f484c5	KVM: nSVM: leave guest mode when clearing EFER.SVME According to the AMD manual, the effect of turning off EFER.SVME while a guest is running is undefined. We make it leave guest mode immediately, similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:03 -04:00
Paolo Bonzini	ca46d739e3	KVM: nSVM: split nested_vmcb_check_controls The authoritative state does not come from the VMCB once in guest mode, but KVM_SET_NESTED_STATE can still perform checks on L1's provided SVM controls because we get them from userspace. Therefore, split out a function to do them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:03 -04:00
Paolo Bonzini	08245e6d2e	KVM: nSVM: remove HF_HIF_MASK The L1 flags can be found in the save area of svm->nested.hsave, fish it from there so that there is one fewer thing to migrate. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:02 -04:00
Paolo Bonzini	e9fd761a46	KVM: nSVM: remove HF_VINTR_MASK Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can use it instead of vcpu->arch.hflags to check whether L2 is running in V_INTR_MASKING mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:02 -04:00
Paolo Bonzini	36e2e98363	KVM: nSVM: synthesize correct EXITINTINFO on vmexit This bit was added to nested VMX right when nested_run_pending was introduced, but it is not yet there in nSVM. Since we can have pending events that L0 injected directly into L2 on vmentry, we have to transfer them into L1's queue. For this to work, one important change is required: svm_complete_interrupts (which clears the "injected" fields from the previous VMRUN, and updates them from svm->vmcb's EXITINTINFO) must be placed before we inject the vmexit. This is not too scary though; VMX even does it in vmx_vcpu_run. While at it, the nested_vmexit_inject tracepoint is moved towards the end of nested_svm_vmexit. This ensures that the synthesized EXITINTINFO is visible in the trace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:02 -04:00
Paolo Bonzini	91b7130cb6	KVM: SVM: preserve VGIF across VMCB switch There is only one GIF flag for the whole processor, so make sure it is not clobbered when switching to L2 (in which case we also have to include the V_GIF_ENABLE_MASK, lest we confuse enable_gif/disable_gif/gif_set). When going back, L1 could in theory have entered L2 without issuing a CLGI so make sure the svm_set_gif is done last, after svm->vmcb->control.int_ctl has been copied back from hsave. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:01 -04:00
Paolo Bonzini	ffdf7f9e80	KVM: nSVM: extract svm_set_gif Extract the code that is needed to implement CLGI and STGI, so that we can run it from VMRUN and vmexit (and in the future, KVM_SET_NESTED_STATE). Skip the request for KVM_REQ_EVENT unless needed, subsuming the evaluate_pending_interrupts optimization that is found in enter_svm_guest_mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:01 -04:00
Paolo Bonzini	31031098fe	KVM: nSVM: remove unnecessary if kvm_vcpu_apicv_active must be false when nested virtualization is enabled, so there is no need to check it in clgi_interception. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:01 -04:00
Paolo Bonzini	2d8a42be0e	KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit The control state changes on every L2->L0 vmexit, and we will have to serialize it in the nested state. So keep it up to date in svm->nested.ctl and just copy them back to the nested VMCB in nested_svm_vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:00 -04:00
Paolo Bonzini	d8e4e58f4b	KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTR Restore the INT_CTL value from the guest's VMCB once we've stopped using it, so that virtual interrupts can be injected as requested by L1. V_TPR is up-to-date however, and it can change if the guest writes to CR8, so keep it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:00 -04:00
Paolo Bonzini	e670bf68f4	KVM: nSVM: save all control fields in svm->nested In preparation for nested SVM save/restore, store all data that matters from the VMCB control area into svm->nested. It will then become part of the nested SVM state that is saved by KVM_SET_NESTED_STATE and restored by KVM_GET_NESTED_STATE, just like the cached vmcs12 for nVMX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:26:00 -04:00
Paolo Bonzini	7923ef4f6e	KVM: nSVM: remove trailing padding for struct vmcb_control_area Allow placing the VMCB structs on the stack or in other structs without wasting too much space. Add BUILD_BUG_ON as a quick safeguard against typos. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:59 -04:00
Paolo Bonzini	2f675917ef	KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area This will come in handy when we put a struct vmcb_control_area in svm->nested. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:59 -04:00
Paolo Bonzini	18fc6c55d1	KVM: nSVM: clean up tsc_offset update Use l1_tsc_offset to compute svm->vcpu.arch.tsc_offset and svm->vmcb->control.tsc_offset, instead of relying on hsave. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:59 -04:00
Paolo Bonzini	69cb877487	KVM: nSVM: move MMU setup to nested_prepare_vmcb_control Everything that is needed during nested state restore is now part of nested_prepare_vmcb_control. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:58 -04:00
Paolo Bonzini	f241d711b2	KVM: nSVM: extract preparation of VMCB for nested run Split out filling svm->vmcb.save and svm->vmcb.control before VMRUN. Only the latter will be useful when restoring nested SVM state. This patch introduces no semantic change, so the MMU setup is still done in nested_prepare_vmcb_save. The next patch will clean up things. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:58 -04:00
Paolo Bonzini	3e06f0163f	KVM: nSVM: extract load_nested_vmcb_control When restoring SVM nested state, the control state cache in svm->nested will have to be filled, but the save state will not have to be moved into svm->vmcb. Therefore, pull the code that handles the control area into a separate function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:25:58 -04:00
Paolo Bonzini	69c9dfa24b	KVM: nSVM: move map argument out of enter_svm_guest_mode Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart, since the map argument is not used elsewhere in the function. There are just two callers, and those are also the place where kvm_vcpu_map is called, so it is cleaner to unmap there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-01 04:24:32 -04:00
Paolo Bonzini	df7e0681dd	KVM: nVMX: always update CR3 in VMCS vmx_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as an optimization, but this is only correct before the nested vmentry. If userspace is modifying CR3 with KVM_SET_SREGS after the VM has already been put in guest mode, the value of CR3 will not be updated. Remove the optimization, which almost never triggers anyway. Fixes: `04f11ef458` ("KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:46:18 -04:00
Paolo Bonzini	978ce5837c	KVM: SVM: always update CR3 in VMCB svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as an optimization, but this is only correct before the nested vmentry. If userspace is modifying CR3 with KVM_SET_SREGS after the VM has already been put in guest mode, the value of CR3 will not be updated. Remove the optimization, which almost never triggers anyway. This was was added in commit `689f3bf216` ("KVM: x86: unify callbacks to load paging root", 2020-03-16) just to keep the two vendor-specific modules closer, but we'll fix VMX too. Fixes: `689f3bf216` ("KVM: x86: unify callbacks to load paging root") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:46:18 -04:00
Paolo Bonzini	5b67240866	KVM: nSVM: correctly inject INIT vmexits The usual drill at this point, except there is no code to remove because this case was not handled at all. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:46:18 -04:00
Paolo Bonzini	bd279629f7	KVM: nSVM: remove exit_required All events now inject vmexits before vmentry rather than after vmexit. Therefore, exit_required is not set anymore and we can remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:46:17 -04:00
Paolo Bonzini	7c86663b68	KVM: nSVM: inject exceptions via svm_check_nested_events This allows exceptions injected by the emulator to be properly delivered as vmexits. The code also becomes simpler, because we can just let all L0-intercepted exceptions go through the usual path. In particular, our emulation of the VMX #DB exit qualification is very much simplified, because the vmexit injection path can use kvm_deliver_exception_payload to update DR6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:46:17 -04:00
Paolo Bonzini	c9d40913ac	KVM: x86: enable event window in inject_pending_event In case an interrupt arrives after nested.check_events but before the call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt window even if the interrupt is actually going to be a vmexit. This is useless rather than harmful, but it really complicates reasoning about SVM's handling of the VINTR intercept. We'd like to never bother with the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in that case there is no interrupt window and we can just exit the nested guest whenever we want. This patch moves the opening of the interrupt window inside inject_pending_event. This consolidates the check for pending interrupt/NMI/SMI in one place, and makes KVM's usage of immediate exits more consistent, extending it beyond just nested virtualization. There are two functional changes here. They only affect corner cases, but overall they simplify the inject_pending_event. - re-injection of still-pending events will also use req_immediate_exit instead of using interrupt-window intercepts. This should have no impact on performance on Intel since it simply replaces an interrupt-window or NMI-window exit for a preemption-timer exit. On AMD, which has no equivalent of the preemption time, it may incur some overhead but an actual effect on performance should only be visible in pathological cases. - kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true if an interrupt, NMI or SMI is blocked by nested_run_pending. This makes sense because entering the VM will allow it to make progress and deliver the event. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-28 11:41:46 -04:00
Paolo Bonzini	c6b22f59d6	KVM: x86: track manually whether an event has been injected Instead of calling kvm_event_needs_reinjection, track its future return value in a variable. This will be useful in the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:12 -04:00
Vitaly Kuznetsov	b6162e82ae	KVM: nSVM: Preserve registers modifications done before nested_svm_vmexit() L2 guest hang is observed after 'exit_required' was dropped and nSVM switched to check_nested_events() completely. The hang is a busy loop when e.g. KVM is emulating an instruction (e.g. L2 is accessing MMIO space and we drop to userspace). After nested_svm_vmexit() and when L1 is doing VMRUN nested guest's RIP is not advanced so KVM goes into emulating the same instruction which caused nested_svm_vmexit() and the loop continues. nested_svm_vmexit() is not new, however, with check_nested_events() we're now calling it later than before. In case by that time KVM has modified register state we may pick stale values from VMCB when trying to save nested guest state to nested VMCB. nVMX code handles this case correctly: sync_vmcs02_to_vmcs12() called from nested_vmx_vmexit() does e.g 'vmcs12->guest_rip = kvm_rip_read(vcpu)' and this ensures KVM-made modifications are preserved. Do the same for nSVM. Generally, nested_vmx_vmexit()/nested_svm_vmexit() need to pick up all nested guest state modifications done by KVM after vmexit. It would be great to find a way to express this in a way which would not require to manually track these changes, e.g. nested_{vmcb,vmcs}_get_field(). Co-debugged-with: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200527090102.220647-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:12 -04:00
Sean Christopherson	7d2e8748af	KVM: x86: Initialize tdp_level during vCPU creation Initialize vcpu->arch.tdp_level during vCPU creation to avoid consuming garbage if userspace calls KVM_RUN without first calling KVM_SET_CPUID. Fixes: `e93fd3b3e8` ("KVM: x86/mmu: Capture TDP level when updating CPUID") Reported-by: syzbot+904752567107eefb728c@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200527085400.23759-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:11 -04:00
Paolo Bonzini	6c0238c4a6	KVM: nSVM: leave ASID aside in copy_vmcb_control_area Restoring the ASID from the hsave area on VMEXIT is wrong, because its value depends on the handling of TLB flushes. Just skipping the field in copy_vmcb_control_area will do. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:11 -04:00
Paolo Bonzini	a3535be731	KVM: nSVM: fix condition for filtering async PF Async page faults have to be trapped in the host (L1 in this case), since the APF reason was passed from L0 to L1 and stored in the L1 APF data page. This was completely reversed: the page faults were passed to the guest, a L2 hypervisor. Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:10 -04:00
彭浩(Richard)	88197e6ab3	kvm/x86: Remove redundant function implementations pic_in_kernel(), ioapic_in_kernel() and irqchip_kernel() have the same implementation. Signed-off-by: Peng Hao <richard.peng@oppo.com> Message-Id: <HKAPR02MB4291D5926EA10B8BFE9EA0D3E0B70@HKAPR02MB4291.apcprd02.prod.outlook.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:10 -04:00
Haiwei Li	80bc97f2d8	KVM: Fix the indentation to match coding style There is a bad indentation in next&queue branch. The patch looks like fixes nothing though it fixes the indentation. Before fixing: if (!handle_fastpath_set_x2apic_icr_irqoff(vcpu, data)) { kvm_skip_emulated_instruction(vcpu); ret = EXIT_FASTPATH_EXIT_HANDLED; } break; case MSR_IA32_TSCDEADLINE: After fixing: if (!handle_fastpath_set_x2apic_icr_irqoff(vcpu, data)) { kvm_skip_emulated_instruction(vcpu); ret = EXIT_FASTPATH_EXIT_HANDLED; } break; case MSR_IA32_TSCDEADLINE: Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <2f78457e-f3a7-3bc9-e237-3132ee87f71e@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:09 -04:00
Miaohe Lin	a8cfbae592	KVM: VMX: replace "fall through" with "return" to indicate different case The second "/* fall through */" in rmode_exception() makes code harder to read. Replace it with "return" to indicate they are different cases, only the #DB and #BP check vcpu->guest_debug, while others don't care. And this also improves the readability. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Message-Id: <1582080348-20827-1-git-send-email-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:09 -04:00
Sean Christopherson	cb97c2d680	KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s index Take a u32 for the index in has_emulated_msr() to match hardware, which treats MSR indices as unsigned 32-bit values. Functionally, taking a signed int doesn't cause problems with the current code base, but could theoretically cause problems with 32-bit KVM, e.g. if the index were checked via a less-than statement, which would evaluate incorrectly for MSR indices with bit 31 set. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-05-27 13:11:08 -04:00

1 2 3 4 5 ...

916987 Commits