mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-11-23 20:24:12 +08:00
* Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
* Documentation improvements * Prevent module exit until all VMs are freed * PMU Virtualization fixes * Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences * Other miscellaneous bugfixes -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJIGV8UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroO5FQgAhls4+Nu+NqId/yvvyNxr3vXq0dHI hLlHtvzgGzZisZ7y2bNeyIpJVBDT5LCbrptPD/5eTvchVswDh0+kCVC0Uni5ugGT tLT/Pv9Oq9e0X7aGdHRyuHIivIFDC20zIZO2DV48Lrj/+r6DafB2Fghq2XQLlBxN p8KislvuqAAos543BPC1+Lk3dhOLuZ8qcFD8wGRlcCwjNwYaitrQ16rO04cLfUur OwIks1I6TdI2JpLBhm6oWYVG/YnRsoo4bQE8cjdQ6yNSbwWtRpV33q7X6onw8x8K BEeESoTnMqfaxIF/6mPl6bnDblVHFp6Xhld/vJcgeWQTdajFtuFE/K4sCA== =xnQ6 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
This commit is contained in:
commit
38904911e8
@ -151,12 +151,6 @@ In order to create user controlled virtual machines on S390, check
|
||||
KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
|
||||
privileged user (CAP_SYS_ADMIN).
|
||||
|
||||
To use hardware assisted virtualization on MIPS (VZ ASE) rather than
|
||||
the default trap & emulate implementation (which changes the virtual
|
||||
memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
|
||||
flag KVM_VM_MIPS_VZ.
|
||||
|
||||
|
||||
On arm64, the physical address size for a VM (IPA Size limit) is limited
|
||||
to 40bits by default. The limit can be configured if the host supports the
|
||||
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
|
||||
@ -4081,6 +4075,11 @@ x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
|
||||
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
|
||||
register.
|
||||
|
||||
.. warning::
|
||||
MSR accesses coming from nested vmentry/vmexit are not filtered.
|
||||
This includes both writes to individual VMCS fields and reads/writes
|
||||
through the MSR lists pointed to by the VMCS.
|
||||
|
||||
If a bit is within one of the defined ranges, read and write accesses are
|
||||
guarded by the bitmap's value for the MSR index if the kind of access
|
||||
is included in the ``struct kvm_msr_filter_range`` flags. If no range
|
||||
@ -5293,6 +5292,10 @@ type values:
|
||||
|
||||
KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
|
||||
Sets the guest physical address of the vcpu_info for a given vCPU.
|
||||
As with the shared_info page for the VM, the corresponding page may be
|
||||
dirtied at any time if event channel interrupt delivery is enabled, so
|
||||
userspace should always assume that the page is dirty without relying
|
||||
on dirty logging.
|
||||
|
||||
KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
|
||||
Sets the guest physical address of an additional pvclock structure
|
||||
@ -7719,3 +7722,49 @@ only be invoked on a VM prior to the creation of VCPUs.
|
||||
At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting
|
||||
this capability will disable PMU virtualization for that VM. Usermode
|
||||
should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
|
||||
|
||||
9. Known KVM API problems
|
||||
=========================
|
||||
|
||||
In some cases, KVM's API has some inconsistencies or common pitfalls
|
||||
that userspace need to be aware of. This section details some of
|
||||
these issues.
|
||||
|
||||
Most of them are architecture specific, so the section is split by
|
||||
architecture.
|
||||
|
||||
9.1. x86
|
||||
--------
|
||||
|
||||
``KVM_GET_SUPPORTED_CPUID`` issues
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
In general, ``KVM_GET_SUPPORTED_CPUID`` is designed so that it is possible
|
||||
to take its result and pass it directly to ``KVM_SET_CPUID2``. This section
|
||||
documents some cases in which that requires some care.
|
||||
|
||||
Local APIC features
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CPU[EAX=1]:ECX[21] (X2APIC) is reported by ``KVM_GET_SUPPORTED_CPUID``,
|
||||
but it can only be enabled if ``KVM_CREATE_IRQCHIP`` or
|
||||
``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of
|
||||
the local APIC.
|
||||
|
||||
The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.
|
||||
|
||||
CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``.
|
||||
It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
|
||||
has enabled in-kernel emulation of the local APIC.
|
||||
|
||||
Obsolete ioctls and capabilities
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
KVM_CAP_DISABLE_QUIRKS does not let userspace know which quirks are actually
|
||||
available. Use ``KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2)`` instead if
|
||||
available.
|
||||
|
||||
Ordering of KVM_GET_*/KVM_SET_* ioctls
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
TBD
|
||||
|
@ -8,25 +8,13 @@ KVM
|
||||
:maxdepth: 2
|
||||
|
||||
api
|
||||
amd-memory-encryption
|
||||
cpuid
|
||||
halt-polling
|
||||
hypercalls
|
||||
locking
|
||||
mmu
|
||||
msr
|
||||
nested-vmx
|
||||
ppc-pv
|
||||
s390-diag
|
||||
s390-pv
|
||||
s390-pv-boot
|
||||
timekeeping
|
||||
vcpu-requests
|
||||
|
||||
review-checklist
|
||||
|
||||
arm/index
|
||||
|
||||
devices/index
|
||||
|
||||
running-nested-guests
|
||||
arm/index
|
||||
s390/index
|
||||
ppc-pv
|
||||
x86/index
|
||||
|
||||
locking
|
||||
vcpu-requests
|
||||
review-checklist
|
||||
|
@ -210,32 +210,47 @@ time it will be set using the Dirty tracking mechanism described above.
|
||||
3. Reference
|
||||
------------
|
||||
|
||||
:Name: kvm_lock
|
||||
``kvm_lock``
|
||||
^^^^^^^^^^^^
|
||||
|
||||
:Type: mutex
|
||||
:Arch: any
|
||||
:Protects: - vm_list
|
||||
|
||||
:Name: kvm_count_lock
|
||||
``kvm_count_lock``
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
:Type: raw_spinlock_t
|
||||
:Arch: any
|
||||
:Protects: - hardware virtualization enable/disable
|
||||
:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
|
||||
migration.
|
||||
|
||||
:Name: kvm_arch::tsc_write_lock
|
||||
:Type: raw_spinlock
|
||||
``kvm->mn_invalidate_lock``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
:Type: spinlock_t
|
||||
:Arch: any
|
||||
:Protects: mn_active_invalidate_count, mn_memslots_update_rcuwait
|
||||
|
||||
``kvm_arch::tsc_write_lock``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
:Type: raw_spinlock_t
|
||||
:Arch: x86
|
||||
:Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
|
||||
- tsc offset in vmcb
|
||||
:Comment: 'raw' because updating the tsc offsets must not be preempted.
|
||||
|
||||
:Name: kvm->mmu_lock
|
||||
:Type: spinlock_t
|
||||
``kvm->mmu_lock``
|
||||
^^^^^^^^^^^^^^^^^
|
||||
:Type: spinlock_t or rwlock_t
|
||||
:Arch: any
|
||||
:Protects: -shadow page/shadow tlb entry
|
||||
:Comment: it is a spinlock since it is used in mmu notifier.
|
||||
|
||||
:Name: kvm->srcu
|
||||
``kvm->srcu``
|
||||
^^^^^^^^^^^^^
|
||||
:Type: srcu lock
|
||||
:Arch: any
|
||||
:Protects: - kvm->memslots
|
||||
@ -246,10 +261,20 @@ time it will be set using the Dirty tracking mechanism described above.
|
||||
The srcu index can be stored in kvm_vcpu->srcu_idx per vcpu
|
||||
if it is needed by multiple functions.
|
||||
|
||||
:Name: blocked_vcpu_on_cpu_lock
|
||||
``kvm->slots_arch_lock``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
:Type: mutex
|
||||
:Arch: any (only needed on x86 though)
|
||||
:Protects: any arch-specific fields of memslots that have to be modified
|
||||
in a ``kvm->srcu`` read-side critical section.
|
||||
:Comment: must be held before reading the pointer to the current memslots,
|
||||
until after all changes to the memslots are complete
|
||||
|
||||
``wakeup_vcpus_on_cpu_lock``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
:Type: spinlock_t
|
||||
:Arch: x86
|
||||
:Protects: blocked_vcpu_on_cpu
|
||||
:Protects: wakeup_vcpus_on_cpu
|
||||
:Comment: This is a per-CPU lock and it is used for VT-d posted-interrupts.
|
||||
When VT-d posted-interrupts is supported and the VM has assigned
|
||||
devices, we put the blocked vCPU on the list blocked_vcpu_on_cpu
|
||||
|
12
Documentation/virt/kvm/s390/index.rst
Normal file
12
Documentation/virt/kvm/s390/index.rst
Normal file
@ -0,0 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================
|
||||
KVM for s390 systems
|
||||
====================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
s390-diag
|
||||
s390-pv
|
||||
s390-pv-boot
|
@ -135,6 +135,16 @@ KVM_REQ_UNHALT
|
||||
such as a pending signal, which does not indicate the VCPU's halt
|
||||
emulation should stop, and therefore does not make the request.
|
||||
|
||||
KVM_REQ_OUTSIDE_GUEST_MODE
|
||||
|
||||
This "request" ensures the target vCPU has exited guest mode prior to the
|
||||
sender of the request continuing on. No action needs be taken by the target,
|
||||
and so no request is actually logged for the target. This request is similar
|
||||
to a "kick", but unlike a kick it guarantees the vCPU has actually exited
|
||||
guest mode. A kick only guarantees the vCPU will exit at some point in the
|
||||
future, e.g. a previous kick may have started the process, but there's no
|
||||
guarantee the to-be-kicked vCPU has fully exited guest mode.
|
||||
|
||||
KVM_REQUEST_MASK
|
||||
----------------
|
||||
|
||||
|
39
Documentation/virt/kvm/x86/errata.rst
Normal file
39
Documentation/virt/kvm/x86/errata.rst
Normal file
@ -0,0 +1,39 @@
|
||||
|
||||
=======================================
|
||||
Known limitations of CPU virtualization
|
||||
=======================================
|
||||
|
||||
Whenever perfect emulation of a CPU feature is impossible or too hard, KVM
|
||||
has to choose between not implementing the feature at all or introducing
|
||||
behavioral differences between virtual machines and bare metal systems.
|
||||
|
||||
This file documents some of the known limitations that KVM has in
|
||||
virtualizing CPU features.
|
||||
|
||||
x86
|
||||
===
|
||||
|
||||
``KVM_GET_SUPPORTED_CPUID`` issues
|
||||
----------------------------------
|
||||
|
||||
x87 features
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Unlike most other CPUID feature bits, CPUID[EAX=7,ECX=0]:EBX[6]
|
||||
(FDP_EXCPTN_ONLY) and CPUID[EAX=7,ECX=0]:EBX]13] (ZERO_FCS_FDS) are
|
||||
clear if the features are present and set if the features are not present.
|
||||
|
||||
Clearing these bits in CPUID has no effect on the operation of the guest;
|
||||
if these bits are set on hardware, the features will not be present on
|
||||
any virtual machine that runs on that hardware.
|
||||
|
||||
**Workaround:** It is recommended to always set these bits in guest CPUID.
|
||||
Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
|
||||
to be present likely predates these CPUID feature bits, and therefore
|
||||
doesn't know to check for them anyway.
|
||||
|
||||
Nested virtualization features
|
||||
------------------------------
|
||||
|
||||
TBD
|
||||
|
19
Documentation/virt/kvm/x86/index.rst
Normal file
19
Documentation/virt/kvm/x86/index.rst
Normal file
@ -0,0 +1,19 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================
|
||||
KVM for x86 systems
|
||||
===================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
amd-memory-encryption
|
||||
cpuid
|
||||
errata
|
||||
halt-polling
|
||||
hypercalls
|
||||
mmu
|
||||
msr
|
||||
nested-vmx
|
||||
running-nested-guests
|
||||
timekeeping
|
@ -3462,7 +3462,7 @@ void exit_sie(struct kvm_vcpu *vcpu)
|
||||
/* Kick a guest cpu out of SIE to process a request synchronously */
|
||||
void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu)
|
||||
{
|
||||
kvm_make_request(req, vcpu);
|
||||
__kvm_make_request(req, vcpu);
|
||||
kvm_s390_vcpu_request(vcpu);
|
||||
}
|
||||
|
||||
|
@ -249,6 +249,7 @@ enum x86_intercept_stage;
|
||||
#define PFERR_SGX_BIT 15
|
||||
#define PFERR_GUEST_FINAL_BIT 32
|
||||
#define PFERR_GUEST_PAGE_BIT 33
|
||||
#define PFERR_IMPLICIT_ACCESS_BIT 48
|
||||
|
||||
#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
|
||||
#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
|
||||
@ -259,6 +260,7 @@ enum x86_intercept_stage;
|
||||
#define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
|
||||
#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
|
||||
#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
|
||||
#define PFERR_IMPLICIT_ACCESS (1ULL << PFERR_IMPLICIT_ACCESS_BIT)
|
||||
|
||||
#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
|
||||
PFERR_WRITE_MASK | \
|
||||
@ -430,7 +432,7 @@ struct kvm_mmu {
|
||||
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
|
||||
struct x86_exception *fault);
|
||||
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t gva_or_gpa, u32 access,
|
||||
gpa_t gva_or_gpa, u64 access,
|
||||
struct x86_exception *exception);
|
||||
int (*sync_page)(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu_page *sp);
|
||||
@ -512,6 +514,7 @@ struct kvm_pmu {
|
||||
u64 global_ctrl_mask;
|
||||
u64 global_ovf_ctrl_mask;
|
||||
u64 reserved_bits;
|
||||
u64 raw_event_mask;
|
||||
u8 version;
|
||||
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
|
||||
struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
|
||||
@ -1040,14 +1043,16 @@ struct kvm_x86_msr_filter {
|
||||
struct msr_bitmap_range ranges[16];
|
||||
};
|
||||
|
||||
#define APICV_INHIBIT_REASON_DISABLE 0
|
||||
#define APICV_INHIBIT_REASON_HYPERV 1
|
||||
#define APICV_INHIBIT_REASON_NESTED 2
|
||||
#define APICV_INHIBIT_REASON_IRQWIN 3
|
||||
#define APICV_INHIBIT_REASON_PIT_REINJ 4
|
||||
#define APICV_INHIBIT_REASON_X2APIC 5
|
||||
#define APICV_INHIBIT_REASON_BLOCKIRQ 6
|
||||
#define APICV_INHIBIT_REASON_ABSENT 7
|
||||
enum kvm_apicv_inhibit {
|
||||
APICV_INHIBIT_REASON_DISABLE,
|
||||
APICV_INHIBIT_REASON_HYPERV,
|
||||
APICV_INHIBIT_REASON_NESTED,
|
||||
APICV_INHIBIT_REASON_IRQWIN,
|
||||
APICV_INHIBIT_REASON_PIT_REINJ,
|
||||
APICV_INHIBIT_REASON_X2APIC,
|
||||
APICV_INHIBIT_REASON_BLOCKIRQ,
|
||||
APICV_INHIBIT_REASON_ABSENT,
|
||||
};
|
||||
|
||||
struct kvm_arch {
|
||||
unsigned long n_used_mmu_pages;
|
||||
@ -1401,7 +1406,7 @@ struct kvm_x86_ops {
|
||||
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
|
||||
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
|
||||
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
|
||||
bool (*check_apicv_inhibit_reasons)(ulong bit);
|
||||
bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
|
||||
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
|
||||
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
|
||||
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
|
||||
@ -1585,7 +1590,7 @@ void kvm_mmu_module_exit(void);
|
||||
|
||||
void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
|
||||
int kvm_mmu_create(struct kvm_vcpu *vcpu);
|
||||
void kvm_mmu_init_vm(struct kvm *kvm);
|
||||
int kvm_mmu_init_vm(struct kvm *kvm);
|
||||
void kvm_mmu_uninit_vm(struct kvm *kvm);
|
||||
|
||||
void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
|
||||
@ -1795,11 +1800,22 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
|
||||
|
||||
bool kvm_apicv_activated(struct kvm *kvm);
|
||||
void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
|
||||
void kvm_request_apicv_update(struct kvm *kvm, bool activate,
|
||||
unsigned long bit);
|
||||
void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason, bool set);
|
||||
void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason, bool set);
|
||||
|
||||
void __kvm_request_apicv_update(struct kvm *kvm, bool activate,
|
||||
unsigned long bit);
|
||||
static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason)
|
||||
{
|
||||
kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
|
||||
}
|
||||
|
||||
static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason)
|
||||
{
|
||||
kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
|
||||
}
|
||||
|
||||
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
|
||||
|
||||
|
@ -221,8 +221,14 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
|
||||
#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
|
||||
|
||||
|
||||
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
|
||||
#define SVM_TSC_RATIO_MIN 0x0000000000000001ULL
|
||||
#define SVM_TSC_RATIO_MAX 0x000000ffffffffffULL
|
||||
#define SVM_TSC_RATIO_DEFAULT 0x0100000000ULL
|
||||
|
||||
|
||||
/* AVIC */
|
||||
#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF)
|
||||
#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFFULL)
|
||||
#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31
|
||||
#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
|
||||
|
||||
@ -230,9 +236,11 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12)
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
|
||||
#define AVIC_PHYSICAL_ID_TABLE_SIZE_MASK (0xFF)
|
||||
#define AVIC_PHYSICAL_ID_TABLE_SIZE_MASK (0xFFULL)
|
||||
|
||||
#define AVIC_DOORBELL_PHYSICAL_ID_MASK (0xFF)
|
||||
#define AVIC_DOORBELL_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
|
||||
|
||||
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
|
||||
|
||||
#define AVIC_UNACCEL_ACCESS_WRITE_MASK 1
|
||||
#define AVIC_UNACCEL_ACCESS_OFFSET_MASK 0xFF0
|
||||
|
@ -517,7 +517,7 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
|
||||
} else if (apic_id < min && max - apic_id < KVM_IPI_CLUSTER_SIZE) {
|
||||
ipi_bitmap <<= min - apic_id;
|
||||
min = apic_id;
|
||||
} else if (apic_id < min + KVM_IPI_CLUSTER_SIZE) {
|
||||
} else if (apic_id > min && apic_id < min + KVM_IPI_CLUSTER_SIZE) {
|
||||
max = apic_id < max ? max : apic_id;
|
||||
} else {
|
||||
ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
|
||||
|
@ -735,6 +735,7 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
|
||||
if (function > READ_ONCE(max_cpuid_80000000))
|
||||
return entry;
|
||||
}
|
||||
break;
|
||||
|
||||
default:
|
||||
break;
|
||||
|
@ -3540,8 +3540,10 @@ static int em_rdpid(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u64 tsc_aux = 0;
|
||||
|
||||
if (ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux))
|
||||
if (!ctxt->ops->guest_has_rdpid(ctxt))
|
||||
return emulate_ud(ctxt);
|
||||
|
||||
ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux);
|
||||
ctxt->dst.val = tsc_aux;
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
@ -3642,7 +3644,7 @@ static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
|
||||
|
||||
msr_data = (u32)reg_read(ctxt, VCPU_REGS_RAX)
|
||||
| ((u64)reg_read(ctxt, VCPU_REGS_RDX) << 32);
|
||||
r = ctxt->ops->set_msr(ctxt, msr_index, msr_data);
|
||||
r = ctxt->ops->set_msr_with_filter(ctxt, msr_index, msr_data);
|
||||
|
||||
if (r == X86EMUL_IO_NEEDED)
|
||||
return r;
|
||||
@ -3659,7 +3661,7 @@ static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
|
||||
u64 msr_data;
|
||||
int r;
|
||||
|
||||
r = ctxt->ops->get_msr(ctxt, msr_index, &msr_data);
|
||||
r = ctxt->ops->get_msr_with_filter(ctxt, msr_index, &msr_data);
|
||||
|
||||
if (r == X86EMUL_IO_NEEDED)
|
||||
return r;
|
||||
|
@ -122,9 +122,13 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
|
||||
else
|
||||
hv->synic_auto_eoi_used--;
|
||||
|
||||
__kvm_request_apicv_update(vcpu->kvm,
|
||||
!hv->synic_auto_eoi_used,
|
||||
APICV_INHIBIT_REASON_HYPERV);
|
||||
/*
|
||||
* Inhibit APICv if any vCPU is using SynIC's AutoEOI, which relies on
|
||||
* the hypervisor to manually inject IRQs.
|
||||
*/
|
||||
__kvm_set_or_clear_apicv_inhibit(vcpu->kvm,
|
||||
APICV_INHIBIT_REASON_HYPERV,
|
||||
!!hv->synic_auto_eoi_used);
|
||||
|
||||
up_write(&vcpu->kvm->arch.apicv_update_lock);
|
||||
}
|
||||
@ -239,7 +243,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
|
||||
struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
|
||||
int ret;
|
||||
|
||||
if (!synic->active && !host)
|
||||
if (!synic->active && (!host || data))
|
||||
return 1;
|
||||
|
||||
trace_kvm_hv_synic_set_msr(vcpu->vcpu_id, msr, data, host);
|
||||
@ -285,6 +289,9 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
|
||||
case HV_X64_MSR_EOM: {
|
||||
int i;
|
||||
|
||||
if (!synic->active)
|
||||
break;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(synic->sint); i++)
|
||||
kvm_hv_notify_acked_sint(vcpu, i);
|
||||
break;
|
||||
@ -449,6 +456,9 @@ static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
|
||||
struct kvm_lapic_irq irq;
|
||||
int ret, vector;
|
||||
|
||||
if (KVM_BUG_ON(!lapic_in_kernel(vcpu), vcpu->kvm))
|
||||
return -EINVAL;
|
||||
|
||||
if (sint >= ARRAY_SIZE(synic->sint))
|
||||
return -EINVAL;
|
||||
|
||||
@ -661,7 +671,7 @@ static int stimer_set_config(struct kvm_vcpu_hv_stimer *stimer, u64 config,
|
||||
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
|
||||
struct kvm_vcpu_hv_synic *synic = to_hv_synic(vcpu);
|
||||
|
||||
if (!synic->active && !host)
|
||||
if (!synic->active && (!host || config))
|
||||
return 1;
|
||||
|
||||
if (unlikely(!host && hv_vcpu->enforce_cpuid && new_config.direct_mode &&
|
||||
@ -690,7 +700,7 @@ static int stimer_set_count(struct kvm_vcpu_hv_stimer *stimer, u64 count,
|
||||
struct kvm_vcpu *vcpu = hv_stimer_to_vcpu(stimer);
|
||||
struct kvm_vcpu_hv_synic *synic = to_hv_synic(vcpu);
|
||||
|
||||
if (!synic->active && !host)
|
||||
if (!synic->active && (!host || count))
|
||||
return 1;
|
||||
|
||||
trace_kvm_hv_stimer_set_count(hv_stimer_to_vcpu(stimer)->vcpu_id,
|
||||
|
@ -305,15 +305,13 @@ void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
|
||||
* So, deactivate APICv when PIT is in reinject mode.
|
||||
*/
|
||||
if (reinject) {
|
||||
kvm_request_apicv_update(kvm, false,
|
||||
APICV_INHIBIT_REASON_PIT_REINJ);
|
||||
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PIT_REINJ);
|
||||
/* The initial state is preserved while ps->reinject == 0. */
|
||||
kvm_pit_reset_reinject(pit);
|
||||
kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
|
||||
kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
|
||||
} else {
|
||||
kvm_request_apicv_update(kvm, true,
|
||||
APICV_INHIBIT_REASON_PIT_REINJ);
|
||||
kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PIT_REINJ);
|
||||
kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
|
||||
kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
|
||||
}
|
||||
|
@ -210,6 +210,8 @@ struct x86_emulate_ops {
|
||||
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
|
||||
u64 (*get_smbase)(struct x86_emulate_ctxt *ctxt);
|
||||
void (*set_smbase)(struct x86_emulate_ctxt *ctxt, u64 smbase);
|
||||
int (*set_msr_with_filter)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
|
||||
int (*get_msr_with_filter)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata);
|
||||
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
|
||||
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata);
|
||||
int (*check_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc);
|
||||
@ -226,6 +228,7 @@ struct x86_emulate_ops {
|
||||
bool (*guest_has_long_mode)(struct x86_emulate_ctxt *ctxt);
|
||||
bool (*guest_has_movbe)(struct x86_emulate_ctxt *ctxt);
|
||||
bool (*guest_has_fxsr)(struct x86_emulate_ctxt *ctxt);
|
||||
bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);
|
||||
|
||||
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
|
||||
|
||||
|
@ -1024,6 +1024,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
|
||||
*r = -1;
|
||||
|
||||
if (irq->shorthand == APIC_DEST_SELF) {
|
||||
if (KVM_BUG_ON(!src, kvm)) {
|
||||
*r = 0;
|
||||
return true;
|
||||
}
|
||||
*r = kvm_apic_set_irq(src->vcpu, irq, dest_map);
|
||||
return true;
|
||||
}
|
||||
|
@ -214,27 +214,27 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
|
||||
*/
|
||||
static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
unsigned pte_access, unsigned pte_pkey,
|
||||
unsigned pfec)
|
||||
u64 access)
|
||||
{
|
||||
int cpl = static_call(kvm_x86_get_cpl)(vcpu);
|
||||
/* strip nested paging fault error codes */
|
||||
unsigned int pfec = access;
|
||||
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
|
||||
|
||||
/*
|
||||
* If CPL < 3, SMAP prevention are disabled if EFLAGS.AC = 1.
|
||||
* For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC = 1.
|
||||
* For implicit supervisor accesses, SMAP cannot be overridden.
|
||||
*
|
||||
* If CPL = 3, SMAP applies to all supervisor-mode data accesses
|
||||
* (these are implicit supervisor accesses) regardless of the value
|
||||
* of EFLAGS.AC.
|
||||
* SMAP works on supervisor accesses only, and not_smap can
|
||||
* be set or not set when user access with neither has any bearing
|
||||
* on the result.
|
||||
*
|
||||
* This computes (cpl < 3) && (rflags & X86_EFLAGS_AC), leaving
|
||||
* the result in X86_EFLAGS_AC. We then insert it in place of
|
||||
* the PFERR_RSVD_MASK bit; this bit will always be zero in pfec,
|
||||
* but it will be one in index if SMAP checks are being overridden.
|
||||
* It is important to keep this branchless.
|
||||
* We put the SMAP checking bit in place of the PFERR_RSVD_MASK bit;
|
||||
* this bit will always be zero in pfec, but it will be one in index
|
||||
* if SMAP checks are being disabled.
|
||||
*/
|
||||
unsigned long smap = (cpl - 3) & (rflags & X86_EFLAGS_AC);
|
||||
int index = (pfec >> 1) +
|
||||
(smap >> (X86_EFLAGS_AC_BIT - PFERR_RSVD_BIT + 1));
|
||||
u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
|
||||
bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
|
||||
int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
|
||||
bool fault = (mmu->permissions[index] >> pte_access) & 1;
|
||||
u32 errcode = PFERR_PRESENT_MASK;
|
||||
|
||||
@ -317,12 +317,12 @@ static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
|
||||
atomic64_add(count, &kvm->stat.pages[level - 1]);
|
||||
}
|
||||
|
||||
gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
|
||||
gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
|
||||
struct x86_exception *exception);
|
||||
|
||||
static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
|
||||
struct kvm_mmu *mmu,
|
||||
gpa_t gpa, u32 access,
|
||||
gpa_t gpa, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
if (mmu != &vcpu->arch.nested_mmu)
|
||||
|
@ -2696,8 +2696,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
|
||||
if (*sptep == spte) {
|
||||
ret = RET_PF_SPURIOUS;
|
||||
} else {
|
||||
trace_kvm_mmu_set_spte(level, gfn, sptep);
|
||||
flush |= mmu_spte_update(sptep, spte);
|
||||
trace_kvm_mmu_set_spte(level, gfn, sptep);
|
||||
}
|
||||
|
||||
if (wrprot) {
|
||||
@ -3703,7 +3703,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
|
||||
static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t vaddr, u32 access,
|
||||
gpa_t vaddr, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
if (exception)
|
||||
@ -4591,11 +4591,11 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
|
||||
* - X86_CR4_SMAP is set in CR4
|
||||
* - A user page is accessed
|
||||
* - The access is not a fetch
|
||||
* - Page fault in kernel mode
|
||||
* - if CPL = 3 or X86_EFLAGS_AC is clear
|
||||
* - The access is supervisor mode
|
||||
* - If implicit supervisor access or X86_EFLAGS_AC is clear
|
||||
*
|
||||
* Here, we cover the first three conditions.
|
||||
* The fourth is computed dynamically in permission_fault();
|
||||
* Here, we cover the first four conditions.
|
||||
* The fifth is computed dynamically in permission_fault();
|
||||
* PFERR_RSVD_MASK bit will be set in PFEC if the access is
|
||||
* *not* subject to SMAP restrictions.
|
||||
*/
|
||||
@ -5768,17 +5768,24 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
|
||||
kvm_mmu_zap_all_fast(kvm);
|
||||
}
|
||||
|
||||
void kvm_mmu_init_vm(struct kvm *kvm)
|
||||
int kvm_mmu_init_vm(struct kvm *kvm)
|
||||
{
|
||||
struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
|
||||
int r;
|
||||
|
||||
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
|
||||
INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
|
||||
INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
|
||||
spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
|
||||
|
||||
kvm_mmu_init_tdp_mmu(kvm);
|
||||
r = kvm_mmu_init_tdp_mmu(kvm);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
node->track_write = kvm_mmu_pte_write;
|
||||
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
|
||||
kvm_page_track_register_notifier(kvm, node);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void kvm_mmu_uninit_vm(struct kvm *kvm)
|
||||
@ -5842,8 +5849,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
|
||||
|
||||
if (is_tdp_mmu_enabled(kvm)) {
|
||||
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
|
||||
flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
|
||||
gfn_end, flush);
|
||||
flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
|
||||
gfn_end, true, flush);
|
||||
}
|
||||
|
||||
if (flush)
|
||||
|
@ -34,9 +34,8 @@
|
||||
#define PT_HAVE_ACCESSED_DIRTY(mmu) true
|
||||
#ifdef CONFIG_X86_64
|
||||
#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
|
||||
#define CMPXCHG cmpxchg
|
||||
#define CMPXCHG "cmpxchgq"
|
||||
#else
|
||||
#define CMPXCHG cmpxchg64
|
||||
#define PT_MAX_FULL_LEVELS 2
|
||||
#endif
|
||||
#elif PTTYPE == 32
|
||||
@ -52,7 +51,7 @@
|
||||
#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
|
||||
#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
|
||||
#define PT_HAVE_ACCESSED_DIRTY(mmu) true
|
||||
#define CMPXCHG cmpxchg
|
||||
#define CMPXCHG "cmpxchgl"
|
||||
#elif PTTYPE == PTTYPE_EPT
|
||||
#define pt_element_t u64
|
||||
#define guest_walker guest_walkerEPT
|
||||
@ -65,7 +64,9 @@
|
||||
#define PT_GUEST_DIRTY_SHIFT 9
|
||||
#define PT_GUEST_ACCESSED_SHIFT 8
|
||||
#define PT_HAVE_ACCESSED_DIRTY(mmu) ((mmu)->ept_ad)
|
||||
#define CMPXCHG cmpxchg64
|
||||
#ifdef CONFIG_X86_64
|
||||
#define CMPXCHG "cmpxchgq"
|
||||
#endif
|
||||
#define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL
|
||||
#else
|
||||
#error Invalid PTTYPE value
|
||||
@ -147,43 +148,36 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
pt_element_t __user *ptep_user, unsigned index,
|
||||
pt_element_t orig_pte, pt_element_t new_pte)
|
||||
{
|
||||
int npages;
|
||||
pt_element_t ret;
|
||||
pt_element_t *table;
|
||||
struct page *page;
|
||||
signed char r;
|
||||
|
||||
npages = get_user_pages_fast((unsigned long)ptep_user, 1, FOLL_WRITE, &page);
|
||||
if (likely(npages == 1)) {
|
||||
table = kmap_atomic(page);
|
||||
ret = CMPXCHG(&table[index], orig_pte, new_pte);
|
||||
kunmap_atomic(table);
|
||||
if (!user_access_begin(ptep_user, sizeof(pt_element_t)))
|
||||
return -EFAULT;
|
||||
|
||||
kvm_release_page_dirty(page);
|
||||
} else {
|
||||
struct vm_area_struct *vma;
|
||||
unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
|
||||
unsigned long pfn;
|
||||
unsigned long paddr;
|
||||
#ifdef CMPXCHG
|
||||
asm volatile("1:" LOCK_PREFIX CMPXCHG " %[new], %[ptr]\n"
|
||||
"setnz %b[r]\n"
|
||||
"2:"
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r])
|
||||
: [ptr] "+m" (*ptep_user),
|
||||
[old] "+a" (orig_pte),
|
||||
[r] "=q" (r)
|
||||
: [new] "r" (new_pte)
|
||||
: "memory");
|
||||
#else
|
||||
asm volatile("1:" LOCK_PREFIX "cmpxchg8b %[ptr]\n"
|
||||
"setnz %b[r]\n"
|
||||
"2:"
|
||||
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r])
|
||||
: [ptr] "+m" (*ptep_user),
|
||||
[old] "+A" (orig_pte),
|
||||
[r] "=q" (r)
|
||||
: [new_lo] "b" ((u32)new_pte),
|
||||
[new_hi] "c" ((u32)(new_pte >> 32))
|
||||
: "memory");
|
||||
#endif
|
||||
|
||||
mmap_read_lock(current->mm);
|
||||
vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE);
|
||||
if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
|
||||
mmap_read_unlock(current->mm);
|
||||
return -EFAULT;
|
||||
}
|
||||
pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
|
||||
paddr = pfn << PAGE_SHIFT;
|
||||
table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
|
||||
if (!table) {
|
||||
mmap_read_unlock(current->mm);
|
||||
return -EFAULT;
|
||||
}
|
||||
ret = CMPXCHG(&table[index], orig_pte, new_pte);
|
||||
memunmap(table);
|
||||
mmap_read_unlock(current->mm);
|
||||
}
|
||||
|
||||
return (ret != orig_pte);
|
||||
user_access_end();
|
||||
return r;
|
||||
}
|
||||
|
||||
static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
|
||||
@ -339,7 +333,7 @@ static inline bool FNAME(is_last_gpte)(struct kvm_mmu *mmu,
|
||||
*/
|
||||
static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
||||
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t addr, u32 access)
|
||||
gpa_t addr, u64 access)
|
||||
{
|
||||
int ret;
|
||||
pt_element_t pte;
|
||||
@ -347,7 +341,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
||||
gfn_t table_gfn;
|
||||
u64 pt_access, pte_access;
|
||||
unsigned index, accessed_dirty, pte_pkey;
|
||||
unsigned nested_access;
|
||||
u64 nested_access;
|
||||
gpa_t pte_gpa;
|
||||
bool have_ad;
|
||||
int offset;
|
||||
@ -540,7 +534,7 @@ error:
|
||||
}
|
||||
|
||||
static int FNAME(walk_addr)(struct guest_walker *walker,
|
||||
struct kvm_vcpu *vcpu, gpa_t addr, u32 access)
|
||||
struct kvm_vcpu *vcpu, gpa_t addr, u64 access)
|
||||
{
|
||||
return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr,
|
||||
access);
|
||||
@ -988,7 +982,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
|
||||
|
||||
/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
|
||||
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
||||
gpa_t addr, u32 access,
|
||||
gpa_t addr, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
struct guest_walker walker;
|
||||
|
@ -14,21 +14,24 @@ static bool __read_mostly tdp_mmu_enabled = true;
|
||||
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
|
||||
|
||||
/* Initializes the TDP MMU for the VM, if enabled. */
|
||||
bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
|
||||
int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
|
||||
{
|
||||
struct workqueue_struct *wq;
|
||||
|
||||
if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
|
||||
return false;
|
||||
return 0;
|
||||
|
||||
wq = alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE, 0);
|
||||
if (!wq)
|
||||
return -ENOMEM;
|
||||
|
||||
/* This should not be changed for the lifetime of the VM. */
|
||||
kvm->arch.tdp_mmu_enabled = true;
|
||||
|
||||
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
|
||||
spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
|
||||
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
|
||||
kvm->arch.tdp_mmu_zap_wq =
|
||||
alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE, 0);
|
||||
|
||||
return true;
|
||||
kvm->arch.tdp_mmu_zap_wq = wq;
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Arbitrarily returns true so that this may be used in if statements. */
|
||||
@ -906,10 +909,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
}
|
||||
|
||||
/*
|
||||
* Tears down the mappings for the range of gfns, [start, end), and frees the
|
||||
* non-root pages mapping GFNs strictly within that range. Returns true if
|
||||
* SPTEs have been cleared and a TLB flush is needed before releasing the
|
||||
* MMU lock.
|
||||
* Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs
|
||||
* have been cleared and a TLB flush is needed before releasing the MMU lock.
|
||||
*
|
||||
* If can_yield is true, will release the MMU lock and reschedule if the
|
||||
* scheduler needs the CPU or there is contention on the MMU lock. If this
|
||||
@ -917,42 +918,25 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
|
||||
* the caller must ensure it does not supply too large a GFN range, or the
|
||||
* operation can cause a soft lockup.
|
||||
*/
|
||||
static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
gfn_t start, gfn_t end, bool can_yield, bool flush)
|
||||
static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
gfn_t start, gfn_t end, bool can_yield, bool flush)
|
||||
{
|
||||
bool zap_all = (start == 0 && end >= tdp_mmu_max_gfn_host());
|
||||
struct tdp_iter iter;
|
||||
|
||||
/*
|
||||
* No need to try to step down in the iterator when zapping all SPTEs,
|
||||
* zapping the top-level non-leaf SPTEs will recurse on their children.
|
||||
*/
|
||||
int min_level = zap_all ? root->role.level : PG_LEVEL_4K;
|
||||
|
||||
end = min(end, tdp_mmu_max_gfn_host());
|
||||
|
||||
lockdep_assert_held_write(&kvm->mmu_lock);
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
|
||||
for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
|
||||
if (can_yield &&
|
||||
tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
|
||||
flush = false;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!is_shadow_present_pte(iter.old_spte))
|
||||
continue;
|
||||
|
||||
/*
|
||||
* If this is a non-last-level SPTE that covers a larger range
|
||||
* than should be zapped, continue, and zap the mappings at a
|
||||
* lower level, except when zapping all SPTEs.
|
||||
*/
|
||||
if (!zap_all &&
|
||||
(iter.gfn < start ||
|
||||
iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
|
||||
if (!is_shadow_present_pte(iter.old_spte) ||
|
||||
!is_last_spte(iter.old_spte, iter.level))
|
||||
continue;
|
||||
|
||||
@ -960,17 +944,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
flush = true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Need to flush before releasing RCU. TODO: do it only if intermediate
|
||||
* page tables were zapped; there is no need to flush under RCU protection
|
||||
* if no 'struct kvm_mmu_page' is freed.
|
||||
*/
|
||||
if (flush)
|
||||
kvm_flush_remote_tlbs_with_address(kvm, start, end - start);
|
||||
|
||||
rcu_read_unlock();
|
||||
|
||||
return false;
|
||||
/*
|
||||
* Because this flow zaps _only_ leaf SPTEs, the caller doesn't need
|
||||
* to provide RCU protection as no 'struct kvm_mmu_page' will be freed.
|
||||
*/
|
||||
return flush;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -979,13 +959,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
* SPTEs have been cleared and a TLB flush is needed before releasing the
|
||||
* MMU lock.
|
||||
*/
|
||||
bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
|
||||
gfn_t end, bool can_yield, bool flush)
|
||||
bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
|
||||
bool can_yield, bool flush)
|
||||
{
|
||||
struct kvm_mmu_page *root;
|
||||
|
||||
for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
|
||||
flush = zap_gfn_range(kvm, root, start, end, can_yield, flush);
|
||||
flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
|
||||
|
||||
return flush;
|
||||
}
|
||||
@ -1233,8 +1213,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
|
||||
bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
|
||||
bool flush)
|
||||
{
|
||||
return __kvm_tdp_mmu_zap_gfn_range(kvm, range->slot->as_id, range->start,
|
||||
range->end, range->may_block, flush);
|
||||
return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
|
||||
range->end, range->may_block, flush);
|
||||
}
|
||||
|
||||
typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
|
||||
|
@ -15,14 +15,8 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root)
|
||||
void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
|
||||
bool shared);
|
||||
|
||||
bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
|
||||
bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
|
||||
gfn_t end, bool can_yield, bool flush);
|
||||
static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
|
||||
gfn_t start, gfn_t end, bool flush)
|
||||
{
|
||||
return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush);
|
||||
}
|
||||
|
||||
bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
|
||||
void kvm_tdp_mmu_zap_all(struct kvm *kvm);
|
||||
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
|
||||
@ -72,7 +66,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr,
|
||||
u64 *spte);
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
bool kvm_mmu_init_tdp_mmu(struct kvm *kvm);
|
||||
int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
|
||||
void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
|
||||
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
|
||||
|
||||
@ -93,7 +87,7 @@ static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
|
||||
return sp && is_tdp_mmu_page(sp) && sp->root_count;
|
||||
}
|
||||
#else
|
||||
static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; }
|
||||
static inline int kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return 0; }
|
||||
static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
|
||||
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
|
||||
static inline bool is_tdp_mmu(struct kvm_mmu *mmu) { return false; }
|
||||
|
@ -96,8 +96,7 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
|
||||
|
||||
static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
|
||||
u64 config, bool exclude_user,
|
||||
bool exclude_kernel, bool intr,
|
||||
bool in_tx, bool in_tx_cp)
|
||||
bool exclude_kernel, bool intr)
|
||||
{
|
||||
struct perf_event *event;
|
||||
struct perf_event_attr attr = {
|
||||
@ -116,16 +115,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
|
||||
|
||||
attr.sample_period = get_sample_period(pmc, pmc->counter);
|
||||
|
||||
if (in_tx)
|
||||
attr.config |= HSW_IN_TX;
|
||||
if (in_tx_cp) {
|
||||
if ((attr.config & HSW_IN_TX_CHECKPOINTED) &&
|
||||
guest_cpuid_is_intel(pmc->vcpu)) {
|
||||
/*
|
||||
* HSW_IN_TX_CHECKPOINTED is not supported with nonzero
|
||||
* period. Just clear the sample period so at least
|
||||
* allocating the counter doesn't fail.
|
||||
*/
|
||||
attr.sample_period = 0;
|
||||
attr.config |= HSW_IN_TX_CHECKPOINTED;
|
||||
}
|
||||
|
||||
event = perf_event_create_kernel_counter(&attr, -1, current,
|
||||
@ -185,6 +182,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
|
||||
u32 type = PERF_TYPE_RAW;
|
||||
struct kvm *kvm = pmc->vcpu->kvm;
|
||||
struct kvm_pmu_event_filter *filter;
|
||||
struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu);
|
||||
bool allow_event = true;
|
||||
|
||||
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
|
||||
@ -221,7 +219,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
|
||||
}
|
||||
|
||||
if (type == PERF_TYPE_RAW)
|
||||
config = eventsel & AMD64_RAW_EVENT_MASK;
|
||||
config = eventsel & pmu->raw_event_mask;
|
||||
|
||||
if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
|
||||
return;
|
||||
@ -232,9 +230,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
|
||||
pmc_reprogram_counter(pmc, type, config,
|
||||
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
|
||||
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
|
||||
eventsel & ARCH_PERFMON_EVENTSEL_INT,
|
||||
(eventsel & HSW_IN_TX),
|
||||
(eventsel & HSW_IN_TX_CHECKPOINTED));
|
||||
eventsel & ARCH_PERFMON_EVENTSEL_INT);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(reprogram_gp_counter);
|
||||
|
||||
@ -270,7 +266,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
|
||||
kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc),
|
||||
!(en_field & 0x2), /* exclude user */
|
||||
!(en_field & 0x1), /* exclude kernel */
|
||||
pmi, false, false);
|
||||
pmi);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
|
||||
|
||||
|
@ -726,7 +726,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
|
||||
{
|
||||
struct kvm_kernel_irq_routing_entry *e;
|
||||
struct kvm_irq_routing_table *irq_rt;
|
||||
int idx, ret = -EINVAL;
|
||||
int idx, ret = 0;
|
||||
|
||||
if (!kvm_arch_has_assigned_device(kvm) ||
|
||||
!irq_remapping_cap(IRQ_POSTING_CAP))
|
||||
@ -737,7 +737,13 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
|
||||
|
||||
idx = srcu_read_lock(&kvm->irq_srcu);
|
||||
irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
|
||||
WARN_ON(guest_irq >= irq_rt->nr_rt_entries);
|
||||
|
||||
if (guest_irq >= irq_rt->nr_rt_entries ||
|
||||
hlist_empty(&irq_rt->map[guest_irq])) {
|
||||
pr_warn_once("no route for guest_irq %u/%u (broken user space?)\n",
|
||||
guest_irq, irq_rt->nr_rt_entries);
|
||||
goto out;
|
||||
}
|
||||
|
||||
hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
|
||||
struct vcpu_data vcpu_info;
|
||||
@ -822,7 +828,7 @@ out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
bool avic_check_apicv_inhibit_reasons(ulong bit)
|
||||
bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
|
||||
{
|
||||
ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
|
||||
BIT(APICV_INHIBIT_REASON_ABSENT) |
|
||||
@ -833,7 +839,7 @@ bool avic_check_apicv_inhibit_reasons(ulong bit)
|
||||
BIT(APICV_INHIBIT_REASON_X2APIC) |
|
||||
BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
|
||||
|
||||
return supported & BIT(bit);
|
||||
return supported & BIT(reason);
|
||||
}
|
||||
|
||||
|
||||
|
@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
/* MSR_EVNTSELn */
|
||||
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
|
||||
if (pmc) {
|
||||
if (data == pmc->eventsel)
|
||||
return 0;
|
||||
if (!(data & pmu->reserved_bits)) {
|
||||
data &= ~pmu->reserved_bits;
|
||||
if (data != pmc->eventsel)
|
||||
reprogram_gp_counter(pmc, data);
|
||||
return 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
return 1;
|
||||
@ -284,6 +282,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
|
||||
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
|
||||
pmu->reserved_bits = 0xfffffff000280000ull;
|
||||
pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
|
||||
pmu->version = 1;
|
||||
/* not applicable to AMD; but clean them to prevent any fall out */
|
||||
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
|
||||
|
@ -62,20 +62,8 @@ MODULE_DEVICE_TABLE(x86cpu, svm_cpu_id);
|
||||
#define SEG_TYPE_LDT 2
|
||||
#define SEG_TYPE_BUSY_TSS16 3
|
||||
|
||||
#define SVM_FEATURE_LBRV (1 << 1)
|
||||
#define SVM_FEATURE_SVML (1 << 2)
|
||||
#define SVM_FEATURE_TSC_RATE (1 << 4)
|
||||
#define SVM_FEATURE_VMCB_CLEAN (1 << 5)
|
||||
#define SVM_FEATURE_FLUSH_ASID (1 << 6)
|
||||
#define SVM_FEATURE_DECODE_ASSIST (1 << 7)
|
||||
#define SVM_FEATURE_PAUSE_FILTER (1 << 10)
|
||||
|
||||
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
|
||||
|
||||
#define TSC_RATIO_RSVD 0xffffff0000000000ULL
|
||||
#define TSC_RATIO_MIN 0x0000000000000001ULL
|
||||
#define TSC_RATIO_MAX 0x000000ffffffffffULL
|
||||
|
||||
static bool erratum_383_found __read_mostly;
|
||||
|
||||
u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
|
||||
@ -87,7 +75,6 @@ u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
|
||||
static uint64_t osvw_len = 4, osvw_status;
|
||||
|
||||
static DEFINE_PER_CPU(u64, current_tsc_ratio);
|
||||
#define TSC_RATIO_DEFAULT 0x0100000000ULL
|
||||
|
||||
static const struct svm_direct_access_msrs {
|
||||
u32 index; /* Index of the MSR */
|
||||
@ -480,7 +467,7 @@ static void svm_hardware_disable(void)
|
||||
{
|
||||
/* Make sure we clean up behind us */
|
||||
if (tsc_scaling)
|
||||
wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
|
||||
wrmsrl(MSR_AMD64_TSC_RATIO, SVM_TSC_RATIO_DEFAULT);
|
||||
|
||||
cpu_svm_disable();
|
||||
|
||||
@ -526,8 +513,8 @@ static int svm_hardware_enable(void)
|
||||
* Set the default value, even if we don't use TSC scaling
|
||||
* to avoid having stale value in the msr
|
||||
*/
|
||||
wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
|
||||
__this_cpu_write(current_tsc_ratio, TSC_RATIO_DEFAULT);
|
||||
wrmsrl(MSR_AMD64_TSC_RATIO, SVM_TSC_RATIO_DEFAULT);
|
||||
__this_cpu_write(current_tsc_ratio, SVM_TSC_RATIO_DEFAULT);
|
||||
}
|
||||
|
||||
|
||||
@ -2723,7 +2710,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
|
||||
break;
|
||||
}
|
||||
|
||||
if (data & TSC_RATIO_RSVD)
|
||||
if (data & SVM_TSC_RATIO_RSVD)
|
||||
return 1;
|
||||
|
||||
svm->tsc_ratio_msr = data;
|
||||
@ -2918,7 +2905,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
|
||||
* In this case AVIC was temporarily disabled for
|
||||
* requesting the IRQ window and we have to re-enable it.
|
||||
*/
|
||||
kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
|
||||
kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
|
||||
|
||||
++vcpu->stat.irq_window_exits;
|
||||
return 1;
|
||||
@ -3516,7 +3503,7 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
|
||||
* via AVIC. In such case, we need to temporarily disable AVIC,
|
||||
* and fallback to injecting IRQ via V_IRQ.
|
||||
*/
|
||||
kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
|
||||
kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN);
|
||||
svm_set_vintr(svm);
|
||||
}
|
||||
}
|
||||
@ -3948,6 +3935,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_svm *svm = to_svm(vcpu);
|
||||
struct kvm_cpuid_entry2 *best;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
|
||||
vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
|
||||
boot_cpu_has(X86_FEATURE_XSAVE) &&
|
||||
@ -3974,16 +3962,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
|
||||
* is exposed to the guest, disable AVIC.
|
||||
*/
|
||||
if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
|
||||
kvm_request_apicv_update(vcpu->kvm, false,
|
||||
APICV_INHIBIT_REASON_X2APIC);
|
||||
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_X2APIC);
|
||||
|
||||
/*
|
||||
* Currently, AVIC does not work with nested virtualization.
|
||||
* So, we disable AVIC when cpuid for SVM is set in the L1 guest.
|
||||
*/
|
||||
if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
|
||||
kvm_request_apicv_update(vcpu->kvm, false,
|
||||
APICV_INHIBIT_REASON_NESTED);
|
||||
kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_NESTED);
|
||||
}
|
||||
init_vmcb_after_set_cpuid(vcpu);
|
||||
}
|
||||
@ -4766,10 +4752,10 @@ static __init int svm_hardware_setup(void)
|
||||
} else {
|
||||
pr_info("TSC scaling supported\n");
|
||||
kvm_has_tsc_control = true;
|
||||
kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
|
||||
kvm_tsc_scaling_ratio_frac_bits = 32;
|
||||
}
|
||||
}
|
||||
kvm_max_tsc_scaling_ratio = SVM_TSC_RATIO_MAX;
|
||||
kvm_tsc_scaling_ratio_frac_bits = 32;
|
||||
|
||||
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
|
||||
|
||||
|
@ -22,6 +22,8 @@
|
||||
#include <asm/svm.h>
|
||||
#include <asm/sev-common.h>
|
||||
|
||||
#include "kvm_cache_regs.h"
|
||||
|
||||
#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
|
||||
|
||||
#define IOPM_SIZE PAGE_SIZE * 3
|
||||
@ -569,17 +571,6 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
|
||||
|
||||
/* avic.c */
|
||||
|
||||
#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF)
|
||||
#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31
|
||||
#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
|
||||
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK GENMASK_ULL(11, 0)
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12)
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62)
|
||||
#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
|
||||
|
||||
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
|
||||
|
||||
int avic_ga_log_notifier(u32 ga_tag);
|
||||
void avic_vm_destroy(struct kvm *kvm);
|
||||
int avic_vm_init(struct kvm *kvm);
|
||||
@ -592,7 +583,7 @@ void __avic_vcpu_put(struct kvm_vcpu *vcpu);
|
||||
void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
|
||||
void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
|
||||
void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
|
||||
bool avic_check_apicv_inhibit_reasons(ulong bit);
|
||||
bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
|
||||
void avic_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
|
||||
void avic_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
|
||||
bool avic_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu);
|
||||
|
@ -4,7 +4,6 @@
|
||||
*/
|
||||
|
||||
#include <linux/kvm_host.h>
|
||||
#include "kvm_cache_regs.h"
|
||||
|
||||
#include <asm/mshyperv.h>
|
||||
|
||||
|
@ -1339,23 +1339,25 @@ TRACE_EVENT(kvm_hv_stimer_cleanup,
|
||||
__entry->vcpu_id, __entry->timer_index)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_apicv_update_request,
|
||||
TP_PROTO(bool activate, unsigned long bit),
|
||||
TP_ARGS(activate, bit),
|
||||
TRACE_EVENT(kvm_apicv_inhibit_changed,
|
||||
TP_PROTO(int reason, bool set, unsigned long inhibits),
|
||||
TP_ARGS(reason, set, inhibits),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(bool, activate)
|
||||
__field(unsigned long, bit)
|
||||
__field(int, reason)
|
||||
__field(bool, set)
|
||||
__field(unsigned long, inhibits)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->activate = activate;
|
||||
__entry->bit = bit;
|
||||
__entry->reason = reason;
|
||||
__entry->set = set;
|
||||
__entry->inhibits = inhibits;
|
||||
),
|
||||
|
||||
TP_printk("%s bit=%lu",
|
||||
__entry->activate ? "activate" : "deactivate",
|
||||
__entry->bit)
|
||||
TP_printk("%s reason=%u, inhibits=0x%lx",
|
||||
__entry->set ? "set" : "cleared",
|
||||
__entry->reason, __entry->inhibits)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_apicv_accept_irq,
|
||||
|
@ -389,6 +389,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
struct kvm_pmc *pmc;
|
||||
u32 msr = msr_info->index;
|
||||
u64 data = msr_info->data;
|
||||
u64 reserved_bits;
|
||||
|
||||
switch (msr) {
|
||||
case MSR_CORE_PERF_FIXED_CTR_CTRL:
|
||||
@ -443,7 +444,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
} else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) {
|
||||
if (data == pmc->eventsel)
|
||||
return 0;
|
||||
if (!(data & pmu->reserved_bits)) {
|
||||
reserved_bits = pmu->reserved_bits;
|
||||
if ((pmc->idx == 2) &&
|
||||
(pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED))
|
||||
reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
|
||||
if (!(data & reserved_bits)) {
|
||||
reprogram_gp_counter(pmc, data);
|
||||
return 0;
|
||||
}
|
||||
@ -485,6 +490,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
|
||||
pmu->version = 0;
|
||||
pmu->reserved_bits = 0xffffffff00200000ull;
|
||||
pmu->raw_event_mask = X86_RAW_EVENT_MASK;
|
||||
|
||||
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
|
||||
if (!entry || !vcpu->kvm->arch.enable_pmu)
|
||||
@ -533,8 +539,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
|
||||
entry = kvm_find_cpuid_entry(vcpu, 7, 0);
|
||||
if (entry &&
|
||||
(boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
|
||||
(entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
|
||||
pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
|
||||
(entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) {
|
||||
pmu->reserved_bits ^= HSW_IN_TX;
|
||||
pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
|
||||
}
|
||||
|
||||
bitmap_set(pmu->all_valid_pmc_idx,
|
||||
0, pmu->nr_arch_gp_counters);
|
||||
|
@ -2866,21 +2866,17 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
|
||||
int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
|
||||
{
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
struct vmx_uret_msr *msr = vmx_find_uret_msr(vmx, MSR_EFER);
|
||||
|
||||
/* Nothing to do if hardware doesn't support EFER. */
|
||||
if (!msr)
|
||||
if (!vmx_find_uret_msr(vmx, MSR_EFER))
|
||||
return 0;
|
||||
|
||||
vcpu->arch.efer = efer;
|
||||
if (efer & EFER_LMA) {
|
||||
vm_entry_controls_setbit(to_vmx(vcpu), VM_ENTRY_IA32E_MODE);
|
||||
msr->data = efer;
|
||||
} else {
|
||||
vm_entry_controls_clearbit(to_vmx(vcpu), VM_ENTRY_IA32E_MODE);
|
||||
if (efer & EFER_LMA)
|
||||
vm_entry_controls_setbit(vmx, VM_ENTRY_IA32E_MODE);
|
||||
else
|
||||
vm_entry_controls_clearbit(vmx, VM_ENTRY_IA32E_MODE);
|
||||
|
||||
msr->data = efer & ~EFER_LME;
|
||||
}
|
||||
vmx_setup_uret_msrs(vmx);
|
||||
return 0;
|
||||
}
|
||||
@ -2906,7 +2902,6 @@ static void enter_lmode(struct kvm_vcpu *vcpu)
|
||||
|
||||
static void exit_lmode(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vm_entry_controls_clearbit(to_vmx(vcpu), VM_ENTRY_IA32E_MODE);
|
||||
vmx_set_efer(vcpu, vcpu->arch.efer & ~EFER_LMA);
|
||||
}
|
||||
|
||||
@ -7705,14 +7700,14 @@ static void vmx_hardware_unsetup(void)
|
||||
free_kvm_area();
|
||||
}
|
||||
|
||||
static bool vmx_check_apicv_inhibit_reasons(ulong bit)
|
||||
static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
|
||||
{
|
||||
ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
|
||||
BIT(APICV_INHIBIT_REASON_ABSENT) |
|
||||
BIT(APICV_INHIBIT_REASON_HYPERV) |
|
||||
BIT(APICV_INHIBIT_REASON_BLOCKIRQ);
|
||||
|
||||
return supported & BIT(bit);
|
||||
return supported & BIT(reason);
|
||||
}
|
||||
|
||||
static struct kvm_x86_ops vmx_x86_ops __initdata = {
|
||||
@ -7980,12 +7975,11 @@ static __init int hardware_setup(void)
|
||||
if (!enable_apicv)
|
||||
vmx_x86_ops.sync_pir_to_irr = NULL;
|
||||
|
||||
if (cpu_has_vmx_tsc_scaling()) {
|
||||
if (cpu_has_vmx_tsc_scaling())
|
||||
kvm_has_tsc_control = true;
|
||||
kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
|
||||
kvm_tsc_scaling_ratio_frac_bits = 48;
|
||||
}
|
||||
|
||||
kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
|
||||
kvm_tsc_scaling_ratio_frac_bits = 48;
|
||||
kvm_has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
|
||||
|
||||
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
|
||||
|
@ -1748,9 +1748,6 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
|
||||
{
|
||||
struct msr_data msr;
|
||||
|
||||
if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
|
||||
return KVM_MSR_RET_FILTERED;
|
||||
|
||||
switch (index) {
|
||||
case MSR_FS_BASE:
|
||||
case MSR_GS_BASE:
|
||||
@ -1832,9 +1829,6 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
|
||||
struct msr_data msr;
|
||||
int ret;
|
||||
|
||||
if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
|
||||
return KVM_MSR_RET_FILTERED;
|
||||
|
||||
switch (index) {
|
||||
case MSR_TSC_AUX:
|
||||
if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
|
||||
@ -1871,6 +1865,20 @@ static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int kvm_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 *data)
|
||||
{
|
||||
if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
|
||||
return KVM_MSR_RET_FILTERED;
|
||||
return kvm_get_msr_ignored_check(vcpu, index, data, false);
|
||||
}
|
||||
|
||||
static int kvm_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 index, u64 data)
|
||||
{
|
||||
if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
|
||||
return KVM_MSR_RET_FILTERED;
|
||||
return kvm_set_msr_ignored_check(vcpu, index, data, false);
|
||||
}
|
||||
|
||||
int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
|
||||
{
|
||||
return kvm_get_msr_ignored_check(vcpu, index, data, false);
|
||||
@ -1953,7 +1961,7 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
|
||||
u64 data;
|
||||
int r;
|
||||
|
||||
r = kvm_get_msr(vcpu, ecx, &data);
|
||||
r = kvm_get_msr_with_filter(vcpu, ecx, &data);
|
||||
|
||||
if (!r) {
|
||||
trace_kvm_msr_read(ecx, data);
|
||||
@ -1978,7 +1986,7 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
|
||||
u64 data = kvm_read_edx_eax(vcpu);
|
||||
int r;
|
||||
|
||||
r = kvm_set_msr(vcpu, ecx, data);
|
||||
r = kvm_set_msr_with_filter(vcpu, ecx, data);
|
||||
|
||||
if (!r) {
|
||||
trace_kvm_msr_write(ecx, data);
|
||||
@ -5938,7 +5946,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
|
||||
smp_wmb();
|
||||
kvm->arch.irqchip_mode = KVM_IRQCHIP_SPLIT;
|
||||
kvm->arch.nr_reserved_ioapic_pins = cap->args[0];
|
||||
kvm_request_apicv_update(kvm, true, APICV_INHIBIT_REASON_ABSENT);
|
||||
kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_ABSENT);
|
||||
r = 0;
|
||||
split_irqchip_unlock:
|
||||
mutex_unlock(&kvm->lock);
|
||||
@ -6335,7 +6343,7 @@ set_identity_unlock:
|
||||
/* Write kvm->irq_routing before enabling irqchip_in_kernel. */
|
||||
smp_wmb();
|
||||
kvm->arch.irqchip_mode = KVM_IRQCHIP_KERNEL;
|
||||
kvm_request_apicv_update(kvm, true, APICV_INHIBIT_REASON_ABSENT);
|
||||
kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_ABSENT);
|
||||
create_irqchip_unlock:
|
||||
mutex_unlock(&kvm->lock);
|
||||
break;
|
||||
@ -6726,7 +6734,7 @@ void kvm_get_segment(struct kvm_vcpu *vcpu,
|
||||
static_call(kvm_x86_get_segment)(vcpu, var, seg);
|
||||
}
|
||||
|
||||
gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
|
||||
gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.mmu;
|
||||
@ -6746,7 +6754,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
|
||||
u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
|
||||
@ -6756,7 +6764,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
|
||||
u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
access |= PFERR_FETCH_MASK;
|
||||
return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception);
|
||||
}
|
||||
@ -6766,7 +6774,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
|
||||
u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
access |= PFERR_WRITE_MASK;
|
||||
return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception);
|
||||
}
|
||||
@ -6782,7 +6790,7 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
|
||||
}
|
||||
|
||||
static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
|
||||
struct kvm_vcpu *vcpu, u32 access,
|
||||
struct kvm_vcpu *vcpu, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
@ -6819,7 +6827,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
unsigned offset;
|
||||
int ret;
|
||||
|
||||
@ -6844,7 +6852,7 @@ int kvm_read_guest_virt(struct kvm_vcpu *vcpu,
|
||||
gva_t addr, void *val, unsigned int bytes,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
|
||||
|
||||
/*
|
||||
* FIXME: this should call handle_emulation_failure if X86EMUL_IO_NEEDED
|
||||
@ -6863,9 +6871,11 @@ static int emulator_read_std(struct x86_emulate_ctxt *ctxt,
|
||||
struct x86_exception *exception, bool system)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
u32 access = 0;
|
||||
u64 access = 0;
|
||||
|
||||
if (!system && static_call(kvm_x86_get_cpl)(vcpu) == 3)
|
||||
if (system)
|
||||
access |= PFERR_IMPLICIT_ACCESS;
|
||||
else if (static_call(kvm_x86_get_cpl)(vcpu) == 3)
|
||||
access |= PFERR_USER_MASK;
|
||||
|
||||
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, exception);
|
||||
@ -6881,7 +6891,7 @@ static int kvm_read_guest_phys_system(struct x86_emulate_ctxt *ctxt,
|
||||
}
|
||||
|
||||
static int kvm_write_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
|
||||
struct kvm_vcpu *vcpu, u32 access,
|
||||
struct kvm_vcpu *vcpu, u64 access,
|
||||
struct x86_exception *exception)
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
@ -6915,9 +6925,11 @@ static int emulator_write_std(struct x86_emulate_ctxt *ctxt, gva_t addr, void *v
|
||||
bool system)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
u32 access = PFERR_WRITE_MASK;
|
||||
u64 access = PFERR_WRITE_MASK;
|
||||
|
||||
if (!system && static_call(kvm_x86_get_cpl)(vcpu) == 3)
|
||||
if (system)
|
||||
access |= PFERR_IMPLICIT_ACCESS;
|
||||
else if (static_call(kvm_x86_get_cpl)(vcpu) == 3)
|
||||
access |= PFERR_USER_MASK;
|
||||
|
||||
return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
|
||||
@ -6984,7 +6996,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
|
||||
bool write)
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
u32 access = ((static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0)
|
||||
u64 access = ((static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0)
|
||||
| (write ? PFERR_WRITE_MASK : 0);
|
||||
|
||||
/*
|
||||
@ -7627,13 +7639,13 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector,
|
||||
return;
|
||||
}
|
||||
|
||||
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 *pdata)
|
||||
static int emulator_get_msr_with_filter(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 *pdata)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
int r;
|
||||
|
||||
r = kvm_get_msr(vcpu, msr_index, pdata);
|
||||
r = kvm_get_msr_with_filter(vcpu, msr_index, pdata);
|
||||
|
||||
if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
|
||||
complete_emulated_rdmsr, r)) {
|
||||
@ -7644,13 +7656,13 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
|
||||
return r;
|
||||
}
|
||||
|
||||
static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 data)
|
||||
static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 data)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
int r;
|
||||
|
||||
r = kvm_set_msr(vcpu, msr_index, data);
|
||||
r = kvm_set_msr_with_filter(vcpu, msr_index, data);
|
||||
|
||||
if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
|
||||
complete_emulated_msr_access, r)) {
|
||||
@ -7661,6 +7673,18 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
|
||||
return r;
|
||||
}
|
||||
|
||||
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 *pdata)
|
||||
{
|
||||
return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
|
||||
}
|
||||
|
||||
static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
|
||||
u32 msr_index, u64 data)
|
||||
{
|
||||
return kvm_set_msr(emul_to_vcpu(ctxt), msr_index, data);
|
||||
}
|
||||
|
||||
static u64 emulator_get_smbase(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
|
||||
@ -7724,6 +7748,11 @@ static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt)
|
||||
return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
|
||||
}
|
||||
|
||||
static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
|
||||
}
|
||||
|
||||
static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
|
||||
{
|
||||
return kvm_register_read_raw(emul_to_vcpu(ctxt), reg);
|
||||
@ -7794,6 +7823,8 @@ static const struct x86_emulate_ops emulate_ops = {
|
||||
.set_dr = emulator_set_dr,
|
||||
.get_smbase = emulator_get_smbase,
|
||||
.set_smbase = emulator_set_smbase,
|
||||
.set_msr_with_filter = emulator_set_msr_with_filter,
|
||||
.get_msr_with_filter = emulator_get_msr_with_filter,
|
||||
.set_msr = emulator_set_msr,
|
||||
.get_msr = emulator_get_msr,
|
||||
.check_pmc = emulator_check_pmc,
|
||||
@ -7806,6 +7837,7 @@ static const struct x86_emulate_ops emulate_ops = {
|
||||
.guest_has_long_mode = emulator_guest_has_long_mode,
|
||||
.guest_has_movbe = emulator_guest_has_movbe,
|
||||
.guest_has_fxsr = emulator_guest_has_fxsr,
|
||||
.guest_has_rdpid = emulator_guest_has_rdpid,
|
||||
.set_nmi_mask = emulator_set_nmi_mask,
|
||||
.get_hflags = emulator_get_hflags,
|
||||
.exiting_smm = emulator_exiting_smm,
|
||||
@ -9058,15 +9090,29 @@ bool kvm_apicv_activated(struct kvm *kvm)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_apicv_activated);
|
||||
|
||||
|
||||
static void set_or_clear_apicv_inhibit(unsigned long *inhibits,
|
||||
enum kvm_apicv_inhibit reason, bool set)
|
||||
{
|
||||
if (set)
|
||||
__set_bit(reason, inhibits);
|
||||
else
|
||||
__clear_bit(reason, inhibits);
|
||||
|
||||
trace_kvm_apicv_inhibit_changed(reason, set, *inhibits);
|
||||
}
|
||||
|
||||
static void kvm_apicv_init(struct kvm *kvm)
|
||||
{
|
||||
unsigned long *inhibits = &kvm->arch.apicv_inhibit_reasons;
|
||||
|
||||
init_rwsem(&kvm->arch.apicv_update_lock);
|
||||
|
||||
set_bit(APICV_INHIBIT_REASON_ABSENT,
|
||||
&kvm->arch.apicv_inhibit_reasons);
|
||||
set_or_clear_apicv_inhibit(inhibits, APICV_INHIBIT_REASON_ABSENT, true);
|
||||
|
||||
if (!enable_apicv)
|
||||
set_bit(APICV_INHIBIT_REASON_DISABLE,
|
||||
&kvm->arch.apicv_inhibit_reasons);
|
||||
set_or_clear_apicv_inhibit(inhibits,
|
||||
APICV_INHIBIT_REASON_ABSENT, true);
|
||||
}
|
||||
|
||||
static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
|
||||
@ -9740,24 +9786,21 @@ out:
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
|
||||
|
||||
void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
|
||||
void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason, bool set)
|
||||
{
|
||||
unsigned long old, new;
|
||||
|
||||
lockdep_assert_held_write(&kvm->arch.apicv_update_lock);
|
||||
|
||||
if (!static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
|
||||
if (!static_call(kvm_x86_check_apicv_inhibit_reasons)(reason))
|
||||
return;
|
||||
|
||||
old = new = kvm->arch.apicv_inhibit_reasons;
|
||||
|
||||
if (activate)
|
||||
__clear_bit(bit, &new);
|
||||
else
|
||||
__set_bit(bit, &new);
|
||||
set_or_clear_apicv_inhibit(&new, reason, set);
|
||||
|
||||
if (!!old != !!new) {
|
||||
trace_kvm_apicv_update_request(activate, bit);
|
||||
/*
|
||||
* Kick all vCPUs before setting apicv_inhibit_reasons to avoid
|
||||
* false positives in the sanity check WARN in svm_vcpu_run().
|
||||
@ -9776,20 +9819,22 @@ void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
|
||||
unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
|
||||
kvm_zap_gfn_range(kvm, gfn, gfn+1);
|
||||
}
|
||||
} else
|
||||
} else {
|
||||
kvm->arch.apicv_inhibit_reasons = new;
|
||||
}
|
||||
}
|
||||
|
||||
void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
|
||||
void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
|
||||
enum kvm_apicv_inhibit reason, bool set)
|
||||
{
|
||||
if (!enable_apicv)
|
||||
return;
|
||||
|
||||
down_write(&kvm->arch.apicv_update_lock);
|
||||
__kvm_request_apicv_update(kvm, activate, bit);
|
||||
__kvm_set_or_clear_apicv_inhibit(kvm, reason, set);
|
||||
up_write(&kvm->arch.apicv_update_lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
|
||||
EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit);
|
||||
|
||||
static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
@ -10937,7 +10982,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
|
||||
|
||||
static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
|
||||
{
|
||||
bool inhibit = false;
|
||||
bool set = false;
|
||||
struct kvm_vcpu *vcpu;
|
||||
unsigned long i;
|
||||
|
||||
@ -10945,11 +10990,11 @@ static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||
if (vcpu->guest_debug & KVM_GUESTDBG_BLOCKIRQ) {
|
||||
inhibit = true;
|
||||
set = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
__kvm_request_apicv_update(kvm, !inhibit, APICV_INHIBIT_REASON_BLOCKIRQ);
|
||||
__kvm_set_or_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_BLOCKIRQ, set);
|
||||
up_write(&kvm->arch.apicv_update_lock);
|
||||
}
|
||||
|
||||
@ -11557,10 +11602,8 @@ int kvm_arch_hardware_setup(void *opaque)
|
||||
u64 max = min(0x7fffffffULL,
|
||||
__scale_tsc(kvm_max_tsc_scaling_ratio, tsc_khz));
|
||||
kvm_max_guest_tsc_khz = max;
|
||||
|
||||
kvm_default_tsc_scaling_ratio = 1ULL << kvm_tsc_scaling_ratio_frac_bits;
|
||||
}
|
||||
|
||||
kvm_default_tsc_scaling_ratio = 1ULL << kvm_tsc_scaling_ratio_frac_bits;
|
||||
kvm_init_msr_list();
|
||||
return 0;
|
||||
}
|
||||
@ -11629,12 +11672,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
|
||||
ret = kvm_page_track_init(kvm);
|
||||
if (ret)
|
||||
return ret;
|
||||
goto out;
|
||||
|
||||
ret = kvm_mmu_init_vm(kvm);
|
||||
if (ret)
|
||||
goto out_page_track;
|
||||
|
||||
INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
|
||||
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
|
||||
INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
|
||||
INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
|
||||
INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
|
||||
atomic_set(&kvm->arch.noncoherent_dma_count, 0);
|
||||
|
||||
@ -11666,10 +11710,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
|
||||
|
||||
kvm_apicv_init(kvm);
|
||||
kvm_hv_init_vm(kvm);
|
||||
kvm_mmu_init_vm(kvm);
|
||||
kvm_xen_init_vm(kvm);
|
||||
|
||||
return static_call(kvm_x86_vm_init)(kvm);
|
||||
|
||||
out_page_track:
|
||||
kvm_page_track_cleanup(kvm);
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
int kvm_arch_post_init_vm(struct kvm *kvm)
|
||||
@ -12593,7 +12641,7 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
|
||||
{
|
||||
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
|
||||
struct x86_exception fault;
|
||||
u32 access = error_code &
|
||||
u64 access = error_code &
|
||||
(PFERR_WRITE_MASK | PFERR_FETCH_MASK | PFERR_USER_MASK);
|
||||
|
||||
if (!(error_code & PFERR_PRESENT_MASK) ||
|
||||
@ -12933,7 +12981,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
|
||||
|
@ -39,8 +39,8 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
|
||||
}
|
||||
|
||||
do {
|
||||
ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, false, true,
|
||||
gpa, PAGE_SIZE, false);
|
||||
ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, KVM_HOST_USES_PFN,
|
||||
gpa, PAGE_SIZE);
|
||||
if (ret)
|
||||
goto out;
|
||||
|
||||
@ -1025,8 +1025,7 @@ static int evtchn_set_fn(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm
|
||||
break;
|
||||
|
||||
idx = srcu_read_lock(&kvm->srcu);
|
||||
rc = kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpc->gpa,
|
||||
PAGE_SIZE, false);
|
||||
rc = kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpc->gpa, PAGE_SIZE);
|
||||
srcu_read_unlock(&kvm->srcu, idx);
|
||||
} while(!rc);
|
||||
|
||||
|
@ -148,6 +148,7 @@ static inline bool is_error_page(struct page *page)
|
||||
#define KVM_REQUEST_MASK GENMASK(7,0)
|
||||
#define KVM_REQUEST_NO_WAKEUP BIT(8)
|
||||
#define KVM_REQUEST_WAIT BIT(9)
|
||||
#define KVM_REQUEST_NO_ACTION BIT(10)
|
||||
/*
|
||||
* Architecture-independent vcpu->requests bit members
|
||||
* Bits 4-7 are reserved for more arch-independent bits.
|
||||
@ -156,9 +157,18 @@ static inline bool is_error_page(struct page *page)
|
||||
#define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQ_UNBLOCK 2
|
||||
#define KVM_REQ_UNHALT 3
|
||||
#define KVM_REQ_GPC_INVALIDATE (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
#define KVM_REQUEST_ARCH_BASE 8
|
||||
|
||||
/*
|
||||
* KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
|
||||
* OUTSIDE_GUEST_MODE. KVM_REQ_OUTSIDE_GUEST_MODE differs from a vCPU "kick"
|
||||
* in that it ensures the vCPU has reached OUTSIDE_GUEST_MODE before continuing
|
||||
* on. A kick only guarantees that the vCPU is on its way out, e.g. a previous
|
||||
* kick may have set vcpu->mode to EXITING_GUEST_MODE, and so there's no
|
||||
* guarantee the vCPU received an IPI and has actually exited guest mode.
|
||||
*/
|
||||
#define KVM_REQ_OUTSIDE_GUEST_MODE (KVM_REQUEST_NO_ACTION | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
|
||||
|
||||
#define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
|
||||
BUILD_BUG_ON((unsigned)(nr) >= (sizeof_field(struct kvm_vcpu, requests) * 8) - KVM_REQUEST_ARCH_BASE); \
|
||||
(unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \
|
||||
@ -1221,27 +1231,27 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
|
||||
* @gpc: struct gfn_to_pfn_cache object.
|
||||
* @vcpu: vCPU to be used for marking pages dirty and to be woken on
|
||||
* invalidation.
|
||||
* @guest_uses_pa: indicates that the resulting host physical PFN is used while
|
||||
* @vcpu is IN_GUEST_MODE so invalidations should wake it.
|
||||
* @kernel_map: requests a kernel virtual mapping (kmap / memremap).
|
||||
* @usage: indicates if the resulting host physical PFN is used while
|
||||
* the @vcpu is IN_GUEST_MODE (in which case invalidation of
|
||||
* the cache from MMU notifiers---but not for KVM memslot
|
||||
* changes!---will also force @vcpu to exit the guest and
|
||||
* refresh the cache); and/or if the PFN used directly
|
||||
* by KVM (and thus needs a kernel virtual mapping).
|
||||
* @gpa: guest physical address to map.
|
||||
* @len: sanity check; the range being access must fit a single page.
|
||||
* @dirty: mark the cache dirty immediately.
|
||||
*
|
||||
* @return: 0 for success.
|
||||
* -EINVAL for a mapping which would cross a page boundary.
|
||||
* -EFAULT for an untranslatable guest physical address.
|
||||
*
|
||||
* This primes a gfn_to_pfn_cache and links it into the @kvm's list for
|
||||
* invalidations to be processed. Invalidation callbacks to @vcpu using
|
||||
* %KVM_REQ_GPC_INVALIDATE will occur only for MMU notifiers, not for KVM
|
||||
* memslot changes. Callers are required to use kvm_gfn_to_pfn_cache_check()
|
||||
* to ensure that the cache is valid before accessing the target page.
|
||||
* invalidations to be processed. Callers are required to use
|
||||
* kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before
|
||||
* accessing the target page.
|
||||
*/
|
||||
int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
struct kvm_vcpu *vcpu, bool guest_uses_pa,
|
||||
bool kernel_map, gpa_t gpa, unsigned long len,
|
||||
bool dirty);
|
||||
struct kvm_vcpu *vcpu, enum pfn_cache_usage usage,
|
||||
gpa_t gpa, unsigned long len);
|
||||
|
||||
/**
|
||||
* kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache.
|
||||
@ -1250,7 +1260,6 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
* @gpc: struct gfn_to_pfn_cache object.
|
||||
* @gpa: current guest physical address to map.
|
||||
* @len: sanity check; the range being access must fit a single page.
|
||||
* @dirty: mark the cache dirty immediately.
|
||||
*
|
||||
* @return: %true if the cache is still valid and the address matches.
|
||||
* %false if the cache is not valid.
|
||||
@ -1272,7 +1281,6 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
* @gpc: struct gfn_to_pfn_cache object.
|
||||
* @gpa: updated guest physical address to map.
|
||||
* @len: sanity check; the range being access must fit a single page.
|
||||
* @dirty: mark the cache dirty immediately.
|
||||
*
|
||||
* @return: 0 for success.
|
||||
* -EINVAL for a mapping which would cross a page boundary.
|
||||
@ -1285,7 +1293,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
* with the lock still held to permit access.
|
||||
*/
|
||||
int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
gpa_t gpa, unsigned long len, bool dirty);
|
||||
gpa_t gpa, unsigned long len);
|
||||
|
||||
/**
|
||||
* kvm_gfn_to_pfn_cache_unmap - temporarily unmap a gfn_to_pfn_cache.
|
||||
@ -1293,10 +1301,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
* @kvm: pointer to kvm instance.
|
||||
* @gpc: struct gfn_to_pfn_cache object.
|
||||
*
|
||||
* This unmaps the referenced page and marks it dirty, if appropriate. The
|
||||
* cache is left in the invalid state but at least the mapping from GPA to
|
||||
* userspace HVA will remain cached and can be reused on a subsequent
|
||||
* refresh.
|
||||
* This unmaps the referenced page. The cache is left in the invalid state
|
||||
* but at least the mapping from GPA to userspace HVA will remain cached
|
||||
* and can be reused on a subsequent refresh.
|
||||
*/
|
||||
void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);
|
||||
|
||||
@ -1984,7 +1991,7 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
|
||||
|
||||
void kvm_arch_irq_routing_update(struct kvm *kvm);
|
||||
|
||||
static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
|
||||
static inline void __kvm_make_request(int req, struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/*
|
||||
* Ensure the rest of the request is published to kvm_check_request's
|
||||
@ -1994,6 +2001,19 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
|
||||
set_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests);
|
||||
}
|
||||
|
||||
static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
|
||||
{
|
||||
/*
|
||||
* Request that don't require vCPU action should never be logged in
|
||||
* vcpu->requests. The vCPU won't clear the request, so it will stay
|
||||
* logged indefinitely and prevent the vCPU from entering the guest.
|
||||
*/
|
||||
BUILD_BUG_ON(!__builtin_constant_p(req) ||
|
||||
(req & KVM_REQUEST_NO_ACTION));
|
||||
|
||||
__kvm_make_request(req, vcpu);
|
||||
}
|
||||
|
||||
static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return READ_ONCE(vcpu->requests);
|
||||
|
@ -18,6 +18,7 @@ struct kvm_memslots;
|
||||
|
||||
enum kvm_mr_change;
|
||||
|
||||
#include <linux/bits.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/spinlock_types.h>
|
||||
|
||||
@ -46,6 +47,12 @@ typedef u64 hfn_t;
|
||||
|
||||
typedef hfn_t kvm_pfn_t;
|
||||
|
||||
enum pfn_cache_usage {
|
||||
KVM_GUEST_USES_PFN = BIT(0),
|
||||
KVM_HOST_USES_PFN = BIT(1),
|
||||
KVM_GUEST_AND_HOST_USE_PFN = KVM_GUEST_USES_PFN | KVM_HOST_USES_PFN,
|
||||
};
|
||||
|
||||
struct gfn_to_hva_cache {
|
||||
u64 generation;
|
||||
gpa_t gpa;
|
||||
@ -64,11 +71,9 @@ struct gfn_to_pfn_cache {
|
||||
rwlock_t lock;
|
||||
void *khva;
|
||||
kvm_pfn_t pfn;
|
||||
enum pfn_cache_usage usage;
|
||||
bool active;
|
||||
bool valid;
|
||||
bool dirty;
|
||||
bool kernel_map;
|
||||
bool guest_uses_pa;
|
||||
};
|
||||
|
||||
#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
|
||||
|
@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
|
||||
|
||||
static const struct file_operations stat_fops_per_vm;
|
||||
|
||||
static struct file_operations kvm_chardev_ops;
|
||||
|
||||
static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
|
||||
unsigned long arg);
|
||||
#ifdef CONFIG_KVM_COMPAT
|
||||
@ -251,7 +253,8 @@ static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req,
|
||||
{
|
||||
int cpu;
|
||||
|
||||
kvm_make_request(req, vcpu);
|
||||
if (likely(!(req & KVM_REQUEST_NO_ACTION)))
|
||||
__kvm_make_request(req, vcpu);
|
||||
|
||||
if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
|
||||
return;
|
||||
@ -1131,6 +1134,16 @@ static struct kvm *kvm_create_vm(unsigned long type)
|
||||
preempt_notifier_inc();
|
||||
kvm_init_pm_notifier(kvm);
|
||||
|
||||
/*
|
||||
* When the fd passed to this ioctl() is opened it pins the module,
|
||||
* but try_module_get() also prevents getting a reference if the module
|
||||
* is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait").
|
||||
*/
|
||||
if (!try_module_get(kvm_chardev_ops.owner)) {
|
||||
r = -ENODEV;
|
||||
goto out_err;
|
||||
}
|
||||
|
||||
return kvm;
|
||||
|
||||
out_err:
|
||||
@ -1220,6 +1233,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
|
||||
preempt_notifier_dec();
|
||||
hardware_disable_all();
|
||||
mmdrop(mm);
|
||||
module_put(kvm_chardev_ops.owner);
|
||||
}
|
||||
|
||||
void kvm_get_kvm(struct kvm *kvm)
|
||||
@ -3663,7 +3677,7 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct file_operations kvm_vcpu_fops = {
|
||||
static const struct file_operations kvm_vcpu_fops = {
|
||||
.release = kvm_vcpu_release,
|
||||
.unlocked_ioctl = kvm_vcpu_ioctl,
|
||||
.mmap = kvm_vcpu_mmap,
|
||||
@ -4714,7 +4728,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
|
||||
}
|
||||
#endif
|
||||
|
||||
static struct file_operations kvm_vm_fops = {
|
||||
static const struct file_operations kvm_vm_fops = {
|
||||
.release = kvm_vm_release,
|
||||
.unlocked_ioctl = kvm_vm_ioctl,
|
||||
.llseek = noop_llseek,
|
||||
@ -5721,8 +5735,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
|
||||
goto out_free_5;
|
||||
|
||||
kvm_chardev_ops.owner = module;
|
||||
kvm_vm_fops.owner = module;
|
||||
kvm_vcpu_fops.owner = module;
|
||||
|
||||
r = misc_register(&kvm_dev);
|
||||
if (r) {
|
||||
|
@ -27,7 +27,7 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
|
||||
{
|
||||
DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
|
||||
struct gfn_to_pfn_cache *gpc;
|
||||
bool wake_vcpus = false;
|
||||
bool evict_vcpus = false;
|
||||
|
||||
spin_lock(&kvm->gpc_lock);
|
||||
list_for_each_entry(gpc, &kvm->gpc_list, list) {
|
||||
@ -40,41 +40,32 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
|
||||
|
||||
/*
|
||||
* If a guest vCPU could be using the physical address,
|
||||
* it needs to be woken.
|
||||
* it needs to be forced out of guest mode.
|
||||
*/
|
||||
if (gpc->guest_uses_pa) {
|
||||
if (!wake_vcpus) {
|
||||
wake_vcpus = true;
|
||||
if (gpc->usage & KVM_GUEST_USES_PFN) {
|
||||
if (!evict_vcpus) {
|
||||
evict_vcpus = true;
|
||||
bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
|
||||
}
|
||||
__set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap);
|
||||
}
|
||||
|
||||
/*
|
||||
* We cannot call mark_page_dirty() from here because
|
||||
* this physical CPU might not have an active vCPU
|
||||
* with which to do the KVM dirty tracking.
|
||||
*
|
||||
* Neither is there any point in telling the kernel MM
|
||||
* that the underlying page is dirty. A vCPU in guest
|
||||
* mode might still be writing to it up to the point
|
||||
* where we wake them a few lines further down anyway.
|
||||
*
|
||||
* So all the dirty marking happens on the unmap.
|
||||
*/
|
||||
}
|
||||
write_unlock_irq(&gpc->lock);
|
||||
}
|
||||
spin_unlock(&kvm->gpc_lock);
|
||||
|
||||
if (wake_vcpus) {
|
||||
unsigned int req = KVM_REQ_GPC_INVALIDATE;
|
||||
if (evict_vcpus) {
|
||||
/*
|
||||
* KVM needs to ensure the vCPU is fully out of guest context
|
||||
* before allowing the invalidation to continue.
|
||||
*/
|
||||
unsigned int req = KVM_REQ_OUTSIDE_GUEST_MODE;
|
||||
bool called;
|
||||
|
||||
/*
|
||||
* If the OOM reaper is active, then all vCPUs should have
|
||||
* been stopped already, so perform the request without
|
||||
* KVM_REQUEST_WAIT and be sad if any needed to be woken.
|
||||
* KVM_REQUEST_WAIT and be sad if any needed to be IPI'd.
|
||||
*/
|
||||
if (!may_block)
|
||||
req &= ~KVM_REQUEST_WAIT;
|
||||
@ -104,8 +95,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);
|
||||
|
||||
static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva,
|
||||
gpa_t gpa, bool dirty)
|
||||
static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
|
||||
{
|
||||
/* Unmap the old page if it was mapped before, and release it */
|
||||
if (!is_error_noslot_pfn(pfn)) {
|
||||
@ -118,9 +108,7 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva,
|
||||
#endif
|
||||
}
|
||||
|
||||
kvm_release_pfn(pfn, dirty);
|
||||
if (dirty)
|
||||
mark_page_dirty(kvm, gpa);
|
||||
kvm_release_pfn(pfn, false);
|
||||
}
|
||||
}
|
||||
|
||||
@ -152,7 +140,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
|
||||
}
|
||||
|
||||
int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
gpa_t gpa, unsigned long len, bool dirty)
|
||||
gpa_t gpa, unsigned long len)
|
||||
{
|
||||
struct kvm_memslots *slots = kvm_memslots(kvm);
|
||||
unsigned long page_offset = gpa & ~PAGE_MASK;
|
||||
@ -160,7 +148,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
unsigned long old_uhva;
|
||||
gpa_t old_gpa;
|
||||
void *old_khva;
|
||||
bool old_valid, old_dirty;
|
||||
bool old_valid;
|
||||
int ret = 0;
|
||||
|
||||
/*
|
||||
@ -177,20 +165,19 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
old_khva = gpc->khva - offset_in_page(gpc->khva);
|
||||
old_uhva = gpc->uhva;
|
||||
old_valid = gpc->valid;
|
||||
old_dirty = gpc->dirty;
|
||||
|
||||
/* If the userspace HVA is invalid, refresh that first */
|
||||
if (gpc->gpa != gpa || gpc->generation != slots->generation ||
|
||||
kvm_is_error_hva(gpc->uhva)) {
|
||||
gfn_t gfn = gpa_to_gfn(gpa);
|
||||
|
||||
gpc->dirty = false;
|
||||
gpc->gpa = gpa;
|
||||
gpc->generation = slots->generation;
|
||||
gpc->memslot = __gfn_to_memslot(slots, gfn);
|
||||
gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
|
||||
|
||||
if (kvm_is_error_hva(gpc->uhva)) {
|
||||
gpc->pfn = KVM_PFN_ERR_FAULT;
|
||||
ret = -EFAULT;
|
||||
goto out;
|
||||
}
|
||||
@ -219,7 +206,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
goto map_done;
|
||||
}
|
||||
|
||||
if (gpc->kernel_map) {
|
||||
if (gpc->usage & KVM_HOST_USES_PFN) {
|
||||
if (new_pfn == old_pfn) {
|
||||
new_khva = old_khva;
|
||||
old_pfn = KVM_PFN_ERR_FAULT;
|
||||
@ -255,14 +242,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
}
|
||||
|
||||
out:
|
||||
if (ret)
|
||||
gpc->dirty = false;
|
||||
else
|
||||
gpc->dirty = dirty;
|
||||
|
||||
write_unlock_irq(&gpc->lock);
|
||||
|
||||
__release_gpc(kvm, old_pfn, old_khva, old_gpa, old_dirty);
|
||||
__release_gpc(kvm, old_pfn, old_khva, old_gpa);
|
||||
|
||||
return ret;
|
||||
}
|
||||
@ -272,7 +254,6 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
|
||||
{
|
||||
void *old_khva;
|
||||
kvm_pfn_t old_pfn;
|
||||
bool old_dirty;
|
||||
gpa_t old_gpa;
|
||||
|
||||
write_lock_irq(&gpc->lock);
|
||||
@ -280,7 +261,6 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
|
||||
gpc->valid = false;
|
||||
|
||||
old_khva = gpc->khva - offset_in_page(gpc->khva);
|
||||
old_dirty = gpc->dirty;
|
||||
old_gpa = gpc->gpa;
|
||||
old_pfn = gpc->pfn;
|
||||
|
||||
@ -293,16 +273,17 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
|
||||
|
||||
write_unlock_irq(&gpc->lock);
|
||||
|
||||
__release_gpc(kvm, old_pfn, old_khva, old_gpa, old_dirty);
|
||||
__release_gpc(kvm, old_pfn, old_khva, old_gpa);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);
|
||||
|
||||
|
||||
int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
struct kvm_vcpu *vcpu, bool guest_uses_pa,
|
||||
bool kernel_map, gpa_t gpa, unsigned long len,
|
||||
bool dirty)
|
||||
struct kvm_vcpu *vcpu, enum pfn_cache_usage usage,
|
||||
gpa_t gpa, unsigned long len)
|
||||
{
|
||||
WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage);
|
||||
|
||||
if (!gpc->active) {
|
||||
rwlock_init(&gpc->lock);
|
||||
|
||||
@ -310,8 +291,7 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
gpc->pfn = KVM_PFN_ERR_FAULT;
|
||||
gpc->uhva = KVM_HVA_ERR_BAD;
|
||||
gpc->vcpu = vcpu;
|
||||
gpc->kernel_map = kernel_map;
|
||||
gpc->guest_uses_pa = guest_uses_pa;
|
||||
gpc->usage = usage;
|
||||
gpc->valid = false;
|
||||
gpc->active = true;
|
||||
|
||||
@ -319,7 +299,7 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
|
||||
list_add(&gpc->list, &kvm->gpc_list);
|
||||
spin_unlock(&kvm->gpc_lock);
|
||||
}
|
||||
return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len, dirty);
|
||||
return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_init);
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user