Commit Graph

257 Commits

Author SHA1 Message Date
Marc Zyngier
8794b4f510 Merge branch kvm-arm64/per-vcpu-host-pmu-data into kvmarm-master/next
* kvm-arm64/per-vcpu-host-pmu-data:
  : .
  : Pass the host PMU state in the vcpu to avoid the use of additional
  : shared memory between EL1 and EL2 (this obviously only applies
  : to nVHE and Protected setups).
  :
  : Patches courtesy of Fuad Tabba.
  : .
  KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
  KVM: arm64: Reenable pmu in Protected Mode
  KVM: arm64: Pass pmu events to hyp via vcpu
  KVM: arm64: Repack struct kvm_pmu to reduce size
  KVM: arm64: Wrapper for getting pmu_events

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:36 +01:00
Marc Zyngier
ec2cff6cbd Merge branch kvm-arm64/vgic-invlpir into kvmarm-master/next
* kvm-arm64/vgic-invlpir:
  : .
  : Implement MMIO-based LPI invalidation for vGICv3.
  : .
  KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision
  KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation
  KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs
  irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES}

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:48:35 +01:00
Marc Zyngier
0586e28aaa Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
* kvm-arm64/hcall-selection:
  : .
  : Introduce a new set of virtual sysregs for userspace to
  : select the hypercalls it wants to see exposed to the guest.
  :
  : Patches courtesy of Raghavendra and Oliver.
  : .
  KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
  KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
  Documentation: Fix index.rst after psci.rst renaming
  selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
  selftests: KVM: aarch64: Introduce hypercall ABI test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  tools: Import ARM SMCCC definitions
  Docs: KVM: Add doc for the bitmap firmware registers
  Docs: KVM: Rename psci.rst to hypercalls.rst
  KVM: arm64: Add vendor hypervisor firmware register
  KVM: arm64: Add standard hypervisor firmware register
  KVM: arm64: Setup a framework for hypercall bitmap firmware registers
  KVM: arm64: Factor out firmware register handling from psci.c

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16 17:47:03 +01:00
Marc Zyngier
20492a62b9 KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
Moving kvm_pmu_events into the vcpu (and refering to it) broke the
somewhat unusual case where the kernel has no support for a PMU
at all.

In order to solve this, move things around a bit so that we can
easily avoid refering to the pmu structure outside of PMU-aware
code. As a bonus, pmu.c isn't compiled in when HW_PERF_EVENTS
isn't selected.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/202205161814.KQHpOzsJ-lkp@intel.com
2022-05-16 13:42:41 +01:00
Fuad Tabba
84d751a019 KVM: arm64: Pass pmu events to hyp via vcpu
Instead of the host accessing hyp data directly, pass the pmu
events of the current cpu to hyp via the vcpu.

This adds 64 bits (in two fields) to the vcpu that need to be
synced before every vcpu run in nvhe and protected modes.
However, it isolates the hypervisor from the host, which allows
us to use pmu in protected mode in a subsequent patch.

No visible side effects in behavior intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510095710.148178-4-tabba@google.com
2022-05-15 11:26:41 +01:00
Fuad Tabba
e987a4c60f KVM: arm64: Repack struct kvm_pmu to reduce size
struct kvm_pmu has 2 holes using 10 bytes. This is instantiated
in all vcpus, so it adds up. Repack the structures to remove the
holes.

No functional change intended.

Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220510095710.148178-3-tabba@google.com
2022-05-15 11:24:17 +01:00
Marc Zyngier
49a1a2c70a KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revision
Since adversising GICR_CTLR.{IC,CES} is directly observable from
a guest, we need to make it selectable from userspace.

For that, bump the default GICD_IIDR revision and let userspace
downgrade it to the previous default. For GICv2, the two distributor
revisions are strictly equivalent.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220405182327.205520-5-maz@kernel.org
2022-05-04 14:09:53 +01:00
Marc Zyngier
4645d11f4a KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation
Since GICv4.1, it has become legal for an implementation to advertise
GICR_{INVLPIR,INVALLR,SYNCR} while having an ITS, allowing for a more
efficient invalidation scheme (no guest command queue contention when
multiple CPUs are generating invalidations).

Provide the invalidation registers as a primitive to their ITS
counterpart. Note that we don't advertise them to the guest yet
(the architecture allows an implementation to do this).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220405182327.205520-4-maz@kernel.org
2022-05-04 14:09:53 +01:00
Marc Zyngier
94828468a6 KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs
When disabling LPIs, a guest needs to poll GICR_CTLR.RWP in order
to be sure that the write has taken effect. We so far reported it
as 0, as we didn't advertise that LPIs could be turned off the
first place.

Start tracking this state during which LPIs are being disabled,
and expose the 'in progress' state via the RWP bit.

We also take this opportunity to disallow enabling LPIs and programming
GICR_{PEND,PROP}BASER while LPI disabling is in progress, as allowed by
the architecture (UNPRED behaviour).

We don't advertise the feature to the guest yet (which is allowed by
the architecture).

Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220405182327.205520-3-maz@kernel.org
2022-05-04 14:09:53 +01:00
Raghavendra Rao Ananta
05714cab7d KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.

As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.

These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.

During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.

Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.

Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.

In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: reduced the scope of some helpers, tidy-up bitmap max values,
 dropped error-only fast path]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
2022-05-03 21:30:19 +01:00
Raghavendra Rao Ananta
85fbe08e4d KVM: arm64: Factor out firmware register handling from psci.c
Common hypercall firmware register handing is currently employed
by psci.c. Since the upcoming patches add more of these registers,
it's better to move the generic handling to hypercall.c for a
cleaner presentation.

While we are at it, collect all the firmware registers under
fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
kvm_arm_copy_fw_reg_indices() in a generic way. Also, define
KVM_REG_FEATURE_LEVEL_MASK using a GENMASK instead.

No functional change intended.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: fixed KVM_REG_FEATURE_LEVEL_MASK]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-2-rananta@google.com
2022-05-03 14:48:54 +01:00
Marc Zyngier
b57de4ffd7 KVM: arm64: Simplify kvm_cpu_has_pending_timer()
kvm_cpu_has_pending_timer() ends up checking all the possible
timers for a wake-up cause. However, we already check for
pending interrupts whenever we try to wake-up a vcpu, including
the timer interrupts.

Obviously, doing the same work twice is once too many. Reduce
this helper to almost nothing, but keep it around, as we are
going to make use of it soon.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220419182755.601427-4-maz@kernel.org
2022-04-20 13:24:44 +01:00
Marc Zyngier
1a48ce9264 Merge branch kvm-arm64/psci-1.1 into kvmarm-master/next
* kvm-arm64/psci-1.1:
  : .
  : Limited PSCI-1.1 support from Will Deacon:
  :
  : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which
  : allows the propagation of a "reset_type" and a "cookie" back to the VMM.
  : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"),
  : the vendor-defined range can be used by a bootloader to provide additional
  : information about the reset, such as an error code.
  : .
  KVM: arm64: Remove unneeded semicolons
  KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
  KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
  KVM: arm64: Bump guest PSCI version to 1.1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-25 13:49:48 +00:00
Will Deacon
512865d83f KVM: arm64: Bump guest PSCI version to 1.1
Expose PSCI version v1.1 to the guest by default. The only difference
for now is that an updated version number is reported by PSCI_VERSION.

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-2-will@kernel.org
2022-02-21 16:02:55 +00:00
Marc Zyngier
00e6dae00e Merge branch kvm-arm64/pmu-bl into kvmarm-master/next
* kvm-arm64/pmu-bl:
  : .
  : Improve PMU support on heterogeneous systems, courtesy of Alexandru Elisei
  : .
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Keep a per-VM pointer to the default PMU
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Do not change the PMU event filter after a VCPU has run

Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-08 17:54:41 +00:00
Alexandru Elisei
db858060b1 KVM: arm64: Keep a list of probed PMUs
The ARM PMU driver calls kvm_host_pmu_init() after probing to tell KVM that
a hardware PMU is available for guest emulation. Heterogeneous systems can
have more than one PMU present, and the callback gets called multiple
times, once for each of them. Keep track of all the PMUs available to KVM,
as they're going to be needed later.

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-5-alexandru.elisei@arm.com
2022-02-08 17:51:21 +00:00
Oliver Upton
dfefa04a90 KVM: arm64: Drop unused param from kvm_psci_version()
kvm_psci_version() consumes a pointer to struct kvm in addition to a
vcpu pointer. Drop the kvm pointer as it is unused. While the comment
suggests the explicit kvm pointer was useful for calling from hyp, there
exist no such callsite in hyp.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208012705.640444-1-oupton@google.com
2022-02-08 15:25:34 +00:00
Linus Torvalds
79e06c4c49 RISCV:
- Use common KVM implementation of MMU memory caches
 
 - SBI v0.2 support for Guest
 
 - Initial KVM selftests support
 
 - Fix to avoid spurious virtual interrupts after clearing hideleg CSR
 
 - Update email address for Anup and Atish
 
 ARM:
 - Simplification of the 'vcpu first run' by integrating it into
   KVM's 'pid change' flow
 
 - Refactoring of the FP and SVE state tracking, also leading to
   a simpler state and less shared data between EL1 and EL2 in
   the nVHE case
 
 - Tidy up the header file usage for the nvhe hyp object
 
 - New HYP unsharing mechanism, finally allowing pages to be
   unmapped from the Stage-1 EL2 page-tables
 
 - Various pKVM cleanups around refcounting and sharing
 
 - A couple of vgic fixes for bugs that would trigger once
   the vcpu xarray rework is merged, but not sooner
 
 - Add minimal support for ARMv8.7's PMU extension
 
 - Rework kvm_pgtable initialisation ahead of the NV work
 
 - New selftest for IRQ injection
 
 - Teach selftests about the lack of default IPA space and
   page sizes
 
 - Expand sysreg selftest to deal with Pointer Authentication
 
 - The usual bunch of cleanups and doc update
 
 s390:
 - fix sigp sense/start/stop/inconsistency
 
 - cleanups
 
 x86:
 - Clean up some function prototypes more
 
 - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation
 
 - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery
 
 - completely remove potential TOC/TOU races in nested SVM consistency checks
 
 - update some PMCs on emulated instructions
 
 - Intel AMX support (joint work between Thomas and Intel)
 
 - large MMU cleanups
 
 - module parameter to disable PMU virtualization
 
 - cleanup register cache
 
 - first part of halt handling cleanups
 
 - Hyper-V enlightened MSR bitmap support for nested hypervisors
 
 Generic:
 - clean up Makefiles
 
 - introduce CONFIG_HAVE_KVM_DIRTY_RING
 
 - optimize memslot lookup using a tree
 
 - optimize vCPU array usage by converting to xarray
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHhxvsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPZkAf+Nz92UL/5nNGcdHtE4m7AToMmitE9
 bYkesf9BMQvAe5wjkABLuoHGi6ay4jabo4fiGzbdkiK7lO5YgfsWiMB3/MT5fl4E
 jRPzaVQabp3YZLM8UYCBmfUVuRj524S967SfSRe0AvYjDEH8y7klPf4+7sCsFT0/
 Px9Vf2KGuOlf0eM78yKg4rGaF0jS22eLgXm6FfNMY8/e29ZAo/jyUmqBY+Z2xxZG
 aWhceDtSheW1jwLHLj3nOlQJvHTn8LVGXBE/R8Gda3ZjrBV2rKaDi4Fh+HD+dz86
 2zVXwzQ7uck2CMW73GMoXMTWoKSHMyvlBOs1BdvBm4UsnGcXR+q8IFCeuQ==
 =s73m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "RISCV:

   - Use common KVM implementation of MMU memory caches

   - SBI v0.2 support for Guest

   - Initial KVM selftests support

   - Fix to avoid spurious virtual interrupts after clearing hideleg CSR

   - Update email address for Anup and Atish

  ARM:

   - Simplification of the 'vcpu first run' by integrating it into KVM's
     'pid change' flow

   - Refactoring of the FP and SVE state tracking, also leading to a
     simpler state and less shared data between EL1 and EL2 in the nVHE
     case

   - Tidy up the header file usage for the nvhe hyp object

   - New HYP unsharing mechanism, finally allowing pages to be unmapped
     from the Stage-1 EL2 page-tables

   - Various pKVM cleanups around refcounting and sharing

   - A couple of vgic fixes for bugs that would trigger once the vcpu
     xarray rework is merged, but not sooner

   - Add minimal support for ARMv8.7's PMU extension

   - Rework kvm_pgtable initialisation ahead of the NV work

   - New selftest for IRQ injection

   - Teach selftests about the lack of default IPA space and page sizes

   - Expand sysreg selftest to deal with Pointer Authentication

   - The usual bunch of cleanups and doc update

  s390:

   - fix sigp sense/start/stop/inconsistency

   - cleanups

  x86:

   - Clean up some function prototypes more

   - improved gfn_to_pfn_cache with proper invalidation, used by Xen
     emulation

   - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery

   - completely remove potential TOC/TOU races in nested SVM consistency
     checks

   - update some PMCs on emulated instructions

   - Intel AMX support (joint work between Thomas and Intel)

   - large MMU cleanups

   - module parameter to disable PMU virtualization

   - cleanup register cache

   - first part of halt handling cleanups

   - Hyper-V enlightened MSR bitmap support for nested hypervisors

  Generic:

   - clean up Makefiles

   - introduce CONFIG_HAVE_KVM_DIRTY_RING

   - optimize memslot lookup using a tree

   - optimize vCPU array usage by converting to xarray"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits)
  x86/fpu: Fix inline prefix warnings
  selftest: kvm: Add amx selftest
  selftest: kvm: Move struct kvm_x86_state to header
  selftest: kvm: Reorder vcpu_load_state steps for AMX
  kvm: x86: Disable interception for IA32_XFD on demand
  x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
  kvm: selftests: Add support for KVM_CAP_XSAVE2
  kvm: x86: Add support for getting/setting expanded xstate buffer
  x86/fpu: Add uabi_size to guest_fpu
  kvm: x86: Add CPUID support for Intel AMX
  kvm: x86: Add XCR0 support for Intel AMX
  kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
  kvm: x86: Emulate IA32_XFD_ERR for guest
  kvm: x86: Intercept #NM for saving IA32_XFD_ERR
  x86/fpu: Prepare xfd_err in struct fpu_guest
  kvm: x86: Add emulation for IA32_XFD
  x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
  kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
  x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
  x86/fpu: Add guest support to xfd_enable_feature()
  ...
2022-01-16 16:15:14 +02:00
Andy Shevchenko
6c9eeb5f4a KVM: arm64: vgic: Replace kernel.h with the necessary inclusions
arm_vgic.h does not require all the stuff that kernel.h provides.
Replace kernel.h inclusion with the list of what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220104151940.55399-1-andriy.shevchenko@linux.intel.com
2022-01-04 17:11:47 +00:00
Sean Christopherson
be399d824b KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y
Move the definition of kvm_arm_pmu_available to pmu-emul.c and, out of
"necessity", hide it behind CONFIG_HW_PERF_EVENTS.  Provide a stub for
the key's wrapper, kvm_arm_support_pmu_v3().  Moving the key's definition
out of perf.c will allow a future commit to delete perf.c entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-16-seanjc@google.com
2021-11-17 14:49:11 +01:00
Marc Zyngier
e840f42a49 KVM: arm64: Fix PMU probe ordering
Russell reported that since 5.13, KVM's probing of the PMU has
started to fail on his HW. As it turns out, there is an implicit
ordering dependency between the architectural PMU probing code and
and KVM's own probing. If, due to probe ordering reasons, KVM probes
before the PMU driver, it will fail to detect the PMU and prevent it
from being advertised to guests as well as the VMM.

Obviously, this is one probing too many, and we should be able to
deal with any ordering.

Add a callback from the PMU code into KVM to advertise the registration
of a host CPU PMU, allowing for any probing order.

Fixes: 5421db1be3 ("KVM: arm64: Divorce the perf code from oprofile helpers")
Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk
Cc: stable@vger.kernel.org
2021-09-20 12:43:34 +01:00
Marc Zyngier
354920e794 KVM: arm64: vgic: Implement SW-driven deactivation
In order to deal with these systems that do not offer HW-based
deactivation of interrupts, let implement a SW-based approach:

- When the irq is queued into a LR, treat it as a pure virtual
  interrupt and set the EOI flag in the LR.

- When the interrupt state is read back from the LR, force a
  deactivation when the state is invalid (neither active nor
  pending)

Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:46:00 +01:00
Marc Zyngier
db75f1a33f KVM: arm64: vgic: move irq->get_input_level into an ops structure
We already have the option to attach a callback to an interrupt
to retrieve its pending state. As we are planning to expand this
facility, move this callback into its own data structure.

This will limit the size of individual interrupts as the ops
structures can be shared across multiple interrupts.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:59 +01:00
Marc Zyngier
f6c3e24fb7 KVM: arm64: vgic: Let an interrupt controller advertise lack of HW deactivation
The vGIC, as architected by ARM, allows a virtual interrupt to
trigger the deactivation of a physical interrupt. This allows
the following interrupt to be delivered without requiring an exit.

However, some implementations have choosen not to implement this,
meaning that we will need some unsavoury workarounds to deal with this.

On detecting such a case, taint the kernel and spit a nastygram.
We'll deal with this in later patches.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:59 +01:00
Marc Zyngier
9a8aae605b Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-22 13:41:49 +01:00
Marc Zyngier
5421db1be3 KVM: arm64: Divorce the perf code from oprofile helpers
KVM/arm64 is the sole user of perf_num_counters(), and really
could do without it. Stop using the obsolete API by relying on
the existing probing code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210414134409.1266357-2-maz@kernel.org
2021-04-22 13:32:39 +01:00
Eric Auger
28e9d4bce3 KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
Commit 23bde34771 ("KVM: arm64: vgic-v3: Drop the
reporting of GICR_TYPER.Last for userspace") temporarily fixed
a bug identified when attempting to access the GICR_TYPER
register before the redistributor region setting, but dropped
the support of the LAST bit.

Emulating the GICR_TYPER.Last bit still makes sense for
architecture compliance though. This patch restores its support
(if the redistributor region was set) while keeping the code safe.

We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
computes whether a redistributor is the highest one of a series
of redistributor contributor pages.

With this new implementation we do not need to have a uaccess
read accessor anymore.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
2021-04-06 14:51:38 +01:00
Marc Zyngier
6b5b368fcc KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
We currently find out about the presence of a HW PMU (or the handling
of that PMU by perf, which amounts to the same thing) in a fairly
roundabout way, by checking the number of counters available to perf.
That's good enough for now, but we will soon need to find about about
that on paths where perf is out of reach (in the world switch).

Instead, let's turn kvm_arm_support_pmu_v3() into a static key.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210209114844.3278746-2-maz@kernel.org
Message-Id: <20210305185254.3730990-5-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:40 -05:00
Marc Zyngier
8cbebc4118 KVM: arm64: Replace KVM_ARM_PMU with HW_PERF_EVENTS
KVM_ARM_PMU only existed for the benefit of 32bit ARM hosts,
and makes no sense now that we are 64bit only. Get rid of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-01-04 16:50:16 +00:00
Marc Zyngier
17f84520cb Merge remote-tracking branch 'origin/kvm-arm64/misc-5.11' into kvmarm-master/queue
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-12-04 10:12:55 +00:00
Shenming Lu
57e3cebd02 KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit
In order to reduce the impact of the VPT parsing happening on the GIC,
we can split the vcpu reseidency in two phases:

- programming GICR_VPENDBASER: this still happens in vcpu_load()
- checking for the VPT parsing to be complete: this can happen
  on vcpu entry (in kvm_vgic_flush_hwstate())

This allows the GIC and the CPU to work in parallel, rewmoving some
of the entry overhead.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201128141857.983-3-lushenming@huawei.com
2020-11-30 11:18:29 +00:00
Marc Zyngier
7521c3a9e6 KVM: arm64: Get rid of the PMU ready state
The PMU ready state has no user left. Goodbye.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-27 11:41:24 +00:00
Linus Torvalds
f9a705ad1c ARM:
- New page table code for both hypervisor and guest stage-2
 - Introduction of a new EL2-private host context
 - Allow EL2 to have its own private per-CPU variables
 - Support of PMU event filtering
 - Complete rework of the Spectre mitigation
 
 PPC:
 - Fix for running nested guests with in-kernel IRQ chip
 - Fix race condition causing occasional host hard lockup
 - Minor cleanups and bugfixes
 
 x86:
 - allow trapping unknown MSRs to userspace
 - allow userspace to force #GP on specific MSRs
 - INVPCID support on AMD
 - nested AMD cleanup, on demand allocation of nested SVM state
 - hide PV MSRs and hypercalls for features not enabled in CPUID
 - new test for MSR_IA32_TSC writes from host and guest
 - cleanups: MMU, CPUID, shared MSRs
 - LAPIC latency optimizations ad bugfixes
 
 For x86, also included in this pull request is a new alternative and
 (in the future) more scalable implementation of extended page tables
 that does not need a reverse map from guest physical addresses to
 host physical addresses.  For now it is disabled by default because
 it is still lacking a few of the existing MMU's bells and whistles.
 However it is a very solid piece of work and it is already available
 for people to hammer on it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+S8dsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM40Af+M46NJmuS5rcwFfybvK/c42KT6svX
 Co1NrZDwzSQ2mMy3WQzH9qeLvb+nbY4sT3n5BPNPNsT+aIDPOTDt//qJ2/Ip9UUs
 tRNea0MAR96JWLE7MSeeRxnTaQIrw/AAZC0RXFzZvxcgytXwdqBExugw4im+b+dn
 Dcz8QxX1EkwT+4lTm5HC0hKZAuo4apnK1QkqCq4SdD2QVJ1YE6+z7pgj4wX7xitr
 STKD6q/Yt/0ndwqS0GSGbyg0jy6mE620SN6isFRkJYwqfwLJci6KnqvEK67EcNMu
 qeE017K+d93yIVC46/6TfVHzLR/D1FpQ8LZ16Yl6S13OuGIfAWBkQZtPRg==
 =AD6a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "For x86, there is a new alternative and (in the future) more scalable
  implementation of extended page tables that does not need a reverse
  map from guest physical addresses to host physical addresses.

  For now it is disabled by default because it is still lacking a few of
  the existing MMU's bells and whistles. However it is a very solid
  piece of work and it is already available for people to hammer on it.

  Other updates:

  ARM:
   - New page table code for both hypervisor and guest stage-2
   - Introduction of a new EL2-private host context
   - Allow EL2 to have its own private per-CPU variables
   - Support of PMU event filtering
   - Complete rework of the Spectre mitigation

  PPC:
   - Fix for running nested guests with in-kernel IRQ chip
   - Fix race condition causing occasional host hard lockup
   - Minor cleanups and bugfixes

  x86:
   - allow trapping unknown MSRs to userspace
   - allow userspace to force #GP on specific MSRs
   - INVPCID support on AMD
   - nested AMD cleanup, on demand allocation of nested SVM state
   - hide PV MSRs and hypercalls for features not enabled in CPUID
   - new test for MSR_IA32_TSC writes from host and guest
   - cleanups: MMU, CPUID, shared MSRs
   - LAPIC latency optimizations ad bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
  kvm: x86/mmu: NX largepage recovery for TDP MMU
  kvm: x86/mmu: Don't clear write flooding count for direct roots
  kvm: x86/mmu: Support MMIO in the TDP MMU
  kvm: x86/mmu: Support write protection for nesting in tdp MMU
  kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
  kvm: x86/mmu: Support dirty logging for the TDP MMU
  kvm: x86/mmu: Support changed pte notifier in tdp MMU
  kvm: x86/mmu: Add access tracking for tdp_mmu
  kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
  kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
  kvm: x86/mmu: Add TDP MMU PF handler
  kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
  kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
  KVM: Cache as_id in kvm_memory_slot
  kvm: x86/mmu: Add functions to handle changed TDP SPTEs
  kvm: x86/mmu: Allocate and free TDP MMU roots
  kvm: x86/mmu: Init / Uninit the TDP MMU
  kvm: x86/mmu: Introduce tdp_iter
  KVM: mmu: extract spte.h and spte.c
  KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
  ...
2020-10-23 11:17:56 -07:00
Marc Zyngier
88865beca9 KVM: arm64: Mask out filtered events in PCMEID{0,1}_EL1
As we can now hide events from the guest, let's also adjust its view of
PCMEID{0,1}_EL1 so that it can figure out why some common events are not
counting as they should.

The astute user can still look into the TRM for their CPU and find out
they've been cheated, though. Nobody's perfect.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-29 14:19:39 +01:00
Julien Thierry
95e92e45a4 KVM: arm64: pmu: Make overflow handler NMI safe
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from
NMI context, defer waking the vcpu to an irq_work queue.

A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent
running the irq_work for a non-existent vcpu by calling irq_work_sync() on
the PMU destroy path.

[Alexandru E.: Added irq_work_sync()]

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28 19:00:17 +01:00
Marc Zyngier
41ce82f63c KVM: arm64: timers: Move timer registers to the sys_regs file
Move the timer gsisters to the sysreg file. This will further help when
they are directly changed by a nesting hypervisor in the VNCR page.

This requires moving the initialisation of the timer struct so that some
of the helpers (such as arch_timer_ctx_index) can work correctly at an
early stage.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Marc Zyngier
3c5ff0c60f KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user
kvm_timer_sync_hwstate() has nothing to do with the timer HW state,
but more to do with the state of a userspace interrupt controller.
Change the suffix from _hwstate to_user, in keeping with the rest
of the code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-07 09:28:38 +01:00
Christoffer Dall
fc5d1f1a42 KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu
If we move the used_lrs field to the version-specific cpu interface
structure, the following functions only operate on the struct
vgic_v3_cpu_if and not the full vcpu:

  __vgic_v3_save_state
  __vgic_v3_restore_state
  __vgic_v3_activate_traps
  __vgic_v3_deactivate_traps
  __vgic_v3_save_aprs
  __vgic_v3_restore_aprs

This is going to be very useful for nested virt, so move the used_lrs
field and change the prototypes and implementations of these functions to
take the cpu_if parameter directly.

No functional change.

Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-05-28 11:57:10 +01:00
Marc Zyngier
bacf2c6054 KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interrupts
In order to let a guest buy in the new, active-less SGIs, we
need to be able to switch between the two modes.

Handle this by stopping all guest activity, transfer the state
from one mode to the other, and resume the guest. Nothing calls
this code so far, but a later patch will plug it into the MMIO
emulation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-20-maz@kernel.org
2020-03-24 12:15:51 +00:00
Marc Zyngier
ae699ad348 irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
In order to hide some of the differences between v4.0 and v4.1, move
the doorbell management out of the KVM code, and into the GICv4-specific
layer. This allows the calling code to ask for the doorbell when blocking,
and otherwise to leave the doorbell permanently disabled.

This matches the v4.1 code perfectly, and only results in a minor
refactoring of the v4.0 code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org
2020-03-24 12:15:51 +00:00
Marc Zyngier
cd7056ae34 Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/next 2019-11-08 11:27:29 +00:00
Zenghui Yu
bad36e4e8c KVM: arm/arm64: vgic: Fix some comments typo
Fix various comments, including wrong function names, grammar mistakes
and specification references.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com
2019-10-29 13:47:32 +00:00
Zenghui Yu
9ff624cdbf KVM: arm/arm64: vgic: Remove the declaration of kvm_send_userspace_msi()
The callsite of kvm_send_userspace_msi() is currently arch agnostic.
There seems no reason to keep an extra declaration of it in arm_vgic.h
(we already have one in include/linux/kvm_host.h).

Remove it.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20191029071919.177-2-yuzenghui@huawei.com
2019-10-29 13:45:46 +00:00
Marc Zyngier
8e01d9a396 KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put
When the VHE code was reworked, a lot of the vgic stuff was moved around,
but the GICv4 residency code did stay untouched, meaning that we come
in and out of residency on each flush/sync, which is obviously suboptimal.

To address this, let's move things around a bit:

- Residency entry (flush) moves to vcpu_load
- Residency exit (sync) moves to vcpu_put
- On blocking (entry to WFI), we "put"
- On unblocking (exit from WFI), we "load"

Because these can nest (load/block/put/load/unblock/put, for example),
we now have per-VPE tracking of the residency state.

Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
gets set to true when blocking because of a WFI. This allows a finer
control of the doorbell, which now also gets disabled as soon as
it gets signaled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-28 16:20:58 +00:00
Christoffer Dall
55009c6ed2 KVM: arm/arm64: Factor out hypercall handling from PSCI code
We currently intertwine the KVM PSCI implementation with the general
dispatch of hypercall handling, which makes perfect sense because PSCI
is the only category of hypercalls we support.

However, as we are about to support additional hypercalls, factor out
this functionality into a separate hypercall handler file.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[steven.price@arm.com: rebased]
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21 19:20:26 +01:00
Eric Auger
3109741a8d KVM: arm/arm64: vgic: Use a single IO device per redistributor
At the moment we use 2 IO devices per GICv3 redistributor: one
one for the RD_base frame and one for the SGI_base frame.

Instead we can use a single IO device per redistributor (the 2
frames are contiguous). This saves slots on the KVM_MMIO_BUS
which is currently limited to NR_IOBUS_DEVS (1000).

This change allows to instantiate up to 512 redistributors and may
speed the guest boot with a large number of VCPUs.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-25 11:02:52 +01:00
Marc Zyngier
24cab82c34 KVM: arm/arm64: vgic: Add LPI translation cache definition
Add the basic data structure that expresses an MSI to LPI
translation as well as the allocation/release hooks.

The size of the cache is arbitrarily defined as 16*nr_vcpus.

Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18 18:38:35 +01:00
Marc Zyngier
5eeaf10eec KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e566479 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.

There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().

Things go really south if the guest does the following:

	mov x0, #0	// or any small value masking interrupts
	msr ICC_PMR_EL1, x0

	[vcpu preempted, then rescheduled, VMCR sampled]

	mov x0, #ff	// allow all interrupts
	msr ICC_PMR_EL1, x0
	wfi		// traps to EL2, so samping of VMCR

	[interrupt arrives just after WFI]

Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.

To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.

This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.

Cc: stable@vger.kernel.org # 4.12
Fixes: 328e566479 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05 15:36:46 +01:00
Zenghui Yu
bca031e2c8 KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
We use "pmc->idx" and the "chained" bitmap to determine if the pmc is
chained, in kvm_pmu_pmc_is_chained().  But idx might be uninitialized
(and random) when we doing this decision, through a KVM_ARM_VCPU_INIT
ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random
idx will potentially hit a KASAN BUG [1].

In general, idx is the static property of a PMU counter that is not
expected to be modified across resets, as suggested by Julien.  It
looks more reasonable if we can setup the PMU counter idx for a vcpu
in its creation time. Introduce a new function - kvm_pmu_vcpu_init()
for this basic setup. Oh, and the KASAN BUG will get fixed this way.

[1] https://www.spinics.net/lists/kvm-arm/msg36700.html

Fixes: 80f393a23b ("KVM: arm/arm64: Support chained PMU counters")
Suggested-by: Andrew Murray <andrew.murray@arm.com>
Suggested-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-23 09:23:44 +01:00
Andrew Murray
80f393a23b KVM: arm/arm64: Support chained PMU counters
ARMv8 provides support for chained PMU counters, where an event type
of 0x001E is set for odd-numbered counters, the event counter will
increment by one for each overflow of the preceding even-numbered
counter. Let's emulate this in KVM by creating a 64 bit perf counter
when a user chains two emulated counters together.

For chained events we only support generating an overflow interrupt
on the high counter. We use the attributes of the low counter to
determine the attributes of the perf event.

Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-07-05 13:56:22 +01:00