Commit Graph

1043945 Commits

Author SHA1 Message Date
Fuad Tabba
16dd1fbb12 KVM: arm64: Simplify masking out MTE in feature id reg
Simplify code for hiding MTE support in feature id register when
MTE is not enabled/supported by KVM.

No functional change intended.

Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-7-tabba@google.com
2021-10-11 14:57:29 +01:00
Fuad Tabba
5386839077 KVM: arm64: Add missing field descriptor for MDCR_EL2
It's not currently used. Added for completeness.

No functional change intended.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-6-tabba@google.com
2021-10-11 14:57:28 +01:00
Fuad Tabba
3b1a690eda KVM: arm64: Pass struct kvm to per-EC handlers
We need struct kvm to check for protected VMs to be able to pick
the right handlers for them in subsequent patches.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010145636.1950948-5-tabba@google.com
2021-10-11 14:57:28 +01:00
Marc Zyngier
8fb2046180 KVM: arm64: Move early handlers to per-EC handlers
Simplify the early exception handling by slicing the gigantic decoding
tree into a more manageable set of functions, similar to what we have
in handle_exit.c.

This will also make the structure reusable for pKVM's own early exit
handling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-4-tabba@google.com
2021-10-11 14:57:28 +01:00
Marc Zyngier
cc1e6fdfa9 KVM: arm64: Don't include switch.h into nvhe/kvm-main.c
hyp-main.c includes switch.h while it only requires adjust-pc.h.
Fix it to remove an unnecessary dependency.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-3-tabba@google.com
2021-10-11 14:57:27 +01:00
Marc Zyngier
7dd9b5a157 KVM: arm64: Move __get_fault_info() and co into their own include file
In order to avoid including the whole of the switching helpers
in unrelated files, move the __get_fault_info() and related helpers
into their own include file.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211010145636.1950948-2-tabba@google.com
2021-10-11 14:57:27 +01:00
Marc Zyngier
1eb07f4b68 Merge branch kvm-arm64/raz-sysregs into kvmarm-master/next
* kvm-arm64/raz-sysregs:
  : .
  : Simplify the handling of RAZ register, removing pointless indirections.
  : .
  KVM: arm64: Replace get_raz_id_reg() with get_raz_reg()
  KVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0
  KVM: arm64: Return early from read_id_reg() if register is RAZ

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-11 14:15:53 +01:00
Alexandru Elisei
ebf6aa8c04 KVM: arm64: Replace get_raz_id_reg() with get_raz_reg()
Reading a RAZ ID register isn't different from reading any other RAZ
register, so get rid of get_raz_id_reg() and replace it with get_raz_reg(),
which does the same thing, but does it without going through two layers of
indirection.

No functional change.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-4-alexandru.elisei@arm.com
2021-10-11 14:13:59 +01:00
Alexandru Elisei
5a43097623 KVM: arm64: Use get_raz_reg() for userspace reads of PMSWINC_EL0
PMSWINC_EL0 is a write-only register and was initially part of the VCPU
register state, but was later removed in commit 7a3ba3095a ("KVM:
arm64: Remove PMSWINC_EL0 shadow register"). To prevent regressions, the
register was kept accessible from userspace as Read-As-Zero (RAZ).

The read function that is used to handle userspace reads of this
register is get_raz_id_reg(), which, while technically correct, as it
returns 0, it is not semantically correct, as PMSWINC_EL0 is not an ID
register as the function name suggests.

Add a new function, get_raz_reg(), to use it as the accessor for
PMSWINC_EL0, as to not conflate get_raz_id_reg() to handle other types
of registers.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-3-alexandru.elisei@arm.com
2021-10-11 14:13:59 +01:00
Alexandru Elisei
00d5101b25 KVM: arm64: Return early from read_id_reg() if register is RAZ
If read_id_reg() is called for an ID register which is Read-As-Zero (RAZ),
it initializes the return value to zero, then goes through a list of
registers which require special handling before returning the final value.

By not returning as soon as it checks that the register should be RAZ, the
function creates the opportunity for bugs, if, for example, a patch changes
a register to RAZ (like has happened with PMSWINC_EL0 in commit
11663111cd), but doesn't remove the special handling from read_id_reg();
or if a register is RAZ in certain situations, but readable in others.

Return early to make it impossible for a RAZ register to be anything other
than zero.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211011105840.155815-2-alexandru.elisei@arm.com
2021-10-11 14:13:58 +01:00
Marc Zyngier
a049cf7e63 Merge branch kvm-arm64/misc-5.16 into kvmarm-master/next
* kvm-arm64/misc-5.16:
  : .
  : - Allow KVM to be disabled from the command-line
  : - Clean up CONFIG_KVM vs CONFIG_HAVE_KVM
  : .
  KVM: arm64: Depend on HAVE_KVM instead of OF
  KVM: arm64: Unconditionally include generic KVM's Kconfig
  KVM: arm64: Allow KVM to be disabled from the command line

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-11 10:14:38 +01:00
Sean Christopherson
e26bb75aa2 KVM: arm64: Depend on HAVE_KVM instead of OF
Select HAVE_KVM at all times on arm64, as the OF requirement is
always there (even in the case of an ACPI system, we still depend
on some of the OF infrastructure), and won't fo away.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Will Deacon <will@kernel.org>
[maz: Drop the "HAVE_KVM if OF" dependency, as OF is always there on arm64,
 new commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921222231.518092-3-seanjc@google.com
2021-10-11 10:08:50 +01:00
Sean Christopherson
c8f1e96734 KVM: arm64: Unconditionally include generic KVM's Kconfig
Unconditionally "source" the generic KVM Kconfig instead of wrapping it
with KVM=y.  A future patch will select HAVE_KVM so that referencing
HAVE_KVM in common kernel code doesn't break, and because KVM=y and
HAVE_KVM=n is weird.  Source the generic KVM Kconfig unconditionally so
that HAVE_KVM and KVM don't end up with a circular dependency.

Note, all but one of generic KVM's "configs" are of the HAVE_XYZ nature,
and the one outlier correctly takes a dependency on CONFIG_KVM, i.e. the
generic Kconfig is intended to be included unconditionally.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
[maz: made NVHE_EL2_DEBUG depend on KVM]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210921222231.518092-2-seanjc@google.com
2021-10-11 10:04:59 +01:00
Marc Zyngier
b6a68b97af KVM: arm64: Allow KVM to be disabled from the command line
Although KVM can be compiled out of the kernel, it cannot be disabled
at runtime. Allow this possibility by introducing a new mode that
will prevent KVM from initialising.

This is useful in the (limited) circumstances where you don't want
KVM to be available (what is wrong with you?), or when you want
to install another hypervisor instead (good luck with that).

Reviewed-by: David Brazdil <dbrazdil@google.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Scull <ascull@google.com>
Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org
2021-10-11 09:48:47 +01:00
Marc Zyngier
15f9017c28 Merge branch kvm-arm64/vgic-ipa-checks into kvmarm-master/next
* kvm-arm64/vgic-ipa-checks:
  : .
  : Add extra checks to prevent ther various GIC regions to land
  : outside of the IPA space (and tests to verify that it works).
  : .
  KVM: arm64: selftests: Add init ITS device test
  KVM: arm64: selftests: Add test for legacy GICv3 REDIST base partially above IPA range
  KVM: arm64: selftests: Add tests for GIC redist/cpuif partially above IPA range
  KVM: arm64: selftests: Add some tests for GICv2 in vgic_init
  KVM: arm64: selftests: Make vgic_init/vm_gic_create version agnostic
  KVM: arm64: selftests: Make vgic_init gic version agnostic
  KVM: arm64: vgic: Drop vgic_check_ioaddr()
  KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size
  KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size
  KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size
  kvm: arm64: vgic: Introduce vgic_check_iorange

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-11 09:43:06 +01:00
Ricardo Koller
3e197f17b2 KVM: arm64: selftests: Add init ITS device test
Add some ITS device init tests: general KVM device tests (address not
defined already, address aligned) and tests for the ITS region being
within the addressable IPA range.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-12-ricarkol@google.com
2021-10-11 09:31:43 +01:00
Ricardo Koller
1883458638 KVM: arm64: selftests: Add test for legacy GICv3 REDIST base partially above IPA range
Add a new test into vgic_init which checks that the first vcpu fails to
run if there is not sufficient REDIST space below the addressable IPA
range.  This only applies to the KVM_VGIC_V3_ADDR_TYPE_REDIST legacy API
as the required REDIST space is not know when setting the DIST region.

Note that using the REDIST_REGION API results in a different check at
first vcpu run: that the number of redist regions is enough for all
vcpus. And there is already a test for that case in, the first step of
test_v3_new_redist_regions.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-11-ricarkol@google.com
2021-10-11 09:31:43 +01:00
Ricardo Koller
2dcd9aa1c3 KVM: arm64: selftests: Add tests for GIC redist/cpuif partially above IPA range
Add tests for checking that KVM returns the right error when trying to
set GICv2 CPU interfaces or GICv3 Redistributors partially above the
addressable IPA range. Also tighten the IPA range by replacing
KVM_CAP_ARM_VM_IPA_SIZE with the IPA range currently configured for the
guest (i.e., the default).

The check for the GICv3 redistributor created using the REDIST legacy
API is not sufficient as this new test only checks the check done using
vcpus already created when setting the base. The next commit will add
the missing test which verifies that the KVM check is done at first vcpu
run.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-10-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
c44df5f9ff KVM: arm64: selftests: Add some tests for GICv2 in vgic_init
Add some GICv2 tests: general KVM device tests and DIST/CPUIF overlap
tests.  Do this by making test_vcpus_then_vgic and test_vgic_then_vcpus
in vgic_init GIC version agnostic.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-9-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
46fb941bc0 KVM: arm64: selftests: Make vgic_init/vm_gic_create version agnostic
Make vm_gic_create GIC version agnostic in the vgic_init test. Also
add a nr_vcpus arg into it instead of defaulting to NR_VCPUS.

No functional change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-8-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
3f4db37e20 KVM: arm64: selftests: Make vgic_init gic version agnostic
As a preparation for the next commits which will add some tests for
GICv2, make aarch64/vgic_init GIC version agnostic. Add a new generic
run_tests function(gic_dev_type) that starts all applicable tests using
GICv3 or GICv2. GICv2 tests are attempted if GICv3 is not available in
the system. There are currently no GICv2 tests, but the test passes now
in GICv2 systems.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-7-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
96e9038969 KVM: arm64: vgic: Drop vgic_check_ioaddr()
There are no more users of vgic_check_ioaddr(). Move its checks to
vgic_check_iorange() and then remove it.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-6-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
2ec02f6c64 KVM: arm64: vgic-v3: Check ITS region is not above the VM IPA size
Verify that the ITS region does not extend beyond the VM-specified IPA
range (phys_size).

  base + size > phys_size AND base < phys_size

Add the missing check into vgic_its_set_attr() which is called when
setting the region.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-5-ricarkol@google.com
2021-10-11 09:31:42 +01:00
Ricardo Koller
c56a87da0a KVM: arm64: vgic-v2: Check cpu interface region is not above the VM IPA size
Verify that the GICv2 CPU interface does not extend beyond the
VM-specified IPA range (phys_size).

  base + size > phys_size AND base < phys_size

Add the missing check into kvm_vgic_addr() which is called when setting
the region. This patch also enables some superfluous checks for the
distributor (vgic_check_ioaddr was enough as alignment == size for the
distributors).

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-4-ricarkol@google.com
2021-10-11 09:31:41 +01:00
Ricardo Koller
4612d98f58 KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size
Verify that the redistributor regions do not extend beyond the
VM-specified IPA range (phys_size). This can happen when using
KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
with:

  base + size > phys_size AND base < phys_size

Add the missing check into vgic_v3_alloc_redist_region() which is called
when setting the regions, and into vgic_v3_check_base() which is called
when attempting the first vcpu-run. The vcpu-run check does not apply to
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
before the first vcpu-run. Note that using the REDIST_REGIONS API
results in a different check, which already exists, at first vcpu run:
that the number of redist regions is enough for all vcpus.

Finally, this patch also enables some extra tests in
vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
redist api: like checking that the REDIST region can fit all the already
created vcpus.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com
2021-10-11 09:31:41 +01:00
Ricardo Koller
f25c5e4daf kvm: arm64: vgic: Introduce vgic_check_iorange
Add the new vgic_check_iorange helper that checks that an iorange is
sane: the start address and size have valid alignments, the range is
within the addressable PA range, start+size doesn't overflow, and the
start wasn't already defined.

No functional change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-2-ricarkol@google.com
2021-10-11 09:31:41 +01:00
Marc Zyngier
3864d17f17 Merge branch kvm-arm64/pkvm/restrict-hypercalls into kvmarm-master/next
* kvm-arm64/pkvm/restrict-hypercalls:
  : .
  : Restrict the use of some hypercalls as well as kexec once
  : the protected KVM mode has been initialised.
  : .
  KVM: arm64: Disable privileged hypercalls after pKVM finalisation
  KVM: arm64: Prevent re-finalisation of pKVM for a given CPU
  KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall
  KVM: arm64: Reject stub hypercalls after pKVM has been initialised
  arm64: Prevent kexec and hibernation if is_protected_kvm_enabled()
  KVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-11 09:21:37 +01:00
Will Deacon
057bed206f KVM: arm64: Disable privileged hypercalls after pKVM finalisation
After pKVM has been 'finalised' using the __pkvm_prot_finalize hypercall,
the calling CPU will have a Stage-2 translation enabled to prevent access
to memory pages owned by EL2.

Although this forms a significant part of the process to deprivilege the
host kernel, we also need to ensure that the hypercall interface is
reduced so that the EL2 code cannot, for example, be re-initialised using
a new set of vectors.

Re-order the hypercalls so that only a suffix remains available after
finalisation of pKVM.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-7-will@kernel.org
2021-10-11 09:07:29 +01:00
Will Deacon
07036cffe1 KVM: arm64: Prevent re-finalisation of pKVM for a given CPU
__pkvm_prot_finalize() completes the deprivilege of the host when pKVM
is in use by installing a stage-2 translation table for the calling CPU.

Issuing the hypercall multiple times for a given CPU makes little sense,
but in such a case just return early with -EPERM rather than go through
the whole page-table dance again.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-6-will@kernel.org
2021-10-11 09:07:29 +01:00
Will Deacon
2f2e1a5069 KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall
If the __pkvm_prot_finalize hypercall returns an error, we WARN but fail
to propagate the failure code back to kvm_arch_init().

Pass a pointer to a zero-initialised return variable so that failure
to finalise the pKVM protections on a host CPU can be reported back to
KVM.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-5-will@kernel.org
2021-10-11 09:07:29 +01:00
Will Deacon
8579a185ba KVM: arm64: Reject stub hypercalls after pKVM has been initialised
The stub hypercalls provide mechanisms to reset and replace the EL2 code,
so uninstall them once pKVM has been initialised in order to ensure the
integrity of the hypervisor code.

To ensure pKVM initialisation remains functional, split cpu_hyp_reinit()
into two helper functions to separate usage of the stub from usage of
pkvm hypercalls either side of __pkvm_init on the boot CPU.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-4-will@kernel.org
2021-10-11 09:07:28 +01:00
Will Deacon
8f4566f18d arm64: Prevent kexec and hibernation if is_protected_kvm_enabled()
When pKVM is enabled, the hypervisor code at EL2 and its data structures
are inaccessible to the host kernel and cannot be torn down or replaced
as this would defeat the integrity properies which pKVM aims to provide.
Furthermore, the ABI between the host and EL2 is flexible and private to
whatever the current implementation of KVM requires and so booting a new
kernel with an old EL2 component is very likely to end in disaster.

In preparation for uninstalling the hyp stub calls which are relied upon
to reset EL2, disable kexec and hibernation in the host when protected
KVM is enabled.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-3-will@kernel.org
2021-10-11 09:07:28 +01:00
Marc Zyngier
a78738ed1d KVM: arm64: Turn __KVM_HOST_SMCCC_FUNC_* into an enum (mostly)
__KVM_HOST_SMCCC_FUNC_* is a royal pain, as there is a fair amount
of churn around these #defines, and we avoid making it an enum
only for the sake of the early init, low level code that requires
__KVM_HOST_SMCCC_FUNC___kvm_hyp_init to be usable from assembly.

Let's be brave and turn everything but this symbol into an enum,
using a bit of arithmetic to avoid any overlap.

Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/877depq9gw.wl-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211008135839.1193-2-will@kernel.org
2021-10-11 09:07:28 +01:00
Quentin Perret
6e6a8ef088 KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
VM_SHARED mappings are currently forbidden in a memslot with MTE to
prevent two VMs racing to sanitise the same page. However, this check
is performed while holding current->mm's mmap_lock, but fails to release
it. Fix this by releasing the lock when needed.

Fixes: ea7fc1bb1c ("KVM: arm64: Introduce MTE VM feature")
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005122031.809857-1-qperret@google.com
2021-10-05 13:22:45 +01:00
Quentin Perret
7615c2a514 KVM: arm64: Report corrupted refcount at EL2
Some of the refcount manipulation helpers used at EL2 are instrumented
to catch a corrupted state, but not all of them are treated equally. Let's
make things more consistent by instrumenting hyp_page_ref_dec_and_test()
as well.

Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005090155.734578-6-qperret@google.com
2021-10-05 13:02:54 +01:00
Quentin Perret
1d58a17ef5 KVM: arm64: Fix host stage-2 PGD refcount
The KVM page-table library refcounts the pages of concatenated stage-2
PGDs individually. However, when running KVM in protected mode, the
host's stage-2 PGD is currently managed by EL2 as a single high-order
compound page, which can cause the refcount of the tail pages to reach 0
when they shouldn't, hence corrupting the page-table.

Fix this by introducing a new hyp_split_page() helper in the EL2 page
allocator (matching the kernel's split_page() function), and make use of
it from host_s2_zalloc_pages_exact().

Fixes: 1025c8c0c6 ("KVM: arm64: Wrap the host with a stage 2")
Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005090155.734578-5-qperret@google.com
2021-10-05 13:02:54 +01:00
Paolo Bonzini
542a2640a2 Initial KVM RISC-V support
Following features are supported by the initial KVM RISC-V support:
 1. No RISC-V specific KVM IOCTL
 2. Loadable KVM RISC-V module
 3. Minimal possible KVM world-switch which touches only GPRs and few CSRs
 4. Works on both RV64 and RV32 host
 5. Full Guest/VM switch via vcpu_get/vcpu_put infrastructure
 6. KVM ONE_REG interface for VCPU register access from KVM user-space
 7. Interrupt controller emulation in KVM user-space
 8. Timer and IPI emuation in kernel
 9. Both Sv39x4 and Sv48x4 supported for RV64 host
 10. MMU notifiers supported
 11. Generic dirty log supported
 12. FP lazy save/restore supported
 13. SBI v0.1 emulation for Guest/VM
 14. Forward unhandled SBI calls to KVM user-space
 15. Hugepage support for Guest/VM
 16. IOEVENTFD support for Vhost
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmFb7+kACgkQrUjsVaLH
 LAfvBRAAiD/sUCbrGNk4IAOpI3drxBUTJAC3cCClOAlffHjgUgSf3Qlz+9NMNfuV
 6h3Jk9cheIVwrsQLPG4Zawnk2bujFpaJ4rn+kO3aA1ArT+ECGR2XVie2CpBuJ5g7
 ysRlrhcr9YVXXlt8ZczUBwYSjDkDaM1gaDQCt3ep65RCm8NU++Y/TlJOK4xFqC2Z
 Km2jlX2FAxKQLjNJstCPFGiXHhvRK9tu9TK18Fl9Ibk+oO7/H+tZ6VwCivzjzdiW
 kYAa6xdX4NT80H6UGeR/R72b2CekD2RJDnyaN4LE2ticIzuysZpxcI3806gOazDm
 8zVXB9627LRzHzJt4JjdKw76BghuKE3DW8V8hg4WuTpVa6Nx6EHxgpOpGA2sFZDr
 QZPv4gWQeEGBsTxQJpVJ8D6sIEsQdNQQERUemQToAzX9Hl/kkKaSQSfocSOYqgVP
 iMNY4LsJfNg/vDEDzMub/kF9qH0jPH8pkQuNX/X0A8nAtJkznjxVN9NaTxDD25Tp
 udw5/easWD1qDGE6i4Kk/JyDb9NbVAK0ZK6CM5JjtqDtOAnVXCXy+rXgEPsNs3Td
 kZVa34hKAlTpVhMfI0yj0s5jX0d/2w5KuQX147C/IWi3pBdIYZLvcqAb/1cxSoff
 uyW4dryfU7cQQdOfbIKHgaLU3b221/q5wYmEi3zqtVbixhIUtNE=
 =MMFv
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-5.16-1' of git://github.com/kvm-riscv/linux into HEAD

Initial KVM RISC-V support

Following features are supported by the initial KVM RISC-V support:
1. No RISC-V specific KVM IOCTL
2. Loadable KVM RISC-V module
3. Minimal possible KVM world-switch which touches only GPRs and few CSRs
4. Works on both RV64 and RV32 host
5. Full Guest/VM switch via vcpu_get/vcpu_put infrastructure
6. KVM ONE_REG interface for VCPU register access from KVM user-space
7. Interrupt controller emulation in KVM user-space
8. Timer and IPI emuation in kernel
9. Both Sv39x4 and Sv48x4 supported for RV64 host
10. MMU notifiers supported
11. Generic dirty log supported
12. FP lazy save/restore supported
13. SBI v0.1 emulation for Guest/VM
14. Forward unhandled SBI calls to KVM user-space
15. Hugepage support for Guest/VM
16. IOEVENTFD support for Vhost
2021-10-05 04:19:24 -04:00
Anup Patel
24b699d12c RISC-V: KVM: Add MAINTAINERS entry
Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:14:10 +05:30
Anup Patel
da40d85805 RISC-V: KVM: Document RISC-V specific parts of KVM API
Document RISC-V specific parts of the KVM API, such as:
 - The interrupt numbers passed to the KVM_INTERRUPT ioctl.
 - The states supported by the KVM_{GET,SET}_MP_STATE ioctls.
 - The registers supported by the KVM_{GET,SET}_ONE_REG interface
   and the encoding of those register ids.
 - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to
   userspace tool.

CC: Jonathan Corbet <corbet@lwn.net>
CC: linux-doc@vger.kernel.org
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:12:34 +05:30
Atish Patra
dea8ee31a0 RISC-V: KVM: Add SBI v0.1 support
The KVM host kernel is running in HS-mode needs so we need to handle
the SBI calls coming from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1
calls are implemented in KVM kernel module except GETCHAR and PUTCHART
calls which are forwarded to user space because these calls cannot be
implemented in kernel space. In future, when we implement SBI v0.2 for
Guest, we will forward SBI v0.2 experimental and vendor extension calls
to user space.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:11:30 +05:30
Atish Patra
4d9c5c072f RISC-V: KVM: Implement ONE REG interface for FP registers
Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:09:32 +05:30
Atish Patra
5de52d4a23 RISC-V: KVM: FP lazy save/restore
This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:08:23 +05:30
Atish Patra
3a9f66cb25 RISC-V: KVM: Add timer functionality
The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

This patch adds guest VCPU timer implementation along with ONE_REG
interface to access VCPU timer state from user space.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:07:16 +05:30
Anup Patel
9955371cc0 RISC-V: KVM: Implement MMU notifiers
This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:03:39 +05:30
Anup Patel
9d05c1fee8 RISC-V: KVM: Implement stage2 page table programming
This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

[jiangyifei: stage2 dirty log support]
Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:02:19 +05:30
Anup Patel
fd7bb4a251 RISC-V: KVM: Implement VMID allocator
We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
   VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
   of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
   VMID changes using VCPU request KVM_REQ_UPDATE_HGATP

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 16:01:04 +05:30
Anup Patel
5a5d79acd7 RISC-V: KVM: Handle WFI exits for VCPU
We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:55:01 +05:30
Anup Patel
9f70132651 RISC-V: KVM: Handle MMIO exits for VCPU
We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

[jiangyifei: ioeventfd and in-kernel mmio device support]
Signed-off-by: Yifei Jiang <jiangyifei@huawei.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:51:47 +05:30
Anup Patel
34bde9d8b9 RISC-V: KVM: Implement VCPU world-switch
This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:49:57 +05:30
Anup Patel
92ad82002c RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have three types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE   - these are VCPU general purpose registers
3. CSR    - these are VCPU control and status registers

The CONFIG register available to user-space is ISA. The ISA register is
a read and write register where user-space can only write the desired
VCPU ISA capabilities before running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.

In future, more VCPU register types will be added (such as FP) for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04 15:46:22 +05:30