Add support for cache flushing a range of the stage-2 address space to
the generic page-table code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-15-will@kernel.org
Convert stage2_wp_range() to call the kvm_pgtable_stage2_wrprotect()
function of the generic page-table code instead of walking the page-table
directly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-14-will@kernel.org
Convert the page-aging functions and access fault handler to use the
generic page-table code instead of walking the page-table directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-12-will@kernel.org
Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
of walking the page-table directly.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-10-will@kernel.org
Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-9-will@kernel.org
Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-8-will@kernel.org
Introduce alloc() and free() functions to the generic page-table code
for guest stage-2 page-tables and plumb these into the existing KVM
page-table allocator. Subsequent patches will convert other operations
within the KVM allocator over to the generic code.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-6-will@kernel.org
Now that we have a shiny new page-table allocator, replace the hyp
page-table code with calls into the new API. This also allows us to
remove the extended idmap code, as we can now simply ensure that the
VA size is large enough to map everything we need.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-5-will@kernel.org
The generic page-table walker is pretty useless as it stands, because it
doesn't understand enough to allocate anything. Teach it about stage-1
page-tables, and hook up an API for allocating these for the hypervisor
at EL2.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-4-will@kernel.org
The KVM page-table code is intricately tied into the kernel page-table
code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
to reduce code duplication. Unfortunately, the reality is that there is
an awful lot of code required to make this work, and at the end of the
day you're limited to creating page-tables with the same configuration
as the host kernel. Furthermore, lifting the page-table code to run
directly at EL2 on a non-VHE system (as we plan to to do in future
patches) is practically impossible due to the number of dependencies it
has on the core kernel.
Introduce a framework for walking Armv8 page-tables configured
independently from the host kernel.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-3-will@kernel.org
kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(),
so inline the implementation and get rid of the extra function.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-2-will@kernel.org
see the -Wa,-march option.
- Fix "make vdso_install" when COMPAT_VDSO is disabled.
- Make KVM more robust if the AT S1E1R instruction triggers an exception
(architecture corner cases).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl9JPWEACgkQa9axLQDI
XvFLuBAAjTz6SgaLVk5vtoNlNXR+zx/AcwG1hFthWaPqwRLjEVogwZ76Hx7qOStb
M2+rEsr7q8BKAsI7nU1OJGfDMbyozqDxCIq89NmOgCm3TTze/BiZx7KIL+l5aQea
5qiPIt3pwhPaFGAnQLDbdBJ7Iz34VbB8bqxLi9tz5RkbfFFEIkNgobrljVj71ZLp
7xDn8+w54iVqnMrhSTQtPtbdIpgpBO0HL6PuX6jBY+sGfkwpaZCKMdgU4HVkhW8t
kgmj3S/orMtPvZvQXDZflFdDn+dS0c0dyJlzTu7qyJjL/zgma5RYwLSaWSH2kcib
lsna1Xoge1Iqzj7QKT8uzsfCHkZ+ANr17oB8YakQtu1HmVDgvOiDX5v44+aLKdJd
mRwa+UtJT7cVl7I/3r7rOZyb+ApcmjD5Wft7Hi6lOQSfNp+kBRcBCaOcKdh0Gk4A
KFlZYBnXywo1Xy06HkUSIL3k+qvrHMHC5g6S2XmIL6BYvj08poq6BUTlqSAIuzp4
GzIzEusqPX80V8MQeRvJ8XmIPtzqgiA4AVCshAwrSiUcEgYWpsWb+yPTpKpnygpd
UyuuUmfxR7I9ctNw25C4jebdi+gLQoCwCQqRqHR/0Fj4KvkQnAKIWDa52Dcl85Qp
nedLKsEDc+Tb7ePfp9VzgJ3OmVQUL4hiYDIR3YSVeQrG3R9O72A=
=wzsB
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix kernel build with the integrated LLVM assembler which doesn't see
the -Wa,-march option.
- Fix "make vdso_install" when COMPAT_VDSO is disabled.
- Make KVM more robust if the AT S1E1R instruction triggers an
exception (architecture corner cases).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
KVM: arm64: Survive synchronous exceptions caused by AT instructions
KVM: arm64: Add kvm_extable for vaxorcism code
arm64: vdso32: make vdso32 install conditional
arm64: use a common .arch preamble for inline assembly
KVM doesn't expect any synchronous exceptions when executing, any such
exception leads to a panic(). AT instructions access the guest page
tables, and can cause a synchronous external abort to be taken.
The arm-arm is unclear on what should happen if the guest has configured
the hardware update of the access-flag, and a memory type in TCR_EL1 that
does not support atomic operations. B2.2.6 "Possible implementation
restrictions on using atomic instructions" from DDI0487F.a lists
synchronous external abort as a possible behaviour of atomic instructions
that target memory that isn't writeback cacheable, but the page table
walker may behave differently.
Make KVM robust to synchronous exceptions caused by AT instructions.
Add a get_user() style helper for AT instructions that returns -EFAULT
if an exception was generated.
While KVM's version of the exception table mixes synchronous and
asynchronous exceptions, only one of these can occur at each location.
Re-enter the guest when the AT instructions take an exception on the
assumption the guest will take the same exception. This isn't guaranteed
to make forward progress, as the AT instructions may always walk the page
tables, but guest execution may use the translation cached in the TLB.
This isn't a problem, as since commit 5dcd0fdbb4 ("KVM: arm64: Defer guest
entry when an asynchronous exception is pending"), KVM will return to the
host to process IRQs allowing the rest of the system to keep running.
Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdbb4 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending")
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM has a one instruction window where it will allow an SError exception
to be consumed by the hypervisor without treating it as a hypervisor bug.
This is used to consume asynchronous external abort that were caused by
the guest.
As we are about to add another location that survives unexpected exceptions,
generalise this code to make it behave like the host's extable.
KVM's version has to be mapped to EL2 to be accessible on nVHE systems.
The SError vaxorcism code is a one instruction window, so has two entries
in the extable. Because the KVM code is copied for VHE and nVHE, we end up
with four entries, half of which correspond with code that isn't mapped.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(currently they are parked if none of the early CPUs are affected by
this erratum).
- Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
vdso_install' installs the 32-bit compat vdso when it is compiled.
- Print a warning that untrusted guests without a CPU erratum workaround
(Cortex-A57 832075) may deadlock the affected system.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl9BOb8ACgkQa9axLQDI
XvFnixAAladNjLswpC8gm0PSv2fXD+OJU7XiJKl1McG0UcnkRp1brgruOi7WMyQg
I0cbY6Kjfb6mFJaHfEA9uuQ0P44FGmI+gz+3Injl+a0qJgdu0QLU1uJQG/Rae+zG
kdoimf+/CnLnJTiIA5YXsdrFhSQsn2lVConJx26QrSJO0SB0TROy86aRSrRSMyYy
mXrK4xCm/cx4LqJQJrFmShpgs/IjuK8T/LZBInjgB43e3y++SHwGTSA2kf3NOldz
Rx4HqQVJ37IeROZ8B2v8MggZsTk/5C+DaxJu6QBk7REDKPnI9+RAr3BhPUAAVWDn
BONYtsjBLgf/Q9bmXNsHlGAhjIIeOAgaIIr10oVhSFScCHsjEU15hqyNP2jfgLSC
Q4cgU8bA0sm36CHNI0vd5phAnMBN6HJZtmSzu2xb/GJKYW4yiuXupYiOWbBAhhiu
uAFjMRW+dwfOXY/59kwpmedBZ0WvHod2mPhp3n4FRzo3X0NfRgHzgg1kDRwqVAzv
eh8ynUYMjuy5H3siNR0279M6epIK/hTtqSvnzIqkCTBkftsYX+32v4UzEardRGiv
+/gKJ1XQGmWD3OCISuMZwaRK+tH5V68eT+3P0gGmssJLYMtRJFzzVtuv3mz1yXRD
USsww/SY2etlGU+N8r+66NSqU0KPRdQ0tkK6eCX8DTyl3MPLQfQ=
=VkR3
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Allow booting of late secondary CPUs affected by erratum 1418040
(currently they are parked if none of the early CPUs are affected by
this erratum).
- Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
vdso_install' installs the 32-bit compat vdso when it is compiled.
- Print a warning that untrusted guests without a CPU erratum
workaround (Cortex-A57 832075) may deadlock the affected system.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ARM64: vdso32: Install vdso32 from vdso_install
KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
arm64: Allow booting of late CPUs affected by erratum 1418040
arm64: Move handling of erratum 1418040 into C code
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():
| BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
| in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
| INFO: lockdep is turned off.
| CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
| Call trace:
| dump_backtrace+0x0/0x284
| show_stack+0x1c/0x28
| dump_stack+0xf0/0x1a4
| ___might_sleep+0x2bc/0x2cc
| unmap_stage2_range+0x160/0x1ac
| kvm_unmap_hva_range+0x1a0/0x1c8
| kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
| __mmu_notifier_invalidate_range_start+0x218/0x31c
| mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
| __oom_reap_task_mm+0x128/0x268
| oom_reap_task+0xac/0x298
| oom_reaper+0x178/0x17c
| kthread+0x1e4/0x1fc
| ret_from_fork+0x10/0x30
Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.
Cc: <stable@vger.kernel.org>
Fixes: 8b3405e345 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.
Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If guests don't have certain CPU erratum workarounds implemented, then
there is a possibility a guest can deadlock the system. IOW, only trusted
guests should be used on systems with the erratum.
This is the case for Cortex-A57 erratum 832075.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20200803193127.3012242-2-robh@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
separately, allowing for the VHE code to now be built with instrumentation
- Level-based TLB invalidation support
- Restructure of the vcpu register storage to accomodate the NV code
- Pointer Authentication available for guests on nVHE hosts
- Simplification of the system register table parsing
- MMU cleanups and fixes
- A number of post-32bit cleanups and other fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl8q5DEPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDQFAP/jtscnC5OxEOoGNW1gvg/1QI/BuU4zLvqQL1
OEW72fUQlil7tmF/CbLLKnsBpxKmzO02C3wDdg3oaRi884bRtTXdok0nsFuCvrZD
u/wrlMnP0zTjjk1uwIFfZJTx+nnUiT0jC6ffvGxB/jnTJk/8atvOUFL7ODFEfixz
mS5g1jwwJkRmWKESFg7KGSghKuwXTvo4HVWCfME+t1rQwAa03stXFV8H5tkU6+cG
BRIssxo7BkAV2AozwL7hgl/M6wd6QvbOrYJqgb67+sQ8qts0YNne96NN3InMedb1
RENyDssXlA+VI0HoYyEbYnPtFy1Hoj1lOGDZLEZAEH1qcmWrV+hApnoSXSmuofvn
QlfOWCyd92CZySu21MALRUVXbrKkA3zT2b9R93A5z7iEBPY+Wk0ryJCO6IxdZzF8
48LNjtzb/Kd0SMU/issJlw+u6fJvLbpnSzXNsYYhiiTMUE9cbu2SEkq0SkonH0a4
d3V8UifZyeffXsOfOAG0DJZOu/fWZp1/I3tfzujtG9rCb+jTQueJ4E1cFYrwSO6b
sFNyiI1AzlwcCippG08zSUX61nGfKXBuMXuhIlMRk7GeiF95DmSXuxEgYndZX9I+
E6zJr1iQk/1lrip41svDIIOBHuMbIeD/w1bsOKi7Zoa270MxB4r2Z3IqRMgosoE5
l4YO9pl1
=Ukr4
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next-5.6
KVM/arm64 updates for Linux 5.9:
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
separately, allowing for the VHE code to now be built with instrumentation
- Level-based TLB invalidation support
- Restructure of the vcpu register storage to accomodate the NV code
- Pointer Authentication available for guests on nVHE hosts
- Simplification of the system register table parsing
- MMU cleanups and fixes
- A number of post-32bit cleanups and other fixes
x86:
* Report last CPU for debugging
* Emulate smaller MAXPHYADDR in the guest than in the host
* .noinstr and tracing fixes from Thomas
* nested SVM page table switching optimization and fixes
Generic:
* Unify shadow MMU cache data structures across architectures
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
=YVx0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"s390:
- implement diag318
x86:
- Report last CPU for debugging
- Emulate smaller MAXPHYADDR in the guest than in the host
- .noinstr and tracing fixes from Thomas
- nested SVM page table switching optimization and fixes
Generic:
- Unify shadow MMU cache data structures across architectures"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
KVM: SVM: Fix sev_pin_memory() error handling
KVM: LAPIC: Set the TDCR settable bits
KVM: x86: Specify max TDP level via kvm_configure_mmu()
KVM: x86/mmu: Rename max_page_level to max_huge_page_level
KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
KVM: VMX: Make vmx_load_mmu_pgd() static
KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
KVM: VMX: Drop a duplicate declaration of construct_eptp()
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
KVM: Using macros instead of magic values
MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
KVM: VMX: Add guest physical address check in EPT violation and misconfig
KVM: VMX: introduce vmx_need_pf_intercept
KVM: x86: update exception bitmap on CPUID changes
KVM: x86: rename update_bp_intercept to update_exception_bitmap
...
- Removal of the tremendously unpopular read_barrier_depends() barrier,
which is a NOP on all architectures apart from Alpha, in favour of
allowing architectures to override READ_ONCE() and do whatever dance
they need to do to ensure address dependencies provide LOAD ->
LOAD/STORE ordering. This work also offers a potential solution if
compilers are shown to convert LOAD -> LOAD address dependencies into
control dependencies (e.g. under LTO), as weakly ordered architectures
will effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic, augment
the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the device
ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions and
kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl8oTcsACgkQa9axLQDI
XvEj6hAAkn39mO5xrR/Vhpg3DyFPk63ZlMSX9SsOeVyaLbovT6stTs1XAZXPpnkt
rV3gwACyGSrqH6+uey9pHgHJuPF2TdrGEVK08yVKo9KGW/6yXSIncdKFE4jUJ/WJ
wF5j7eMET2aGzcpm5AlzMmq6HOrKB8nZac9H8/x6H+Ox2WdgJkEjOkDvyqACUyum
N3FsTZkWj2pIkTXHNgDZ8KjxVLO8HlFaB2hkxFDl9NPlX2UTCQJ8Tg1KiPLafKaK
gUvH4usQDFdb5RU/UWogre37J4emO0ZTApZOyju+U+PMMWlWVHjZ4isUIS9zz/AE
JNZ23dnKZX2HrYa5p8HZx175zwj/vXUqUHCZPLvQXaAudCEhF8BVljPiG0e80FV5
GHFUgUbylKspp01I/9L+2JvsG96Mr0e+P3Sx7L2HTI42cmtoSa14+MpoSRj7zlft
Qcl8hfrVOjCjUnFRHa/1y1cGvnD9GbgnKJR7zgVxl9bD/Jd48r1HUtwRORZCzWFr
mRPVbPS72fWxMzMV9DZYJm02jJY9kLX2BMl49njbB8MhAhzOvrMVzoVVtMMeRFLR
XHeJpmg36W09FiRGe7LRXlkXIhCQzQG2bJfiphuupCfhjRAitPoq8I925G6Pig60
c8RWaXGU7PrEsdMNrL83vekvGKgqrkoFkRVtsCoQ2X6Hvu/XdYI=
=mh79
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 and cross-arch updates from Catalin Marinas:
"Here's a slightly wider-spread set of updates for 5.9.
Going outside the usual arch/arm64/ area is the removal of
read_barrier_depends() series from Will and the MSI/IOMMU ID
translation series from Lorenzo.
The notable arm64 updates include ARMv8.4 TLBI range operations and
translation level hint, time namespace support, and perf.
Summary:
- Removal of the tremendously unpopular read_barrier_depends()
barrier, which is a NOP on all architectures apart from Alpha, in
favour of allowing architectures to override READ_ONCE() and do
whatever dance they need to do to ensure address dependencies
provide LOAD -> LOAD/STORE ordering.
This work also offers a potential solution if compilers are shown
to convert LOAD -> LOAD address dependencies into control
dependencies (e.g. under LTO), as weakly ordered architectures will
effectively be able to upgrade READ_ONCE() to smp_load_acquire().
The latter case is not used yet, but will be discussed further at
LPC.
- Make the MSI/IOMMU input/output ID translation PCI agnostic,
augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID
bus-specific parameter and apply the resulting changes to the
device ID space provided by the Freescale FSL bus.
- arm64 support for TLBI range operations and translation table level
hints (part of the ARMv8.4 architecture version).
- Time namespace support for arm64.
- Export the virtual and physical address sizes in vmcoreinfo for
makedumpfile and crash utilities.
- CPU feature handling cleanups and checks for programmer errors
(overlapping bit-fields).
- ACPI updates for arm64: disallow AML accesses to EFI code regions
and kernel memory.
- perf updates for arm64.
- Miscellaneous fixes and cleanups, most notably PLT counting
optimisation for module loading, recordmcount fix to ignore
relocations other than R_AARCH64_CALL26, CMA areas reserved for
gigantic pages on 16K and 64K configurations.
- Trivial typos, duplicate words"
Link: http://lkml.kernel.org/r/20200710165203.31284-1-will@kernel.org
Link: http://lkml.kernel.org/r/20200619082013.13661-1-lorenzo.pieralisi@arm.com
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits)
arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack
arm64/mm: save memory access in check_and_switch_context() fast switch path
arm64: sigcontext.h: delete duplicated word
arm64: ptrace.h: delete duplicated word
arm64: pgtable-hwdef.h: delete duplicated words
bus: fsl-mc: Add ACPI support for fsl-mc
bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
of/irq: Make of_msi_map_rid() PCI bus agnostic
of/irq: make of_msi_map_get_device_domain() bus agnostic
dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
of/device: Add input id to of_dma_configure()
of/iommu: Make of_map_rid() PCI agnostic
ACPI/IORT: Add an input ID to acpi_dma_configure()
ACPI/IORT: Remove useless PCI bus walk
ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
arm64: enable time namespace support
arm64/vdso: Restrict splitting VVAR VMA
arm64/vdso: Handle faults on timens page
...
* for-next/read-barrier-depends:
: Allow architectures to override __READ_ONCE()
arm64: Reduce the number of header files pulled into vmlinux.lds.S
compiler.h: Move compiletime_assert() macros into compiler_types.h
checkpatch: Remove checks relating to [smp_]read_barrier_depends()
include/linux: Remove smp_read_barrier_depends() from comments
tools/memory-model: Remove smp_read_barrier_depends() from informal doc
Documentation/barriers/kokr: Remove references to [smp_]read_barrier_depends()
Documentation/barriers: Remove references to [smp_]read_barrier_depends()
locking/barriers: Remove definitions for [smp_]read_barrier_depends()
alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
vhost: Remove redundant use of read_barrier_depends() barrier
asm/rwonce: Don't pull <asm/barrier.h> into 'asm-generic/rwonce.h'
asm/rwonce: Remove smp_read_barrier_depends() invocation
alpha: Override READ_ONCE() with barriered implementation
asm/rwonce: Allow __READ_ONCE to be overridden by the architecture
compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h
tools: bpf: Use local copy of headers including uapi/linux/filter.h
To allow for re-injection of stage-2 faults on stage-1 page-table walks
due to either a missing or read-only memslot, move the triage logic out
of io_mem_abort() and into kvm_handle_guest_abort(), where these aborts
can be handled before anything else.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-5-will@kernel.org
If a guest performs cache maintenance on a read-only memslot, we should
inform userspace rather than skip the instruction altogether.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-4-will@kernel.org
If the guest generates a synchronous external abort which is not handled
by the host, we inject it back into the guest as a virtual SError, but
only if the original fault was reported on the data side. Instruction
faults are reported as "Unsupported FSC", causing the vCPU run loop to
bail with -EFAULT.
Although synchronous external aborts from a guest are pretty unusual,
treat them the same regardless of whether they are taken as data or
instruction aborts by EL2.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-3-will@kernel.org
kvm_vcpu_dabt_isextabt() is not specific to data aborts and, unlike
kvm_vcpu_dabt_issext(), has nothing to do with sign extension.
Rename it to 'kvm_vcpu_abt_issea()'.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-2-will@kernel.org
Some compilers may put a subset of generated functions into '.text.*'
ELF sections and the linker may leverage this division to optimize ELF
layout. Unfortunately, the recently introduced HYPCOPY command assumes
that all executable code (with the exception of specialized sections
such as '.hyp.idmap.text') is in the '.text' section. If this
assumption is broken, code in '.text.*' will be merged into kernel
proper '.text' instead of the '.hyp.text' that is mapped in EL2.
To ensure that this cannot happen, insert an OBJDUMP assertion into
HYPCOPY. The command dumps a list of ELF sections in the input object
file and greps for '.text.'. If found, compilation fails. Tested with
both binutils' and LLVM's objdump (the output format is different).
GCC offers '-fno-reorder-functions' to disable this behaviour. Select
the flag if it is available. From inspection of GCC source (latest
Git in July 2020), this flag does force all code into '.text'.
By default, GCC uses profile data, heuristics and attributes to select
a subsection.
LLVM/Clang currently does not have a similar optimization pass. It can
place static constructors into '.text.startup' and it's optimizer can
be provided with profile data to reorder hot/cold functions. Neither
of these is applicable to nVHE hyp code. If this changes in the future,
the OBJDUMP assertion should alert users to the problem.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200730132519.48787-1-dbrazdil@google.com
The HARDEN_EL2_VECTORS config maps vectors at a fixed location on cores which
are susceptible to Spector variant 3a (A57, A72) to prevent defeating hyp
layout randomization by leaking the value of VBAR_EL2.
Since this feature is only applicable when EL2 layout randomization is enabled,
unify both behind the same RANDOMIZE_BASE Kconfig. Majority of code remains
conditional on a capability selected for the affected cores.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200721094445.82184-3-dbrazdil@google.com
If there are spare bits in non-VHE hyp VA, KVM unconditionally replaces them
with a random tag chosen at init. Disable this if the kernel is built without
RANDOMIZE_BASE to align with kernel behavior.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200721094445.82184-2-dbrazdil@google.com
If a stage-2 page-table contains an executable, read-only mapping at the
pte level (e.g. due to dirty logging being enabled), a subsequent write
fault to the same page which tries to install a larger block mapping
(e.g. due to dirty logging having been disabled) will erroneously inherit
the exec permission and consequently skip I-cache invalidation for the
rest of the block.
Ensure that exec permission is only inherited by write faults when the
new mapping is of the same size as the existing one. A subsequent
instruction abort will result in I-cache invalidation for the entire
block mapping.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Quentin Perret <qperret@google.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200723101714.15873-1-will@kernel.org
Activity Monitor Event Type Registers are named as AMEVTYPER{0,1}<n>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200721091259.102756-1-vladimir.murzin@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Although vmlinux.lds.S smells like an assembly file and is compiled
with __ASSEMBLY__ defined, it's actually just fed to the preprocessor to
create our linker script. This means that any assembly macros defined
by headers that it includes will result in a helpful link error:
| aarch64-linux-gnu-ld:./arch/arm64/kernel/vmlinux.lds:1: syntax error
In preparation for an arm64-private asm/rwonce.h implementation, which
will end up pulling assembly macros into linux/compiler.h, reduce the
number of headers we include directly and transitively in vmlinux.lds.S
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.
No functional change intended.
Cc: Marc Zyngier <maz@kernel.org>
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-19-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make
the struct and its usage compatible with the common 'struct
kvm_mmu_memory_cache' in linux/kvm_host.h. This will minimize code
churn when arm64 moves to the common implementation in a future patch, at
the cost of temporarily having somewhat silly code.
Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO:
#define GFP_PGTABLE_USER (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
|
-> #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
== GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO
versus
#define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)
with __GFP_ZERO explicitly OR'd in
== GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO
No functional change intended.
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-18-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache. This removes a
BUG_ON() and sets the stage for moving arm64 to the common memory cache
implementation.
No functional change intended.
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-17-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the timer gsisters to the sysreg file. This will further help when
they are directly changed by a nesting hypervisor in the VNCR page.
This requires moving the initialisation of the timer struct so that some
of the helpers (such as arch_timer_ctx_index) can work correctly at an
early stage.
Signed-off-by: Marc Zyngier <maz@kernel.org>
kvm_timer_sync_hwstate() has nothing to do with the timer HW state,
but more to do with the state of a userspace interrupt controller.
Change the suffix from _hwstate to_user, in keeping with the rest
of the code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
SPSR_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to
the sysregs array and update the accessors.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
As we're about to move SPSR_EL1 into the VNCR page, we need to
disassociate it from the rest of the 32bit cruft. Let's break
the array into individual fields.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
SP_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to the
system register array and update the accessors.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>