guest_enter and guest_exit must be called with interrupts disabled,
since they take the vtime_seqlock with write_seq{lock,unlock}.
Therefore, it is not necessary to check for exceptions, nor to
save/restore the IRQ state, when context tracking functions are
called by guest_enter and guest_exit.
Split the body of context_tracking_entry and context_tracking_exit
out to __-prefixed functions, and use them from KVM.
Rik van Riel has measured this to speed up a tight vmentry/vmexit
loop by about 2%.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Tested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All calls to context_tracking_enter and context_tracking_exit
are already checking context_tracking_is_enabled, except the
context_tracking_user_enter and context_tracking_user_exit
functions left in for the benefit of assembly calls.
Pull the check up to those functions, by making them simple
wrappers around the user_enter and user_exit inline functions.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Tested-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr. Remove
BUG_ON. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON. Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I found PML was broken since below commit:
commit feda805fe7
Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Date: Wed Sep 9 14:05:55 2015 +0800
KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update
Unify the update in vmx_cpuid_update()
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Rewrite to use vmcs_set_secondary_exec_control. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control,
in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as
PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after
vcpu is created, PML will be disabled unexpectedly while log-dirty code still
thinks PML is used.
Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control
only when PML is not supported or not enabled (!enable_pml). This is more
reasonable as PML is currently either always enabled or disabled. With this
explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not
needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer.
Fixes: feda805fe7
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
[While at it, change a wrong ASSERT to an "if". The condition can happen
if creating the VCPU fails with ENOMEM. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit b18d5431ac ("KVM: x86: fix CR0.CD virtualization") was
technically correct, but it broke OVMF guests by slowing down various
parts of the firmware.
Commit fb279950ba ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the
first function modified by b18d5431ac, vmx_get_mt_mask(), for OVMF's
sake. This restored the speed of the OVMF code that runs before
PlatformPei (including the memory intensive LZMA decompression in SEC).
This patch extends the quirk to the second function modified by
b18d5431ac, kvm_set_cr0(). It eliminates the intrusive slowdown that
hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's
UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up
all APs at once for initialization, in order to count them.
We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from
the other half of the original commit b18d5431ac.
Fixes: b18d5431ac
Cc: stable@vger.kernel.org
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: Janusz Mocek <januszmk6@gmail.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>#
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The SDM says that exiting system management mode from 64-bit mode
is invalid, but that would be too good to be true. But actually,
most of the code is already there to support exiting from compat
mode (EFER.LME=1, EFER.LMA=0). Getting all the way from 64-bit
mode to real mode only requires clearing CS.L and CR4.PCIDE.
Cc: stable@vger.kernel.org
Fixes: 660a5d517a
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.
Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function is not used outside device assignment, and
kvm_arch_set_irq_inatomic has a different prototype. Move it here and
make it static to avoid confusion.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We do not want to do too much work in atomic context, in particular
not walking all the VCPUs of the virtual machine. So we want
to distinguish the architecture-specific injection function for irqfd
from kvm_set_msi. Since it's still empty, reuse the newly added
kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot.
BSP won't get lowest priority interrupts until other VCPUs get enough
interrupts to match their pre-reboot apic_arb_prio.
That behavior doesn't fit into KVM's round-robin-like interpretation of
lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so
just zero apic_arb_prio there.
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code. Enforce physical addressing.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We want to read the physical memory when emulating RSM.
X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
removing unused variables, found by coccinelle
Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
=GMNk
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.
Conflicts:
arch/x86/include/asm/kvm_host.h
Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used
together. Merge them into one function, saving from second vgic_ops
dereferencing every time.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
1. Remove unnecessary 'irq' argument, because irq number can be retrieved
from the LR.
2. Since cff9211eb1
("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ")
LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it
clears vlr.state itself. Therefore, we remove the same, now duplicated,
check with all accompanying bit manipulations from vgic_unqueue_irqs().
3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since
it already does more than just clearing the LR, move
vgic_irq_clear_queued() inside of it.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we use vgic_irq_lr_map in order to track which LRs hold which
IRQs, and lr_used bitmap in order to track which LRs are used or free.
vgic_irq_lr_map is actually used only for piggy-back optimization, and
can be easily replaced by iteration over lr_used. This is good because in
future, when LPI support is introduced, number of IRQs will grow up to at
least 16384, while numbers from 1024 to 8192 are never going to be used.
This would be a huge memory waste.
In its turn, lr_used is also completely redundant since
ae705930fc ("arm/arm64: KVM: Keep elrsr/aisr
in sync with software model"), because together with lr_used we also update
elrsr. This allows to easily replace lr_used with elrsr, inverting all
conditions (because in elrsr '1' means 'free').
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
There is one important bug fix for a potential memory corruption
and/or guest errors for guests with 63 or 64 vCPUs. This fix would
qualify for 4.3 but is some days too late giving that we are
about to release 4.3.
Given that this patch is cc stable >= 3.15 anyway, we can handle
it via 4.4. merge window.
This pull reuqest also contains two cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJWMjS5AAoJEBF7vIC1phx80FMP/Rbm+DOrPfAQJteIxTvm5TmM
LjZjdZ6xCuUH+css6mtziQmA3GX2n4wPgrh5oU7ZYewvvNf6KXfzSj8lh0Y40xvE
LELMrleip1A2E+lix3jcYrUnCtuGUVvbAlEbh3B51GeMuS55NdMbTJjDz+YCZS77
TVwvzXiTiV81SEE0g5F6ydAIHoa+qTwNtf9LOqP33DNonjqOZVIbxJpU7c1gvm/N
2GXMqds7SU/iI59cQCKdeoPcUfuq/tyQOaSQShfbwZ34yw10IBd+24roJp77jYo7
txBrlJEP1P7OjALrzT4PHieZ3rSUgxFPhdUvdgbgNI5YSIcSOWbKKFGXkQ0QJYJ2
dZh58qY12LFLkC5PJbJzEv9cz/wI+O27DT2dFluhvTNSgI09yeQ7yuMqYhwqBsSS
NPQdUQ+D/TF1oZh1UVEEIk4UVURB9wRhqrPMeeNTNcawEEqxYGtpBw19AdeacMCk
diSdPVgN3qQCoR7jR3LR0S28M1i24PyayQWeKwLWitm7wGp663bfjkpnJCOn9jXD
/LBPYJDQDCdzBd1ynoAj+BK7Uau++LiJUeozKf/ohek3c+N+N/bcFVY9liGnFFub
DVHIfGEztf3t6TXpqnJ4uxGjafonx/WsfPqGRdOnB6OZ1WVB7MZIAUJBAmLtFHvv
N0FkHKfrRSRTtsUb6g5e
=w01Y
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-20151028' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Bugfix and cleanups
There is one important bug fix for a potential memory corruption
and/or guest errors for guests with 63 or 64 vCPUs. This fix would
qualify for 4.3 but is some days too late giving that we are
about to release 4.3.
Given that this patch is cc stable >= 3.15 anyway, we can handle
it via 4.4. merge window.
This pull request also contains two cleanups.
We currently do some magic shifting (by exploiting that exit codes
are always a multiple of 4) and a table lookup to jump into the
exit handlers. This causes some calculations and checks, just to
do an potentially expensive function call.
Changing that to a switch statement gives the compiler the chance
to inline and dynamically decide between jump tables or inline
compare and branches. In addition it makes the code more readable.
bloat-o-meter gives me a small reduction in code size:
add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348)
function old new delta
kvm_handle_sie_intercept 72 1058 +986
handle_prog 704 696 -8
handle_noop 54 - -54
handle_partial_execution 60 - -60
intercept_funcs 120 - -120
handle_instruction 198 - -198
handle_validity 210 - -210
handle_stop 316 - -316
handle_external_interrupt 368 - -368
Right now my gcc does conditional branches instead of jump tables.
The inlining seems to give us enough cycles as some micro-benchmarking
shows minimal improvements, but still in noise.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
the s390 debug feature does not need newlines. In fact it will
result in empty lines. Get rid of 4 leftovers.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We seemed to have missed a few corner cases in commit f6c137ff00
("KVM: s390: randomize sca address").
The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
some unlucky numbers, we exceed the page.
0x7c0 (1984) -> Fits exactly
0x7d0 (2000) -> 16 bytes out
0x7e0 (2016) -> 32 bytes out
0x7f0 (2032) -> 48 bytes out
One VCPU entry is 32 bytes long.
For the last two cases, we actually write data to the other page.
1. The address of the VCPU.
2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.
Especially the 2. happens regularly. So this could produce two problems:
1. The guest losing/getting external calls.
2. Random memory overwrites in the host.
So this problem happens on every 127 + 128 created VM with 64 VCPUs.
Cc: stable@vger.kernel.org # v3.15+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Besides being a coding style issue, it confuses make tags:
ctags: Warning: include/kvm/arm_vgic.h:307: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:308: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:309: null expansion of name pattern "\1"
ctags: Warning: include/kvm/arm_vgic.h:317: null expansion of name pattern "\1"
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
If we panic in hyp mode, we inject a call to panic() into the EL1N host
kernel. If a guest context is active, we first attempt to restore the
minimal amount of state necessary to execute the host kernel with
restore_sysregs.
However, the SP is restored as part of restore_common_regs, and so we
may return to the host's panic() function with the SP of the guest. Any
calculations based on the SP will be bogus, and any attempt to access
the stack will result in recursive data aborts.
When running Linux as a guest, the guest's EL1N SP is like to be some
valid kernel address. In this case, the host kernel may use that region
as a stack for panic(), corrupting it in the process.
Avoid the problem by restoring the host SP prior to returning to the
host. To prevent misleading backtraces in the host, the FP is zeroed at
the same time. We don't need any of the other "common" registers in
order to panic successfully.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: <kvmarm@lists.cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints
or tracepoint infrastructure defined. Rewriting some of the timer code
handling showed me how much we need this, so let's add these simple
trace points once and for all and we can easily expand with additional
trace points in these files as we go along.
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The ARM architecture only saves the exit class to the HSR (ESR_EL2 for
arm64) on synchronous exceptions, not on asynchronous exceptions like an
IRQ. However, we only report the exception class on kvm_exit, which is
confusing because an IRQ looks like it exited at some PC with the same
reason as the previous exit. Add a lookup table for the exception index
and prepend the kvm_exit tracepoint text with the exception type to
clarify this situation.
Also resolve the exception class (EC) to a human-friendly text version
so the trace output becomes immediately usable for debugging this code.
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Correct some old mistakes in the API documentation:
1. VCPU is identified by index (using kvm_get_vcpu() function), but
"cpu id" can be mistaken for affinity ID.
2. Some error codes are wrong.
[ Slightly tweaked some grammer and did some s/CPU index/vcpu_index/
in the descriptions. -Christoffer ]
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We introduce kvm_arm_halt_guest and resume functions. They
will be used for IRQ forward state change.
Halt is synchronous and prevents the guest from being re-entered.
We use the same mechanism put in place for PSCI former pause,
now renamed power_off. A new flag is introduced in arch vcpu state,
pause, only meant to be used by those functions.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
In case a vcpu off PSCI call is called just after we executed the
vcpu_sleep check, we can enter the guest although power_off
is set. Let's check the power_off state in the critical section,
just before entering the guest.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
kvm_arch_vcpu_runnable now also checks whether the power_off
flag is set.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The kvm_vcpu_arch pause field is renamed into power_off to prepare
for the introduction of a new pause field. Also vcpu_pause is renamed
into vcpu_sleep since we will sleep until both power_off and pause are
false.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
vhost drivers provide guest VMs with better I/O performance and lower
CPU utilization. This patch allows users to select vhost devices under
KVM configuration menu on ARM. This makes vhost support on arm/arm64
on a par with other architectures (e.g. x86, ppc).
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.
However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt. At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.
Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any effect on the pending state of
virtual interrupts in the vgic. This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts. Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.
This patch fixes this shortcoming through the following series of
changes.
First, we change the flow of the timer/vgic sync/flush operations. Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output. This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.
Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes. Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.
Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever the input to the GIC is asserted and conversely clear the
physical active state when the input to the GIC is deasserted.
Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>