The goal is to sleep qemu whenever the guest clock
is in advance compared to the host clock (we use
the monotonic clocks). The amount of time to sleep
is calculated in the execution loop in cpu_exec.
At first, we tried to approximate at each for loop the real time elapsed
while searching for a TB (generating or retrieving from cache) and
executing it. We would then approximate the virtual time corresponding
to the number of virtual instructions executed. The difference between
these 2 values would allow us to know if the guest is in advance or delayed.
However, the function used for measuring the real time
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive.
We had an added overhead of 13% of the total run time.
Therefore, we modified the algorithm and only take into account the
difference between the 2 clocks at the begining of the cpu_exec function.
During the for loop we try to reduce the advance of the guest only by
computing the virtual time elapsed and sleeping if necessary. The overhead
is thus reduced to 3%. Even though this method still has a noticeable
overhead, it no longer is a bottleneck in trying to achieve a better
guest frequency for which the guest clock is faster than the host one.
As for the the alignement of the 2 clocks, with the first algorithm
the guest clock was oscillating between -1 and 1ms compared to the host clock.
Using the second algorithm we notice that the guest is 5ms behind the host, which
is still acceptable for our use case.
The tests where conducted using fio and stress. The host machine in an i5 CPU at
3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb
built with buildroot.
Currently, on our test machine, the lowest icount we can achieve that is suitable for
aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are
slower than the cpu tests (using stress).
Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The align option is used for activating the align algorithm
in order to synchronise the host clock and the guest clock.
Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make icount parameter use QemuOpts style options in order
to easily add other suboptions.
Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Tested-by: Camille Bégué <camille.begue@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using the icount option on ARM, the virtual
clock starts counting at realtime clock but it
should start at 0.
The reason why the virtual clock starts at realtime clock
is because the first time we call qemu_clock_warp (which
calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias
(which is part of the virtual time computation mechanism)
will increment by realtime - vm_clock_warp_start, with
vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c).
By changing the value of vm_clock_warp_start from 0 to -1,
the first time we call qemu_clock_warp which calls
icount_warp_rt, we will return immediatly because
icount_warp_rt first checks if vm_clock_warp_start is -1
and if it's the case it returns. Therefore, qemu_icount_bias
will first be incremented by the value of a virtual timer
deadline when the virtual cpu goes from active to inactive.
The virtual time will start at 0 and increment based
on the instruction counter when the vcpu is active or
the qemu_icount_bias value when inactive.
Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds cpu_icount_to_ns function which is needed for reverse execution.
It returns the time for a specific instruction.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes a bug where qemu_icount and qemu_icount_bias are not migrated.
It adds a subsection "timer/icount" to vmstate_timers so icount is migrated only
when needed.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This puts qemu_icount and qemu_icount_bias into TimerState structure to allow
them to be migrated.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There patch protects vmstop_requested with a lock and introduces
qemu_system_vmstop_request_prepare.
Together with the new call to qemu_vmstop_requested in vm_start,
qemu_system_vmstop_request_prepare avoids a race where the VM could remain
stopped even though the iostatus of a block device has already been set
(for example).
qemu_system_vmstop_request_prepare however also lets the caller thread
delay observation of the state change until it has itself communicated
that change to the user. This delay avoids any possibility of a wrong
reordering of the BLOCK_IO_ERROR event and the subsequent STOP event.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
MST: comment tweaks
Use dedicated qemu_soonest_timeout() instead of MIN().
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After previous Peter patch, they are redundant. This way we don't
assign them except when needed. Once there, there were lots of case
where the ".fields" indentation was wrong:
.fields = (VMStateField []) {
and
.fields = (VMStateField []) {
Change all the combinations to:
.fields = (VMStateField[]){
The biggest problem (appart from aesthetics) was that checkpatch complained
when we copy&pasted the code from one place to another.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
These functions don't need type casts (as does cpu_physical_memory_rw)
and also make the code better readable.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Default to false.
Tidy variable naming and inline cast uses while at it.
Tested-by: Jia Liu <proljc@gmail.com> (or32)
Signed-off-by: Andreas Färber <afaerber@suse.de>
If enabled, set the thread name at creation (on GNU systems with
pthread_set_np)
Fix up all the callers with a thread name
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
This motion is preparing for refactoring vCPU APIC subsequently.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Stop/cont commands are broken with -icount due to a deadlock. The
real problem is that the computation of timers_state.cpu_ticks_offset
makes no sense with -icount enabled: we set it to an icount clock value
in cpu_disable_ticks, and subtract a TSC (or similar, whatever
cpu_get_real_ticks happens to return) value in cpu_enable_ticks.
The fix is simple. timers_state.cpu_ticks_offset is only used
together with cpu_get_real_ticks, so we can use cpu_get_real_ticks
in cpu_disable_ticks. There is no need to update cpu_ticks_prev
at the time cpu_disable_ticks is called; instead, we can do it
the next time cpu_get_ticks is called.
The change to cpu_disable_ticks is the important part of the patch.
The rest modifies the code to always check timers_state.cpu_ticks_prev,
even when the ticks are not advancing (i.e. the VM is stopped). It also
makes a similar change to cpu_get_clock_locked, so that the code remains
similar for cpu_get_ticks and cpu_get_clock_locked.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1382977938-13844-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
When we translate the virtual address to physical check for error.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Computing the deadline of all vm_clocks is somewhat expensive and calls
out to qemu-timer.c; two reasons not to do it in the seqlock's write-side
critical section. This however opens the door for races in setting and
reading vm_clock_warp_start.
To plug them, we need to cover the case where a new deadline slips in
between the call to qemu_clock_deadline_ns_all and the actual modification
of the icount_warp_timer. Restrict changes to vm_clock_warp_start and
the icount_warp_timer's expiration time, to only move them back (which
would simply cause an early wakeup).
If a vm_clock timer is cancelled while CPUs are idle, this might cause the
icount_warp_timer to fire unnecessarily. This is not a problem, after it
fires the timer becomes inactive and the next call to timer_mod_anticipate
will be precise.
In addition to this, we must deactivate the icount_warp_timer _before_
checking whether CPUs are idle. This way, if the "last" CPU becomes idle
during the call to timer_del we will still set up the icount_warp_timer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To prepare for future code changes, move the increment of qemu_icount_bias
outside the "if" statement.
Also, hoist outside the if the check for timers that expired due to the
"warping". The check is redundant when !runstate_is_running(), but
doing it this way helps because the code that increments qemu_icount_bias
will be a critical section.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will help later when we will have to place these calls in
a critical section, and thus call a version of cpu_get_icount()
that does not take the lock.
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU_CLOCK_VIRTUAL may be read outside BQL. This will make its
foundation, i.e. cpu_clock_offset exposed to race condition.
Using private lock to protect it.
After this patch, reading QEMU_CLOCK_VIRTUAL is thread safe
unless use_icount is true, in which case the existing callers
still rely on the BQL.
Lock rule: private lock innermost, ie BQL->"this lock"
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was introduced to loop over CPUs from target-independent code, but
since commit 182735efaf target-independent
CPUState is used.
A loop can be considered more efficient than function calls in a loop,
and CPU_FOREACH() hides implementation details just as well, so use that
instead.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
There is the 'nmi' command that is used to trigger a guest dump via kdump feature on x86.
s390 uses RESTART interrupt to trigger kdump.
So, this patch provides a mean to use 'nmi' command on s390 to raise RESTART interrupt.
The CPU to receive the RESTART interrupt is the "default" one.
There is an infrastructure to select the "default" CPU using 'cpu' command.
The 'info cpus' command can be used to see which one is the "default".
In order to wire up the RESTART to 'nmi' command we had to:
1. implement the kvm_s390_cpu_restart function by exporting the existing code
2. implement s390_cpu_restart function as kvm-aware wrapper
3. modify the qmp_inject_nmi function to enable (for s390) the scan for
"default" CPU and call s390_cpu_restart for it;
3. fix some messages.
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Rearrange timer.h so it is in order by function type.
Make legacy functions call non-legacy functions rather than vice-versa.
Convert cpus.c to use new API.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Notify all timerlists derived from vm_clock in icount warp
calculations.
When calculating timer delay based on vm_clock deadline, use
all timerlists.
For compatibility, maintain an apparent bug where when using
icount, if no vm_clock timer was set, qemu_clock_deadline
would return INT32_MAX and always set an icount clock expiry
about 2 seconds ahead.
NB: thread safety - when different timerlists sit on different
threads, this will need some locking.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It makes more sense and will make things simpler later.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Prepares for changing cpu_single_step() argument to CPUState.
Acked-by: Michael Walle <michael@walle.cc> (for lm32)
Signed-off-by: Andreas Färber <afaerber@suse.de>
Even if the VM is already stopped, we cannot assume that all data has
already been successfully flushed to disk. The flush during the previous
vm_stop() could have failed.
Run bdrv_flush_all() unconditionally so that we get an error each time
if the block device isn't really flushed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Chegu Vinod
# Via Juan Quintela
* quintela/migration.next:
Force auto-convegence of live migration
Add 'auto-converge' migration capability
Introduce async_run_on_cpu()
Message-id: 1373664508-5404-1-git-send-email-quintela@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If flushing the block devices fails, return an error. The VM is stopped
anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce an asynchronous version of run_on_cpu() i.e. the caller
doesn't have to block till the call back routine finishes execution
on the target vcpu.
Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Move next_cpu from CPU_COMMON to CPUState.
Move first_cpu variable to qom/cpu.h.
gdbstub needs to use CPUState::env_ptr for now.
cpu_copy() no longer needs to save and restore cpu_next.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[AF: Rebased, simplified cpu_copy()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
On PPC, we don't support MP state. So far it's not necessary and I'm
not convinced yet that we really need to support it ever.
However, the current idle logic in QEMU assumes that an in-kernel PIC
also means we support MP state. This assumption is not true anymore.
Let's split up the two cases into two different variables. That way
PPC can expose an in-kernel PIC, while not implementing MP state.
Signed-off-by: Alexander Graf <agraf@suse.de>
CC: Jan Kiszka <jan.kiszka@siemens.com>
This allows to move the call into CPUState's realizefn.
Therefore move the stub into libqemustub.a.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Pass it to qemu_dummy_cpu_thread_fn().
Use CPUState::env_ptr for cpu_single_env.
Prepares for changing qemu_init_vcpu() argument to CPUState.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Pass it on to qemu_kvm_cpu_thread_fn().
Prepares for changing qemu_init_vcpu() argument to CPUState.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
CPUArchState is no longer needed.
Prepares for changing qemu_kvm_cpu_thread_fn() opaque to CPUState.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Use CPUState::env_ptr for now.
Prepares for changing cpu_handle_guest_debug() argument to CPUState.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
It no longer uses CPUArchState.
Prepares for changing qemu_kvm_cpu_thread_fn() opaque to CPUState.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>