As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.
Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.
Notes:
-"No EOI required" is set only if interrupt injected
is edge triggered; this is true because level interrupts are going
through IOAPIC which disables PV EOI.
In any case, if guest triggers EOI the bit will get cleared on exit.
-For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
seems sufficient
In any case, bit is cleared on exit so worst case it's never re-enabled
-no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
replacement that lets you do EOI with an MSR and not IO.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail. Instead the regular register accessing interface should be used.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Our common function for vmptr checks (in 2/4) needs to fetch
the memory address
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTaOGKAAoJEBF7vIC1phx8IqkP/0zQ3gWbYdGV20UEvIB+oHsO
u7OysZdyfXS3wx6rysTWepQJ6rtWJ/yQSyzTt+RnCTYxUnyhMVPKMJOmoztyhkD5
37I9ricqMS/Ob5A3pKGEW2p/TojPYL5o8svCRt+UWbyxz05AQiCEPteeD7MrcOK+
ASULR2z2h95EYfrMhZSeFjFoXHrPfeMoR5OVESP8gef7uGTlqIZO1mZ6QkAFqL/b
VtqCI74oTc+XpNj7jxnvxznilqnvjD31oaci2oK+AX+DQcwOnTIGuUlU1bS+XOwm
WFbDKUbksNC/QQ2hPqcCvZTtK+U7XlPZz7pRyEdvHYRckaNDzLbiLzYHvRGgCHoq
uy9u429L1pthoj1vQvUY2ZD4HyI4K/UusApie5x3hmYlePNSEcC7TNDt2SvdjrID
yX6X9zWC9ffHSmKLBI11PWNs5R1EUrUlBcZ7CFDDmJDCeKRmwmY1+nuYSm7x80iB
ctfpXTJG4Ajrbbki5LCdoLPU0piR/IkSEwxeEY0u/5XLcdEiY/Z3SEJzlWeuIPf6
bNuWQK8YP6ane8p3Vc/UwmtMgaCEsnAwYrcRfmjOEQfVDxmRzHARIxbIFs0EsM54
S+6SH6LN1HCeFsG3zvpwPrm9gK2GojvJ0tCwZ78UZZx5m4CrgtHVHHfbspygftv8
6L/YJ/Q0PQja0s3lx/Eh
=R95o
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140506' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Fixes an error return code for the breakpoint setup
2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-
The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add an interface to inject clock comparator and CPU timer interrupts
into the guest. This is needed for handling the external interrupt
interception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When copy_from_user() fails, this code returns the number of bytes
remaining instead of a negative error code. The positive number is
returned to the user but otherwise it is harmless.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When starting lots of dataplane devices the bootup takes very long on
Christian's s390 with irqfd patches. With larger setups he is even
able to trigger some timeouts in some components. Turns out that the
KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
when having multiple CPUs. This is caused by the synchronize_rcu and
the HZ=100 of s390. By changing the code to use a private srcu we can
speed things up. This patch reduces the boot time till mounting root
from 8 to 2 seconds on my s390 guest with 100 disks.
Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
uses rcu_dereference_raw rather than rcu_dereference, and write-sides
do not do rcu lockdep at all).
Note that we're hardly relying on the "sleepable" part of srcu. We just
want SRCU's faster detection of grace periods.
Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
and RR. The difference between results "before" and "after" the patch
has mean -0.2% and standard deviation 0.6%. Using a paired t-test on the
data points says that there is a 2.5% probability that the patch is the
cause of the performance difference (rather than a random fluctuation).
(Restricting the t-test to RR, which is the most likely to be affected,
changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
probability that the numbers actually say something about the patch.
The probability increases mostly because there are fewer data points).
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.
2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.
3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS 23
define KVM_REQ_DISABLE_IBS 24
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTX6eTAAoJEBF7vIC1phx8S8YQAKRQ1Oe75qLS+F1yipNHi905
7byi0K2nuGF/K4NSkQeWyv8mqjlWIEE+a8PR69mrgx6biae/Sn6l2ZZiV3Ml0flH
9FEIlu0PU/RvyPfERT9MxUsbY2Dbec4r3Q3U+RLftlAht6oA/AaiLaY6cKSzwXqa
vTqm7VORgTsM7JUkdoC/BdwNH6+94I7IM6CGaWMmWqELAhKq6SUCmc8g26bvEjBd
94bkUEBYgHuceEPYAmDA1r7QSStpnU+cgj0haHRI12g4y1PhuBuFkmAGPv/E60wT
iGOkhQ5XchR6dmZBLC/zRbjObi5NqqRojRYcI8RZfbdfvxtD8+xKg7G++KXw+VkK
lR2CyJfrXms9r90mo7Oi3oCEuy0gC5ToOtRbOb/Xo8CYjCd6pS8DJraaQQfUhsoX
koWiPVu87Gk6GMALZdJYCAdHgOhwC/dglKVGHThsFFVwLgsC7ZKsluWYA/Y0FRtA
B2A7DBIC5O1NNXs3CYIv+v0M3jNF+4tt7wxV4omNPiSZTb4IAhG+0ucvU+hIZFbm
i07a1sULAOSAX7qeizmHamomrC5NeEOUQeQ2ciQbSH9L+GEZQj10YaOXXKQh79C+
7LXTtXWzESzx+EXBuCPcFsNt84GH294YoFTitHzN0hAburkVqwQzIAEAby7ylgX2
WUC6UjhyheZr0rBqO2g9
=Lxxg
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140429' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.
2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.
3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS 23
define KVM_REQ_DISABLE_IBS 24
Invariant TSC is a property of TSC, no additional
support code necessary.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.
When this facility is operating, some instructions can be executed
faster for single-cpu guests.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and
makes use of it at all applicable places. These functions prepare the additional
execution of code when starting/stopping a vcpu.
The CPUSTAT_STOPPED bit should not be touched outside of these functions.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
TEST BLOCK is also subject to the low-address protection, so we need
to check the destination address in our handler.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add a check for low-address protection to the PFMF handler and
convert real-addresses to absolute if necessary, as it is defined
in the Principles of Operations specification.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The s390 architecture has a special protection mechanism that can
be used to prevent write access to the vital data in the low-core
memory area. This patch adds a new helper function that can be used
to check for such write accesses and in case of protection, it also
sets up the exception data accordingly.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When the guest executes the MVPG instruction with DAT disabled,
and the source or destination page is not mapped in the host,
the so-called partial execution interception occurs. We need to
handle this event by setting up a mapping for the corresponding
user pages.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
async_pf_execute() passes tsk == current to gup(), this is doesn't
hurt but unnecessary and misleading. "tsk" is only used to account
the number of faults and current is the random workqueue thread.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
async_pf_execute() has no reasons to adopt apf->mm, gup(current, mm)
should work just fine even if current has another or NULL ->mm.
Recently kvm_async_page_present_sync() was added insedie the "use_mm"
section, but it seems that it doesn't need current->mm too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
just change the spte from writable to readonly so that we only need to care
the case of changing spte from present to present (changing the spte from
present to nonpresent will flush all the TLBs immediately), in other words,
the only case we need to care is mmu_spte_update()
- in mmu_spte_update(), we haved checked
SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
means it does not depend on PT_WRITABLE_MASK anymore
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)
This patch is used to avoid this kind of race:
VCPU 0 VCPU 1
lockless wirte protection:
set spte.w = 0
lock mmu-lock
write protection the spte to sync shadow page,
see spte.w = 0, then without flush tlb
unlock mmu-lock
!!! At this point, the shadow page can still be
writable due to the corrupt tlb entry
Flush all TLB
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided
The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter. This removes the need for the return value.
This version has fixed the issue reported in 6b73a9606, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Using sp->role.level instead of @level since @level is not got from the
page table hierarchy
There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)
This patch makes the code more readable and avoids potential issue in the
further development
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This reverts commit 5befdc385d.
Since we will allow flush tlb out of mmu-lock in the later
patch
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If EFER.LMA is off, cs.l does not determine execution mode.
Currently, the emulation engine assumes differently.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The IN instruction is not be affected by REP-prefix as INS is. Therefore, the
emulation should ignore the REP prefix as well. The current emulator
implementation tries to perform writeback when IN instruction with REP-prefix
is emulated. This causes it to perform wrong memory write or spurious #GP
exception to be injected to the guest.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
According to Intel specifications, PAE and non-PAE does not have any reserved
bits. In long-mode, regardless to PCIDE, only the high bits (above the
physical address) are reserved.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If a guest enables a performance counter but does not enable PMI, the
hypervisor currently does not reprogram the performance counter once it
overflows. As a result the host performance counter is kept with the original
sampling period which was configured according to the value of the guest's
counter when the counter was enabled.
Such behaviour can cause very bad consequences. The most distrubing one can
cause the guest not to make any progress at all, and keep exiting due to host
PMI before any guest instructions is exeucted. This situation occurs when the
performance counter holds a very high value when the guest enables the
performance counter. As a result the host's sampling period is configured to be
very short. The host then never reconfigures the sampling period and get stuck
at entry->PMI->exit loop. We encountered such a scenario in our experiments.
The solution is to reprogram the counter even if the guest does not use PMI.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Some Type 1 hypervisors such as XEN won't enable VMX without it present
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This feature emulates the "Acknowledge interrupt on exit" behavior.
We can safely emulate it for L1 to run L2 even if L0 itself has it
disabled (to run L1).
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For single context invalidation, we fall through to global
invalidation in handle_invept() except for one case - when
the operand supplied by L1 is different from what we have in
vmcs12. However, typically hypervisors will only call invept
for the currently loaded eptp, so the condition will
never be true.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When entering an exception after an ICEBP, the saved instruction
pointer should point to after the instruction.
This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686
Signed-off-by: Huw Davies <huw@codeweavers.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
-------------------------
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.
Migration bitmap
Until now s390 never provided a proper dirty bitmap. Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.
per-VM device attributes
------------------------
To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-"device".
Userspace controlled CMMA
-------------------------
The CMMA assist is changed from "always on" to "on if requested" via
per-VM device attributes. In addition a callback to reset all usage
states is provided.
Proper guest DAT handling for intercepts
----------------------------------------
While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
distros
Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.
GDB support
-----------
We provide breakpoints, single stepping and watchpoints.
Fixes/Cleanups
--------------
- Improve program check delivery
- Factor out the handling of transactional memory on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation
NOTES
-----
- All patches touching base s390 are either ACKed or written by the s390
maintainers
- One base KVM patch "KVM: add kvm_is_error_gpa() helper"
- One patch introduces the notion of VM device attributes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTVlHZAAoJEBF7vIC1phx8REgP/1P0EUzfBpoS53z1v60n2uLT
lW79LY9Op4/ZacEgHtU9LzmGa88X0arDsIpBZQsTNLF77AGFcMCCV3X2il/lQrRG
KSE+ycKLoFjCcES442DwF4gHoGldD+KL/+5LPWSQZtvb9dDpHDft9aeMRBbpUL0Q
M2kKQDlmJ2XqQu3D5PwSHgVRByHiHOzmTe2ejSSbdppkwBpaiqSBBBk0jVYDW9Jh
eqUnBcrrYW2p+QS37ELM6hOkfDXN/vXoHBQeyca19TuZVCPNA7HeJaPc2mJ/GZk9
wrNWEmY3f/lY0lk0zMwBwsDOS5K7jbtvXzcex6m+NsIqQuOvKsmPBy1BWb/axcK5
uZq/JGFC0fxsFU+7ImtvQrJ/DMHnVuvSKF4WUVle2GdMlDIqkguwX27WwHSiH4/r
Au02KlVIMUZdLAEUrw/W/S4MPLeZYoGfetHGCOmSaP2qGc97BVFedZaqekDlUgMw
3gIoQmSIBcfrgF4k9N4nLjdhAX2S4gkviwF3pTlIkecNfa7RcI3Xk7U9mVPmIhL4
IquVqjdXZH4m0e4gViBMtQ0IPwGt1qFlV6Wv3O9MExhfi7VQ8M8TMYNhEvtGpY75
cuZwZYGM4FqszDAy9hbk0avTLqCxqlTiBKi3tHoQMappQmsJPrIdxIpev3MZPHCp
vZMkbzhM9l3eefNJVw66
=jxBp
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140422' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into queue
Lazy storage key handling
-------------------------
Linux does not use the ACC and F bits of the storage key. Newer Linux
versions also do not use the storage keys for dirty and reference
tracking. We can optimize the guest handling for those guests for faults
as well as page-in and page-out by simply not caring about the guest
visible storage key. We trap guest storage key instruction to enable
those keys only on demand.
Migration bitmap
Until now s390 never provided a proper dirty bitmap. Let's provide a
proper migration bitmap for s390. We also change the user dirty tracking
to a fault based mechanism. This makes the host completely independent
from the storage keys. Long term this will allow us to back guest memory
with large pages.
per-VM device attributes
------------------------
To avoid the introduction of new ioctls, let's provide the
attribute semanantic also on the VM-"device".
Userspace controlled CMMA
-------------------------
The CMMA assist is changed from "always on" to "on if requested" via
per-VM device attributes. In addition a callback to reset all usage
states is provided.
Proper guest DAT handling for intercepts
----------------------------------------
While instructions handled by SIE take care of all addressing aspects,
KVM/s390 currently does not care about guest address translation of
intercepts. This worked out fine, because
- the s390 Linux kernel has a 1:1 mapping between kernel virtual<->real
for all pages up to memory size
- intercepts happen only for a small amount of cases
- all of these intercepts happen to be in the kernel text for current
distros
Of course we need to be better for other intercepts, kernel modules etc.
We provide the infrastructure and rework all in-kernel intercepts to work
on logical addresses (paging etc) instead of real ones. The code has
been running internally for several months now, so it is time for going
public.
GDB support
-----------
We provide breakpoints, single stepping and watchpoints.
Fixes/Cleanups
--------------
- Improve program check delivery
- Factor out the handling of transactional memory on program checks
- Use the existing define __LC_PGM_TDB
- Several cleanups in the lowcore structure
- Documentation
NOTES
-----
- All patches touching base s390 are either ACKed or written by the s390
maintainers
- One base KVM patch "KVM: add kvm_is_error_gpa() helper"
- One patch introduces the notion of VM device attributes
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Conflicts:
include/uapi/linux/kvm.h
Factor out the new function handle_itdb(), which copies the ITDB into
guest lowcore to fully handle a TX abort.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The generically assembled low core labels already contain the
address for the TDB.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
On hard exits (abort, sigkill) we have have some kvm_s390_interrupt_info
structures hanging around. Delete those on exit to avoid memory leaks.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
When a guest is single-stepped, we want to disable timer interrupts. Otherwise,
the guest will continuously execute the external interrupt handler and make
debugging of code where timer interrupts are enabled almost impossible.
The delivery of timer interrupts can be enforced in such sections by setting a
breakpoint and continuing execution.
In order to disable timer interrupts, they are disabled in the control register
of the guest just before SIE entry and are suppressed in the interrupt
check/delivery methods.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch moves the checks for enabled timer (clock-comparator) interrupts and pending
timer interrupts into own functions, making the code better readable and easier to
maintain.
The method kvm_cpu_has_pending_timer is filled with life.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Added documentation for diag 501, stating that no subfunctions are provided and
no parameters are used.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adds support to debug the guest using the PER facility on s390.
Single-stepping, hardware breakpoints and hardware watchpoints are supported. In
order to use the PER facility of the guest without it noticing it, the control
registers of the guest have to be patched and access to them has to be
intercepted(stctl, stctg, lctl, lctlg).
All PER program interrupts have to be intercepted and only the relevant PER
interrupts for the guest have to be given back. Special care has to be taken
about repeated exits on the same hardware breakpoint. The intervention of the
host in the guests PER configuration is not fully transparent. PER instruction
nullification can not be used by the guest and too many storage alteration
events may be reported to the guest (if it is activated for special address
ranges only) when the host concurrently debugging it.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adds the structs to the kernel headers needed to pass information
from/to userspace in order to debug a guest on s390 with hardware support.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduce the methods to emulate the stctl and stctg instruction. Added tracing
code.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>