[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* This module enables machines with Intel VT-x extensions to run virtual
|
|
|
|
* machines without emulation or binary translation.
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Qumranet, Inc.
|
2010-10-06 20:23:22 +08:00
|
|
|
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Avi Kivity <avi@qumranet.com>
|
|
|
|
* Yaniv Kamay <yaniv@qumranet.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2015-03-26 22:39:29 +08:00
|
|
|
#include <kvm/iodev.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-16 17:02:48 +08:00
|
|
|
#include <linux/kvm_host.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#include <linux/kvm.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/percpu.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/miscdevice.h>
|
|
|
|
#include <linux/vmalloc.h>
|
|
|
|
#include <linux/reboot.h>
|
|
|
|
#include <linux/debugfs.h>
|
|
|
|
#include <linux/highmem.h>
|
|
|
|
#include <linux/file.h>
|
2011-03-24 05:16:23 +08:00
|
|
|
#include <linux/syscore_ops.h>
|
2007-02-12 16:54:47 +08:00
|
|
|
#include <linux/cpu.h>
|
2017-02-03 02:15:33 +08:00
|
|
|
#include <linux/sched/signal.h>
|
2017-02-09 01:51:29 +08:00
|
|
|
#include <linux/sched/mm.h>
|
2017-02-09 01:51:35 +08:00
|
|
|
#include <linux/sched/stat.h>
|
2007-06-08 00:18:30 +08:00
|
|
|
#include <linux/cpumask.h>
|
|
|
|
#include <linux/smp.h>
|
2007-06-28 20:38:16 +08:00
|
|
|
#include <linux/anon_inodes.h>
|
2007-09-10 23:10:54 +08:00
|
|
|
#include <linux/profile.h>
|
2007-09-18 03:57:50 +08:00
|
|
|
#include <linux/kvm_para.h>
|
2007-10-10 01:20:39 +08:00
|
|
|
#include <linux/pagemap.h>
|
2007-10-18 22:59:34 +08:00
|
|
|
#include <linux/mman.h>
|
2008-04-03 03:46:56 +08:00
|
|
|
#include <linux/swap.h>
|
2009-03-12 21:45:39 +08:00
|
|
|
#include <linux/bitops.h>
|
2009-05-08 04:55:13 +08:00
|
|
|
#include <linux/spinlock.h>
|
2009-10-22 20:19:27 +08:00
|
|
|
#include <linux/compat.h>
|
2009-12-24 00:35:21 +08:00
|
|
|
#include <linux/srcu.h>
|
2010-01-28 19:37:56 +08:00
|
|
|
#include <linux/hugetlb.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2011-07-27 21:00:48 +08:00
|
|
|
#include <linux/sort.h>
|
|
|
|
#include <linux/bsearch.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-06-29 02:15:57 +08:00
|
|
|
#include <asm/processor.h>
|
|
|
|
#include <asm/io.h>
|
2014-09-20 07:03:25 +08:00
|
|
|
#include <asm/ioctl.h>
|
2016-12-25 03:46:01 +08:00
|
|
|
#include <linux/uaccess.h>
|
2007-11-19 17:16:57 +08:00
|
|
|
#include <asm/pgtable.h>
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-05-30 22:05:54 +08:00
|
|
|
#include "coalesced_mmio.h"
|
2010-10-14 17:22:46 +08:00
|
|
|
#include "async_pf.h"
|
2014-09-24 19:02:46 +08:00
|
|
|
#include "vfio.h"
|
2008-05-30 22:05:54 +08:00
|
|
|
|
2009-06-17 20:22:14 +08:00
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include <trace/events/kvm.h>
|
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
/* Worst case buffer size needed for holding an integer. */
|
|
|
|
#define ITOA_MAX_LEN 12
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
MODULE_AUTHOR("Qumranet");
|
|
|
|
MODULE_LICENSE("GPL");
|
|
|
|
|
2015-09-18 18:34:53 +08:00
|
|
|
/* Architectures should define their poll value according to the halt latency */
|
2016-10-14 08:53:19 +08:00
|
|
|
unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
|
2017-06-27 17:51:18 +08:00
|
|
|
module_param(halt_poll_ns, uint, 0644);
|
2016-10-14 08:53:19 +08:00
|
|
|
EXPORT_SYMBOL_GPL(halt_poll_ns);
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
|
2015-09-03 22:07:38 +08:00
|
|
|
/* Default doubles per-vcpu halt_poll_ns. */
|
2016-10-14 08:53:19 +08:00
|
|
|
unsigned int halt_poll_ns_grow = 2;
|
2017-06-27 17:51:18 +08:00
|
|
|
module_param(halt_poll_ns_grow, uint, 0644);
|
2016-10-14 08:53:19 +08:00
|
|
|
EXPORT_SYMBOL_GPL(halt_poll_ns_grow);
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2019-01-27 18:17:15 +08:00
|
|
|
/* The start value to grow halt_poll_ns from */
|
|
|
|
unsigned int halt_poll_ns_grow_start = 10000; /* 10us */
|
|
|
|
module_param(halt_poll_ns_grow_start, uint, 0644);
|
|
|
|
EXPORT_SYMBOL_GPL(halt_poll_ns_grow_start);
|
|
|
|
|
2015-09-03 22:07:38 +08:00
|
|
|
/* Default resets per-vcpu halt_poll_ns . */
|
2016-10-14 08:53:19 +08:00
|
|
|
unsigned int halt_poll_ns_shrink;
|
2017-06-27 17:51:18 +08:00
|
|
|
module_param(halt_poll_ns_shrink, uint, 0644);
|
2016-10-14 08:53:19 +08:00
|
|
|
EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2009-06-05 02:08:24 +08:00
|
|
|
/*
|
|
|
|
* Ordering of locks:
|
|
|
|
*
|
2015-02-26 14:58:24 +08:00
|
|
|
* kvm->lock --> kvm->slots_lock --> kvm->irq_lock
|
2009-06-05 02:08:24 +08:00
|
|
|
*/
|
|
|
|
|
2013-09-25 19:53:07 +08:00
|
|
|
DEFINE_SPINLOCK(kvm_lock);
|
2013-09-10 18:58:35 +08:00
|
|
|
static DEFINE_RAW_SPINLOCK(kvm_count_lock);
|
2007-11-14 20:38:21 +08:00
|
|
|
LIST_HEAD(vm_list);
|
2007-02-12 16:54:44 +08:00
|
|
|
|
2008-12-07 18:55:45 +08:00
|
|
|
static cpumask_var_t cpus_hardware_enabled;
|
2015-02-26 14:58:21 +08:00
|
|
|
static int kvm_usage_count;
|
2009-09-15 17:37:46 +08:00
|
|
|
static atomic_t hardware_enable_failed;
|
2007-05-24 18:03:52 +08:00
|
|
|
|
2007-07-30 19:12:19 +08:00
|
|
|
struct kmem_cache *kvm_vcpu_cache;
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
|
2007-04-19 22:27:43 +08:00
|
|
|
|
2007-07-11 23:17:21 +08:00
|
|
|
static __read_mostly struct preempt_ops kvm_preempt_ops;
|
|
|
|
|
2008-04-16 05:05:42 +08:00
|
|
|
struct dentry *kvm_debugfs_dir;
|
2015-03-28 11:21:01 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
static int kvm_debugfs_num_entries;
|
|
|
|
static const struct file_operations *stat_fops_per_vm[];
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
|
|
|
|
unsigned long arg);
|
2015-02-03 16:35:15 +08:00
|
|
|
#ifdef CONFIG_KVM_COMPAT
|
2011-06-08 08:45:37 +08:00
|
|
|
static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
|
|
|
|
unsigned long arg);
|
2018-06-17 17:16:21 +08:00
|
|
|
#define KVM_COMPAT(c) .compat_ioctl = (c)
|
|
|
|
#else
|
|
|
|
static long kvm_no_compat_ioctl(struct file *file, unsigned int ioctl,
|
|
|
|
unsigned long arg) { return -EINVAL; }
|
|
|
|
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl
|
2011-06-08 08:45:37 +08:00
|
|
|
#endif
|
2009-09-15 17:37:46 +08:00
|
|
|
static int hardware_enable_all(void);
|
|
|
|
static void hardware_disable_all(void);
|
2007-02-22 00:04:26 +08:00
|
|
|
|
2009-12-24 00:35:24 +08:00
|
|
|
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
|
2013-12-30 04:12:29 +08:00
|
|
|
|
2015-05-26 18:43:41 +08:00
|
|
|
static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
|
2009-12-24 00:35:24 +08:00
|
|
|
|
2014-02-08 15:51:57 +08:00
|
|
|
__visible bool kvm_rebooting;
|
2010-12-02 23:52:50 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_rebooting);
|
2008-05-13 18:23:38 +08:00
|
|
|
|
2009-06-11 23:07:44 +08:00
|
|
|
static bool largepages_enabled = true;
|
|
|
|
|
2017-07-12 23:56:44 +08:00
|
|
|
#define KVM_EVENT_CREATE_VM 0
|
|
|
|
#define KVM_EVENT_DESTROY_VM 1
|
|
|
|
static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
|
|
|
|
static unsigned long long kvm_createvm_count;
|
|
|
|
static unsigned long long kvm_active_vms;
|
|
|
|
|
2018-08-22 12:52:33 +08:00
|
|
|
__weak int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
|
|
|
|
unsigned long start, unsigned long end, bool blockable)
|
2017-12-01 02:05:45 +08:00
|
|
|
{
|
2018-08-22 12:52:33 +08:00
|
|
|
return 0;
|
2017-12-01 02:05:45 +08:00
|
|
|
}
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
|
2008-07-29 00:26:24 +08:00
|
|
|
{
|
2013-07-25 09:04:38 +08:00
|
|
|
if (pfn_valid(pfn))
|
2014-11-10 16:33:56 +08:00
|
|
|
return PageReserved(pfn_to_page(pfn));
|
2008-07-29 00:26:24 +08:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
/*
|
|
|
|
* Switches to specified vcpu, until a matching vcpu_put()
|
|
|
|
*/
|
2017-12-05 04:35:23 +08:00
|
|
|
void vcpu_load(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2017-12-05 04:35:23 +08:00
|
|
|
int cpu = get_cpu();
|
2007-07-11 23:17:21 +08:00
|
|
|
preempt_notifier_register(&vcpu->preempt_notifier);
|
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-12 01:16:52 +08:00
|
|
|
kvm_arch_vcpu_load(vcpu, cpu);
|
2007-07-11 23:17:21 +08:00
|
|
|
put_cpu();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2016-07-09 06:36:06 +08:00
|
|
|
EXPORT_SYMBOL_GPL(vcpu_load);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-12 01:16:52 +08:00
|
|
|
void vcpu_put(struct kvm_vcpu *vcpu)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-07-11 23:17:21 +08:00
|
|
|
preempt_disable();
|
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-12 01:16:52 +08:00
|
|
|
kvm_arch_vcpu_put(vcpu);
|
2007-07-11 23:17:21 +08:00
|
|
|
preempt_notifier_unregister(&vcpu->preempt_notifier);
|
|
|
|
preempt_enable();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2016-07-09 06:36:06 +08:00
|
|
|
EXPORT_SYMBOL_GPL(vcpu_put);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2017-04-27 20:33:43 +08:00
|
|
|
/* TODO: merge with kvm_arch_vcpu_should_kick */
|
|
|
|
static bool kvm_request_needs_ipi(struct kvm_vcpu *vcpu, unsigned req)
|
|
|
|
{
|
|
|
|
int mode = kvm_vcpu_exiting_guest_mode(vcpu);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to wait for the VCPU to reenable interrupts and get out of
|
|
|
|
* READING_SHADOW_PAGE_TABLES mode.
|
|
|
|
*/
|
|
|
|
if (req & KVM_REQUEST_WAIT)
|
|
|
|
return mode != OUTSIDE_GUEST_MODE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Need to kick a running VCPU, but otherwise there is nothing to do.
|
|
|
|
*/
|
|
|
|
return mode == IN_GUEST_MODE;
|
|
|
|
}
|
|
|
|
|
2007-06-08 00:18:30 +08:00
|
|
|
static void ack_flush(void *_completed)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2017-06-30 19:25:45 +08:00
|
|
|
static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait)
|
|
|
|
{
|
|
|
|
if (unlikely(!cpus))
|
|
|
|
cpus = cpu_online_mask;
|
|
|
|
|
|
|
|
if (cpumask_empty(cpus))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
smp_call_function_many(cpus, ack_flush, NULL, wait);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2018-05-16 23:21:28 +08:00
|
|
|
bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
|
|
|
|
unsigned long *vcpu_bitmap, cpumask_var_t tmp)
|
2007-06-08 00:18:30 +08:00
|
|
|
{
|
2008-07-20 19:24:22 +08:00
|
|
|
int i, cpu, me;
|
2007-06-08 00:18:30 +08:00
|
|
|
struct kvm_vcpu *vcpu;
|
2018-05-16 23:21:28 +08:00
|
|
|
bool called;
|
2008-12-08 17:58:04 +08:00
|
|
|
|
2011-01-12 15:41:22 +08:00
|
|
|
me = get_cpu();
|
2018-05-16 23:21:28 +08:00
|
|
|
|
2009-06-09 20:56:29 +08:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
2018-08-22 18:18:29 +08:00
|
|
|
if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
|
2018-05-16 23:21:28 +08:00
|
|
|
continue;
|
|
|
|
|
2011-01-12 15:41:22 +08:00
|
|
|
kvm_make_request(req, vcpu);
|
2007-06-08 00:18:30 +08:00
|
|
|
cpu = vcpu->cpu;
|
2011-01-12 15:40:31 +08:00
|
|
|
|
2017-04-27 04:32:26 +08:00
|
|
|
if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
|
|
|
|
continue;
|
2017-04-27 04:32:23 +08:00
|
|
|
|
2018-05-16 23:21:28 +08:00
|
|
|
if (tmp != NULL && cpu != -1 && cpu != me &&
|
2017-04-27 20:33:43 +08:00
|
|
|
kvm_request_needs_ipi(vcpu, req))
|
2018-05-16 23:21:28 +08:00
|
|
|
__cpumask_set_cpu(cpu, tmp);
|
2008-12-08 17:56:24 +08:00
|
|
|
}
|
2018-05-16 23:21:28 +08:00
|
|
|
|
|
|
|
called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT));
|
2011-01-12 15:41:22 +08:00
|
|
|
put_cpu();
|
2018-05-16 23:21:28 +08:00
|
|
|
|
|
|
|
return called;
|
|
|
|
}
|
|
|
|
|
|
|
|
bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
|
|
|
|
{
|
|
|
|
cpumask_var_t cpus;
|
|
|
|
bool called;
|
|
|
|
|
|
|
|
zalloc_cpumask_var(&cpus, GFP_ATOMIC);
|
|
|
|
|
2018-08-22 18:18:29 +08:00
|
|
|
called = kvm_make_vcpus_request_mask(kvm, req, NULL, cpus);
|
2018-05-16 23:21:28 +08:00
|
|
|
|
2008-12-08 17:58:04 +08:00
|
|
|
free_cpumask_var(cpus);
|
2008-12-08 17:56:24 +08:00
|
|
|
return called;
|
2007-06-08 00:18:30 +08:00
|
|
|
}
|
|
|
|
|
2015-01-16 07:58:52 +08:00
|
|
|
#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
|
2008-12-08 17:56:24 +08:00
|
|
|
void kvm_flush_remote_tlbs(struct kvm *kvm)
|
2008-02-21 03:47:24 +08:00
|
|
|
{
|
2016-03-13 11:10:28 +08:00
|
|
|
/*
|
|
|
|
* Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
|
|
|
|
* kvm_make_all_cpus_request.
|
|
|
|
*/
|
|
|
|
long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We want to publish modifications to the page tables before reading
|
|
|
|
* mode. Pairs with a memory barrier in arch-specific code.
|
|
|
|
* - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest
|
|
|
|
* and smp_mb in walk_shadow_page_lockless_begin/end.
|
|
|
|
* - powerpc: smp_mb in kvmppc_prepare_to_enter.
|
|
|
|
*
|
|
|
|
* There is already an smp_mb__after_atomic() before
|
|
|
|
* kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
|
|
|
|
* barrier here.
|
|
|
|
*/
|
2018-07-19 16:40:17 +08:00
|
|
|
if (!kvm_arch_flush_remote_tlb(kvm)
|
|
|
|
|| kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
|
2008-12-08 17:56:24 +08:00
|
|
|
++kvm->stat.remote_tlb_flush;
|
2014-04-17 17:06:12 +08:00
|
|
|
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
|
2008-02-21 03:47:24 +08:00
|
|
|
}
|
2013-10-08 00:47:59 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
|
2015-01-16 07:58:52 +08:00
|
|
|
#endif
|
2008-02-21 03:47:24 +08:00
|
|
|
|
2008-12-08 17:56:24 +08:00
|
|
|
void kvm_reload_remote_mmus(struct kvm *kvm)
|
|
|
|
{
|
2014-09-24 15:57:55 +08:00
|
|
|
kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
|
2008-12-08 17:56:24 +08:00
|
|
|
}
|
2008-02-21 03:47:24 +08:00
|
|
|
|
2007-07-27 15:16:56 +08:00
|
|
|
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
mutex_init(&vcpu->mutex);
|
|
|
|
vcpu->cpu = -1;
|
|
|
|
vcpu->kvm = kvm;
|
|
|
|
vcpu->vcpu_id = id;
|
2011-02-01 22:52:41 +08:00
|
|
|
vcpu->pid = NULL;
|
2016-02-19 16:46:39 +08:00
|
|
|
init_swait_queue_head(&vcpu->wq);
|
2010-10-14 17:22:46 +08:00
|
|
|
kvm_async_pf_vcpu_init(vcpu);
|
2007-07-27 15:16:56 +08:00
|
|
|
|
2015-09-18 22:29:55 +08:00
|
|
|
vcpu->pre_pcpu = -1;
|
|
|
|
INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
|
|
|
|
|
2007-07-27 15:16:56 +08:00
|
|
|
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
|
|
|
|
if (!page) {
|
|
|
|
r = -ENOMEM;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
vcpu->run = page_address(page);
|
|
|
|
|
2012-07-18 21:37:46 +08:00
|
|
|
kvm_vcpu_set_in_spin_loop(vcpu, false);
|
|
|
|
kvm_vcpu_set_dy_eligible(vcpu, false);
|
2013-03-05 02:02:07 +08:00
|
|
|
vcpu->preempted = false;
|
2012-07-18 21:37:46 +08:00
|
|
|
|
2007-11-14 20:38:21 +08:00
|
|
|
r = kvm_arch_vcpu_init(vcpu);
|
2007-07-27 15:16:56 +08:00
|
|
|
if (r < 0)
|
2007-11-14 20:38:21 +08:00
|
|
|
goto fail_free_run;
|
2007-07-27 15:16:56 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
fail_free_run:
|
|
|
|
free_page((unsigned long)vcpu->run);
|
|
|
|
fail:
|
2007-10-08 08:50:48 +08:00
|
|
|
return r;
|
2007-07-27 15:16:56 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_init);
|
|
|
|
|
|
|
|
void kvm_vcpu_uninit(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2017-07-06 20:44:28 +08:00
|
|
|
/*
|
|
|
|
* no need for rcu_read_lock as VCPU_RUN is the only place that
|
|
|
|
* will change the vcpu->pid pointer and on uninit all file
|
|
|
|
* descriptors are already gone.
|
|
|
|
*/
|
|
|
|
put_pid(rcu_dereference_protected(vcpu->pid, 1));
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_vcpu_uninit(vcpu);
|
2007-07-27 15:16:56 +08:00
|
|
|
free_page((unsigned long)vcpu->run);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_uninit);
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
|
|
|
|
static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
|
|
|
|
{
|
|
|
|
return container_of(mn, struct kvm, mmu_notifier);
|
|
|
|
}
|
|
|
|
|
2009-09-24 02:47:18 +08:00
|
|
|
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
|
|
|
|
struct mm_struct *mm,
|
|
|
|
unsigned long address,
|
|
|
|
pte_t pte)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
2009-12-24 00:35:21 +08:00
|
|
|
int idx;
|
2009-09-24 02:47:18 +08:00
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2009-09-24 02:47:18 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
kvm->mmu_notifier_seq++;
|
2018-12-06 21:21:11 +08:00
|
|
|
|
|
|
|
if (kvm_set_spte_hva(kvm, address, pte))
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
2009-09-24 02:47:18 +08:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2009-12-24 00:35:21 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2009-09-24 02:47:18 +08:00
|
|
|
}
|
|
|
|
|
2018-08-22 12:52:33 +08:00
|
|
|
static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
|
2018-12-28 16:38:05 +08:00
|
|
|
const struct mmu_notifier_range *range)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
2009-12-24 00:35:21 +08:00
|
|
|
int need_tlb_flush = 0, idx;
|
2018-08-22 12:52:33 +08:00
|
|
|
int ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2008-07-25 22:24:52 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
/*
|
|
|
|
* The count increase must become visible at unlock time as no
|
|
|
|
* spte can be established without taking the mmu_lock and
|
|
|
|
* count is also read inside the mmu_lock critical section.
|
|
|
|
*/
|
|
|
|
kvm->mmu_notifier_count++;
|
2018-12-28 16:38:05 +08:00
|
|
|
need_tlb_flush = kvm_unmap_hva_range(kvm, range->start, range->end);
|
2010-11-23 11:13:00 +08:00
|
|
|
need_tlb_flush |= kvm->tlbs_dirty;
|
2008-07-25 22:24:52 +08:00
|
|
|
/* we've to flush the tlb before the pages can be freed */
|
|
|
|
if (need_tlb_flush)
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
2012-02-10 14:28:31 +08:00
|
|
|
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
2017-12-01 02:05:45 +08:00
|
|
|
|
2018-12-28 16:38:05 +08:00
|
|
|
ret = kvm_arch_mmu_notifier_invalidate_range(kvm, range->start,
|
|
|
|
range->end, range->blockable);
|
2017-12-01 02:05:45 +08:00
|
|
|
|
2012-02-10 14:28:31 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2018-08-22 12:52:33 +08:00
|
|
|
|
|
|
|
return ret;
|
2008-07-25 22:24:52 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
|
2018-12-28 16:38:05 +08:00
|
|
|
const struct mmu_notifier_range *range)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
|
|
|
|
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
/*
|
|
|
|
* This sequence increase will notify the kvm page fault that
|
|
|
|
* the page that is going to be mapped in the spte could have
|
|
|
|
* been freed.
|
|
|
|
*/
|
|
|
|
kvm->mmu_notifier_seq++;
|
2011-12-12 20:37:21 +08:00
|
|
|
smp_wmb();
|
2008-07-25 22:24:52 +08:00
|
|
|
/*
|
|
|
|
* The above sequence increase must be visible before the
|
2011-12-12 20:37:21 +08:00
|
|
|
* below count decrease, which is ensured by the smp_wmb above
|
|
|
|
* in conjunction with the smp_rmb in mmu_notifier_retry().
|
2008-07-25 22:24:52 +08:00
|
|
|
*/
|
|
|
|
kvm->mmu_notifier_count--;
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
|
|
|
|
|
|
|
BUG_ON(kvm->mmu_notifier_count < 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
|
|
|
|
struct mm_struct *mm,
|
2014-09-23 05:54:42 +08:00
|
|
|
unsigned long start,
|
|
|
|
unsigned long end)
|
2008-07-25 22:24:52 +08:00
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
2009-12-24 00:35:21 +08:00
|
|
|
int young, idx;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2008-07-25 22:24:52 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
|
2014-09-23 05:54:42 +08:00
|
|
|
young = kvm_age_hva(kvm, start, end);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (young)
|
|
|
|
kvm_flush_remote_tlbs(kvm);
|
|
|
|
|
2012-02-10 14:28:31 +08:00
|
|
|
spin_unlock(&kvm->mmu_lock);
|
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2015-09-10 06:35:41 +08:00
|
|
|
static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
|
|
|
|
struct mm_struct *mm,
|
|
|
|
unsigned long start,
|
|
|
|
unsigned long end)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
|
|
|
int young, idx;
|
|
|
|
|
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
/*
|
|
|
|
* Even though we do not flush TLB, this will still adversely
|
|
|
|
* affect performance on pre-Haswell Intel EPT, where there is
|
|
|
|
* no EPT Access Bit to clear so that we have to tear down EPT
|
|
|
|
* tables instead. If we find this unacceptable, we can always
|
|
|
|
* add a parameter to kvm_age_hva so that it effectively doesn't
|
|
|
|
* do anything on clear_young.
|
|
|
|
*
|
|
|
|
* Also note that currently we never issue secondary TLB flushes
|
|
|
|
* from clear_young, leaving this job up to the regular system
|
|
|
|
* cadence. If we find this inaccurate, we might come up with a
|
|
|
|
* more sophisticated heuristic later.
|
|
|
|
*/
|
|
|
|
young = kvm_age_hva(kvm, start, end);
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
|
|
|
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2011-01-14 07:47:10 +08:00
|
|
|
static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
|
|
|
|
struct mm_struct *mm,
|
|
|
|
unsigned long address)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
|
|
|
int young, idx;
|
|
|
|
|
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
young = kvm_test_age_hva(kvm, address);
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
|
|
|
|
|
|
|
return young;
|
|
|
|
}
|
|
|
|
|
2008-12-11 04:23:26 +08:00
|
|
|
static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
|
|
|
|
struct mm_struct *mm)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = mmu_notifier_to_kvm(mn);
|
2010-04-20 14:29:29 +08:00
|
|
|
int idx;
|
|
|
|
|
|
|
|
idx = srcu_read_lock(&kvm->srcu);
|
2012-08-25 02:54:57 +08:00
|
|
|
kvm_arch_flush_shadow_all(kvm);
|
2010-04-20 14:29:29 +08:00
|
|
|
srcu_read_unlock(&kvm->srcu, idx);
|
2008-12-11 04:23:26 +08:00
|
|
|
}
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
|
|
|
|
.invalidate_range_start = kvm_mmu_notifier_invalidate_range_start,
|
|
|
|
.invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
|
|
|
|
.clear_flush_young = kvm_mmu_notifier_clear_flush_young,
|
2015-09-10 06:35:41 +08:00
|
|
|
.clear_young = kvm_mmu_notifier_clear_young,
|
2011-01-14 07:47:10 +08:00
|
|
|
.test_young = kvm_mmu_notifier_test_young,
|
2009-09-24 02:47:18 +08:00
|
|
|
.change_pte = kvm_mmu_notifier_change_pte,
|
2008-12-11 04:23:26 +08:00
|
|
|
.release = kvm_mmu_notifier_release,
|
2008-07-25 22:24:52 +08:00
|
|
|
};
|
2009-12-20 20:54:04 +08:00
|
|
|
|
|
|
|
static int kvm_init_mmu_notifier(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
kvm->mmu_notifier.ops = &kvm_mmu_notifier_ops;
|
|
|
|
return mmu_notifier_register(&kvm->mmu_notifier, current->mm);
|
|
|
|
}
|
|
|
|
|
|
|
|
#else /* !(CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER) */
|
|
|
|
|
|
|
|
static int kvm_init_mmu_notifier(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
#endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
|
|
|
|
|
2015-05-17 17:41:37 +08:00
|
|
|
static struct kvm_memslots *kvm_alloc_memslots(void)
|
2011-11-24 17:40:57 +08:00
|
|
|
{
|
|
|
|
int i;
|
2015-05-17 17:41:37 +08:00
|
|
|
struct kvm_memslots *slots;
|
2011-11-24 17:40:57 +08:00
|
|
|
|
2019-02-12 03:02:49 +08:00
|
|
|
slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
|
2015-05-17 17:41:37 +08:00
|
|
|
if (!slots)
|
|
|
|
return NULL;
|
|
|
|
|
2011-11-24 17:40:57 +08:00
|
|
|
for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
|
2011-11-24 17:41:54 +08:00
|
|
|
slots->id_to_index[i] = slots->memslots[i].id = i;
|
2015-05-17 17:41:37 +08:00
|
|
|
|
|
|
|
return slots;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
|
|
|
|
{
|
|
|
|
if (!memslot->dirty_bitmap)
|
|
|
|
return;
|
|
|
|
|
|
|
|
kvfree(memslot->dirty_bitmap);
|
|
|
|
memslot->dirty_bitmap = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Free any memory in @free but not in @dont.
|
|
|
|
*/
|
|
|
|
static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
|
|
|
|
struct kvm_memory_slot *dont)
|
|
|
|
{
|
|
|
|
if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
|
|
|
|
kvm_destroy_dirty_bitmap(free);
|
|
|
|
|
|
|
|
kvm_arch_free_memslot(kvm, free, dont);
|
|
|
|
|
|
|
|
free->npages = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *memslot;
|
|
|
|
|
|
|
|
if (!slots)
|
|
|
|
return;
|
|
|
|
|
|
|
|
kvm_for_each_memslot(memslot, slots)
|
|
|
|
kvm_free_memslot(kvm, memslot, NULL);
|
|
|
|
|
|
|
|
kvfree(slots);
|
2011-11-24 17:40:57 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
static void kvm_destroy_vm_debugfs(struct kvm *kvm)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!kvm->debugfs_dentry)
|
|
|
|
return;
|
|
|
|
|
|
|
|
debugfs_remove_recursive(kvm->debugfs_dentry);
|
|
|
|
|
2016-09-08 02:47:21 +08:00
|
|
|
if (kvm->debugfs_stat_data) {
|
|
|
|
for (i = 0; i < kvm_debugfs_num_entries; i++)
|
|
|
|
kfree(kvm->debugfs_stat_data[i]);
|
|
|
|
kfree(kvm->debugfs_stat_data);
|
|
|
|
}
|
2016-05-18 19:26:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
|
|
|
|
{
|
|
|
|
char dir_name[ITOA_MAX_LEN * 2];
|
|
|
|
struct kvm_stat_data *stat_data;
|
|
|
|
struct kvm_stats_debugfs_item *p;
|
|
|
|
|
|
|
|
if (!debugfs_initialized())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
|
2018-05-30 00:22:04 +08:00
|
|
|
kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
|
2016-05-18 19:26:23 +08:00
|
|
|
|
|
|
|
kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
|
|
|
|
sizeof(*kvm->debugfs_stat_data),
|
2019-02-12 03:02:49 +08:00
|
|
|
GFP_KERNEL_ACCOUNT);
|
2016-05-18 19:26:23 +08:00
|
|
|
if (!kvm->debugfs_stat_data)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
for (p = debugfs_entries; p->name; p++) {
|
2019-02-12 03:02:49 +08:00
|
|
|
stat_data = kzalloc(sizeof(*stat_data), GFP_KERNEL_ACCOUNT);
|
2016-05-18 19:26:23 +08:00
|
|
|
if (!stat_data)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
stat_data->kvm = kvm;
|
|
|
|
stat_data->offset = p->offset;
|
|
|
|
kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
|
2018-05-30 00:22:04 +08:00
|
|
|
debugfs_create_file(p->name, 0644, kvm->debugfs_dentry,
|
|
|
|
stat_data, stat_fops_per_vm[p->kind]);
|
2016-05-18 19:26:23 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-01-04 17:25:20 +08:00
|
|
|
static struct kvm *kvm_create_vm(unsigned long type)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-11-10 00:02:49 +08:00
|
|
|
int r, i;
|
|
|
|
struct kvm *kvm = kvm_arch_alloc_vm();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-11-10 00:02:49 +08:00
|
|
|
if (!kvm)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2016-03-21 17:15:25 +08:00
|
|
|
spin_lock_init(&kvm->mmu_lock);
|
2017-02-28 06:30:07 +08:00
|
|
|
mmgrab(current->mm);
|
2016-03-21 17:15:25 +08:00
|
|
|
kvm->mm = current->mm;
|
|
|
|
kvm_eventfd_init(kvm);
|
|
|
|
mutex_init(&kvm->lock);
|
|
|
|
mutex_init(&kvm->irq_lock);
|
|
|
|
mutex_init(&kvm->slots_lock);
|
2017-02-20 19:06:21 +08:00
|
|
|
refcount_set(&kvm->users_count, 1);
|
2016-03-21 17:15:25 +08:00
|
|
|
INIT_LIST_HEAD(&kvm->devices);
|
|
|
|
|
2012-01-04 17:25:20 +08:00
|
|
|
r = kvm_arch_init_vm(kvm, type);
|
2010-11-10 00:02:49 +08:00
|
|
|
if (r)
|
2014-01-16 20:44:20 +08:00
|
|
|
goto out_err_no_disable;
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
r = hardware_enable_all();
|
|
|
|
if (r)
|
2014-01-16 20:44:20 +08:00
|
|
|
goto out_err_no_disable;
|
2009-09-15 17:37:46 +08:00
|
|
|
|
2014-08-06 20:24:45 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_IRQFD
|
2009-08-24 16:54:23 +08:00
|
|
|
INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
|
2009-01-04 23:10:50 +08:00
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2012-12-11 01:33:32 +08:00
|
|
|
BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
|
|
|
|
|
2009-12-24 00:35:16 +08:00
|
|
|
r = -ENOMEM;
|
2015-05-17 23:30:37 +08:00
|
|
|
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
|
2017-02-04 12:44:51 +08:00
|
|
|
struct kvm_memslots *slots = kvm_alloc_memslots();
|
|
|
|
if (!slots)
|
2015-05-17 23:30:37 +08:00
|
|
|
goto out_err_no_srcu;
|
KVM: Remove the hack to trigger memslot generation wraparound
x86 captures a subset of the memslot generation (19 bits) in its MMIO
sptes so that it can expedite emulated MMIO handling by checking only
the releveant spte, i.e. doesn't need to do a full page fault walk.
Because the MMIO sptes capture only 19 bits (due to limited space in
the sptes), there is a non-zero probability that the MMIO generation
could wrap, e.g. after 500k memslot updates. Since normal usage is
extremely unlikely to result in 500k memslot updates, a hack was added
by commit 69c9ea93eaea ("KVM: MMU: init kvm generation close to mmio
wrap-around value") to offset the MMIO generation in order to trigger
a wraparound, e.g. after 150 memslot updates.
When separate memslot generation sequences were assigned to each
address space, commit 00f034a12fdd ("KVM: do not bias the generation
number in kvm_current_mmio_generation") moved the offset logic into the
initialization of the memslot generation itself so that the per-address
space bit(s) were not dropped/corrupted by the MMIO shenanigans.
Remove the offset hack for three reasons:
- While it does exercise x86's kvm_mmu_invalidate_mmio_sptes(), simply
wrapping the generation doesn't actually test the interesting case
of having stale MMIO sptes with the new generation number, e.g. old
sptes with a generation number of 0.
- Triggering kvm_mmu_invalidate_mmio_sptes() prematurely makes its
performance rather important since the probability of invalidating
MMIO sptes jumps from "effectively never" to "fairly likely". This
limits what can be done in future patches, e.g. to simplify the
invalidation code, as doing so without proper caution could lead to
a noticeable performance regression.
- Forcing the memslots generation, which is a 64-bit number, to wrap
prevents KVM from assuming the memslots generation will never wrap.
This in turn prevents KVM from using an arbitrary bit for the
"update in-progress" flag, e.g. using bit 63 would immediately
collide with using a large value as the starting generation number.
The "update in-progress" flag is effectively forced into bit 0 so
that it's (subtly) taken into account when incrementing the
generation.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-06 05:01:17 +08:00
|
|
|
/* Generations must be different for each address space. */
|
2019-02-06 05:01:18 +08:00
|
|
|
slots->generation = i;
|
2017-02-04 12:44:51 +08:00
|
|
|
rcu_assign_pointer(kvm->memslots[i], slots);
|
2015-05-17 23:30:37 +08:00
|
|
|
}
|
2014-08-20 20:29:21 +08:00
|
|
|
|
2009-12-24 00:35:21 +08:00
|
|
|
if (init_srcu_struct(&kvm->srcu))
|
2014-01-16 20:44:20 +08:00
|
|
|
goto out_err_no_srcu;
|
|
|
|
if (init_srcu_struct(&kvm->irq_srcu))
|
|
|
|
goto out_err_no_irq_srcu;
|
2009-12-24 00:35:24 +08:00
|
|
|
for (i = 0; i < KVM_NR_BUSES; i++) {
|
2017-07-07 16:51:38 +08:00
|
|
|
rcu_assign_pointer(kvm->buses[i],
|
2019-02-12 03:02:49 +08:00
|
|
|
kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
|
2010-11-09 19:42:12 +08:00
|
|
|
if (!kvm->buses[i])
|
2009-12-24 00:35:24 +08:00
|
|
|
goto out_err;
|
|
|
|
}
|
2008-07-25 22:24:52 +08:00
|
|
|
|
2011-06-04 04:04:53 +08:00
|
|
|
r = kvm_init_mmu_notifier(kvm);
|
|
|
|
if (r)
|
|
|
|
goto out_err;
|
|
|
|
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_lock(&kvm_lock);
|
2007-07-23 15:08:21 +08:00
|
|
|
list_add(&kvm->vm_list, &vm_list);
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_unlock(&kvm_lock);
|
2010-11-10 00:02:49 +08:00
|
|
|
|
2015-07-04 00:53:58 +08:00
|
|
|
preempt_notifier_inc();
|
|
|
|
|
2007-02-22 01:28:04 +08:00
|
|
|
return kvm;
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
out_err:
|
2014-01-16 20:44:20 +08:00
|
|
|
cleanup_srcu_struct(&kvm->irq_srcu);
|
|
|
|
out_err_no_irq_srcu:
|
2010-11-09 19:42:12 +08:00
|
|
|
cleanup_srcu_struct(&kvm->srcu);
|
2014-01-16 20:44:20 +08:00
|
|
|
out_err_no_srcu:
|
2009-09-15 17:37:46 +08:00
|
|
|
hardware_disable_all();
|
2014-01-16 20:44:20 +08:00
|
|
|
out_err_no_disable:
|
2017-09-13 20:17:22 +08:00
|
|
|
refcount_set(&kvm->users_count, 0);
|
2009-12-24 00:35:24 +08:00
|
|
|
for (i = 0; i < KVM_NR_BUSES; i++)
|
2017-08-02 23:55:54 +08:00
|
|
|
kfree(kvm_get_bus(kvm, i));
|
2015-05-17 23:30:37 +08:00
|
|
|
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
|
2017-08-02 23:55:54 +08:00
|
|
|
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
|
2010-11-10 00:02:49 +08:00
|
|
|
kvm_arch_free_vm(kvm);
|
2016-03-21 17:15:25 +08:00
|
|
|
mmdrop(current->mm);
|
2009-09-15 17:37:46 +08:00
|
|
|
return ERR_PTR(r);
|
2007-02-22 01:28:04 +08:00
|
|
|
}
|
|
|
|
|
2013-04-25 22:11:23 +08:00
|
|
|
static void kvm_destroy_devices(struct kvm *kvm)
|
|
|
|
{
|
2016-01-01 19:47:12 +08:00
|
|
|
struct kvm_device *dev, *tmp;
|
2013-04-25 22:11:23 +08:00
|
|
|
|
2016-08-10 01:13:01 +08:00
|
|
|
/*
|
|
|
|
* We do not need to take the kvm->lock here, because nobody else
|
|
|
|
* has a reference to the struct kvm at this point and therefore
|
|
|
|
* cannot access the devices list anyhow.
|
|
|
|
*/
|
2016-01-01 19:47:12 +08:00
|
|
|
list_for_each_entry_safe(dev, tmp, &kvm->devices, vm_node) {
|
|
|
|
list_del(&dev->vm_node);
|
2013-04-25 22:11:23 +08:00
|
|
|
dev->ops->destroy(dev);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-02-22 01:28:04 +08:00
|
|
|
static void kvm_destroy_vm(struct kvm *kvm)
|
|
|
|
{
|
2009-12-24 00:35:24 +08:00
|
|
|
int i;
|
2007-11-21 22:41:05 +08:00
|
|
|
struct mm_struct *mm = kvm->mm;
|
|
|
|
|
2017-07-12 23:56:44 +08:00
|
|
|
kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm);
|
2016-05-18 19:26:23 +08:00
|
|
|
kvm_destroy_vm_debugfs(kvm);
|
2009-01-06 10:03:02 +08:00
|
|
|
kvm_arch_sync_events(kvm);
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_lock(&kvm_lock);
|
2007-02-12 16:54:44 +08:00
|
|
|
list_del(&kvm->vm_list);
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_unlock(&kvm_lock);
|
2008-11-19 19:58:46 +08:00
|
|
|
kvm_free_irq_routing(kvm);
|
2017-03-15 16:01:17 +08:00
|
|
|
for (i = 0; i < KVM_NR_BUSES; i++) {
|
2017-08-02 23:55:54 +08:00
|
|
|
struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
|
2017-07-07 16:51:38 +08:00
|
|
|
|
|
|
|
if (bus)
|
|
|
|
kvm_io_bus_destroy(bus);
|
2017-03-15 16:01:17 +08:00
|
|
|
kvm->buses[i] = NULL;
|
|
|
|
}
|
2009-12-20 21:13:43 +08:00
|
|
|
kvm_coalesced_mmio_free(kvm);
|
2008-07-25 22:24:52 +08:00
|
|
|
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
|
|
|
|
mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
|
2009-03-19 18:20:36 +08:00
|
|
|
#else
|
2012-08-25 02:54:57 +08:00
|
|
|
kvm_arch_flush_shadow_all(kvm);
|
2008-05-30 22:05:54 +08:00
|
|
|
#endif
|
2007-11-18 18:43:45 +08:00
|
|
|
kvm_arch_destroy_vm(kvm);
|
2013-04-25 22:11:23 +08:00
|
|
|
kvm_destroy_devices(kvm);
|
2015-05-17 23:30:37 +08:00
|
|
|
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
|
2017-08-02 23:55:54 +08:00
|
|
|
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
|
2014-06-03 19:44:17 +08:00
|
|
|
cleanup_srcu_struct(&kvm->irq_srcu);
|
2010-11-10 00:02:49 +08:00
|
|
|
cleanup_srcu_struct(&kvm->srcu);
|
|
|
|
kvm_arch_free_vm(kvm);
|
2015-07-04 00:53:58 +08:00
|
|
|
preempt_notifier_dec();
|
2009-09-15 17:37:46 +08:00
|
|
|
hardware_disable_all();
|
2007-11-21 22:41:05 +08:00
|
|
|
mmdrop(mm);
|
2007-02-22 01:28:04 +08:00
|
|
|
}
|
|
|
|
|
2008-03-30 21:01:25 +08:00
|
|
|
void kvm_get_kvm(struct kvm *kvm)
|
|
|
|
{
|
2017-02-20 19:06:21 +08:00
|
|
|
refcount_inc(&kvm->users_count);
|
2008-03-30 21:01:25 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_kvm);
|
|
|
|
|
|
|
|
void kvm_put_kvm(struct kvm *kvm)
|
|
|
|
{
|
2017-02-20 19:06:21 +08:00
|
|
|
if (refcount_dec_and_test(&kvm->users_count))
|
2008-03-30 21:01:25 +08:00
|
|
|
kvm_destroy_vm(kvm);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_put_kvm);
|
|
|
|
|
|
|
|
|
2007-02-22 01:28:04 +08:00
|
|
|
static int kvm_vm_release(struct inode *inode, struct file *filp)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = filp->private_data;
|
|
|
|
|
2009-05-20 22:30:49 +08:00
|
|
|
kvm_irqfd_release(kvm);
|
|
|
|
|
2008-03-30 21:01:25 +08:00
|
|
|
kvm_put_kvm(kvm);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-10-27 17:23:54 +08:00
|
|
|
/*
|
|
|
|
* Allocation size is twice as large as the actual dirty bitmap size.
|
2012-03-01 18:34:45 +08:00
|
|
|
* See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
|
2010-10-27 17:23:54 +08:00
|
|
|
*/
|
2010-10-27 17:22:19 +08:00
|
|
|
static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
|
|
|
|
{
|
2010-10-27 17:23:54 +08:00
|
|
|
unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
|
2010-10-27 17:22:19 +08:00
|
|
|
|
2019-02-12 03:02:49 +08:00
|
|
|
memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL_ACCOUNT);
|
2010-10-27 17:22:19 +08:00
|
|
|
if (!memslot->dirty_bitmap)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-11-24 17:40:57 +08:00
|
|
|
/*
|
2014-12-02 01:29:26 +08:00
|
|
|
* Insert memslot and re-sort memslots based on their GFN,
|
|
|
|
* so binary search could be used to lookup GFN.
|
|
|
|
* Sorting algorithm takes advantage of having initially
|
|
|
|
* sorted array and known changed memslot position.
|
2011-11-24 17:40:57 +08:00
|
|
|
*/
|
2014-11-14 17:55:31 +08:00
|
|
|
static void update_memslots(struct kvm_memslots *slots,
|
2018-08-22 21:57:11 +08:00
|
|
|
struct kvm_memory_slot *new,
|
|
|
|
enum kvm_mr_change change)
|
2011-11-24 17:40:57 +08:00
|
|
|
{
|
2014-11-14 17:22:07 +08:00
|
|
|
int id = new->id;
|
|
|
|
int i = slots->id_to_index[id];
|
2014-11-14 07:00:13 +08:00
|
|
|
struct kvm_memory_slot *mslots = slots->memslots;
|
2011-11-24 17:41:54 +08:00
|
|
|
|
2014-11-14 17:22:07 +08:00
|
|
|
WARN_ON(mslots[i].id != id);
|
2018-08-22 21:57:11 +08:00
|
|
|
switch (change) {
|
|
|
|
case KVM_MR_CREATE:
|
|
|
|
slots->used_slots++;
|
|
|
|
WARN_ON(mslots[i].npages || !new->npages);
|
|
|
|
break;
|
|
|
|
case KVM_MR_DELETE:
|
|
|
|
slots->used_slots--;
|
|
|
|
WARN_ON(new->npages || !mslots[i].npages);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
2014-12-02 01:29:27 +08:00
|
|
|
}
|
2014-12-02 01:29:26 +08:00
|
|
|
|
2014-12-02 01:29:24 +08:00
|
|
|
while (i < KVM_MEM_SLOTS_NUM - 1 &&
|
2014-12-02 01:29:26 +08:00
|
|
|
new->base_gfn <= mslots[i + 1].base_gfn) {
|
|
|
|
if (!mslots[i + 1].npages)
|
|
|
|
break;
|
2014-12-02 01:29:24 +08:00
|
|
|
mslots[i] = mslots[i + 1];
|
|
|
|
slots->id_to_index[mslots[i].id] = i;
|
|
|
|
i++;
|
|
|
|
}
|
2014-12-28 01:01:00 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The ">=" is needed when creating a slot with base_gfn == 0,
|
|
|
|
* so that it moves before all those with base_gfn == npages == 0.
|
|
|
|
*
|
|
|
|
* On the other hand, if new->npages is zero, the above loop has
|
|
|
|
* already left i pointing to the beginning of the empty part of
|
|
|
|
* mslots, and the ">=" would move the hole backwards in this
|
|
|
|
* case---which is wrong. So skip the loop when deleting a slot.
|
|
|
|
*/
|
|
|
|
if (new->npages) {
|
|
|
|
while (i > 0 &&
|
|
|
|
new->base_gfn >= mslots[i - 1].base_gfn) {
|
|
|
|
mslots[i] = mslots[i - 1];
|
|
|
|
slots->id_to_index[mslots[i].id] = i;
|
|
|
|
i--;
|
|
|
|
}
|
2014-12-28 04:08:16 +08:00
|
|
|
} else
|
|
|
|
WARN_ON_ONCE(i != slots->used_slots);
|
2011-11-24 17:41:54 +08:00
|
|
|
|
2014-11-14 17:22:07 +08:00
|
|
|
mslots[i] = *new;
|
|
|
|
slots->id_to_index[mslots[i].id] = i;
|
2011-11-24 17:40:57 +08:00
|
|
|
}
|
|
|
|
|
2015-05-18 19:59:39 +08:00
|
|
|
static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem)
|
2012-08-21 10:58:13 +08:00
|
|
|
{
|
2012-08-21 11:02:51 +08:00
|
|
|
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
|
|
|
|
|
2014-08-26 20:00:37 +08:00
|
|
|
#ifdef __KVM_HAVE_READONLY_MEM
|
2012-08-21 11:02:51 +08:00
|
|
|
valid_flags |= KVM_MEM_READONLY;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
if (mem->flags & ~valid_flags)
|
2012-08-21 10:58:13 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-12-24 23:49:30 +08:00
|
|
|
static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
|
2015-05-17 23:30:37 +08:00
|
|
|
int as_id, struct kvm_memslots *slots)
|
2012-12-24 23:49:30 +08:00
|
|
|
{
|
2015-05-17 23:30:37 +08:00
|
|
|
struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
|
KVM: Explicitly define the "memslot update in-progress" bit
KVM uses bit 0 of the memslots generation as an "update in-progress"
flag, which is used by x86 to prevent caching MMIO access while the
memslots are changing. Although the intended behavior is flag-like,
e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
caching data from in-flux memslots, the implementation oftentimes treats
the bit as part of the generation number itself, e.g. incrementing the
generation increments twice, once to set the flag and once to clear it.
Prior to commit 4bd518f1598d ("KVM: use separate generations for
each address space"), incorporating the "update in-progress" bit into
the generation number largely made sense, e.g. "real" generations are
even, "bogus" generations are odd, most code doesn't need to be aware of
the bit, etc...
Now that unique memslots generation numbers are assigned to each address
space, stealthing the in-progress status into the generation number
results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
over bit 0 when initializing the memslots generation without any hint as
to why.
Explicitly define the flag and convert as much code as possible (which
isn't much) to actually treat it like a flag. This paves the way for
eventually using a different bit for "update in-progress" so that it can
be a flag in truth instead of a awkward extension to the generation
number.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-06 05:01:14 +08:00
|
|
|
u64 gen = old_memslots->generation;
|
2012-12-24 23:49:30 +08:00
|
|
|
|
KVM: Explicitly define the "memslot update in-progress" bit
KVM uses bit 0 of the memslots generation as an "update in-progress"
flag, which is used by x86 to prevent caching MMIO access while the
memslots are changing. Although the intended behavior is flag-like,
e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
caching data from in-flux memslots, the implementation oftentimes treats
the bit as part of the generation number itself, e.g. incrementing the
generation increments twice, once to set the flag and once to clear it.
Prior to commit 4bd518f1598d ("KVM: use separate generations for
each address space"), incorporating the "update in-progress" bit into
the generation number largely made sense, e.g. "real" generations are
even, "bogus" generations are odd, most code doesn't need to be aware of
the bit, etc...
Now that unique memslots generation numbers are assigned to each address
space, stealthing the in-progress status into the generation number
results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
over bit 0 when initializing the memslots generation without any hint as
to why.
Explicitly define the flag and convert as much code as possible (which
isn't much) to actually treat it like a flag. This paves the way for
eventually using a different bit for "update in-progress" so that it can
be a flag in truth instead of a awkward extension to the generation
number.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-06 05:01:14 +08:00
|
|
|
WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
|
|
|
|
slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
|
2014-08-19 06:46:06 +08:00
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
rcu_assign_pointer(kvm->memslots[as_id], slots);
|
2012-12-24 23:49:30 +08:00
|
|
|
synchronize_srcu_expedited(&kvm->srcu);
|
2013-07-04 12:40:29 +08:00
|
|
|
|
2014-08-19 06:46:06 +08:00
|
|
|
/*
|
KVM: Explicitly define the "memslot update in-progress" bit
KVM uses bit 0 of the memslots generation as an "update in-progress"
flag, which is used by x86 to prevent caching MMIO access while the
memslots are changing. Although the intended behavior is flag-like,
e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid
caching data from in-flux memslots, the implementation oftentimes treats
the bit as part of the generation number itself, e.g. incrementing the
generation increments twice, once to set the flag and once to clear it.
Prior to commit 4bd518f1598d ("KVM: use separate generations for
each address space"), incorporating the "update in-progress" bit into
the generation number largely made sense, e.g. "real" generations are
even, "bogus" generations are odd, most code doesn't need to be aware of
the bit, etc...
Now that unique memslots generation numbers are assigned to each address
space, stealthing the in-progress status into the generation number
results in a wide variety of subtle code, e.g. kvm_create_vm() jumps
over bit 0 when initializing the memslots generation without any hint as
to why.
Explicitly define the flag and convert as much code as possible (which
isn't much) to actually treat it like a flag. This paves the way for
eventually using a different bit for "update in-progress" so that it can
be a flag in truth instead of a awkward extension to the generation
number.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-06 05:01:14 +08:00
|
|
|
* Increment the new memslot generation a second time, dropping the
|
|
|
|
* update in-progress flag and incrementing then generation based on
|
|
|
|
* the number of address spaces. This provides a unique and easily
|
|
|
|
* identifiable generation number while the memslots are in flux.
|
|
|
|
*/
|
|
|
|
gen = slots->generation & ~KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
|
|
|
|
|
|
|
|
/*
|
2017-02-04 12:44:51 +08:00
|
|
|
* Generations must be unique even across address spaces. We do not need
|
|
|
|
* a global counter for that, instead the generation space is evenly split
|
|
|
|
* across address spaces. For example, with two address spaces, address
|
2019-02-06 05:01:18 +08:00
|
|
|
* space 0 will use generations 0, 2, 4, ... while address space 1 will
|
|
|
|
* use generations 1, 3, 5, ...
|
2014-08-19 06:46:06 +08:00
|
|
|
*/
|
2019-02-06 05:01:18 +08:00
|
|
|
gen += KVM_ADDRESS_SPACE_NUM;
|
2014-08-19 06:46:06 +08:00
|
|
|
|
2019-02-06 04:54:17 +08:00
|
|
|
kvm_arch_memslots_updated(kvm, gen);
|
2014-08-19 06:46:06 +08:00
|
|
|
|
2019-02-06 04:54:17 +08:00
|
|
|
slots->generation = gen;
|
2013-07-04 12:40:29 +08:00
|
|
|
|
|
|
|
return old_memslots;
|
2012-12-24 23:49:30 +08:00
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Allocate some memory and give it an address in the guest physical address
|
|
|
|
* space.
|
|
|
|
*
|
|
|
|
* Discontiguous memory is allowed, mostly for framebuffers.
|
2007-10-29 09:40:42 +08:00
|
|
|
*
|
2014-10-27 23:22:56 +08:00
|
|
|
* Must be called holding kvm->slots_lock for write.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*/
|
2007-10-29 09:40:42 +08:00
|
|
|
int __kvm_set_memory_region(struct kvm *kvm,
|
2015-05-18 19:59:39 +08:00
|
|
|
const struct kvm_userspace_memory_region *mem)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2010-12-27 18:08:45 +08:00
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
gfn_t base_gfn;
|
2009-09-03 23:35:35 +08:00
|
|
|
unsigned long npages;
|
2013-01-11 17:27:43 +08:00
|
|
|
struct kvm_memory_slot *slot;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
struct kvm_memory_slot old, new;
|
2012-12-11 01:33:03 +08:00
|
|
|
struct kvm_memslots *slots = NULL, *old_memslots;
|
2015-05-17 23:30:37 +08:00
|
|
|
int as_id, id;
|
2013-01-29 10:00:07 +08:00
|
|
|
enum kvm_mr_change change;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2012-08-21 10:58:13 +08:00
|
|
|
r = check_memory_region_flags(mem);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
r = -EINVAL;
|
2015-05-17 23:30:37 +08:00
|
|
|
as_id = mem->slot >> 16;
|
|
|
|
id = (u16)mem->slot;
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* General sanity checks */
|
|
|
|
if (mem->memory_size & (PAGE_SIZE - 1))
|
|
|
|
goto out;
|
|
|
|
if (mem->guest_phys_addr & (PAGE_SIZE - 1))
|
|
|
|
goto out;
|
2011-05-07 15:35:38 +08:00
|
|
|
/* We can read the guest memory with __xxx_user() later on. */
|
2015-05-17 23:30:37 +08:00
|
|
|
if ((id < KVM_USER_MEM_SLOTS) &&
|
2011-05-07 15:35:38 +08:00
|
|
|
((mem->userspace_addr & (PAGE_SIZE - 1)) ||
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
!access_ok((void __user *)(unsigned long)mem->userspace_addr,
|
2011-05-24 13:51:27 +08:00
|
|
|
mem->memory_size)))
|
2008-11-08 03:32:12 +08:00
|
|
|
goto out;
|
2015-05-17 23:30:37 +08:00
|
|
|
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
|
|
|
if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
|
|
|
|
goto out;
|
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
|
|
|
|
npages = mem->memory_size >> PAGE_SHIFT;
|
|
|
|
|
2010-04-13 21:47:24 +08:00
|
|
|
if (npages > KVM_MEM_MAX_NR_PAGES)
|
|
|
|
goto out;
|
|
|
|
|
2013-01-11 17:27:43 +08:00
|
|
|
new = old = *slot;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
new.id = id;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
new.base_gfn = base_gfn;
|
|
|
|
new.npages = npages;
|
|
|
|
new.flags = mem->flags;
|
|
|
|
|
2013-01-29 10:00:07 +08:00
|
|
|
if (npages) {
|
|
|
|
if (!old.npages)
|
|
|
|
change = KVM_MR_CREATE;
|
|
|
|
else { /* Modify an existing slot. */
|
|
|
|
if ((mem->userspace_addr != old.userspace_addr) ||
|
2013-01-30 18:40:41 +08:00
|
|
|
(npages != old.npages) ||
|
|
|
|
((new.flags ^ old.flags) & KVM_MEM_READONLY))
|
2013-01-29 10:00:07 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (base_gfn != old.base_gfn)
|
|
|
|
change = KVM_MR_MOVE;
|
|
|
|
else if (new.flags != old.flags)
|
|
|
|
change = KVM_MR_FLAGS_ONLY;
|
|
|
|
else { /* Nothing to change. */
|
|
|
|
r = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
2015-05-18 19:59:39 +08:00
|
|
|
} else {
|
|
|
|
if (!old.npages)
|
|
|
|
goto out;
|
|
|
|
|
2013-01-29 10:00:07 +08:00
|
|
|
change = KVM_MR_DELETE;
|
2015-05-18 19:59:39 +08:00
|
|
|
new.base_gfn = 0;
|
|
|
|
new.flags = 0;
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2013-01-29 10:00:07 +08:00
|
|
|
if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
|
2013-01-11 17:26:55 +08:00
|
|
|
/* Check for overlaps */
|
|
|
|
r = -EEXIST;
|
2015-05-17 23:30:37 +08:00
|
|
|
kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
|
KVM: mmu: Fix overlap between public and private memslots
Reported by syzkaller:
pte_list_remove: ffff9714eb1f8078 0->BUG
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/mmu.c:1157!
invalid opcode: 0000 [#1] SMP
RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
Call Trace:
drop_spte+0x83/0xb0 [kvm]
mmu_page_zap_pte+0xcc/0xe0 [kvm]
kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
__mmu_notifier_release+0x79/0x110
? __mmu_notifier_release+0x5/0x110
exit_mmap+0x15a/0x170
? do_exit+0x281/0xcb0
mmput+0x66/0x160
do_exit+0x2c9/0xcb0
? __context_tracking_exit.part.5+0x4a/0x150
do_group_exit+0x50/0xd0
SyS_exit_group+0x14/0x20
do_syscall_64+0x73/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25
The reason is that when creates new memslot, there is no guarantee for new
memslot not overlap with private memslots. This can be triggered by the
following program:
#include <fcntl.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/kvm.h>
long r[16];
int main()
{
void *p = valloc(0x4000);
r[2] = open("/dev/kvm", 0);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
uint64_t addr = 0xf000;
ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
ioctl(r[6], KVM_RUN, 0);
ioctl(r[6], KVM_RUN, 0);
struct kvm_userspace_memory_region mr = {
.slot = 0,
.flags = KVM_MEM_LOG_DIRTY_PAGES,
.guest_phys_addr = 0xf000,
.memory_size = 0x4000,
.userspace_addr = (uintptr_t) p
};
ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
return 0;
}
This patch fixes the bug by not adding a new memslot even if it
overlaps with private memslots.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
virt/kvm/kvm_main.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
2018-02-13 22:36:00 +08:00
|
|
|
if (slot->id == id)
|
2013-01-11 17:26:55 +08:00
|
|
|
continue;
|
|
|
|
if (!((base_gfn + npages <= slot->base_gfn) ||
|
|
|
|
(base_gfn >= slot->base_gfn + slot->npages)))
|
|
|
|
goto out;
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Free page dirty bitmap if unneeded */
|
|
|
|
if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
|
2007-02-10 00:38:40 +08:00
|
|
|
new.dirty_bitmap = NULL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
r = -ENOMEM;
|
2013-01-29 10:00:07 +08:00
|
|
|
if (change == KVM_MR_CREATE) {
|
2012-02-08 12:01:09 +08:00
|
|
|
new.userspace_addr = mem->userspace_addr;
|
2012-08-01 17:03:28 +08:00
|
|
|
|
2013-10-08 00:48:00 +08:00
|
|
|
if (kvm_arch_create_memslot(kvm, &new, npages))
|
2012-02-08 12:02:18 +08:00
|
|
|
goto out_free;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2009-06-19 21:16:23 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/* Allocate page dirty bitmap if needed */
|
|
|
|
if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
|
2010-10-27 17:22:19 +08:00
|
|
|
if (kvm_create_dirty_bitmap(&new) < 0)
|
2007-10-29 09:40:42 +08:00
|
|
|
goto out_free;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2019-02-12 03:02:49 +08:00
|
|
|
slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
|
2014-11-14 17:46:45 +08:00
|
|
|
if (!slots)
|
|
|
|
goto out_free;
|
2015-05-17 23:30:37 +08:00
|
|
|
memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
|
2014-11-14 17:46:45 +08:00
|
|
|
|
2013-01-29 10:00:07 +08:00
|
|
|
if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
|
2015-05-17 23:30:37 +08:00
|
|
|
slot = id_to_memslot(slots, id);
|
2011-11-24 19:04:35 +08:00
|
|
|
slot->flags |= KVM_MEMSLOT_INVALID;
|
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
old_memslots = install_new_memslots(kvm, as_id, slots);
|
2009-12-24 00:35:21 +08:00
|
|
|
|
2012-08-25 02:54:58 +08:00
|
|
|
/* From this point no new shadow pages pointing to a deleted,
|
|
|
|
* or moved, memslot will be created.
|
2009-12-24 00:35:21 +08:00
|
|
|
*
|
|
|
|
* validation of sp->gfn happens in:
|
2015-02-26 14:58:24 +08:00
|
|
|
* - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
|
|
|
|
* - kvm_is_visible_gfn (mmu_check_roots)
|
2009-12-24 00:35:21 +08:00
|
|
|
*/
|
2012-08-25 02:54:57 +08:00
|
|
|
kvm_arch_flush_shadow_memslot(kvm, slot);
|
2014-11-14 17:46:45 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We can re-use the old_memslots from above, the only difference
|
|
|
|
* from the currently installed memslots is the invalid flag. This
|
|
|
|
* will get overwritten by update_memslots anyway.
|
|
|
|
*/
|
2012-12-11 01:33:03 +08:00
|
|
|
slots = old_memslots;
|
2009-12-24 00:35:21 +08:00
|
|
|
}
|
2008-07-11 07:49:31 +08:00
|
|
|
|
2013-02-27 18:44:34 +08:00
|
|
|
r = kvm_arch_prepare_memory_region(kvm, &new, mem, change);
|
2009-12-24 00:35:18 +08:00
|
|
|
if (r)
|
2012-12-11 01:33:03 +08:00
|
|
|
goto out_slots;
|
2009-12-24 00:35:18 +08:00
|
|
|
|
2015-05-17 17:41:37 +08:00
|
|
|
/* actual memory is freed via old in kvm_free_memslot below */
|
2013-01-29 10:00:07 +08:00
|
|
|
if (change == KVM_MR_DELETE) {
|
2009-12-24 00:35:21 +08:00
|
|
|
new.dirty_bitmap = NULL;
|
2012-02-08 12:02:18 +08:00
|
|
|
memset(&new.arch, 0, sizeof(new.arch));
|
2009-12-24 00:35:21 +08:00
|
|
|
}
|
|
|
|
|
2018-08-22 21:57:11 +08:00
|
|
|
update_memslots(slots, &new, change);
|
2015-05-17 23:30:37 +08:00
|
|
|
old_memslots = install_new_memslots(kvm, as_id, slots);
|
2007-11-20 13:11:38 +08:00
|
|
|
|
2015-05-18 19:20:23 +08:00
|
|
|
kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
|
2007-10-03 00:52:55 +08:00
|
|
|
|
2015-05-17 17:41:37 +08:00
|
|
|
kvm_free_memslot(kvm, &old, &new);
|
2015-03-20 20:21:37 +08:00
|
|
|
kvfree(old_memslots);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
|
2012-12-11 01:32:57 +08:00
|
|
|
out_slots:
|
2015-03-20 20:21:37 +08:00
|
|
|
kvfree(slots);
|
2007-10-29 09:40:42 +08:00
|
|
|
out_free:
|
2015-05-17 17:41:37 +08:00
|
|
|
kvm_free_memslot(kvm, &new, &old);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
out:
|
|
|
|
return r;
|
2007-10-25 05:52:57 +08:00
|
|
|
}
|
2007-10-29 09:40:42 +08:00
|
|
|
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
|
|
|
|
|
|
|
|
int kvm_set_memory_region(struct kvm *kvm,
|
2015-05-18 19:59:39 +08:00
|
|
|
const struct kvm_userspace_memory_region *mem)
|
2007-10-29 09:40:42 +08:00
|
|
|
{
|
|
|
|
int r;
|
|
|
|
|
2009-12-24 00:35:26 +08:00
|
|
|
mutex_lock(&kvm->slots_lock);
|
2013-02-27 18:43:00 +08:00
|
|
|
r = __kvm_set_memory_region(kvm, mem);
|
2009-12-24 00:35:26 +08:00
|
|
|
mutex_unlock(&kvm->slots_lock);
|
2007-10-29 09:40:42 +08:00
|
|
|
return r;
|
|
|
|
}
|
2007-10-25 05:52:57 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_set_memory_region);
|
|
|
|
|
2013-12-30 04:12:29 +08:00
|
|
|
static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
|
|
|
|
struct kvm_userspace_memory_region *mem)
|
2007-10-25 05:52:57 +08:00
|
|
|
{
|
2015-05-17 23:30:37 +08:00
|
|
|
if ((u16)mem->slot >= KVM_USER_MEM_SLOTS)
|
2007-10-25 05:57:46 +08:00
|
|
|
return -EINVAL;
|
2015-05-18 19:59:39 +08:00
|
|
|
|
2013-02-27 18:43:00 +08:00
|
|
|
return kvm_set_memory_region(kvm, mem);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2007-11-18 20:29:43 +08:00
|
|
|
int kvm_get_dirty_log(struct kvm *kvm,
|
|
|
|
struct kvm_dirty_log *log, int *is_dirty)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2015-05-17 22:20:07 +08:00
|
|
|
struct kvm_memslots *slots;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
struct kvm_memory_slot *memslot;
|
2017-01-23 00:41:07 +08:00
|
|
|
int i, as_id, id;
|
2010-04-12 18:35:35 +08:00
|
|
|
unsigned long n;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
unsigned long any = 0;
|
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
as_id = log->slot >> 16;
|
|
|
|
id = (u16)log->slot;
|
|
|
|
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
|
2017-01-23 00:41:07 +08:00
|
|
|
return -EINVAL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
slots = __kvm_memslots(kvm, as_id);
|
|
|
|
memslot = id_to_memslot(slots, id);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
if (!memslot->dirty_bitmap)
|
2017-01-23 00:41:07 +08:00
|
|
|
return -ENOENT;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-04-12 18:35:35 +08:00
|
|
|
n = kvm_dirty_bitmap_bytes(memslot);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-02-22 22:43:09 +08:00
|
|
|
for (i = 0; !any && i < n/sizeof(long); ++i)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
any = memslot->dirty_bitmap[i];
|
|
|
|
|
|
|
|
if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
|
2017-01-23 00:41:07 +08:00
|
|
|
return -EFAULT;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-18 20:29:43 +08:00
|
|
|
if (any)
|
|
|
|
*is_dirty = 1;
|
2017-01-23 00:41:07 +08:00
|
|
|
return 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2013-10-08 00:47:59 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2015-01-16 07:58:53 +08:00
|
|
|
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
|
|
|
|
/**
|
|
|
|
* kvm_get_dirty_log_protect - get a snapshot of dirty pages, and if any pages
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
* and reenable dirty page tracking for the corresponding pages.
|
2015-01-16 07:58:53 +08:00
|
|
|
* @kvm: pointer to kvm instance
|
|
|
|
* @log: slot id and address to which we copy the log
|
|
|
|
* @is_dirty: flag set if any page is dirty
|
|
|
|
*
|
|
|
|
* We need to keep it in mind that VCPU threads can write to the bitmap
|
|
|
|
* concurrently. So, to avoid losing track of dirty pages we keep the
|
|
|
|
* following order:
|
|
|
|
*
|
|
|
|
* 1. Take a snapshot of the bit and clear it if needed.
|
|
|
|
* 2. Write protect the corresponding page.
|
|
|
|
* 3. Copy the snapshot to the userspace.
|
|
|
|
* 4. Upon return caller flushes TLB's if needed.
|
|
|
|
*
|
|
|
|
* Between 2 and 4, the guest may write to the page using the remaining TLB
|
|
|
|
* entry. This is not a problem because the page is reported dirty using
|
|
|
|
* the snapshot taken before and step 4 ensures that writes done after
|
|
|
|
* exiting to userspace will be logged for the next call.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
int kvm_get_dirty_log_protect(struct kvm *kvm,
|
2018-10-23 08:18:42 +08:00
|
|
|
struct kvm_dirty_log *log, bool *flush)
|
2015-01-16 07:58:53 +08:00
|
|
|
{
|
2015-05-17 22:20:07 +08:00
|
|
|
struct kvm_memslots *slots;
|
2015-01-16 07:58:53 +08:00
|
|
|
struct kvm_memory_slot *memslot;
|
2017-01-23 00:30:16 +08:00
|
|
|
int i, as_id, id;
|
2015-01-16 07:58:53 +08:00
|
|
|
unsigned long n;
|
|
|
|
unsigned long *dirty_bitmap;
|
|
|
|
unsigned long *dirty_bitmap_buffer;
|
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
as_id = log->slot >> 16;
|
|
|
|
id = (u16)log->slot;
|
|
|
|
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
|
2017-01-23 00:30:16 +08:00
|
|
|
return -EINVAL;
|
2015-01-16 07:58:53 +08:00
|
|
|
|
2015-05-17 23:30:37 +08:00
|
|
|
slots = __kvm_memslots(kvm, as_id);
|
|
|
|
memslot = id_to_memslot(slots, id);
|
2015-01-16 07:58:53 +08:00
|
|
|
|
|
|
|
dirty_bitmap = memslot->dirty_bitmap;
|
|
|
|
if (!dirty_bitmap)
|
2017-01-23 00:30:16 +08:00
|
|
|
return -ENOENT;
|
2015-01-16 07:58:53 +08:00
|
|
|
|
|
|
|
n = kvm_dirty_bitmap_bytes(memslot);
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
*flush = false;
|
|
|
|
if (kvm->manual_dirty_log_protect) {
|
|
|
|
/*
|
|
|
|
* Unlike kvm_get_dirty_log, we always return false in *flush,
|
|
|
|
* because no flush is needed until KVM_CLEAR_DIRTY_LOG. There
|
|
|
|
* is some code duplication between this function and
|
|
|
|
* kvm_get_dirty_log, but hopefully all architecture
|
|
|
|
* transition to kvm_get_dirty_log_protect and kvm_get_dirty_log
|
|
|
|
* can be eliminated.
|
|
|
|
*/
|
|
|
|
dirty_bitmap_buffer = dirty_bitmap;
|
|
|
|
} else {
|
|
|
|
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
|
|
|
|
memset(dirty_bitmap_buffer, 0, n);
|
2015-01-16 07:58:53 +08:00
|
|
|
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
for (i = 0; i < n / sizeof(long); i++) {
|
|
|
|
unsigned long mask;
|
|
|
|
gfn_t offset;
|
2015-01-16 07:58:53 +08:00
|
|
|
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
if (!dirty_bitmap[i])
|
|
|
|
continue;
|
|
|
|
|
|
|
|
*flush = true;
|
|
|
|
mask = xchg(&dirty_bitmap[i], 0);
|
|
|
|
dirty_bitmap_buffer[i] = mask;
|
|
|
|
|
2019-02-02 17:20:27 +08:00
|
|
|
offset = i * BITS_PER_LONG;
|
|
|
|
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
|
|
|
|
offset, mask);
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
}
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
|
|
|
|
return -EFAULT;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* kvm_clear_dirty_log_protect - clear dirty bits in the bitmap
|
|
|
|
* and reenable dirty page tracking for the corresponding pages.
|
|
|
|
* @kvm: pointer to kvm instance
|
|
|
|
* @log: slot id and address from which to fetch the bitmap of dirty pages
|
|
|
|
*/
|
|
|
|
int kvm_clear_dirty_log_protect(struct kvm *kvm,
|
|
|
|
struct kvm_clear_dirty_log *log, bool *flush)
|
|
|
|
{
|
|
|
|
struct kvm_memslots *slots;
|
|
|
|
struct kvm_memory_slot *memslot;
|
2019-01-03 01:29:37 +08:00
|
|
|
int as_id, id;
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
gfn_t offset;
|
2019-01-03 01:29:37 +08:00
|
|
|
unsigned long i, n;
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
unsigned long *dirty_bitmap;
|
|
|
|
unsigned long *dirty_bitmap_buffer;
|
|
|
|
|
|
|
|
as_id = log->slot >> 16;
|
|
|
|
id = (u16)log->slot;
|
|
|
|
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if ((log->first_page & 63) || (log->num_pages & 63))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
slots = __kvm_memslots(kvm, as_id);
|
|
|
|
memslot = id_to_memslot(slots, id);
|
|
|
|
|
|
|
|
dirty_bitmap = memslot->dirty_bitmap;
|
|
|
|
if (!dirty_bitmap)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
n = kvm_dirty_bitmap_bytes(memslot);
|
2019-01-03 01:29:37 +08:00
|
|
|
|
|
|
|
if (log->first_page > memslot->npages ||
|
|
|
|
log->num_pages > memslot->npages - log->first_page)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-10-23 08:18:42 +08:00
|
|
|
*flush = false;
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
|
|
|
|
if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
|
|
|
|
return -EFAULT;
|
2015-01-16 07:58:53 +08:00
|
|
|
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
spin_lock(&kvm->mmu_lock);
|
|
|
|
for (offset = log->first_page,
|
|
|
|
i = offset / BITS_PER_LONG, n = log->num_pages / BITS_PER_LONG; n--;
|
|
|
|
i++, offset += BITS_PER_LONG) {
|
|
|
|
unsigned long mask = *dirty_bitmap_buffer++;
|
|
|
|
atomic_long_t *p = (atomic_long_t *) &dirty_bitmap[i];
|
|
|
|
if (!mask)
|
2015-01-16 07:58:53 +08:00
|
|
|
continue;
|
|
|
|
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
mask &= atomic_long_fetch_andnot(mask, p);
|
2015-01-16 07:58:53 +08:00
|
|
|
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
/*
|
|
|
|
* mask contains the bits that really have been cleared. This
|
|
|
|
* never includes any bits beyond the length of the memslot (if
|
|
|
|
* the length is not aligned to 64 pages), therefore it is not
|
|
|
|
* a problem if userspace sets them in log->dirty_bitmap.
|
|
|
|
*/
|
2015-03-17 15:19:58 +08:00
|
|
|
if (mask) {
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
*flush = true;
|
2015-03-17 15:19:58 +08:00
|
|
|
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
|
|
|
|
offset, mask);
|
|
|
|
}
|
2015-01-16 07:58:53 +08:00
|
|
|
}
|
|
|
|
spin_unlock(&kvm->mmu_lock);
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
|
2017-01-23 00:30:16 +08:00
|
|
|
return 0;
|
2015-01-16 07:58:53 +08:00
|
|
|
}
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_clear_dirty_log_protect);
|
2015-01-16 07:58:53 +08:00
|
|
|
#endif
|
|
|
|
|
2012-02-08 12:02:18 +08:00
|
|
|
bool kvm_largepages_enabled(void)
|
|
|
|
{
|
|
|
|
return largepages_enabled;
|
|
|
|
}
|
|
|
|
|
2009-06-11 23:07:44 +08:00
|
|
|
void kvm_disable_largepages(void)
|
|
|
|
{
|
|
|
|
largepages_enabled = false;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_disable_largepages);
|
|
|
|
|
2010-10-18 21:22:23 +08:00
|
|
|
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
|
|
|
return __gfn_to_memslot(kvm_memslots(kvm), gfn);
|
|
|
|
}
|
2010-06-21 16:44:20 +08:00
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_memslot);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
return __gfn_to_memslot(kvm_vcpu_memslots(vcpu), gfn);
|
|
|
|
}
|
|
|
|
|
2015-11-14 11:21:06 +08:00
|
|
|
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
|
2007-10-25 05:57:46 +08:00
|
|
|
{
|
2011-11-24 17:40:57 +08:00
|
|
|
struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn);
|
2007-10-25 05:57:46 +08:00
|
|
|
|
2012-12-11 01:33:09 +08:00
|
|
|
if (!memslot || memslot->id >= KVM_USER_MEM_SLOTS ||
|
2011-11-24 17:40:57 +08:00
|
|
|
memslot->flags & KVM_MEMSLOT_INVALID)
|
2015-11-14 11:21:06 +08:00
|
|
|
return false;
|
2007-10-25 05:57:46 +08:00
|
|
|
|
2015-11-14 11:21:06 +08:00
|
|
|
return true;
|
2007-10-25 05:57:46 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
|
|
|
|
|
2010-01-28 19:37:56 +08:00
|
|
|
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
|
|
|
struct vm_area_struct *vma;
|
|
|
|
unsigned long addr, size;
|
|
|
|
|
|
|
|
size = PAGE_SIZE;
|
|
|
|
|
|
|
|
addr = gfn_to_hva(kvm, gfn);
|
|
|
|
if (kvm_is_error_hva(addr))
|
|
|
|
return PAGE_SIZE;
|
|
|
|
|
|
|
|
down_read(¤t->mm->mmap_sem);
|
|
|
|
vma = find_vma(current->mm, addr);
|
|
|
|
if (!vma)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
size = vma_kernel_pagesize(vma);
|
|
|
|
|
|
|
|
out:
|
|
|
|
up_read(¤t->mm->mmap_sem);
|
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
static bool memslot_is_readonly(struct kvm_memory_slot *slot)
|
|
|
|
{
|
|
|
|
return slot->flags & KVM_MEM_READONLY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
gfn_t *nr_pages, bool write)
|
2007-11-12 04:05:04 +08:00
|
|
|
{
|
2009-12-24 00:35:21 +08:00
|
|
|
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
|
2012-08-21 11:01:50 +08:00
|
|
|
return KVM_HVA_ERR_BAD;
|
2010-08-22 19:11:43 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
if (memslot_is_readonly(slot) && write)
|
|
|
|
return KVM_HVA_ERR_RO_BAD;
|
2010-08-22 19:11:43 +08:00
|
|
|
|
|
|
|
if (nr_pages)
|
|
|
|
*nr_pages = slot->npages - (gfn - slot->base_gfn);
|
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
return __gfn_to_hva_memslot(slot, gfn);
|
2007-11-12 04:05:04 +08:00
|
|
|
}
|
2010-08-22 19:11:43 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
gfn_t *nr_pages)
|
|
|
|
{
|
|
|
|
return __gfn_to_hva_many(slot, gfn, nr_pages, true);
|
2007-11-12 04:05:04 +08:00
|
|
|
}
|
2010-08-22 19:11:43 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
|
2013-12-30 04:12:29 +08:00
|
|
|
gfn_t gfn)
|
2012-08-21 11:02:51 +08:00
|
|
|
{
|
|
|
|
return gfn_to_hva_many(slot, gfn, NULL);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_hva_memslot);
|
|
|
|
|
2010-08-22 19:11:43 +08:00
|
|
|
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
2010-10-18 21:22:23 +08:00
|
|
|
return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
|
2010-08-22 19:11:43 +08:00
|
|
|
}
|
2008-04-25 21:44:50 +08:00
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_hva);
|
2007-11-12 04:05:04 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
return gfn_to_hva_many(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, NULL);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_hva);
|
|
|
|
|
2012-08-21 10:59:53 +08:00
|
|
|
/*
|
2018-10-09 10:41:15 +08:00
|
|
|
* Return the hva of a @gfn and the R/W attribute if possible.
|
|
|
|
*
|
|
|
|
* @slot: the kvm_memory_slot which contains @gfn
|
|
|
|
* @gfn: the gfn to be translated
|
|
|
|
* @writable: used to return the read/write attribute of the @slot if the hva
|
|
|
|
* is valid and @writable is not NULL
|
2012-08-21 10:59:53 +08:00
|
|
|
*/
|
2014-08-19 18:15:00 +08:00
|
|
|
unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot,
|
|
|
|
gfn_t gfn, bool *writable)
|
2012-08-21 10:59:53 +08:00
|
|
|
{
|
2013-10-02 00:58:36 +08:00
|
|
|
unsigned long hva = __gfn_to_hva_many(slot, gfn, NULL, false);
|
|
|
|
|
|
|
|
if (!kvm_is_error_hva(hva) && writable)
|
2013-09-09 19:52:33 +08:00
|
|
|
*writable = !memslot_is_readonly(slot);
|
|
|
|
|
2013-10-02 00:58:36 +08:00
|
|
|
return hva;
|
2012-08-21 10:59:53 +08:00
|
|
|
}
|
|
|
|
|
2014-08-19 18:15:00 +08:00
|
|
|
unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
|
|
|
|
|
|
|
|
return gfn_to_hva_memslot_prot(slot, gfn, writable);
|
|
|
|
}
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
|
|
|
|
|
|
|
return gfn_to_hva_memslot_prot(slot, gfn, writable);
|
|
|
|
}
|
|
|
|
|
2011-01-30 11:15:49 +08:00
|
|
|
static inline int check_user_page_hwpoison(unsigned long addr)
|
|
|
|
{
|
mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.
Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.
We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:
get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 17:57:25 +08:00
|
|
|
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
|
2011-01-30 11:15:49 +08:00
|
|
|
|
mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.
Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.
We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:
get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 17:57:25 +08:00
|
|
|
rc = get_user_pages(addr, 1, flags, NULL, NULL);
|
2011-01-30 11:15:49 +08:00
|
|
|
return rc == -EHWPOISON;
|
|
|
|
}
|
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
/*
|
2018-07-27 23:44:41 +08:00
|
|
|
* The fast path to get the writable pfn which will be stored in @pfn,
|
|
|
|
* true indicates success, otherwise false is returned. It's also the
|
|
|
|
* only part that runs if we can are in atomic context.
|
2012-08-21 11:00:22 +08:00
|
|
|
*/
|
2018-07-27 23:44:41 +08:00
|
|
|
static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
|
|
|
|
bool *writable, kvm_pfn_t *pfn)
|
2007-03-30 19:02:32 +08:00
|
|
|
{
|
2007-10-18 22:59:34 +08:00
|
|
|
struct page *page[1];
|
2012-08-21 11:00:22 +08:00
|
|
|
int npages;
|
2007-03-30 19:02:32 +08:00
|
|
|
|
2012-08-21 11:00:49 +08:00
|
|
|
/*
|
|
|
|
* Fast pin a writable pfn only if it is a write fault request
|
|
|
|
* or the caller allows to map a writable pfn for a read fault
|
|
|
|
* request.
|
|
|
|
*/
|
|
|
|
if (!(write_fault || writable))
|
|
|
|
return false;
|
2010-10-23 00:18:18 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
npages = __get_user_pages_fast(addr, 1, 1, page);
|
|
|
|
if (npages == 1) {
|
|
|
|
*pfn = page_to_pfn(page[0]);
|
2010-10-23 00:18:18 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
if (writable)
|
|
|
|
*writable = true;
|
|
|
|
return true;
|
|
|
|
}
|
2010-10-14 17:22:46 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
return false;
|
|
|
|
}
|
2010-10-23 00:18:18 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
/*
|
|
|
|
* The slow path to get the pfn of the specified host virtual address,
|
|
|
|
* 1 indicates success, -errno is returned if error is detected.
|
|
|
|
*/
|
|
|
|
static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
bool *writable, kvm_pfn_t *pfn)
|
2012-08-21 11:00:22 +08:00
|
|
|
{
|
2017-11-20 06:47:33 +08:00
|
|
|
unsigned int flags = FOLL_HWPOISON;
|
|
|
|
struct page *page;
|
2012-08-21 11:00:22 +08:00
|
|
|
int npages = 0;
|
2010-10-23 00:18:18 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
might_sleep();
|
|
|
|
|
|
|
|
if (writable)
|
|
|
|
*writable = write_fault;
|
|
|
|
|
2017-11-20 06:47:33 +08:00
|
|
|
if (write_fault)
|
|
|
|
flags |= FOLL_WRITE;
|
|
|
|
if (async)
|
|
|
|
flags |= FOLL_NOWAIT;
|
2016-10-13 08:20:12 +08:00
|
|
|
|
2017-11-20 06:47:33 +08:00
|
|
|
npages = get_user_pages_unlocked(addr, 1, &page, flags);
|
2012-08-21 11:00:22 +08:00
|
|
|
if (npages != 1)
|
|
|
|
return npages;
|
|
|
|
|
|
|
|
/* map read fault as writable if possible */
|
2012-08-21 11:00:49 +08:00
|
|
|
if (unlikely(!write_fault) && writable) {
|
2017-11-20 06:47:33 +08:00
|
|
|
struct page *wpage;
|
2012-08-21 11:00:22 +08:00
|
|
|
|
2017-11-20 06:47:33 +08:00
|
|
|
if (__get_user_pages_fast(addr, 1, 1, &wpage) == 1) {
|
2012-08-21 11:00:22 +08:00
|
|
|
*writable = true;
|
2017-11-20 06:47:33 +08:00
|
|
|
put_page(page);
|
|
|
|
page = wpage;
|
2010-10-23 00:18:18 +08:00
|
|
|
}
|
2010-08-22 19:10:28 +08:00
|
|
|
}
|
2017-11-20 06:47:33 +08:00
|
|
|
*pfn = page_to_pfn(page);
|
2012-08-21 11:00:22 +08:00
|
|
|
return npages;
|
|
|
|
}
|
2007-11-12 04:05:04 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
|
|
|
|
{
|
|
|
|
if (unlikely(!(vma->vm_flags & VM_READ)))
|
|
|
|
return false;
|
2008-05-01 04:37:07 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
if (write_fault && (unlikely(!(vma->vm_flags & VM_WRITE))))
|
|
|
|
return false;
|
2010-08-22 19:10:28 +08:00
|
|
|
|
2012-08-21 11:02:51 +08:00
|
|
|
return true;
|
|
|
|
}
|
2010-05-31 14:28:19 +08:00
|
|
|
|
2016-06-07 22:22:47 +08:00
|
|
|
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
|
|
|
|
unsigned long addr, bool *async,
|
2018-01-18 02:18:56 +08:00
|
|
|
bool write_fault, bool *writable,
|
|
|
|
kvm_pfn_t *p_pfn)
|
2016-06-07 22:22:47 +08:00
|
|
|
{
|
2016-06-07 23:51:18 +08:00
|
|
|
unsigned long pfn;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
r = follow_pfn(vma, addr, &pfn);
|
|
|
|
if (r) {
|
|
|
|
/*
|
|
|
|
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
|
|
|
|
* not call the fault handler, so do it here.
|
|
|
|
*/
|
|
|
|
bool unlocked = false;
|
|
|
|
r = fixup_user_fault(current, current->mm, addr,
|
|
|
|
(write_fault ? FAULT_FLAG_WRITE : 0),
|
|
|
|
&unlocked);
|
|
|
|
if (unlocked)
|
|
|
|
return -EAGAIN;
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
|
|
|
r = follow_pfn(vma, addr, &pfn);
|
|
|
|
if (r)
|
|
|
|
return r;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2018-01-18 02:18:56 +08:00
|
|
|
if (writable)
|
|
|
|
*writable = true;
|
2016-06-07 23:51:18 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get a reference here because callers of *hva_to_pfn* and
|
|
|
|
* *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
|
|
|
|
* returned pfn. This is only needed if the VMA has VM_MIXEDMAP
|
|
|
|
* set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
|
|
|
|
* simply do nothing for reserved pfns.
|
|
|
|
*
|
|
|
|
* Whoever called remap_pfn_range is also going to call e.g.
|
|
|
|
* unmap_mapping_range before the underlying pages are freed,
|
|
|
|
* causing a call to our MMU notifier.
|
|
|
|
*/
|
|
|
|
kvm_get_pfn(pfn);
|
|
|
|
|
|
|
|
*p_pfn = pfn;
|
2016-06-07 22:22:47 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-08-21 11:00:49 +08:00
|
|
|
/*
|
|
|
|
* Pin guest page in memory and return its pfn.
|
|
|
|
* @addr: host virtual address which maps memory to the guest
|
|
|
|
* @atomic: whether this function can sleep
|
|
|
|
* @async: whether this function need to wait IO complete if the
|
|
|
|
* host page is not in the memory
|
|
|
|
* @write_fault: whether we should get a writable host page
|
|
|
|
* @writable: whether it allows to map a writable host page for !@write_fault
|
|
|
|
*
|
|
|
|
* The function will map a writable host page for these two cases:
|
|
|
|
* 1): @write_fault = true
|
|
|
|
* 2): @write_fault = false && @writable, @writable will tell the caller
|
|
|
|
* whether the mapping is writable.
|
|
|
|
*/
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
|
2012-08-21 11:00:22 +08:00
|
|
|
bool write_fault, bool *writable)
|
|
|
|
{
|
|
|
|
struct vm_area_struct *vma;
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t pfn = 0;
|
2016-06-07 22:22:47 +08:00
|
|
|
int npages, r;
|
2008-05-01 04:37:07 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
/* we can do it either atomically or asynchronously, not both */
|
|
|
|
BUG_ON(atomic && async);
|
2007-10-18 22:59:34 +08:00
|
|
|
|
2018-07-27 23:44:41 +08:00
|
|
|
if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
|
2012-08-21 11:00:22 +08:00
|
|
|
return pfn;
|
|
|
|
|
|
|
|
if (atomic)
|
|
|
|
return KVM_PFN_ERR_FAULT;
|
|
|
|
|
|
|
|
npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
|
|
|
|
if (npages == 1)
|
|
|
|
return pfn;
|
2007-10-18 22:59:34 +08:00
|
|
|
|
2012-08-21 11:00:22 +08:00
|
|
|
down_read(¤t->mm->mmap_sem);
|
|
|
|
if (npages == -EHWPOISON ||
|
|
|
|
(!async && check_user_page_hwpoison(addr))) {
|
|
|
|
pfn = KVM_PFN_ERR_HWPOISON;
|
|
|
|
goto exit;
|
|
|
|
}
|
|
|
|
|
2016-06-07 23:51:18 +08:00
|
|
|
retry:
|
2012-08-21 11:00:22 +08:00
|
|
|
vma = find_vma_intersection(current->mm, addr, addr + 1);
|
|
|
|
|
|
|
|
if (vma == NULL)
|
|
|
|
pfn = KVM_PFN_ERR_FAULT;
|
2016-06-07 22:22:47 +08:00
|
|
|
else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
|
2018-01-18 02:18:56 +08:00
|
|
|
r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
|
2016-06-07 23:51:18 +08:00
|
|
|
if (r == -EAGAIN)
|
|
|
|
goto retry;
|
2016-06-07 22:22:47 +08:00
|
|
|
if (r < 0)
|
|
|
|
pfn = KVM_PFN_ERR_FAULT;
|
2012-08-21 11:00:22 +08:00
|
|
|
} else {
|
2012-08-21 11:02:51 +08:00
|
|
|
if (async && vma_is_valid(vma, write_fault))
|
2012-08-21 11:00:22 +08:00
|
|
|
*async = true;
|
|
|
|
pfn = KVM_PFN_ERR_FAULT;
|
|
|
|
}
|
|
|
|
exit:
|
|
|
|
up_read(¤t->mm->mmap_sem);
|
2008-05-01 04:37:07 +08:00
|
|
|
return pfn;
|
2008-04-03 03:46:56 +08:00
|
|
|
}
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
bool atomic, bool *async, bool write_fault,
|
|
|
|
bool *writable)
|
2010-08-22 19:10:28 +08:00
|
|
|
{
|
2012-08-21 11:02:51 +08:00
|
|
|
unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
|
|
|
|
|
2016-02-23 22:36:01 +08:00
|
|
|
if (addr == KVM_HVA_ERR_RO_BAD) {
|
|
|
|
if (writable)
|
|
|
|
*writable = false;
|
2012-08-21 11:02:51 +08:00
|
|
|
return KVM_PFN_ERR_RO_FAULT;
|
2016-02-23 22:36:01 +08:00
|
|
|
}
|
2012-08-21 11:02:51 +08:00
|
|
|
|
2016-02-23 22:36:01 +08:00
|
|
|
if (kvm_is_error_hva(addr)) {
|
|
|
|
if (writable)
|
|
|
|
*writable = false;
|
2012-10-16 20:10:59 +08:00
|
|
|
return KVM_PFN_NOSLOT;
|
2016-02-23 22:36:01 +08:00
|
|
|
}
|
2012-08-21 11:02:51 +08:00
|
|
|
|
|
|
|
/* Do not map writable pfn in the readonly memslot. */
|
|
|
|
if (writable && memslot_is_readonly(slot)) {
|
|
|
|
*writable = false;
|
|
|
|
writable = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return hva_to_pfn(addr, atomic, async, write_fault,
|
|
|
|
writable);
|
2010-08-22 19:10:28 +08:00
|
|
|
}
|
2015-04-02 17:20:48 +08:00
|
|
|
EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
|
2010-08-22 19:10:28 +08:00
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
|
2010-10-23 00:18:18 +08:00
|
|
|
bool *writable)
|
|
|
|
{
|
2015-05-19 22:09:04 +08:00
|
|
|
return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
|
|
|
|
write_fault, writable);
|
2010-10-23 00:18:18 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
|
2009-12-24 00:35:19 +08:00
|
|
|
{
|
2012-08-21 11:02:51 +08:00
|
|
|
return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL);
|
2009-12-24 00:35:19 +08:00
|
|
|
}
|
2015-05-19 22:09:04 +08:00
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
|
2009-12-24 00:35:19 +08:00
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
|
2009-12-24 00:35:19 +08:00
|
|
|
{
|
2012-08-21 11:02:51 +08:00
|
|
|
return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL);
|
2009-12-24 00:35:19 +08:00
|
|
|
}
|
2012-08-21 10:59:12 +08:00
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
|
2009-12-24 00:35:19 +08:00
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
|
2015-05-19 22:09:04 +08:00
|
|
|
{
|
|
|
|
return gfn_to_pfn_memslot_atomic(gfn_to_memslot(kvm, gfn), gfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2015-05-17 19:58:53 +08:00
|
|
|
{
|
|
|
|
return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
|
2015-05-19 22:09:04 +08:00
|
|
|
{
|
|
|
|
return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_pfn);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
|
2015-05-17 19:58:53 +08:00
|
|
|
{
|
|
|
|
return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
|
|
|
|
|
2015-05-19 22:01:50 +08:00
|
|
|
int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
struct page **pages, int nr_pages)
|
2010-08-22 19:11:43 +08:00
|
|
|
{
|
|
|
|
unsigned long addr;
|
2017-08-10 20:14:39 +08:00
|
|
|
gfn_t entry = 0;
|
2010-08-22 19:11:43 +08:00
|
|
|
|
2015-05-19 22:01:50 +08:00
|
|
|
addr = gfn_to_hva_many(slot, gfn, &entry);
|
2010-08-22 19:11:43 +08:00
|
|
|
if (kvm_is_error_hva(addr))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
if (entry < nr_pages)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return __get_user_pages_fast(addr, nr_pages, 1, pages);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
|
2012-07-26 11:58:59 +08:00
|
|
|
{
|
2012-10-16 20:10:59 +08:00
|
|
|
if (is_error_noslot_pfn(pfn))
|
2012-08-03 15:42:10 +08:00
|
|
|
return KVM_ERR_PTR_BAD_PAGE;
|
2012-07-26 11:58:59 +08:00
|
|
|
|
2014-11-10 16:33:56 +08:00
|
|
|
if (kvm_is_reserved_pfn(pfn)) {
|
2012-08-03 15:42:10 +08:00
|
|
|
WARN_ON(1);
|
2012-08-03 15:41:22 +08:00
|
|
|
return KVM_ERR_PTR_BAD_PAGE;
|
2012-08-03 15:42:10 +08:00
|
|
|
}
|
2012-07-26 11:58:59 +08:00
|
|
|
|
|
|
|
return pfn_to_page(pfn);
|
|
|
|
}
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t pfn;
|
2008-05-01 04:37:07 +08:00
|
|
|
|
|
|
|
pfn = gfn_to_pfn(kvm, gfn);
|
|
|
|
|
2012-07-26 11:58:59 +08:00
|
|
|
return kvm_pfn_to_page(pfn);
|
2007-03-30 19:02:32 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(gfn_to_page);
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
kvm_pfn_t pfn;
|
2015-05-17 19:58:53 +08:00
|
|
|
|
|
|
|
pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
|
|
|
|
|
|
|
|
return kvm_pfn_to_page(pfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
|
|
|
|
|
2007-11-20 17:49:33 +08:00
|
|
|
void kvm_release_page_clean(struct page *page)
|
|
|
|
{
|
2012-08-03 15:42:52 +08:00
|
|
|
WARN_ON(is_error_page(page));
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_release_pfn_clean(page_to_pfn(page));
|
2007-11-20 17:49:33 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_release_page_clean);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
void kvm_release_pfn_clean(kvm_pfn_t pfn)
|
2008-04-03 03:46:56 +08:00
|
|
|
{
|
2014-11-10 16:33:56 +08:00
|
|
|
if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
|
2008-05-01 04:37:07 +08:00
|
|
|
put_page(pfn_to_page(pfn));
|
2008-04-03 03:46:56 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
|
|
|
|
|
2007-11-20 17:49:33 +08:00
|
|
|
void kvm_release_page_dirty(struct page *page)
|
2007-10-18 17:09:33 +08:00
|
|
|
{
|
2012-07-26 11:58:59 +08:00
|
|
|
WARN_ON(is_error_page(page));
|
|
|
|
|
2008-04-03 03:46:56 +08:00
|
|
|
kvm_release_pfn_dirty(page_to_pfn(page));
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
|
|
|
|
|
2017-09-01 23:11:43 +08:00
|
|
|
void kvm_release_pfn_dirty(kvm_pfn_t pfn)
|
2008-04-03 03:46:56 +08:00
|
|
|
{
|
|
|
|
kvm_set_pfn_dirty(pfn);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
}
|
2017-09-01 23:11:43 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
|
2008-04-03 03:46:56 +08:00
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
void kvm_set_pfn_dirty(kvm_pfn_t pfn)
|
2008-04-03 03:46:56 +08:00
|
|
|
{
|
2014-11-10 16:33:56 +08:00
|
|
|
if (!kvm_is_reserved_pfn(pfn)) {
|
2008-05-01 04:37:07 +08:00
|
|
|
struct page *page = pfn_to_page(pfn);
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2008-05-01 04:37:07 +08:00
|
|
|
if (!PageReserved(page))
|
|
|
|
SetPageDirty(page);
|
|
|
|
}
|
2007-10-18 17:09:33 +08:00
|
|
|
}
|
2008-04-03 03:46:56 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
void kvm_set_pfn_accessed(kvm_pfn_t pfn)
|
2008-04-03 03:46:56 +08:00
|
|
|
{
|
2014-11-10 16:33:56 +08:00
|
|
|
if (!kvm_is_reserved_pfn(pfn))
|
2008-05-01 04:37:07 +08:00
|
|
|
mark_page_accessed(pfn_to_page(pfn));
|
2008-04-03 03:46:56 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
|
|
|
|
|
kvm: rename pfn_t to kvm_pfn_t
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace). This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o. It allows userspace to coordinate
DMA/RDMA from/to persistent memory.
The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver. The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.
The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.
Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array. Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory. The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.
This patch (of 18):
The core has developed a need for a "pfn_t" type [1]. Move the existing
pfn_t in KVM to kvm_pfn_t [2].
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 08:56:11 +08:00
|
|
|
void kvm_get_pfn(kvm_pfn_t pfn)
|
2008-04-03 03:46:56 +08:00
|
|
|
{
|
2014-11-10 16:33:56 +08:00
|
|
|
if (!kvm_is_reserved_pfn(pfn))
|
2008-05-01 04:37:07 +08:00
|
|
|
get_page(pfn_to_page(pfn));
|
2008-04-03 03:46:56 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_get_pfn);
|
2007-10-18 17:09:33 +08:00
|
|
|
|
2007-10-02 04:14:18 +08:00
|
|
|
static int next_segment(unsigned long len, int offset)
|
|
|
|
{
|
|
|
|
if (len > PAGE_SIZE - offset)
|
|
|
|
return PAGE_SIZE - offset;
|
|
|
|
else
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
void *data, int offset, int len)
|
2007-10-02 04:14:18 +08:00
|
|
|
{
|
2007-11-12 04:10:22 +08:00
|
|
|
int r;
|
|
|
|
unsigned long addr;
|
2007-10-02 04:14:18 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
|
2007-11-12 04:10:22 +08:00
|
|
|
if (kvm_is_error_hva(addr))
|
|
|
|
return -EFAULT;
|
2015-04-02 20:08:20 +08:00
|
|
|
r = __copy_from_user(data, (void __user *)addr + offset, len);
|
2007-11-12 04:10:22 +08:00
|
|
|
if (r)
|
2007-10-02 04:14:18 +08:00
|
|
|
return -EFAULT;
|
|
|
|
return 0;
|
|
|
|
}
|
2015-05-17 19:58:53 +08:00
|
|
|
|
|
|
|
int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
|
|
|
|
int len)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
|
|
|
|
|
|
|
|
return __kvm_read_guest_page(slot, gfn, data, offset, len);
|
|
|
|
}
|
2007-10-02 04:14:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_read_guest_page);
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
|
|
|
|
int offset, int len)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
|
|
|
|
|
|
|
return __kvm_read_guest_page(slot, gfn, data, offset, len);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
|
|
|
|
|
2007-10-02 04:14:18 +08:00
|
|
|
int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
int seg;
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
while ((seg = next_segment(len, offset)) != 0) {
|
|
|
|
ret = kvm_read_guest_page(kvm, gfn, data, offset, seg);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
offset = 0;
|
|
|
|
len -= seg;
|
|
|
|
data += seg;
|
|
|
|
++gfn;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_read_guest);
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
int kvm_vcpu_read_guest(struct kvm_vcpu *vcpu, gpa_t gpa, void *data, unsigned long len)
|
2007-12-21 08:18:23 +08:00
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
2015-05-17 19:58:53 +08:00
|
|
|
int seg;
|
2007-12-21 08:18:23 +08:00
|
|
|
int offset = offset_in_page(gpa);
|
2015-05-17 19:58:53 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
while ((seg = next_segment(len, offset)) != 0) {
|
|
|
|
ret = kvm_vcpu_read_guest_page(vcpu, gfn, data, offset, seg);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
offset = 0;
|
|
|
|
len -= seg;
|
|
|
|
data += seg;
|
|
|
|
++gfn;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest);
|
2007-12-21 08:18:23 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
static int __kvm_read_guest_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
|
|
|
|
void *data, int offset, unsigned long len)
|
|
|
|
{
|
|
|
|
int r;
|
|
|
|
unsigned long addr;
|
|
|
|
|
|
|
|
addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
|
2007-12-21 08:18:23 +08:00
|
|
|
if (kvm_is_error_hva(addr))
|
|
|
|
return -EFAULT;
|
2008-01-31 02:57:35 +08:00
|
|
|
pagefault_disable();
|
2015-04-02 20:08:20 +08:00
|
|
|
r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
|
2008-01-31 02:57:35 +08:00
|
|
|
pagefault_enable();
|
2007-12-21 08:18:23 +08:00
|
|
|
if (r)
|
|
|
|
return -EFAULT;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa, void *data,
|
|
|
|
unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
|
|
|
|
return __kvm_read_guest_atomic(slot, gfn, data, offset, len);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_read_guest_atomic);
|
|
|
|
|
|
|
|
int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa,
|
|
|
|
void *data, unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
|
|
|
|
return __kvm_read_guest_atomic(slot, gfn, data, offset, len);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
|
|
|
|
|
|
|
|
static int __kvm_write_guest_page(struct kvm_memory_slot *memslot, gfn_t gfn,
|
|
|
|
const void *data, int offset, int len)
|
2007-10-02 04:14:18 +08:00
|
|
|
{
|
2007-11-12 04:10:22 +08:00
|
|
|
int r;
|
|
|
|
unsigned long addr;
|
2007-10-02 04:14:18 +08:00
|
|
|
|
2015-04-11 03:47:27 +08:00
|
|
|
addr = gfn_to_hva_memslot(memslot, gfn);
|
2007-11-12 04:10:22 +08:00
|
|
|
if (kvm_is_error_hva(addr))
|
|
|
|
return -EFAULT;
|
2011-05-15 23:22:04 +08:00
|
|
|
r = __copy_to_user((void __user *)addr + offset, data, len);
|
2007-11-12 04:10:22 +08:00
|
|
|
if (r)
|
2007-10-02 04:14:18 +08:00
|
|
|
return -EFAULT;
|
2015-05-26 18:43:41 +08:00
|
|
|
mark_page_dirty_in_slot(memslot, gfn);
|
2007-10-02 04:14:18 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2015-05-17 19:58:53 +08:00
|
|
|
|
|
|
|
int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn,
|
|
|
|
const void *data, int offset, int len)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
|
|
|
|
|
|
|
|
return __kvm_write_guest_page(slot, gfn, data, offset, len);
|
|
|
|
}
|
2007-10-02 04:14:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_write_guest_page);
|
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
|
|
|
|
const void *data, int offset, int len)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
|
|
|
|
|
|
|
return __kvm_write_guest_page(slot, gfn, data, offset, len);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page);
|
|
|
|
|
2007-10-02 04:14:18 +08:00
|
|
|
int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
|
|
|
|
unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
int seg;
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
while ((seg = next_segment(len, offset)) != 0) {
|
|
|
|
ret = kvm_write_guest_page(kvm, gfn, data, offset, seg);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
offset = 0;
|
|
|
|
len -= seg;
|
|
|
|
data += seg;
|
|
|
|
++gfn;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
2014-12-11 13:52:58 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_write_guest);
|
2007-10-02 04:14:18 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
|
|
|
|
unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
int seg;
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
while ((seg = next_segment(len, offset)) != 0) {
|
|
|
|
ret = kvm_vcpu_write_guest_page(vcpu, gfn, data, offset, seg);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
offset = 0;
|
|
|
|
len -= seg;
|
|
|
|
data += seg;
|
|
|
|
++gfn;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest);
|
|
|
|
|
2017-02-04 12:32:28 +08:00
|
|
|
static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots,
|
|
|
|
struct gfn_to_hva_cache *ghc,
|
|
|
|
gpa_t gpa, unsigned long len)
|
2010-10-18 21:22:23 +08:00
|
|
|
{
|
|
|
|
int offset = offset_in_page(gpa);
|
2013-03-30 00:35:21 +08:00
|
|
|
gfn_t start_gfn = gpa >> PAGE_SHIFT;
|
|
|
|
gfn_t end_gfn = (gpa + len - 1) >> PAGE_SHIFT;
|
|
|
|
gfn_t nr_pages_needed = end_gfn - start_gfn + 1;
|
|
|
|
gfn_t nr_pages_avail;
|
2018-12-18 05:53:33 +08:00
|
|
|
int r = start_gfn <= end_gfn ? 0 : -EINVAL;
|
2010-10-18 21:22:23 +08:00
|
|
|
|
|
|
|
ghc->gpa = gpa;
|
|
|
|
ghc->generation = slots->generation;
|
2013-03-30 00:35:21 +08:00
|
|
|
ghc->len = len;
|
2018-12-18 05:53:33 +08:00
|
|
|
ghc->hva = KVM_HVA_ERR_BAD;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the requested region crosses two memslots, we still
|
|
|
|
* verify that the entire region is valid here.
|
|
|
|
*/
|
|
|
|
while (!r && start_gfn <= end_gfn) {
|
|
|
|
ghc->memslot = __gfn_to_memslot(slots, start_gfn);
|
|
|
|
ghc->hva = gfn_to_hva_many(ghc->memslot, start_gfn,
|
|
|
|
&nr_pages_avail);
|
|
|
|
if (kvm_is_error_hva(ghc->hva))
|
|
|
|
r = -EFAULT;
|
|
|
|
start_gfn += nr_pages_avail;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Use the slow path for cross page reads and writes. */
|
|
|
|
if (!r && nr_pages_needed == 1)
|
2010-10-18 21:22:23 +08:00
|
|
|
ghc->hva += offset;
|
2018-12-18 05:53:33 +08:00
|
|
|
else
|
2013-03-30 00:35:21 +08:00
|
|
|
ghc->memslot = NULL;
|
2018-12-18 05:53:33 +08:00
|
|
|
|
|
|
|
return r;
|
2010-10-18 21:22:23 +08:00
|
|
|
}
|
2017-02-04 12:32:28 +08:00
|
|
|
|
2017-05-02 22:20:18 +08:00
|
|
|
int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
|
2017-02-04 12:32:28 +08:00
|
|
|
gpa_t gpa, unsigned long len)
|
|
|
|
{
|
2017-05-02 22:20:18 +08:00
|
|
|
struct kvm_memslots *slots = kvm_memslots(kvm);
|
2017-02-04 12:32:28 +08:00
|
|
|
return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
|
|
|
|
}
|
2017-05-02 22:20:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_gfn_to_hva_cache_init);
|
2010-10-18 21:22:23 +08:00
|
|
|
|
2017-05-02 22:20:18 +08:00
|
|
|
int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
|
2018-12-15 06:34:43 +08:00
|
|
|
void *data, unsigned int offset,
|
|
|
|
unsigned long len)
|
2010-10-18 21:22:23 +08:00
|
|
|
{
|
2017-05-02 22:20:18 +08:00
|
|
|
struct kvm_memslots *slots = kvm_memslots(kvm);
|
2010-10-18 21:22:23 +08:00
|
|
|
int r;
|
2016-11-02 17:08:34 +08:00
|
|
|
gpa_t gpa = ghc->gpa + offset;
|
2010-10-18 21:22:23 +08:00
|
|
|
|
2016-11-02 17:08:34 +08:00
|
|
|
BUG_ON(len + offset > ghc->len);
|
2013-03-30 00:35:21 +08:00
|
|
|
|
2010-10-18 21:22:23 +08:00
|
|
|
if (slots->generation != ghc->generation)
|
2017-02-04 12:32:28 +08:00
|
|
|
__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
|
2013-03-30 00:35:21 +08:00
|
|
|
|
|
|
|
if (unlikely(!ghc->memslot))
|
2017-05-02 22:20:18 +08:00
|
|
|
return kvm_write_guest(kvm, gpa, data, len);
|
2010-10-18 21:22:23 +08:00
|
|
|
|
|
|
|
if (kvm_is_error_hva(ghc->hva))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-11-02 17:08:34 +08:00
|
|
|
r = __copy_to_user((void __user *)ghc->hva + offset, data, len);
|
2010-10-18 21:22:23 +08:00
|
|
|
if (r)
|
|
|
|
return -EFAULT;
|
2016-11-02 17:08:34 +08:00
|
|
|
mark_page_dirty_in_slot(ghc->memslot, gpa >> PAGE_SHIFT);
|
2010-10-18 21:22:23 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2017-05-02 22:20:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_write_guest_offset_cached);
|
2016-11-02 17:08:34 +08:00
|
|
|
|
2017-05-02 22:20:18 +08:00
|
|
|
int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
|
|
|
|
void *data, unsigned long len)
|
2016-11-02 17:08:34 +08:00
|
|
|
{
|
2017-05-02 22:20:18 +08:00
|
|
|
return kvm_write_guest_offset_cached(kvm, ghc, data, 0, len);
|
2016-11-02 17:08:34 +08:00
|
|
|
}
|
2017-05-02 22:20:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_write_guest_cached);
|
2010-10-18 21:22:23 +08:00
|
|
|
|
2017-05-02 22:20:18 +08:00
|
|
|
int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
|
|
|
|
void *data, unsigned long len)
|
2011-07-12 03:28:11 +08:00
|
|
|
{
|
2017-05-02 22:20:18 +08:00
|
|
|
struct kvm_memslots *slots = kvm_memslots(kvm);
|
2011-07-12 03:28:11 +08:00
|
|
|
int r;
|
|
|
|
|
2013-03-30 00:35:21 +08:00
|
|
|
BUG_ON(len > ghc->len);
|
|
|
|
|
2011-07-12 03:28:11 +08:00
|
|
|
if (slots->generation != ghc->generation)
|
2017-02-04 12:32:28 +08:00
|
|
|
__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
|
2013-03-30 00:35:21 +08:00
|
|
|
|
|
|
|
if (unlikely(!ghc->memslot))
|
2017-05-02 22:20:18 +08:00
|
|
|
return kvm_read_guest(kvm, ghc->gpa, data, len);
|
2011-07-12 03:28:11 +08:00
|
|
|
|
|
|
|
if (kvm_is_error_hva(ghc->hva))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
r = __copy_from_user(data, (void __user *)ghc->hva, len);
|
|
|
|
if (r)
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2017-05-02 22:20:18 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
|
2011-07-12 03:28:11 +08:00
|
|
|
|
2007-10-02 04:14:18 +08:00
|
|
|
int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
|
|
|
|
{
|
2013-11-18 17:35:55 +08:00
|
|
|
const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
|
|
|
|
|
|
|
|
return kvm_write_guest_page(kvm, gfn, zero_page, offset, len);
|
2007-10-02 04:14:18 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_clear_guest_page);
|
|
|
|
|
|
|
|
int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
|
|
|
|
{
|
|
|
|
gfn_t gfn = gpa >> PAGE_SHIFT;
|
|
|
|
int seg;
|
|
|
|
int offset = offset_in_page(gpa);
|
|
|
|
int ret;
|
|
|
|
|
2015-02-20 21:21:36 +08:00
|
|
|
while ((seg = next_segment(len, offset)) != 0) {
|
2007-10-02 04:14:18 +08:00
|
|
|
ret = kvm_clear_guest_page(kvm, gfn, offset, seg);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
offset = 0;
|
|
|
|
len -= seg;
|
|
|
|
++gfn;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_clear_guest);
|
|
|
|
|
2015-05-26 18:43:41 +08:00
|
|
|
static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
|
2013-12-30 04:12:29 +08:00
|
|
|
gfn_t gfn)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-07-31 18:41:14 +08:00
|
|
|
if (memslot && memslot->dirty_bitmap) {
|
|
|
|
unsigned long rel_gfn = gfn - memslot->base_gfn;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2012-10-05 08:13:12 +08:00
|
|
|
set_bit_le(rel_gfn, memslot->dirty_bitmap);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-10-18 21:22:23 +08:00
|
|
|
void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *memslot;
|
|
|
|
|
|
|
|
memslot = gfn_to_memslot(kvm, gfn);
|
2015-05-26 18:43:41 +08:00
|
|
|
mark_page_dirty_in_slot(memslot, gfn);
|
2010-10-18 21:22:23 +08:00
|
|
|
}
|
2013-10-08 00:47:59 +08:00
|
|
|
EXPORT_SYMBOL_GPL(mark_page_dirty);
|
2010-10-18 21:22:23 +08:00
|
|
|
|
2015-05-17 19:58:53 +08:00
|
|
|
void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
|
|
|
|
{
|
|
|
|
struct kvm_memory_slot *memslot;
|
|
|
|
|
|
|
|
memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
|
|
|
|
mark_page_dirty_in_slot(memslot, gfn);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
|
|
|
|
|
2017-11-25 05:39:01 +08:00
|
|
|
void kvm_sigset_activate(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (!vcpu->sigset_active)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This does a lockless modification of ->real_blocked, which is fine
|
|
|
|
* because, only current can change ->real_blocked and all readers of
|
|
|
|
* ->real_blocked don't care as long ->real_blocked is always a subset
|
|
|
|
* of ->blocked.
|
|
|
|
*/
|
|
|
|
sigprocmask(SIG_SETMASK, &vcpu->sigset, ¤t->real_blocked);
|
|
|
|
}
|
|
|
|
|
|
|
|
void kvm_sigset_deactivate(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
if (!vcpu->sigset_active)
|
|
|
|
return;
|
|
|
|
|
|
|
|
sigprocmask(SIG_SETMASK, ¤t->real_blocked, NULL);
|
|
|
|
sigemptyset(¤t->real_blocked);
|
|
|
|
}
|
|
|
|
|
2015-09-03 22:07:38 +08:00
|
|
|
static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2019-01-27 18:17:16 +08:00
|
|
|
unsigned int old, val, grow, grow_start;
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2015-09-03 22:07:39 +08:00
|
|
|
old = val = vcpu->halt_poll_ns;
|
2019-01-27 18:17:16 +08:00
|
|
|
grow_start = READ_ONCE(halt_poll_ns_grow_start);
|
2016-02-09 20:47:55 +08:00
|
|
|
grow = READ_ONCE(halt_poll_ns_grow);
|
2019-01-27 18:17:14 +08:00
|
|
|
if (!grow)
|
|
|
|
goto out;
|
|
|
|
|
2019-01-27 18:17:16 +08:00
|
|
|
val *= grow;
|
|
|
|
if (val < grow_start)
|
|
|
|
val = grow_start;
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2016-03-09 08:19:44 +08:00
|
|
|
if (val > halt_poll_ns)
|
|
|
|
val = halt_poll_ns;
|
|
|
|
|
2015-09-03 22:07:38 +08:00
|
|
|
vcpu->halt_poll_ns = val;
|
2019-01-27 18:17:14 +08:00
|
|
|
out:
|
2015-09-03 22:07:39 +08:00
|
|
|
trace_kvm_halt_poll_ns_grow(vcpu->vcpu_id, val, old);
|
2015-09-03 22:07:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2016-02-09 20:47:55 +08:00
|
|
|
unsigned int old, val, shrink;
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2015-09-03 22:07:39 +08:00
|
|
|
old = val = vcpu->halt_poll_ns;
|
2016-02-09 20:47:55 +08:00
|
|
|
shrink = READ_ONCE(halt_poll_ns_shrink);
|
|
|
|
if (shrink == 0)
|
2015-09-03 22:07:38 +08:00
|
|
|
val = 0;
|
|
|
|
else
|
2016-02-09 20:47:55 +08:00
|
|
|
val /= shrink;
|
2015-09-03 22:07:38 +08:00
|
|
|
|
|
|
|
vcpu->halt_poll_ns = val;
|
2015-09-03 22:07:39 +08:00
|
|
|
trace_kvm_halt_poll_ns_shrink(vcpu->vcpu_id, val, old);
|
2015-09-03 22:07:38 +08:00
|
|
|
}
|
|
|
|
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
2018-06-28 05:59:11 +08:00
|
|
|
int ret = -EINTR;
|
|
|
|
int idx = srcu_read_lock(&vcpu->kvm->srcu);
|
|
|
|
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
if (kvm_arch_vcpu_runnable(vcpu)) {
|
|
|
|
kvm_make_request(KVM_REQ_UNHALT, vcpu);
|
2018-06-28 05:59:11 +08:00
|
|
|
goto out;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
}
|
|
|
|
if (kvm_cpu_has_pending_timer(vcpu))
|
2018-06-28 05:59:11 +08:00
|
|
|
goto out;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
if (signal_pending(current))
|
2018-06-28 05:59:11 +08:00
|
|
|
goto out;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
|
2018-06-28 05:59:11 +08:00
|
|
|
ret = 0;
|
|
|
|
out:
|
|
|
|
srcu_read_unlock(&vcpu->kvm->srcu, idx);
|
|
|
|
return ret;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
}
|
|
|
|
|
2007-07-18 17:15:21 +08:00
|
|
|
/*
|
|
|
|
* The vCPU has executed a HLT instruction with in-kernel mode enabled.
|
|
|
|
*/
|
2007-11-01 06:24:24 +08:00
|
|
|
void kvm_vcpu_block(struct kvm_vcpu *vcpu)
|
2007-06-05 20:53:05 +08:00
|
|
|
{
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
ktime_t start, cur;
|
2016-02-19 16:46:39 +08:00
|
|
|
DECLARE_SWAITQUEUE(wait);
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
bool waited = false;
|
2015-09-03 22:07:38 +08:00
|
|
|
u64 block_ns;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
|
|
|
|
start = cur = ktime_get();
|
2015-09-03 22:07:37 +08:00
|
|
|
if (vcpu->halt_poll_ns) {
|
|
|
|
ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2015-09-16 00:27:57 +08:00
|
|
|
++vcpu->stat.halt_attempted_poll;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
do {
|
|
|
|
/*
|
|
|
|
* This sets KVM_REQ_UNHALT if an interrupt
|
|
|
|
* arrives.
|
|
|
|
*/
|
|
|
|
if (kvm_vcpu_check_block(vcpu) < 0) {
|
|
|
|
++vcpu->stat.halt_successful_poll;
|
2016-05-13 18:16:35 +08:00
|
|
|
if (!vcpu_valid_wakeup(vcpu))
|
|
|
|
++vcpu->stat.halt_poll_invalid;
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
cur = ktime_get();
|
|
|
|
} while (single_task_running() && ktime_before(cur, stop));
|
|
|
|
}
|
2008-05-09 06:47:01 +08:00
|
|
|
|
2015-08-27 22:41:15 +08:00
|
|
|
kvm_arch_vcpu_blocking(vcpu);
|
|
|
|
|
2008-05-09 06:47:01 +08:00
|
|
|
for (;;) {
|
2018-06-12 16:34:52 +08:00
|
|
|
prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
|
2008-05-09 06:47:01 +08:00
|
|
|
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
if (kvm_vcpu_check_block(vcpu) < 0)
|
2008-05-09 06:47:01 +08:00
|
|
|
break;
|
|
|
|
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
waited = true;
|
2007-07-18 17:15:21 +08:00
|
|
|
schedule();
|
|
|
|
}
|
2007-06-05 20:53:05 +08:00
|
|
|
|
2016-02-19 16:46:39 +08:00
|
|
|
finish_swait(&vcpu->wq, &wait);
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
cur = ktime_get();
|
|
|
|
|
2015-08-27 22:41:15 +08:00
|
|
|
kvm_arch_vcpu_unblocking(vcpu);
|
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-05 01:20:58 +08:00
|
|
|
out:
|
2015-09-03 22:07:38 +08:00
|
|
|
block_ns = ktime_to_ns(cur) - ktime_to_ns(start);
|
|
|
|
|
2016-05-17 16:49:22 +08:00
|
|
|
if (!vcpu_valid_wakeup(vcpu))
|
|
|
|
shrink_halt_poll_ns(vcpu);
|
|
|
|
else if (halt_poll_ns) {
|
2015-09-03 22:07:38 +08:00
|
|
|
if (block_ns <= vcpu->halt_poll_ns)
|
|
|
|
;
|
|
|
|
/* we had a long block, shrink polling */
|
2016-05-17 16:49:22 +08:00
|
|
|
else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns)
|
2015-09-03 22:07:38 +08:00
|
|
|
shrink_halt_poll_ns(vcpu);
|
|
|
|
/* we had a short halt and our poll time is too small */
|
|
|
|
else if (vcpu->halt_poll_ns < halt_poll_ns &&
|
|
|
|
block_ns < halt_poll_ns)
|
|
|
|
grow_halt_poll_ns(vcpu);
|
2015-09-14 17:38:51 +08:00
|
|
|
} else
|
|
|
|
vcpu->halt_poll_ns = 0;
|
2015-09-03 22:07:38 +08:00
|
|
|
|
2016-05-13 18:16:35 +08:00
|
|
|
trace_kvm_vcpu_wakeup(block_ns, waited, vcpu_valid_wakeup(vcpu));
|
|
|
|
kvm_arch_vcpu_block_finish(vcpu);
|
2007-07-18 17:15:21 +08:00
|
|
|
}
|
2013-10-08 00:47:59 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_block);
|
2007-07-18 17:15:21 +08:00
|
|
|
|
2017-04-27 04:32:26 +08:00
|
|
|
bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
|
2012-03-09 05:44:24 +08:00
|
|
|
{
|
2016-02-19 16:46:39 +08:00
|
|
|
struct swait_queue_head *wqp;
|
2012-03-09 05:44:24 +08:00
|
|
|
|
|
|
|
wqp = kvm_arch_vcpu_wq(vcpu);
|
2017-09-14 04:08:22 +08:00
|
|
|
if (swq_has_sleeper(wqp)) {
|
2018-06-12 16:34:52 +08:00
|
|
|
swake_up_one(wqp);
|
2012-03-09 05:44:24 +08:00
|
|
|
++vcpu->stat.halt_wakeup;
|
2017-04-27 04:32:26 +08:00
|
|
|
return true;
|
2012-03-09 05:44:24 +08:00
|
|
|
}
|
|
|
|
|
2017-04-27 04:32:26 +08:00
|
|
|
return false;
|
2016-05-05 03:09:44 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
|
|
|
|
|
2017-05-04 21:14:13 +08:00
|
|
|
#ifndef CONFIG_S390
|
2016-05-05 03:09:44 +08:00
|
|
|
/*
|
|
|
|
* Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
|
|
|
|
*/
|
|
|
|
void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
int me;
|
|
|
|
int cpu = vcpu->cpu;
|
|
|
|
|
2017-04-27 04:32:26 +08:00
|
|
|
if (kvm_vcpu_wake_up(vcpu))
|
|
|
|
return;
|
|
|
|
|
2012-03-09 05:44:24 +08:00
|
|
|
me = get_cpu();
|
|
|
|
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
|
|
|
|
if (kvm_arch_vcpu_should_kick(vcpu))
|
|
|
|
smp_send_reschedule(cpu);
|
|
|
|
put_cpu();
|
|
|
|
}
|
2013-04-11 19:25:15 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
|
2017-05-04 21:14:13 +08:00
|
|
|
#endif /* !CONFIG_S390 */
|
2012-03-09 05:44:24 +08:00
|
|
|
|
2014-05-23 18:20:42 +08:00
|
|
|
int kvm_vcpu_yield_to(struct kvm_vcpu *target)
|
2012-04-25 21:30:38 +08:00
|
|
|
{
|
|
|
|
struct pid *pid;
|
|
|
|
struct task_struct *task = NULL;
|
2014-05-23 18:20:42 +08:00
|
|
|
int ret = 0;
|
2012-04-25 21:30:38 +08:00
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
pid = rcu_dereference(target->pid);
|
|
|
|
if (pid)
|
2014-09-19 07:40:41 +08:00
|
|
|
task = get_pid_task(pid, PIDTYPE_PID);
|
2012-04-25 21:30:38 +08:00
|
|
|
rcu_read_unlock();
|
|
|
|
if (!task)
|
2013-01-22 15:39:24 +08:00
|
|
|
return ret;
|
|
|
|
ret = yield_to(task, 1);
|
2012-04-25 21:30:38 +08:00
|
|
|
put_task_struct(task);
|
2013-01-22 15:39:24 +08:00
|
|
|
|
|
|
|
return ret;
|
2012-04-25 21:30:38 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
|
|
|
|
|
2012-07-19 17:47:52 +08:00
|
|
|
/*
|
|
|
|
* Helper that checks whether a VCPU is eligible for directed yield.
|
|
|
|
* Most eligible candidate to yield is decided by following heuristics:
|
|
|
|
*
|
|
|
|
* (a) VCPU which has not done pl-exit or cpu relax intercepted recently
|
|
|
|
* (preempted lock holder), indicated by @in_spin_loop.
|
|
|
|
* Set at the beiginning and cleared at the end of interception/PLE handler.
|
|
|
|
*
|
|
|
|
* (b) VCPU which has done pl-exit/ cpu relax intercepted but did not get
|
|
|
|
* chance last time (mostly it has become eligible now since we have probably
|
|
|
|
* yielded to lockholder in last iteration. This is done by toggling
|
|
|
|
* @dy_eligible each time a VCPU checked for eligibility.)
|
|
|
|
*
|
|
|
|
* Yielding to a recently pl-exited/cpu relax intercepted VCPU before yielding
|
|
|
|
* to preempted lock-holder could result in wrong VCPU selection and CPU
|
|
|
|
* burning. Giving priority for a potential lock-holder increases lock
|
|
|
|
* progress.
|
|
|
|
*
|
|
|
|
* Since algorithm is based on heuristics, accessing another VCPU data without
|
|
|
|
* locking does not harm. It may result in trying to yield to same VCPU, fail
|
|
|
|
* and continue with next VCPU and so on.
|
|
|
|
*/
|
2013-12-30 04:12:29 +08:00
|
|
|
static bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
|
2012-07-19 17:47:52 +08:00
|
|
|
{
|
2014-01-10 08:43:16 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
|
2012-07-19 17:47:52 +08:00
|
|
|
bool eligible;
|
|
|
|
|
|
|
|
eligible = !vcpu->spin_loop.in_spin_loop ||
|
2014-09-05 03:13:31 +08:00
|
|
|
vcpu->spin_loop.dy_eligible;
|
2012-07-19 17:47:52 +08:00
|
|
|
|
|
|
|
if (vcpu->spin_loop.in_spin_loop)
|
|
|
|
kvm_vcpu_set_dy_eligible(vcpu, !vcpu->spin_loop.dy_eligible);
|
|
|
|
|
|
|
|
return eligible;
|
2014-01-10 08:43:16 +08:00
|
|
|
#else
|
|
|
|
return true;
|
2012-07-19 17:47:52 +08:00
|
|
|
#endif
|
2014-01-10 08:43:16 +08:00
|
|
|
}
|
2013-01-22 15:39:24 +08:00
|
|
|
|
2017-08-08 12:05:32 +08:00
|
|
|
void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
|
2009-10-09 18:03:20 +08:00
|
|
|
{
|
2011-02-01 22:53:28 +08:00
|
|
|
struct kvm *kvm = me->kvm;
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
|
|
|
|
int yielded = 0;
|
2013-01-22 15:39:24 +08:00
|
|
|
int try = 3;
|
2011-02-01 22:53:28 +08:00
|
|
|
int pass;
|
|
|
|
int i;
|
2009-10-09 18:03:20 +08:00
|
|
|
|
2012-07-18 21:37:46 +08:00
|
|
|
kvm_vcpu_set_in_spin_loop(me, true);
|
2011-02-01 22:53:28 +08:00
|
|
|
/*
|
|
|
|
* We boost the priority of a VCPU that is runnable but not
|
|
|
|
* currently running, because it got preempted by something
|
|
|
|
* else and called schedule in __vcpu_run. Hopefully that
|
|
|
|
* VCPU is holding the lock that we need and will release it.
|
|
|
|
* We approximate round-robin by starting at the last boosted VCPU.
|
|
|
|
*/
|
2013-01-22 15:39:24 +08:00
|
|
|
for (pass = 0; pass < 2 && !yielded && try; pass++) {
|
2011-02-01 22:53:28 +08:00
|
|
|
kvm_for_each_vcpu(i, vcpu, kvm) {
|
2012-06-20 04:51:04 +08:00
|
|
|
if (!pass && i <= last_boosted_vcpu) {
|
2011-02-01 22:53:28 +08:00
|
|
|
i = last_boosted_vcpu;
|
|
|
|
continue;
|
|
|
|
} else if (pass && i > last_boosted_vcpu)
|
|
|
|
break;
|
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24 05:07:29 +08:00
|
|
|
if (!READ_ONCE(vcpu->preempted))
|
2013-03-05 02:02:27 +08:00
|
|
|
continue;
|
2011-02-01 22:53:28 +08:00
|
|
|
if (vcpu == me)
|
|
|
|
continue;
|
2016-02-19 16:46:39 +08:00
|
|
|
if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
|
2011-02-01 22:53:28 +08:00
|
|
|
continue;
|
2017-08-08 12:05:32 +08:00
|
|
|
if (yield_to_kernel_mode && !kvm_arch_vcpu_in_kernel(vcpu))
|
|
|
|
continue;
|
2012-07-19 17:47:52 +08:00
|
|
|
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
|
|
|
|
continue;
|
2013-01-22 15:39:24 +08:00
|
|
|
|
|
|
|
yielded = kvm_vcpu_yield_to(vcpu);
|
|
|
|
if (yielded > 0) {
|
2011-02-01 22:53:28 +08:00
|
|
|
kvm->last_boosted_vcpu = i;
|
|
|
|
break;
|
2013-01-22 15:39:24 +08:00
|
|
|
} else if (yielded < 0) {
|
|
|
|
try--;
|
|
|
|
if (!try)
|
|
|
|
break;
|
2011-02-01 22:53:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2012-07-18 21:37:46 +08:00
|
|
|
kvm_vcpu_set_in_spin_loop(me, false);
|
2012-07-19 17:47:52 +08:00
|
|
|
|
|
|
|
/* Ensure vcpu is not eligible during next spinloop */
|
|
|
|
kvm_vcpu_set_dy_eligible(me, false);
|
2009-10-09 18:03:20 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
|
|
|
|
|
2018-04-19 03:19:58 +08:00
|
|
|
static vm_fault_t kvm_vcpu_fault(struct vm_fault *vmf)
|
2007-02-22 18:58:31 +08:00
|
|
|
{
|
2017-02-25 06:56:41 +08:00
|
|
|
struct kvm_vcpu *vcpu = vmf->vma->vm_file->private_data;
|
2007-02-22 18:58:31 +08:00
|
|
|
struct page *page;
|
|
|
|
|
2007-12-05 15:15:52 +08:00
|
|
|
if (vmf->pgoff == 0)
|
2007-03-20 18:46:50 +08:00
|
|
|
page = virt_to_page(vcpu->run);
|
2008-01-24 00:14:23 +08:00
|
|
|
#ifdef CONFIG_X86
|
2007-12-05 15:15:52 +08:00
|
|
|
else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET)
|
2007-12-13 23:50:52 +08:00
|
|
|
page = virt_to_page(vcpu->arch.pio_data);
|
2008-05-30 22:05:54 +08:00
|
|
|
#endif
|
2017-03-31 19:53:23 +08:00
|
|
|
#ifdef CONFIG_KVM_MMIO
|
2008-05-30 22:05:54 +08:00
|
|
|
else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
|
|
|
|
page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
|
2008-01-24 00:14:23 +08:00
|
|
|
#endif
|
2007-03-20 18:46:50 +08:00
|
|
|
else
|
2012-01-04 17:25:23 +08:00
|
|
|
return kvm_arch_vcpu_fault(vcpu, vmf);
|
2007-02-22 18:58:31 +08:00
|
|
|
get_page(page);
|
2007-12-05 15:15:52 +08:00
|
|
|
vmf->page = page;
|
|
|
|
return 0;
|
2007-02-22 18:58:31 +08:00
|
|
|
}
|
|
|
|
|
2009-09-28 02:29:37 +08:00
|
|
|
static const struct vm_operations_struct kvm_vcpu_vm_ops = {
|
2007-12-05 15:15:52 +08:00
|
|
|
.fault = kvm_vcpu_fault,
|
2007-02-22 18:58:31 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
|
|
|
|
{
|
|
|
|
vma->vm_ops = &kvm_vcpu_vm_ops;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
static int kvm_vcpu_release(struct inode *inode, struct file *filp)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = filp->private_data;
|
|
|
|
|
2016-09-16 22:27:35 +08:00
|
|
|
debugfs_remove_recursive(vcpu->debugfs_dentry);
|
2008-04-20 03:33:56 +08:00
|
|
|
kvm_put_kvm(vcpu->kvm);
|
2007-02-22 00:04:26 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-12-02 18:17:32 +08:00
|
|
|
static struct file_operations kvm_vcpu_fops = {
|
2007-02-22 00:04:26 +08:00
|
|
|
.release = kvm_vcpu_release,
|
|
|
|
.unlocked_ioctl = kvm_vcpu_ioctl,
|
2007-02-22 18:58:31 +08:00
|
|
|
.mmap = kvm_vcpu_mmap,
|
llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-08-16 00:52:59 +08:00
|
|
|
.llseek = noop_llseek,
|
2018-06-17 17:16:21 +08:00
|
|
|
KVM_COMPAT(kvm_vcpu_compat_ioctl),
|
2007-02-22 00:04:26 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocates an inode for the vcpu.
|
|
|
|
*/
|
|
|
|
static int create_vcpu_fd(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
kvm: embed vcpu id to dentry of vcpu anon inode
All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means
it is impossible to know the mapping between fds for vcpu and vcpu
from userland.
# LC_ALL=C ls -l /proc/617/fd | grep vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu
It is also impossible to know the mapping between vma for kvm_run
structure and vcpu from userland.
# LC_ALL=C grep vcpu /proc/617/maps
7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu
7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu
This change adds vcpu id to d-entries for vcpu. With this change
you can get the following output:
# LC_ALL=C ls -l /proc/617/fd | grep vcpu
lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu:0
lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu:1
# LC_ALL=C grep vcpu /proc/617/maps
7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:0
7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:1
With the mappings known from the output, a tool like strace can report more details
of qemu-kvm process activities. Here is the strace output of my local prototype:
# ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K'
...
[pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO)
K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
[pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO)
K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
...
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-20 03:04:22 +08:00
|
|
|
char name[8 + 1 + ITOA_MAX_LEN + 1];
|
|
|
|
|
|
|
|
snprintf(name, sizeof(name), "kvm-vcpu:%d", vcpu->vcpu_id);
|
|
|
|
return anon_inode_getfd(name, &kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC);
|
2007-02-22 00:04:26 +08:00
|
|
|
}
|
|
|
|
|
2016-09-16 22:27:35 +08:00
|
|
|
static int kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu)
|
|
|
|
{
|
|
|
|
char dir_name[ITOA_MAX_LEN * 2];
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (!kvm_arch_has_vcpu_debugfs())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!debugfs_initialized())
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id);
|
|
|
|
vcpu->debugfs_dentry = debugfs_create_dir(dir_name,
|
|
|
|
vcpu->kvm->debugfs_dentry);
|
|
|
|
if (!vcpu->debugfs_dentry)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
ret = kvm_arch_create_vcpu_debugfs(vcpu);
|
|
|
|
if (ret < 0) {
|
|
|
|
debugfs_remove_recursive(vcpu->debugfs_dentry);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-02-21 00:41:05 +08:00
|
|
|
/*
|
|
|
|
* Creates some virtual cpus. Good luck creating more than one.
|
|
|
|
*/
|
2009-06-09 20:56:28 +08:00
|
|
|
static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
|
2007-02-21 00:41:05 +08:00
|
|
|
{
|
|
|
|
int r;
|
2015-11-05 16:03:50 +08:00
|
|
|
struct kvm_vcpu *vcpu;
|
2007-02-21 00:41:05 +08:00
|
|
|
|
2016-05-10 00:13:37 +08:00
|
|
|
if (id >= KVM_MAX_VCPU_ID)
|
2013-11-19 08:09:22 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2016-06-13 20:48:25 +08:00
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
if (kvm->created_vcpus == KVM_MAX_VCPUS) {
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
kvm->created_vcpus++;
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
|
2009-06-09 20:56:28 +08:00
|
|
|
vcpu = kvm_arch_vcpu_create(kvm, id);
|
2016-06-13 20:48:25 +08:00
|
|
|
if (IS_ERR(vcpu)) {
|
|
|
|
r = PTR_ERR(vcpu);
|
|
|
|
goto vcpu_decrement;
|
|
|
|
}
|
2007-02-21 00:41:05 +08:00
|
|
|
|
2007-07-11 23:17:21 +08:00
|
|
|
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
|
|
|
|
|
2007-11-20 21:30:24 +08:00
|
|
|
r = kvm_arch_vcpu_setup(vcpu);
|
|
|
|
if (r)
|
2011-05-23 16:33:05 +08:00
|
|
|
goto vcpu_destroy;
|
2007-11-20 21:30:24 +08:00
|
|
|
|
2016-09-16 22:27:35 +08:00
|
|
|
r = kvm_create_vcpu_debugfs(vcpu);
|
|
|
|
if (r)
|
|
|
|
goto vcpu_destroy;
|
|
|
|
|
2007-07-23 14:51:37 +08:00
|
|
|
mutex_lock(&kvm->lock);
|
2015-11-05 16:03:50 +08:00
|
|
|
if (kvm_get_vcpu_by_id(kvm, id)) {
|
|
|
|
r = -EEXIST;
|
|
|
|
goto unlock_vcpu_destroy;
|
|
|
|
}
|
2009-06-09 20:56:28 +08:00
|
|
|
|
|
|
|
BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]);
|
2007-02-21 00:41:05 +08:00
|
|
|
|
2007-07-27 15:16:56 +08:00
|
|
|
/* Now it's all set up, let userspace reach it */
|
2008-04-20 03:33:56 +08:00
|
|
|
kvm_get_kvm(kvm);
|
2007-02-22 00:04:26 +08:00
|
|
|
r = create_vcpu_fd(vcpu);
|
2009-06-09 20:56:28 +08:00
|
|
|
if (r < 0) {
|
|
|
|
kvm_put_kvm(kvm);
|
2011-05-23 16:33:05 +08:00
|
|
|
goto unlock_vcpu_destroy;
|
2009-06-09 20:56:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
|
2015-07-29 17:32:20 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Pairs with smp_rmb() in kvm_get_vcpu. Write kvm->vcpus
|
|
|
|
* before kvm->online_vcpu's incremented value.
|
|
|
|
*/
|
2009-06-09 20:56:28 +08:00
|
|
|
smp_wmb();
|
|
|
|
atomic_inc(&kvm->online_vcpus);
|
|
|
|
|
|
|
|
mutex_unlock(&kvm->lock);
|
2012-11-28 09:29:02 +08:00
|
|
|
kvm_arch_vcpu_postcreate(vcpu);
|
2007-07-27 15:16:56 +08:00
|
|
|
return r;
|
2007-06-08 00:11:53 +08:00
|
|
|
|
2011-05-23 16:33:05 +08:00
|
|
|
unlock_vcpu_destroy:
|
2008-09-18 10:16:59 +08:00
|
|
|
mutex_unlock(&kvm->lock);
|
2016-09-16 22:27:35 +08:00
|
|
|
debugfs_remove_recursive(vcpu->debugfs_dentry);
|
2011-05-23 16:33:05 +08:00
|
|
|
vcpu_destroy:
|
2007-11-20 04:04:43 +08:00
|
|
|
kvm_arch_vcpu_destroy(vcpu);
|
2016-06-13 20:48:25 +08:00
|
|
|
vcpu_decrement:
|
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
kvm->created_vcpus--;
|
|
|
|
mutex_unlock(&kvm->lock);
|
2007-02-21 00:41:05 +08:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2007-03-06 01:46:05 +08:00
|
|
|
static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
|
|
|
|
{
|
|
|
|
if (sigset) {
|
|
|
|
sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP));
|
|
|
|
vcpu->sigset_active = 1;
|
|
|
|
vcpu->sigset = *sigset;
|
|
|
|
} else
|
|
|
|
vcpu->sigset_active = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
static long kvm_vcpu_ioctl(struct file *filp,
|
|
|
|
unsigned int ioctl, unsigned long arg)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-02-22 00:04:26 +08:00
|
|
|
struct kvm_vcpu *vcpu = filp->private_data;
|
2007-02-10 00:38:35 +08:00
|
|
|
void __user *argp = (void __user *)arg;
|
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-12 01:16:52 +08:00
|
|
|
int r;
|
2008-08-12 01:01:46 +08:00
|
|
|
struct kvm_fpu *fpu = NULL;
|
|
|
|
struct kvm_sregs *kvm_sregs = NULL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-21 22:41:05 +08:00
|
|
|
if (vcpu->kvm->mm != current->mm)
|
|
|
|
return -EIO;
|
2010-05-13 16:25:04 +08:00
|
|
|
|
2014-09-20 07:03:25 +08:00
|
|
|
if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2010-05-13 16:25:04 +08:00
|
|
|
/*
|
2017-12-13 00:41:34 +08:00
|
|
|
* Some architectures have vcpu ioctls that are asynchronous to vcpu
|
|
|
|
* execution; mutex_lock() would break them.
|
2010-05-13 16:25:04 +08:00
|
|
|
*/
|
2017-12-13 00:41:34 +08:00
|
|
|
r = kvm_arch_vcpu_async_ioctl(filp, ioctl, arg);
|
|
|
|
if (r != -ENOIOCTLCMD)
|
2012-09-16 16:50:30 +08:00
|
|
|
return r;
|
2010-05-13 16:25:04 +08:00
|
|
|
|
2017-12-05 04:35:23 +08:00
|
|
|
if (mutex_lock_killable(&vcpu->mutex))
|
|
|
|
return -EINTR;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
switch (ioctl) {
|
2017-07-06 20:44:28 +08:00
|
|
|
case KVM_RUN: {
|
|
|
|
struct pid *oldpid;
|
2007-03-07 19:11:17 +08:00
|
|
|
r = -EINVAL;
|
|
|
|
if (arg)
|
|
|
|
goto out;
|
2017-07-06 20:44:28 +08:00
|
|
|
oldpid = rcu_access_pointer(vcpu->pid);
|
2017-07-17 10:39:32 +08:00
|
|
|
if (unlikely(oldpid != task_pid(current))) {
|
2014-08-05 22:44:14 +08:00
|
|
|
/* The thread running this VCPU changed. */
|
2018-02-24 00:23:57 +08:00
|
|
|
struct pid *newpid;
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2018-02-24 00:23:57 +08:00
|
|
|
r = kvm_arch_vcpu_run_pid_change(vcpu);
|
|
|
|
if (r)
|
|
|
|
break;
|
|
|
|
|
|
|
|
newpid = get_task_pid(current, PIDTYPE_PID);
|
2014-08-05 22:44:14 +08:00
|
|
|
rcu_assign_pointer(vcpu->pid, newpid);
|
|
|
|
if (oldpid)
|
|
|
|
synchronize_rcu();
|
|
|
|
put_pid(oldpid);
|
|
|
|
}
|
2007-11-02 03:16:10 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
|
2010-10-24 22:49:08 +08:00
|
|
|
trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
2017-07-06 20:44:28 +08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
case KVM_GET_REGS: {
|
2008-02-25 18:52:20 +08:00
|
|
|
struct kvm_regs *kvm_regs;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-02-25 18:52:20 +08:00
|
|
|
r = -ENOMEM;
|
2019-02-12 03:02:49 +08:00
|
|
|
kvm_regs = kzalloc(sizeof(struct kvm_regs), GFP_KERNEL_ACCOUNT);
|
2008-02-25 18:52:20 +08:00
|
|
|
if (!kvm_regs)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2008-02-25 18:52:20 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_get_regs(vcpu, kvm_regs);
|
|
|
|
if (r)
|
|
|
|
goto out_free1;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
r = -EFAULT;
|
2008-02-25 18:52:20 +08:00
|
|
|
if (copy_to_user(argp, kvm_regs, sizeof(struct kvm_regs)))
|
|
|
|
goto out_free1;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
r = 0;
|
2008-02-25 18:52:20 +08:00
|
|
|
out_free1:
|
|
|
|
kfree(kvm_regs);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_SET_REGS: {
|
2008-02-25 18:52:20 +08:00
|
|
|
struct kvm_regs *kvm_regs;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-02-25 18:52:20 +08:00
|
|
|
r = -ENOMEM;
|
2011-12-05 01:36:29 +08:00
|
|
|
kvm_regs = memdup_user(argp, sizeof(*kvm_regs));
|
|
|
|
if (IS_ERR(kvm_regs)) {
|
|
|
|
r = PTR_ERR(kvm_regs);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2011-12-05 01:36:29 +08:00
|
|
|
}
|
2008-02-25 18:52:20 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs);
|
|
|
|
kfree(kvm_regs);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_GET_SREGS: {
|
2019-02-12 03:02:49 +08:00
|
|
|
kvm_sregs = kzalloc(sizeof(struct kvm_sregs),
|
|
|
|
GFP_KERNEL_ACCOUNT);
|
2008-08-12 01:01:46 +08:00
|
|
|
r = -ENOMEM;
|
|
|
|
if (!kvm_sregs)
|
|
|
|
goto out;
|
|
|
|
r = kvm_arch_vcpu_ioctl_get_sregs(vcpu, kvm_sregs);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
2008-08-12 01:01:46 +08:00
|
|
|
if (copy_to_user(argp, kvm_sregs, sizeof(struct kvm_sregs)))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_SET_SREGS: {
|
2011-12-05 01:36:29 +08:00
|
|
|
kvm_sregs = memdup_user(argp, sizeof(*kvm_sregs));
|
|
|
|
if (IS_ERR(kvm_sregs)) {
|
|
|
|
r = PTR_ERR(kvm_sregs);
|
2012-11-02 18:33:21 +08:00
|
|
|
kvm_sregs = NULL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2011-12-05 01:36:29 +08:00
|
|
|
}
|
2008-08-12 01:01:46 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
2008-04-12 00:24:45 +08:00
|
|
|
case KVM_GET_MP_STATE: {
|
|
|
|
struct kvm_mp_state mp_state;
|
|
|
|
|
|
|
|
r = kvm_arch_vcpu_ioctl_get_mpstate(vcpu, &mp_state);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_to_user(argp, &mp_state, sizeof(mp_state)))
|
2008-04-12 00:24:45 +08:00
|
|
|
goto out;
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_SET_MP_STATE: {
|
|
|
|
struct kvm_mp_state mp_state;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&mp_state, argp, sizeof(mp_state)))
|
2008-04-12 00:24:45 +08:00
|
|
|
goto out;
|
|
|
|
r = kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);
|
|
|
|
break;
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
case KVM_TRANSLATE: {
|
|
|
|
struct kvm_translation tr;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&tr, argp, sizeof(tr)))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2007-11-16 13:05:55 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_translate(vcpu, &tr);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_to_user(argp, &tr, sizeof(tr)))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
2008-12-15 20:52:10 +08:00
|
|
|
case KVM_SET_GUEST_DEBUG: {
|
|
|
|
struct kvm_guest_debug dbg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&dbg, argp, sizeof(dbg)))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2008-12-15 20:52:10 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_set_guest_debug(vcpu, &dbg);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
2007-03-06 01:46:05 +08:00
|
|
|
case KVM_SET_SIGNAL_MASK: {
|
|
|
|
struct kvm_signal_mask __user *sigmask_arg = argp;
|
|
|
|
struct kvm_signal_mask kvm_sigmask;
|
|
|
|
sigset_t sigset, *p;
|
|
|
|
|
|
|
|
p = NULL;
|
|
|
|
if (argp) {
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&kvm_sigmask, argp,
|
2015-02-26 14:58:19 +08:00
|
|
|
sizeof(kvm_sigmask)))
|
2007-03-06 01:46:05 +08:00
|
|
|
goto out;
|
|
|
|
r = -EINVAL;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (kvm_sigmask.len != sizeof(sigset))
|
2007-03-06 01:46:05 +08:00
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&sigset, sigmask_arg->sigset,
|
2015-02-26 14:58:19 +08:00
|
|
|
sizeof(sigset)))
|
2007-03-06 01:46:05 +08:00
|
|
|
goto out;
|
|
|
|
p = &sigset;
|
|
|
|
}
|
2010-06-10 19:10:47 +08:00
|
|
|
r = kvm_vcpu_ioctl_set_sigmask(vcpu, p);
|
2007-03-06 01:46:05 +08:00
|
|
|
break;
|
|
|
|
}
|
2007-04-01 21:34:31 +08:00
|
|
|
case KVM_GET_FPU: {
|
2019-02-12 03:02:49 +08:00
|
|
|
fpu = kzalloc(sizeof(struct kvm_fpu), GFP_KERNEL_ACCOUNT);
|
2008-08-12 01:01:46 +08:00
|
|
|
r = -ENOMEM;
|
|
|
|
if (!fpu)
|
|
|
|
goto out;
|
|
|
|
r = kvm_arch_vcpu_ioctl_get_fpu(vcpu, fpu);
|
2007-04-01 21:34:31 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
2008-08-12 01:01:46 +08:00
|
|
|
if (copy_to_user(argp, fpu, sizeof(struct kvm_fpu)))
|
2007-04-01 21:34:31 +08:00
|
|
|
goto out;
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_SET_FPU: {
|
2011-12-05 01:36:29 +08:00
|
|
|
fpu = memdup_user(argp, sizeof(*fpu));
|
|
|
|
if (IS_ERR(fpu)) {
|
|
|
|
r = PTR_ERR(fpu);
|
2012-11-02 18:33:21 +08:00
|
|
|
fpu = NULL;
|
2007-04-01 21:34:31 +08:00
|
|
|
goto out;
|
2011-12-05 01:36:29 +08:00
|
|
|
}
|
2008-08-12 01:01:46 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
|
2007-04-01 21:34:31 +08:00
|
|
|
break;
|
|
|
|
}
|
2007-02-22 00:04:26 +08:00
|
|
|
default:
|
KVM: Portability: split kvm_vcpu_ioctl
This patch splits kvm_vcpu_ioctl into archtecture independent parts, and
x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c.
Common ioctls for all architectures are:
KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT,
KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU
Note that some PPC chips don't have an FPU, so we might need an #ifdef
around KVM_GET/SET_FPU one day.
x86 specific ioctls are:
KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS
An interresting aspect is vcpu_load/vcpu_put. We now have a common
vcpu_load/put which does the preemption stuff, and an architecture
specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the
vmx/svm function defined in kvm_x86_ops.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-12 01:16:52 +08:00
|
|
|
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
|
2007-02-22 00:04:26 +08:00
|
|
|
}
|
|
|
|
out:
|
2017-12-05 04:35:23 +08:00
|
|
|
mutex_unlock(&vcpu->mutex);
|
2008-08-12 01:01:46 +08:00
|
|
|
kfree(fpu);
|
|
|
|
kfree(kvm_sregs);
|
2007-02-22 00:04:26 +08:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2015-02-03 16:35:15 +08:00
|
|
|
#ifdef CONFIG_KVM_COMPAT
|
2011-06-08 08:45:37 +08:00
|
|
|
static long kvm_vcpu_compat_ioctl(struct file *filp,
|
|
|
|
unsigned int ioctl, unsigned long arg)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = filp->private_data;
|
|
|
|
void __user *argp = compat_ptr(arg);
|
|
|
|
int r;
|
|
|
|
|
|
|
|
if (vcpu->kvm->mm != current->mm)
|
|
|
|
return -EIO;
|
|
|
|
|
|
|
|
switch (ioctl) {
|
|
|
|
case KVM_SET_SIGNAL_MASK: {
|
|
|
|
struct kvm_signal_mask __user *sigmask_arg = argp;
|
|
|
|
struct kvm_signal_mask kvm_sigmask;
|
|
|
|
sigset_t sigset;
|
|
|
|
|
|
|
|
if (argp) {
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&kvm_sigmask, argp,
|
2015-02-26 14:58:19 +08:00
|
|
|
sizeof(kvm_sigmask)))
|
2011-06-08 08:45:37 +08:00
|
|
|
goto out;
|
|
|
|
r = -EINVAL;
|
2017-09-04 09:45:17 +08:00
|
|
|
if (kvm_sigmask.len != sizeof(compat_sigset_t))
|
2011-06-08 08:45:37 +08:00
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
2017-09-04 09:45:17 +08:00
|
|
|
if (get_compat_sigset(&sigset, (void *)sigmask_arg->sigset))
|
2011-06-08 08:45:37 +08:00
|
|
|
goto out;
|
2012-08-22 21:34:11 +08:00
|
|
|
r = kvm_vcpu_ioctl_set_sigmask(vcpu, &sigset);
|
|
|
|
} else
|
|
|
|
r = kvm_vcpu_ioctl_set_sigmask(vcpu, NULL);
|
2011-06-08 08:45:37 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
r = kvm_vcpu_ioctl(filp, ioctl, arg);
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2019-04-18 18:39:36 +08:00
|
|
|
static int kvm_device_mmap(struct file *filp, struct vm_area_struct *vma)
|
|
|
|
{
|
|
|
|
struct kvm_device *dev = filp->private_data;
|
|
|
|
|
|
|
|
if (dev->ops->mmap)
|
|
|
|
return dev->ops->mmap(dev, vma);
|
|
|
|
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
static int kvm_device_ioctl_attr(struct kvm_device *dev,
|
|
|
|
int (*accessor)(struct kvm_device *dev,
|
|
|
|
struct kvm_device_attr *attr),
|
|
|
|
unsigned long arg)
|
|
|
|
{
|
|
|
|
struct kvm_device_attr attr;
|
|
|
|
|
|
|
|
if (!accessor)
|
|
|
|
return -EPERM;
|
|
|
|
|
|
|
|
if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
return accessor(dev, &attr);
|
|
|
|
}
|
|
|
|
|
|
|
|
static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
|
|
|
|
unsigned long arg)
|
|
|
|
{
|
|
|
|
struct kvm_device *dev = filp->private_data;
|
|
|
|
|
2019-02-16 04:48:39 +08:00
|
|
|
if (dev->kvm->mm != current->mm)
|
|
|
|
return -EIO;
|
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
switch (ioctl) {
|
|
|
|
case KVM_SET_DEVICE_ATTR:
|
|
|
|
return kvm_device_ioctl_attr(dev, dev->ops->set_attr, arg);
|
|
|
|
case KVM_GET_DEVICE_ATTR:
|
|
|
|
return kvm_device_ioctl_attr(dev, dev->ops->get_attr, arg);
|
|
|
|
case KVM_HAS_DEVICE_ATTR:
|
|
|
|
return kvm_device_ioctl_attr(dev, dev->ops->has_attr, arg);
|
|
|
|
default:
|
|
|
|
if (dev->ops->ioctl)
|
|
|
|
return dev->ops->ioctl(dev, ioctl, arg);
|
|
|
|
|
|
|
|
return -ENOTTY;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_device_release(struct inode *inode, struct file *filp)
|
|
|
|
{
|
|
|
|
struct kvm_device *dev = filp->private_data;
|
|
|
|
struct kvm *kvm = dev->kvm;
|
|
|
|
|
2019-04-18 18:39:41 +08:00
|
|
|
if (dev->ops->release) {
|
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
list_del(&dev->vm_node);
|
|
|
|
dev->ops->release(dev);
|
|
|
|
mutex_unlock(&kvm->lock);
|
|
|
|
}
|
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
kvm_put_kvm(kvm);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct file_operations kvm_device_fops = {
|
|
|
|
.unlocked_ioctl = kvm_device_ioctl,
|
|
|
|
.release = kvm_device_release,
|
2018-06-17 17:16:21 +08:00
|
|
|
KVM_COMPAT(kvm_device_ioctl),
|
2019-04-18 18:39:36 +08:00
|
|
|
.mmap = kvm_device_mmap,
|
2013-04-12 22:08:42 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_device *kvm_device_from_filp(struct file *filp)
|
|
|
|
{
|
|
|
|
if (filp->f_op != &kvm_device_fops)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return filp->private_data;
|
|
|
|
}
|
|
|
|
|
2014-09-02 17:27:33 +08:00
|
|
|
static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = {
|
2013-04-12 22:08:46 +08:00
|
|
|
#ifdef CONFIG_KVM_MPIC
|
2014-09-02 17:27:33 +08:00
|
|
|
[KVM_DEV_TYPE_FSL_MPIC_20] = &kvm_mpic_ops,
|
|
|
|
[KVM_DEV_TYPE_FSL_MPIC_42] = &kvm_mpic_ops,
|
2013-04-27 08:28:37 +08:00
|
|
|
#endif
|
2014-09-02 17:27:33 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type)
|
|
|
|
{
|
|
|
|
if (type >= ARRAY_SIZE(kvm_device_ops_table))
|
|
|
|
return -ENOSPC;
|
|
|
|
|
|
|
|
if (kvm_device_ops_table[type] != NULL)
|
|
|
|
return -EEXIST;
|
|
|
|
|
|
|
|
kvm_device_ops_table[type] = ops;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-10-09 18:30:08 +08:00
|
|
|
void kvm_unregister_device_ops(u32 type)
|
|
|
|
{
|
|
|
|
if (kvm_device_ops_table[type] != NULL)
|
|
|
|
kvm_device_ops_table[type] = NULL;
|
|
|
|
}
|
|
|
|
|
2013-04-12 22:08:42 +08:00
|
|
|
static int kvm_ioctl_create_device(struct kvm *kvm,
|
|
|
|
struct kvm_create_device *cd)
|
|
|
|
{
|
|
|
|
struct kvm_device_ops *ops = NULL;
|
|
|
|
struct kvm_device *dev;
|
|
|
|
bool test = cd->flags & KVM_CREATE_DEVICE_TEST;
|
|
|
|
int ret;
|
|
|
|
|
2014-09-02 17:27:33 +08:00
|
|
|
if (cd->type >= ARRAY_SIZE(kvm_device_ops_table))
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
ops = kvm_device_ops_table[cd->type];
|
|
|
|
if (ops == NULL)
|
2013-04-12 22:08:42 +08:00
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
if (test)
|
|
|
|
return 0;
|
|
|
|
|
2019-02-12 03:02:49 +08:00
|
|
|
dev = kzalloc(sizeof(*dev), GFP_KERNEL_ACCOUNT);
|
2013-04-12 22:08:42 +08:00
|
|
|
if (!dev)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dev->ops = ops;
|
|
|
|
dev->kvm = kvm;
|
|
|
|
|
2016-08-10 01:13:01 +08:00
|
|
|
mutex_lock(&kvm->lock);
|
2013-04-12 22:08:42 +08:00
|
|
|
ret = ops->create(dev, cd->type);
|
|
|
|
if (ret < 0) {
|
2016-08-10 01:13:01 +08:00
|
|
|
mutex_unlock(&kvm->lock);
|
2013-04-12 22:08:42 +08:00
|
|
|
kfree(dev);
|
|
|
|
return ret;
|
|
|
|
}
|
2016-08-10 01:13:01 +08:00
|
|
|
list_add(&dev->vm_node, &kvm->devices);
|
|
|
|
mutex_unlock(&kvm->lock);
|
2013-04-12 22:08:42 +08:00
|
|
|
|
2016-08-10 01:13:00 +08:00
|
|
|
if (ops->init)
|
|
|
|
ops->init(dev);
|
|
|
|
|
2019-01-26 08:54:33 +08:00
|
|
|
kvm_get_kvm(kvm);
|
2013-08-25 04:14:07 +08:00
|
|
|
ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR | O_CLOEXEC);
|
2013-04-12 22:08:42 +08:00
|
|
|
if (ret < 0) {
|
2019-01-26 08:54:33 +08:00
|
|
|
kvm_put_kvm(kvm);
|
2016-08-10 01:13:01 +08:00
|
|
|
mutex_lock(&kvm->lock);
|
|
|
|
list_del(&dev->vm_node);
|
|
|
|
mutex_unlock(&kvm->lock);
|
2016-12-01 03:21:05 +08:00
|
|
|
ops->destroy(dev);
|
2013-04-12 22:08:42 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
cd->fd = ret;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-07-15 00:33:08 +08:00
|
|
|
static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
|
|
|
|
{
|
|
|
|
switch (arg) {
|
|
|
|
case KVM_CAP_USER_MEMORY:
|
|
|
|
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
|
|
|
|
case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS:
|
|
|
|
case KVM_CAP_INTERNAL_ERROR_DATA:
|
|
|
|
#ifdef CONFIG_HAVE_KVM_MSI
|
|
|
|
case KVM_CAP_SIGNAL_MSI:
|
|
|
|
#endif
|
2014-06-30 18:51:13 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_IRQFD
|
2015-03-05 18:54:46 +08:00
|
|
|
case KVM_CAP_IRQFD:
|
2014-07-15 00:33:08 +08:00
|
|
|
case KVM_CAP_IRQFD_RESAMPLE:
|
|
|
|
#endif
|
2015-09-15 14:41:59 +08:00
|
|
|
case KVM_CAP_IOEVENTFD_ANY_LENGTH:
|
2014-07-15 00:33:08 +08:00
|
|
|
case KVM_CAP_CHECK_EXTENSION_VM:
|
2017-02-16 17:40:56 +08:00
|
|
|
case KVM_CAP_ENABLE_CAP_VM:
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
|
|
|
|
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT:
|
|
|
|
#endif
|
2014-07-15 00:33:08 +08:00
|
|
|
return 1;
|
2017-03-31 19:53:23 +08:00
|
|
|
#ifdef CONFIG_KVM_MMIO
|
2017-03-31 19:53:22 +08:00
|
|
|
case KVM_CAP_COALESCED_MMIO:
|
|
|
|
return KVM_COALESCED_MMIO_PAGE_OFFSET;
|
2018-10-14 07:09:55 +08:00
|
|
|
case KVM_CAP_COALESCED_PIO:
|
|
|
|
return 1;
|
2017-03-31 19:53:22 +08:00
|
|
|
#endif
|
2014-07-15 00:33:08 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
|
|
|
|
case KVM_CAP_IRQ_ROUTING:
|
|
|
|
return KVM_MAX_IRQ_ROUTES;
|
2015-05-17 23:30:37 +08:00
|
|
|
#endif
|
|
|
|
#if KVM_ADDRESS_SPACE_NUM > 1
|
|
|
|
case KVM_CAP_MULTI_ADDRESS_SPACE:
|
|
|
|
return KVM_ADDRESS_SPACE_NUM;
|
2014-07-15 00:33:08 +08:00
|
|
|
#endif
|
2016-05-10 00:13:37 +08:00
|
|
|
case KVM_CAP_MAX_VCPU_ID:
|
|
|
|
return KVM_MAX_VCPU_ID;
|
2014-07-15 00:33:08 +08:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return kvm_vm_ioctl_check_extension(kvm, arg);
|
|
|
|
}
|
|
|
|
|
2017-02-16 17:40:56 +08:00
|
|
|
int __attribute__((weak)) kvm_vm_ioctl_enable_cap(struct kvm *kvm,
|
|
|
|
struct kvm_enable_cap *cap)
|
|
|
|
{
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
|
|
|
|
struct kvm_enable_cap *cap)
|
|
|
|
{
|
|
|
|
switch (cap->cap) {
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
|
|
|
|
case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT:
|
|
|
|
if (cap->flags || (cap->args[0] & ~1))
|
|
|
|
return -EINVAL;
|
|
|
|
kvm->manual_dirty_log_protect = cap->args[0];
|
|
|
|
return 0;
|
|
|
|
#endif
|
2017-02-16 17:40:56 +08:00
|
|
|
default:
|
|
|
|
return kvm_vm_ioctl_enable_cap(kvm, cap);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-02-22 00:04:26 +08:00
|
|
|
static long kvm_vm_ioctl(struct file *filp,
|
|
|
|
unsigned int ioctl, unsigned long arg)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = filp->private_data;
|
|
|
|
void __user *argp = (void __user *)arg;
|
2007-10-29 23:08:35 +08:00
|
|
|
int r;
|
2007-02-22 00:04:26 +08:00
|
|
|
|
2007-11-21 22:41:05 +08:00
|
|
|
if (kvm->mm != current->mm)
|
|
|
|
return -EIO;
|
2007-02-22 00:04:26 +08:00
|
|
|
switch (ioctl) {
|
|
|
|
case KVM_CREATE_VCPU:
|
|
|
|
r = kvm_vm_ioctl_create_vcpu(kvm, arg);
|
|
|
|
break;
|
2017-02-16 17:40:56 +08:00
|
|
|
case KVM_ENABLE_CAP: {
|
|
|
|
struct kvm_enable_cap cap;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&cap, argp, sizeof(cap)))
|
|
|
|
goto out;
|
|
|
|
r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap);
|
|
|
|
break;
|
|
|
|
}
|
2007-10-10 01:20:39 +08:00
|
|
|
case KVM_SET_USER_MEMORY_REGION: {
|
|
|
|
struct kvm_userspace_memory_region kvm_userspace_mem;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&kvm_userspace_mem, argp,
|
2015-02-26 14:58:19 +08:00
|
|
|
sizeof(kvm_userspace_mem)))
|
2007-10-10 01:20:39 +08:00
|
|
|
goto out;
|
|
|
|
|
2013-02-27 18:43:00 +08:00
|
|
|
r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_GET_DIRTY_LOG: {
|
|
|
|
struct kvm_dirty_log log;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&log, argp, sizeof(log)))
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
goto out;
|
2007-02-21 00:27:58 +08:00
|
|
|
r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
break;
|
|
|
|
}
|
kvm: introduce manual dirty log reprotect
There are two problems with KVM_GET_DIRTY_LOG. First, and less important,
it can take kvm->mmu_lock for an extended period of time. Second, its user
can actually see many false positives in some cases. The latter is due
to a benign race like this:
1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
them.
2. The guest modifies the pages, causing them to be marked ditry.
3. Userspace actually copies the pages.
4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
they were not written to since (3).
This is especially a problem for large guests, where the time between
(1) and (3) can be substantial. This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns. Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-23 08:36:47 +08:00
|
|
|
#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
|
|
|
|
case KVM_CLEAR_DIRTY_LOG: {
|
|
|
|
struct kvm_clear_dirty_log log;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&log, argp, sizeof(log)))
|
|
|
|
goto out;
|
|
|
|
r = kvm_vm_ioctl_clear_dirty_log(kvm, &log);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
#endif
|
2017-03-31 19:53:23 +08:00
|
|
|
#ifdef CONFIG_KVM_MMIO
|
2008-05-30 22:05:54 +08:00
|
|
|
case KVM_REGISTER_COALESCED_MMIO: {
|
|
|
|
struct kvm_coalesced_mmio_zone zone;
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2008-05-30 22:05:54 +08:00
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&zone, argp, sizeof(zone)))
|
2008-05-30 22:05:54 +08:00
|
|
|
goto out;
|
|
|
|
r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case KVM_UNREGISTER_COALESCED_MMIO: {
|
|
|
|
struct kvm_coalesced_mmio_zone zone;
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2008-05-30 22:05:54 +08:00
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&zone, argp, sizeof(zone)))
|
2008-05-30 22:05:54 +08:00
|
|
|
goto out;
|
|
|
|
r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
#endif
|
2009-05-20 22:30:49 +08:00
|
|
|
case KVM_IRQFD: {
|
|
|
|
struct kvm_irqfd data;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&data, argp, sizeof(data)))
|
2009-05-20 22:30:49 +08:00
|
|
|
goto out;
|
2012-06-29 23:56:08 +08:00
|
|
|
r = kvm_irqfd(kvm, &data);
|
2009-05-20 22:30:49 +08:00
|
|
|
break;
|
|
|
|
}
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
case KVM_IOEVENTFD: {
|
|
|
|
struct kvm_ioeventfd data;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&data, argp, sizeof(data)))
|
KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest. Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.
Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.
However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc). For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible. All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling. This adds additional computational load on the
system, as well as latency to the signalling path.
Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd. This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.
To test this theory, we built a test-harness called "doorbell". This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled. It supports signalling
from either an eventfd, or an ioctl().
We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl(). The other is direct via
ioeventfd.
You can download this test harness here:
ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2
The measured results are as follows:
qemu-mmio: 110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio: 367300 iops, 2.72us rtt
I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy. However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:
qemu-pio: 153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt
these are just for fun, for now, until I can gather more data.
Here is a graph for your convenience:
http://developer.novell.com/wiki/images/7/76/Iofd-chart.png
The conclusion to draw is that we save about 4us by skipping the userspace
hop.
--------------------
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-07-08 05:08:49 +08:00
|
|
|
goto out;
|
|
|
|
r = kvm_ioeventfd(kvm, &data);
|
|
|
|
break;
|
|
|
|
}
|
2012-03-30 03:14:12 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_MSI
|
|
|
|
case KVM_SIGNAL_MSI: {
|
|
|
|
struct kvm_msi msi;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&msi, argp, sizeof(msi)))
|
2012-03-30 03:14:12 +08:00
|
|
|
goto out;
|
|
|
|
r = kvm_send_userspace_msi(kvm, &msi);
|
|
|
|
break;
|
|
|
|
}
|
2012-07-24 20:51:20 +08:00
|
|
|
#endif
|
|
|
|
#ifdef __KVM_HAVE_IRQ_LINE
|
|
|
|
case KVM_IRQ_LINE_STATUS:
|
|
|
|
case KVM_IRQ_LINE: {
|
|
|
|
struct kvm_irq_level irq_event;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_from_user(&irq_event, argp, sizeof(irq_event)))
|
2012-07-24 20:51:20 +08:00
|
|
|
goto out;
|
|
|
|
|
2013-04-11 19:21:40 +08:00
|
|
|
r = kvm_vm_ioctl_irq_line(kvm, &irq_event,
|
|
|
|
ioctl == KVM_IRQ_LINE_STATUS);
|
2012-07-24 20:51:20 +08:00
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (ioctl == KVM_IRQ_LINE_STATUS) {
|
2015-02-26 14:58:19 +08:00
|
|
|
if (copy_to_user(argp, &irq_event, sizeof(irq_event)))
|
2012-07-24 20:51:20 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
2009-06-09 20:56:28 +08:00
|
|
|
#endif
|
2013-04-16 03:12:53 +08:00
|
|
|
#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
|
|
|
|
case KVM_SET_GSI_ROUTING: {
|
|
|
|
struct kvm_irq_routing routing;
|
|
|
|
struct kvm_irq_routing __user *urouting;
|
2016-06-01 20:09:22 +08:00
|
|
|
struct kvm_irq_routing_entry *entries = NULL;
|
2013-04-16 03:12:53 +08:00
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&routing, argp, sizeof(routing)))
|
|
|
|
goto out;
|
|
|
|
r = -EINVAL;
|
2017-04-28 23:06:20 +08:00
|
|
|
if (!kvm_arch_can_set_irq_routing(kvm))
|
|
|
|
goto out;
|
kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
These days, we experienced one guest crash with 8 cores and 3 disks,
with qemu error logs as bellow:
qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
And then we found one patch(bdf026317d) in qemu tree, which said
could fix this bug.
Execute the following script will reproduce the BUG quickly:
irq_affinity.sh
========================================================================
vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
for irq in {1,2,4,8,10,20,40,80}
do
echo $irq > /proc/irq/$vda_irq_num/smp_affinity
echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
done
done
========================================================================
The following qemu log is added in the qemu code and is displayed when
this bug reproduced:
kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
irq_routes->nr: 1024, gsi_count: 1024.
That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
but in the kernel code when routes->nr >= 1024, will just return -EINVAL;
The nr is the number of the routing entries which is in of
[1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].
This patch fix the BUG above.
Cc: stable@vger.kernel.org
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com>
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-15 18:00:33 +08:00
|
|
|
if (routing.nr > KVM_MAX_IRQ_ROUTES)
|
2013-04-16 03:12:53 +08:00
|
|
|
goto out;
|
|
|
|
if (routing.flags)
|
|
|
|
goto out;
|
2016-06-01 20:09:22 +08:00
|
|
|
if (routing.nr) {
|
|
|
|
r = -ENOMEM;
|
treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:
vmalloc(a * b)
with:
vmalloc(array_size(a, b))
as well as handling cases of:
vmalloc(a * b * c)
with:
vmalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vmalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
vmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
vmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
vmalloc(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
vmalloc(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
vmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
vmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
vmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
vmalloc(C1 * C2 * C3, ...)
|
vmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@
(
vmalloc(C1 * C2, ...)
|
vmalloc(
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:27:11 +08:00
|
|
|
entries = vmalloc(array_size(sizeof(*entries),
|
|
|
|
routing.nr));
|
2016-06-01 20:09:22 +08:00
|
|
|
if (!entries)
|
|
|
|
goto out;
|
|
|
|
r = -EFAULT;
|
|
|
|
urouting = argp;
|
|
|
|
if (copy_from_user(entries, urouting->entries,
|
|
|
|
routing.nr * sizeof(*entries)))
|
|
|
|
goto out_free_irq_routing;
|
|
|
|
}
|
2013-04-16 03:12:53 +08:00
|
|
|
r = kvm_set_irq_routing(kvm, entries, routing.nr,
|
|
|
|
routing.flags);
|
2015-02-26 14:58:20 +08:00
|
|
|
out_free_irq_routing:
|
2013-04-16 03:12:53 +08:00
|
|
|
vfree(entries);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
|
2013-04-12 22:08:42 +08:00
|
|
|
case KVM_CREATE_DEVICE: {
|
|
|
|
struct kvm_create_device cd;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_from_user(&cd, argp, sizeof(cd)))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
r = kvm_ioctl_create_device(kvm, &cd);
|
|
|
|
if (r)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
r = -EFAULT;
|
|
|
|
if (copy_to_user(argp, &cd, sizeof(cd)))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
}
|
2014-07-15 00:33:08 +08:00
|
|
|
case KVM_CHECK_EXTENSION:
|
|
|
|
r = kvm_vm_ioctl_check_extension_generic(kvm, arg);
|
|
|
|
break;
|
2007-02-22 01:28:04 +08:00
|
|
|
default:
|
2007-10-29 23:08:35 +08:00
|
|
|
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
|
2007-02-22 01:28:04 +08:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2015-02-03 16:35:15 +08:00
|
|
|
#ifdef CONFIG_KVM_COMPAT
|
2009-10-22 20:19:27 +08:00
|
|
|
struct compat_kvm_dirty_log {
|
|
|
|
__u32 slot;
|
|
|
|
__u32 padding1;
|
|
|
|
union {
|
|
|
|
compat_uptr_t dirty_bitmap; /* one bit per page */
|
|
|
|
__u64 padding2;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
|
|
|
static long kvm_vm_compat_ioctl(struct file *filp,
|
|
|
|
unsigned int ioctl, unsigned long arg)
|
|
|
|
{
|
|
|
|
struct kvm *kvm = filp->private_data;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
if (kvm->mm != current->mm)
|
|
|
|
return -EIO;
|
|
|
|
switch (ioctl) {
|
|
|
|
case KVM_GET_DIRTY_LOG: {
|
|
|
|
struct compat_kvm_dirty_log compat_log;
|
|
|
|
struct kvm_dirty_log log;
|
|
|
|
|
|
|
|
if (copy_from_user(&compat_log, (void __user *)arg,
|
|
|
|
sizeof(compat_log)))
|
2017-01-22 18:30:21 +08:00
|
|
|
return -EFAULT;
|
2009-10-22 20:19:27 +08:00
|
|
|
log.slot = compat_log.slot;
|
|
|
|
log.padding1 = compat_log.padding1;
|
|
|
|
log.padding2 = compat_log.padding2;
|
|
|
|
log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);
|
|
|
|
|
|
|
|
r = kvm_vm_ioctl_get_dirty_log(kvm, &log);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
r = kvm_vm_ioctl(filp, ioctl, arg);
|
|
|
|
}
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-12-02 18:17:32 +08:00
|
|
|
static struct file_operations kvm_vm_fops = {
|
2007-02-22 01:28:04 +08:00
|
|
|
.release = kvm_vm_release,
|
|
|
|
.unlocked_ioctl = kvm_vm_ioctl,
|
llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-08-16 00:52:59 +08:00
|
|
|
.llseek = noop_llseek,
|
2018-06-17 17:16:21 +08:00
|
|
|
KVM_COMPAT(kvm_vm_compat_ioctl),
|
2007-02-22 01:28:04 +08:00
|
|
|
};
|
|
|
|
|
2012-01-04 17:25:20 +08:00
|
|
|
static int kvm_dev_ioctl_create_vm(unsigned long type)
|
2007-02-22 01:28:04 +08:00
|
|
|
{
|
2010-10-27 23:22:10 +08:00
|
|
|
int r;
|
2007-02-22 01:28:04 +08:00
|
|
|
struct kvm *kvm;
|
2016-07-15 00:54:17 +08:00
|
|
|
struct file *file;
|
2007-02-22 01:28:04 +08:00
|
|
|
|
2012-01-04 17:25:20 +08:00
|
|
|
kvm = kvm_create_vm(type);
|
2007-06-28 20:38:16 +08:00
|
|
|
if (IS_ERR(kvm))
|
|
|
|
return PTR_ERR(kvm);
|
2017-03-31 19:53:23 +08:00
|
|
|
#ifdef CONFIG_KVM_MMIO
|
2010-03-15 21:13:30 +08:00
|
|
|
r = kvm_coalesced_mmio_init(kvm);
|
2017-11-21 20:40:17 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto put_kvm;
|
2010-03-15 21:13:30 +08:00
|
|
|
#endif
|
2016-07-15 00:54:17 +08:00
|
|
|
r = get_unused_fd_flags(O_CLOEXEC);
|
2017-11-21 20:40:17 +08:00
|
|
|
if (r < 0)
|
|
|
|
goto put_kvm;
|
|
|
|
|
2016-07-15 00:54:17 +08:00
|
|
|
file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
|
|
|
|
if (IS_ERR(file)) {
|
|
|
|
put_unused_fd(r);
|
2017-11-21 20:40:17 +08:00
|
|
|
r = PTR_ERR(file);
|
|
|
|
goto put_kvm;
|
2016-07-15 00:54:17 +08:00
|
|
|
}
|
2016-05-18 19:26:23 +08:00
|
|
|
|
2017-06-27 21:45:09 +08:00
|
|
|
/*
|
|
|
|
* Don't call kvm_put_kvm anymore at this point; file->f_op is
|
|
|
|
* already set, with ->release() being kvm_vm_release(). In error
|
|
|
|
* cases it will be called by the final fput(file) and will take
|
|
|
|
* care of doing kvm_put_kvm(kvm).
|
|
|
|
*/
|
2016-05-18 19:26:23 +08:00
|
|
|
if (kvm_create_vm_debugfs(kvm, r) < 0) {
|
2016-07-15 00:54:17 +08:00
|
|
|
put_unused_fd(r);
|
|
|
|
fput(file);
|
2016-05-18 19:26:23 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2017-07-12 23:56:44 +08:00
|
|
|
kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
|
2007-02-22 01:28:04 +08:00
|
|
|
|
2016-07-15 00:54:17 +08:00
|
|
|
fd_install(r, file);
|
2010-10-27 23:22:10 +08:00
|
|
|
return r;
|
2017-11-21 20:40:17 +08:00
|
|
|
|
|
|
|
put_kvm:
|
|
|
|
kvm_put_kvm(kvm);
|
|
|
|
return r;
|
2007-02-22 01:28:04 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static long kvm_dev_ioctl(struct file *filp,
|
|
|
|
unsigned int ioctl, unsigned long arg)
|
|
|
|
{
|
2007-03-07 19:05:38 +08:00
|
|
|
long r = -EINVAL;
|
2007-02-22 01:28:04 +08:00
|
|
|
|
|
|
|
switch (ioctl) {
|
|
|
|
case KVM_GET_API_VERSION:
|
2007-03-07 19:11:17 +08:00
|
|
|
if (arg)
|
|
|
|
goto out;
|
2007-02-22 01:28:04 +08:00
|
|
|
r = KVM_API_VERSION;
|
|
|
|
break;
|
|
|
|
case KVM_CREATE_VM:
|
2012-01-04 17:25:20 +08:00
|
|
|
r = kvm_dev_ioctl_create_vm(arg);
|
2007-02-22 01:28:04 +08:00
|
|
|
break;
|
2007-11-15 23:07:47 +08:00
|
|
|
case KVM_CHECK_EXTENSION:
|
2014-07-15 00:27:35 +08:00
|
|
|
r = kvm_vm_ioctl_check_extension_generic(NULL, arg);
|
2007-03-01 23:56:20 +08:00
|
|
|
break;
|
2007-03-07 19:05:38 +08:00
|
|
|
case KVM_GET_VCPU_MMAP_SIZE:
|
|
|
|
if (arg)
|
|
|
|
goto out;
|
2008-01-24 21:13:08 +08:00
|
|
|
r = PAGE_SIZE; /* struct kvm_run */
|
|
|
|
#ifdef CONFIG_X86
|
|
|
|
r += PAGE_SIZE; /* pio data page */
|
2008-05-30 22:05:54 +08:00
|
|
|
#endif
|
2017-03-31 19:53:23 +08:00
|
|
|
#ifdef CONFIG_KVM_MMIO
|
2008-05-30 22:05:54 +08:00
|
|
|
r += PAGE_SIZE; /* coalesced mmio ring page */
|
2008-01-24 21:13:08 +08:00
|
|
|
#endif
|
2007-03-07 19:05:38 +08:00
|
|
|
break;
|
2008-04-10 20:47:53 +08:00
|
|
|
case KVM_TRACE_ENABLE:
|
|
|
|
case KVM_TRACE_PAUSE:
|
|
|
|
case KVM_TRACE_DISABLE:
|
2009-06-18 22:47:28 +08:00
|
|
|
r = -EOPNOTSUPP;
|
2008-04-10 20:47:53 +08:00
|
|
|
break;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
default:
|
2007-10-10 23:16:19 +08:00
|
|
|
return kvm_arch_dev_ioctl(filp, ioctl, arg);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
out:
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct file_operations kvm_chardev_ops = {
|
|
|
|
.unlocked_ioctl = kvm_dev_ioctl,
|
llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-08-16 00:52:59 +08:00
|
|
|
.llseek = noop_llseek,
|
2018-06-17 17:16:21 +08:00
|
|
|
KVM_COMPAT(kvm_dev_ioctl),
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct miscdevice kvm_dev = {
|
2007-03-04 19:27:36 +08:00
|
|
|
KVM_MINOR,
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
"kvm",
|
|
|
|
&kvm_chardev_ops,
|
|
|
|
};
|
|
|
|
|
2010-11-16 16:37:41 +08:00
|
|
|
static void hardware_enable_nolock(void *junk)
|
2007-05-24 18:03:52 +08:00
|
|
|
{
|
|
|
|
int cpu = raw_smp_processor_id();
|
2009-09-15 17:37:46 +08:00
|
|
|
int r;
|
2007-05-24 18:03:52 +08:00
|
|
|
|
2008-12-07 18:55:45 +08:00
|
|
|
if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
|
2007-05-24 18:03:52 +08:00
|
|
|
return;
|
2009-09-15 17:37:46 +08:00
|
|
|
|
2008-12-07 18:55:45 +08:00
|
|
|
cpumask_set_cpu(cpu, cpus_hardware_enabled);
|
2009-09-15 17:37:46 +08:00
|
|
|
|
2014-08-28 21:13:03 +08:00
|
|
|
r = kvm_arch_hardware_enable();
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
if (r) {
|
|
|
|
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
|
|
|
|
atomic_inc(&hardware_enable_failed);
|
2015-02-26 14:58:26 +08:00
|
|
|
pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
|
2009-09-15 17:37:46 +08:00
|
|
|
}
|
2007-05-24 18:03:52 +08:00
|
|
|
}
|
|
|
|
|
2016-07-14 01:16:37 +08:00
|
|
|
static int kvm_starting_cpu(unsigned int cpu)
|
2010-11-16 16:37:41 +08:00
|
|
|
{
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_lock(&kvm_count_lock);
|
2013-09-10 18:57:17 +08:00
|
|
|
if (kvm_usage_count)
|
|
|
|
hardware_enable_nolock(NULL);
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_unlock(&kvm_count_lock);
|
2016-07-14 01:16:37 +08:00
|
|
|
return 0;
|
2010-11-16 16:37:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void hardware_disable_nolock(void *junk)
|
2007-05-24 18:03:52 +08:00
|
|
|
{
|
|
|
|
int cpu = raw_smp_processor_id();
|
|
|
|
|
2008-12-07 18:55:45 +08:00
|
|
|
if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
|
2007-05-24 18:03:52 +08:00
|
|
|
return;
|
2008-12-07 18:55:45 +08:00
|
|
|
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
|
2014-08-28 21:13:03 +08:00
|
|
|
kvm_arch_hardware_disable();
|
2007-05-24 18:03:52 +08:00
|
|
|
}
|
|
|
|
|
2016-07-14 01:16:37 +08:00
|
|
|
static int kvm_dying_cpu(unsigned int cpu)
|
2010-11-16 16:37:41 +08:00
|
|
|
{
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_lock(&kvm_count_lock);
|
2013-09-10 18:57:17 +08:00
|
|
|
if (kvm_usage_count)
|
|
|
|
hardware_disable_nolock(NULL);
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_unlock(&kvm_count_lock);
|
2016-07-14 01:16:37 +08:00
|
|
|
return 0;
|
2010-11-16 16:37:41 +08:00
|
|
|
}
|
|
|
|
|
2009-09-15 17:37:46 +08:00
|
|
|
static void hardware_disable_all_nolock(void)
|
|
|
|
{
|
|
|
|
BUG_ON(!kvm_usage_count);
|
|
|
|
|
|
|
|
kvm_usage_count--;
|
|
|
|
if (!kvm_usage_count)
|
2010-11-16 16:37:41 +08:00
|
|
|
on_each_cpu(hardware_disable_nolock, NULL, 1);
|
2009-09-15 17:37:46 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void hardware_disable_all(void)
|
|
|
|
{
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_lock(&kvm_count_lock);
|
2009-09-15 17:37:46 +08:00
|
|
|
hardware_disable_all_nolock();
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_unlock(&kvm_count_lock);
|
2009-09-15 17:37:46 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int hardware_enable_all(void)
|
|
|
|
{
|
|
|
|
int r = 0;
|
|
|
|
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_lock(&kvm_count_lock);
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
kvm_usage_count++;
|
|
|
|
if (kvm_usage_count == 1) {
|
|
|
|
atomic_set(&hardware_enable_failed, 0);
|
2010-11-16 16:37:41 +08:00
|
|
|
on_each_cpu(hardware_enable_nolock, NULL, 1);
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
if (atomic_read(&hardware_enable_failed)) {
|
|
|
|
hardware_disable_all_nolock();
|
|
|
|
r = -EBUSY;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-09-10 18:58:35 +08:00
|
|
|
raw_spin_unlock(&kvm_count_lock);
|
2009-09-15 17:37:46 +08:00
|
|
|
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
2007-07-17 21:17:55 +08:00
|
|
|
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
|
2007-10-08 21:02:08 +08:00
|
|
|
void *v)
|
2007-07-17 21:17:55 +08:00
|
|
|
{
|
2009-04-29 11:09:04 +08:00
|
|
|
/*
|
|
|
|
* Some (well, at least mine) BIOSes hang on reboot if
|
|
|
|
* in vmx root mode.
|
|
|
|
*
|
|
|
|
* And Intel TXT required VMX off for all cpu when system shutdown.
|
|
|
|
*/
|
2015-02-26 14:58:26 +08:00
|
|
|
pr_info("kvm: exiting hardware virtualization\n");
|
2009-04-29 11:09:04 +08:00
|
|
|
kvm_rebooting = true;
|
2010-11-16 16:37:41 +08:00
|
|
|
on_each_cpu(hardware_disable_nolock, NULL, 1);
|
2007-07-17 21:17:55 +08:00
|
|
|
return NOTIFY_OK;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct notifier_block kvm_reboot_notifier = {
|
|
|
|
.notifier_call = kvm_reboot,
|
|
|
|
.priority = 0,
|
|
|
|
};
|
|
|
|
|
2009-12-24 00:35:24 +08:00
|
|
|
static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
|
2007-06-01 02:08:53 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < bus->dev_count; i++) {
|
2011-07-27 21:00:48 +08:00
|
|
|
struct kvm_io_device *pos = bus->range[i].dev;
|
2007-06-01 02:08:53 +08:00
|
|
|
|
|
|
|
kvm_iodevice_destructor(pos);
|
|
|
|
}
|
2009-12-24 00:35:24 +08:00
|
|
|
kfree(bus);
|
2007-06-01 02:08:53 +08:00
|
|
|
}
|
|
|
|
|
2013-08-27 21:41:41 +08:00
|
|
|
static inline int kvm_io_bus_cmp(const struct kvm_io_range *r1,
|
2015-02-26 14:58:25 +08:00
|
|
|
const struct kvm_io_range *r2)
|
2011-07-27 21:00:48 +08:00
|
|
|
{
|
2015-09-15 14:41:57 +08:00
|
|
|
gpa_t addr1 = r1->addr;
|
|
|
|
gpa_t addr2 = r2->addr;
|
|
|
|
|
|
|
|
if (addr1 < addr2)
|
2011-07-27 21:00:48 +08:00
|
|
|
return -1;
|
2015-09-15 14:41:57 +08:00
|
|
|
|
|
|
|
/* If r2->len == 0, match the exact address. If r2->len != 0,
|
|
|
|
* accept any overlapping write. Any order is acceptable for
|
|
|
|
* overlapping ranges, because kvm_io_bus_get_first_dev ensures
|
|
|
|
* we process all of them.
|
|
|
|
*/
|
|
|
|
if (r2->len) {
|
|
|
|
addr1 += r1->len;
|
|
|
|
addr2 += r2->len;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (addr1 > addr2)
|
2011-07-27 21:00:48 +08:00
|
|
|
return 1;
|
2015-09-15 14:41:57 +08:00
|
|
|
|
2011-07-27 21:00:48 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-07-16 19:03:29 +08:00
|
|
|
static int kvm_io_bus_sort_cmp(const void *p1, const void *p2)
|
|
|
|
{
|
2013-08-27 21:41:41 +08:00
|
|
|
return kvm_io_bus_cmp(p1, p2);
|
2013-07-16 19:03:29 +08:00
|
|
|
}
|
|
|
|
|
2013-04-06 03:20:30 +08:00
|
|
|
static int kvm_io_bus_get_first_dev(struct kvm_io_bus *bus,
|
2011-07-27 21:00:48 +08:00
|
|
|
gpa_t addr, int len)
|
|
|
|
{
|
|
|
|
struct kvm_io_range *range, key;
|
|
|
|
int off;
|
|
|
|
|
|
|
|
key = (struct kvm_io_range) {
|
|
|
|
.addr = addr,
|
|
|
|
.len = len,
|
|
|
|
};
|
|
|
|
|
|
|
|
range = bsearch(&key, bus->range, bus->dev_count,
|
|
|
|
sizeof(struct kvm_io_range), kvm_io_bus_sort_cmp);
|
|
|
|
if (range == NULL)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
off = range - bus->range;
|
|
|
|
|
2013-08-27 21:41:41 +08:00
|
|
|
while (off > 0 && kvm_io_bus_cmp(&key, &bus->range[off-1]) == 0)
|
2011-07-27 21:00:48 +08:00
|
|
|
off--;
|
|
|
|
|
|
|
|
return off;
|
|
|
|
}
|
|
|
|
|
2015-03-26 22:39:28 +08:00
|
|
|
static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
|
2013-07-03 22:30:53 +08:00
|
|
|
struct kvm_io_range *range, const void *val)
|
|
|
|
{
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len);
|
|
|
|
if (idx < 0)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
while (idx < bus->dev_count &&
|
2013-08-27 21:41:41 +08:00
|
|
|
kvm_io_bus_cmp(range, &bus->range[idx]) == 0) {
|
2015-03-26 22:39:28 +08:00
|
|
|
if (!kvm_iodevice_write(vcpu, bus->range[idx].dev, range->addr,
|
2013-07-03 22:30:53 +08:00
|
|
|
range->len, val))
|
|
|
|
return idx;
|
|
|
|
idx++;
|
|
|
|
}
|
|
|
|
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2009-06-30 03:24:32 +08:00
|
|
|
/* kvm_io_bus_write - called under kvm->slots_lock */
|
2015-03-26 22:39:28 +08:00
|
|
|
int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
|
2009-06-30 03:24:32 +08:00
|
|
|
int len, const void *val)
|
2007-06-01 02:08:53 +08:00
|
|
|
{
|
2010-04-19 17:41:23 +08:00
|
|
|
struct kvm_io_bus *bus;
|
2011-07-27 21:00:48 +08:00
|
|
|
struct kvm_io_range range;
|
2013-07-03 22:30:53 +08:00
|
|
|
int r;
|
2011-07-27 21:00:48 +08:00
|
|
|
|
|
|
|
range = (struct kvm_io_range) {
|
|
|
|
.addr = addr,
|
|
|
|
.len = len,
|
|
|
|
};
|
2010-04-19 17:41:23 +08:00
|
|
|
|
2015-03-26 22:39:28 +08:00
|
|
|
bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!bus)
|
|
|
|
return -ENOMEM;
|
2015-03-26 22:39:28 +08:00
|
|
|
r = __kvm_io_bus_write(vcpu, bus, &range, val);
|
2013-07-03 22:30:53 +08:00
|
|
|
return r < 0 ? r : 0;
|
|
|
|
}
|
2019-02-22 16:10:09 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_io_bus_write);
|
2013-07-03 22:30:53 +08:00
|
|
|
|
|
|
|
/* kvm_io_bus_write_cookie - called under kvm->slots_lock */
|
2015-03-26 22:39:28 +08:00
|
|
|
int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
|
|
|
|
gpa_t addr, int len, const void *val, long cookie)
|
2013-07-03 22:30:53 +08:00
|
|
|
{
|
|
|
|
struct kvm_io_bus *bus;
|
|
|
|
struct kvm_io_range range;
|
|
|
|
|
|
|
|
range = (struct kvm_io_range) {
|
|
|
|
.addr = addr,
|
|
|
|
.len = len,
|
|
|
|
};
|
|
|
|
|
2015-03-26 22:39:28 +08:00
|
|
|
bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!bus)
|
|
|
|
return -ENOMEM;
|
2013-07-03 22:30:53 +08:00
|
|
|
|
|
|
|
/* First try the device referenced by cookie. */
|
|
|
|
if ((cookie >= 0) && (cookie < bus->dev_count) &&
|
2013-08-27 21:41:41 +08:00
|
|
|
(kvm_io_bus_cmp(&range, &bus->range[cookie]) == 0))
|
2015-03-26 22:39:28 +08:00
|
|
|
if (!kvm_iodevice_write(vcpu, bus->range[cookie].dev, addr, len,
|
2013-07-03 22:30:53 +08:00
|
|
|
val))
|
|
|
|
return cookie;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* cookie contained garbage; fall back to search and return the
|
|
|
|
* correct cookie value.
|
|
|
|
*/
|
2015-03-26 22:39:28 +08:00
|
|
|
return __kvm_io_bus_write(vcpu, bus, &range, val);
|
2013-07-03 22:30:53 +08:00
|
|
|
}
|
|
|
|
|
2015-03-26 22:39:28 +08:00
|
|
|
static int __kvm_io_bus_read(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
|
|
|
|
struct kvm_io_range *range, void *val)
|
2013-07-03 22:30:53 +08:00
|
|
|
{
|
|
|
|
int idx;
|
|
|
|
|
|
|
|
idx = kvm_io_bus_get_first_dev(bus, range->addr, range->len);
|
2011-07-27 21:00:48 +08:00
|
|
|
if (idx < 0)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
while (idx < bus->dev_count &&
|
2013-08-27 21:41:41 +08:00
|
|
|
kvm_io_bus_cmp(range, &bus->range[idx]) == 0) {
|
2015-03-26 22:39:28 +08:00
|
|
|
if (!kvm_iodevice_read(vcpu, bus->range[idx].dev, range->addr,
|
2013-07-03 22:30:53 +08:00
|
|
|
range->len, val))
|
|
|
|
return idx;
|
2011-07-27 21:00:48 +08:00
|
|
|
idx++;
|
|
|
|
}
|
|
|
|
|
2009-06-30 03:24:32 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2007-06-01 02:08:53 +08:00
|
|
|
|
2009-06-30 03:24:32 +08:00
|
|
|
/* kvm_io_bus_read - called under kvm->slots_lock */
|
2015-03-26 22:39:28 +08:00
|
|
|
int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
|
2009-12-24 00:35:24 +08:00
|
|
|
int len, void *val)
|
2009-06-30 03:24:32 +08:00
|
|
|
{
|
2010-04-19 17:41:23 +08:00
|
|
|
struct kvm_io_bus *bus;
|
2011-07-27 21:00:48 +08:00
|
|
|
struct kvm_io_range range;
|
2013-07-03 22:30:53 +08:00
|
|
|
int r;
|
2011-07-27 21:00:48 +08:00
|
|
|
|
|
|
|
range = (struct kvm_io_range) {
|
|
|
|
.addr = addr,
|
|
|
|
.len = len,
|
|
|
|
};
|
2009-12-24 00:35:24 +08:00
|
|
|
|
2015-03-26 22:39:28 +08:00
|
|
|
bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!bus)
|
|
|
|
return -ENOMEM;
|
2015-03-26 22:39:28 +08:00
|
|
|
r = __kvm_io_bus_read(vcpu, bus, &range, val);
|
2013-07-03 22:30:53 +08:00
|
|
|
return r < 0 ? r : 0;
|
|
|
|
}
|
2011-07-27 21:00:48 +08:00
|
|
|
|
2009-12-24 00:35:26 +08:00
|
|
|
/* Caller must hold slots_lock. */
|
2011-07-27 21:00:48 +08:00
|
|
|
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
|
|
|
|
int len, struct kvm_io_device *dev)
|
2009-06-30 03:24:26 +08:00
|
|
|
{
|
2018-01-16 21:34:41 +08:00
|
|
|
int i;
|
2009-12-24 00:35:24 +08:00
|
|
|
struct kvm_io_bus *new_bus, *bus;
|
2018-01-16 21:34:41 +08:00
|
|
|
struct kvm_io_range range;
|
2009-07-08 05:08:44 +08:00
|
|
|
|
2017-07-07 16:51:38 +08:00
|
|
|
bus = kvm_get_bus(kvm, bus_idx);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!bus)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2013-05-25 06:44:15 +08:00
|
|
|
/* exclude ioeventfd which is limited by maximum fd */
|
|
|
|
if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
|
2009-07-08 05:08:44 +08:00
|
|
|
return -ENOSPC;
|
2007-06-01 02:08:53 +08:00
|
|
|
|
2019-01-31 00:07:47 +08:00
|
|
|
new_bus = kmalloc(struct_size(bus, range, bus->dev_count + 1),
|
2019-02-12 03:02:49 +08:00
|
|
|
GFP_KERNEL_ACCOUNT);
|
2009-12-24 00:35:24 +08:00
|
|
|
if (!new_bus)
|
|
|
|
return -ENOMEM;
|
2018-01-16 21:34:41 +08:00
|
|
|
|
|
|
|
range = (struct kvm_io_range) {
|
|
|
|
.addr = addr,
|
|
|
|
.len = len,
|
|
|
|
.dev = dev,
|
|
|
|
};
|
|
|
|
|
|
|
|
for (i = 0; i < bus->dev_count; i++)
|
|
|
|
if (kvm_io_bus_cmp(&bus->range[i], &range) > 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
|
|
|
|
new_bus->dev_count++;
|
|
|
|
new_bus->range[i] = range;
|
|
|
|
memcpy(new_bus->range + i + 1, bus->range + i,
|
|
|
|
(bus->dev_count - i) * sizeof(struct kvm_io_range));
|
2009-12-24 00:35:24 +08:00
|
|
|
rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
|
|
|
|
synchronize_srcu_expedited(&kvm->srcu);
|
|
|
|
kfree(bus);
|
2009-07-08 05:08:44 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-12-24 00:35:26 +08:00
|
|
|
/* Caller must hold slots_lock. */
|
2017-03-24 01:24:19 +08:00
|
|
|
void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
|
|
|
|
struct kvm_io_device *dev)
|
2009-07-08 05:08:44 +08:00
|
|
|
{
|
2017-03-24 01:24:19 +08:00
|
|
|
int i;
|
2009-12-24 00:35:24 +08:00
|
|
|
struct kvm_io_bus *new_bus, *bus;
|
2009-07-08 05:08:44 +08:00
|
|
|
|
2017-07-07 16:51:38 +08:00
|
|
|
bus = kvm_get_bus(kvm, bus_idx);
|
2017-03-15 16:01:17 +08:00
|
|
|
if (!bus)
|
2017-03-24 01:24:19 +08:00
|
|
|
return;
|
2017-03-15 16:01:17 +08:00
|
|
|
|
2012-03-09 12:17:32 +08:00
|
|
|
for (i = 0; i < bus->dev_count; i++)
|
|
|
|
if (bus->range[i].dev == dev) {
|
2009-07-08 05:08:44 +08:00
|
|
|
break;
|
|
|
|
}
|
2009-12-24 00:35:24 +08:00
|
|
|
|
2017-03-24 01:24:19 +08:00
|
|
|
if (i == bus->dev_count)
|
|
|
|
return;
|
2012-03-09 12:17:32 +08:00
|
|
|
|
2019-01-31 00:07:47 +08:00
|
|
|
new_bus = kmalloc(struct_size(bus, range, bus->dev_count - 1),
|
2019-02-12 03:02:49 +08:00
|
|
|
GFP_KERNEL_ACCOUNT);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!new_bus) {
|
|
|
|
pr_err("kvm: failed to shrink bus, removing it completely\n");
|
|
|
|
goto broken;
|
|
|
|
}
|
2012-03-09 12:17:32 +08:00
|
|
|
|
|
|
|
memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
|
|
|
|
new_bus->dev_count--;
|
|
|
|
memcpy(new_bus->range + i, bus->range + i + 1,
|
|
|
|
(new_bus->dev_count - i) * sizeof(struct kvm_io_range));
|
2009-12-24 00:35:24 +08:00
|
|
|
|
2017-03-24 01:24:19 +08:00
|
|
|
broken:
|
2009-12-24 00:35:24 +08:00
|
|
|
rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
|
|
|
|
synchronize_srcu_expedited(&kvm->srcu);
|
|
|
|
kfree(bus);
|
2017-03-24 01:24:19 +08:00
|
|
|
return;
|
2007-06-01 02:08:53 +08:00
|
|
|
}
|
|
|
|
|
2016-07-15 19:43:26 +08:00
|
|
|
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
|
|
|
|
gpa_t addr)
|
|
|
|
{
|
|
|
|
struct kvm_io_bus *bus;
|
|
|
|
int dev_idx, srcu_idx;
|
|
|
|
struct kvm_io_device *iodev = NULL;
|
|
|
|
|
|
|
|
srcu_idx = srcu_read_lock(&kvm->srcu);
|
|
|
|
|
|
|
|
bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
|
2017-03-24 01:24:19 +08:00
|
|
|
if (!bus)
|
|
|
|
goto out_unlock;
|
2016-07-15 19:43:26 +08:00
|
|
|
|
|
|
|
dev_idx = kvm_io_bus_get_first_dev(bus, addr, 1);
|
|
|
|
if (dev_idx < 0)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
iodev = bus->range[dev_idx].dev;
|
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
srcu_read_unlock(&kvm->srcu, srcu_idx);
|
|
|
|
|
|
|
|
return iodev;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
|
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
static int kvm_debugfs_open(struct inode *inode, struct file *file,
|
|
|
|
int (*get)(void *, u64 *), int (*set)(void *, u64),
|
|
|
|
const char *fmt)
|
|
|
|
{
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
|
|
|
|
inode->i_private;
|
|
|
|
|
|
|
|
/* The debugfs files are a reference to the kvm struct which
|
|
|
|
* is still valid when kvm_destroy_vm is called.
|
|
|
|
* To avoid the race between open and the removal of the debugfs
|
|
|
|
* directory we test against the users count.
|
|
|
|
*/
|
2017-02-20 19:06:21 +08:00
|
|
|
if (!refcount_inc_not_zero(&stat_data->kvm->users_count))
|
2016-05-18 19:26:23 +08:00
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
if (simple_attr_open(inode, file, get, set, fmt)) {
|
|
|
|
kvm_put_kvm(stat_data->kvm);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int kvm_debugfs_release(struct inode *inode, struct file *file)
|
|
|
|
{
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)
|
|
|
|
inode->i_private;
|
|
|
|
|
|
|
|
simple_attr_release(inode, file);
|
|
|
|
kvm_put_kvm(stat_data->kvm);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vm_stat_get_per_vm(void *data, u64 *val)
|
|
|
|
{
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
|
|
|
|
2016-08-02 12:03:22 +08:00
|
|
|
*val = *(ulong *)((void *)stat_data->kvm + stat_data->offset);
|
2016-05-18 19:26:23 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-19 10:49:47 +08:00
|
|
|
static int vm_stat_clear_per_vm(void *data, u64 val)
|
|
|
|
{
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
|
|
|
|
|
|
|
if (val)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
*(ulong *)((void *)stat_data->kvm + stat_data->offset) = 0;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
static int vm_stat_get_per_vm_open(struct inode *inode, struct file *file)
|
|
|
|
{
|
|
|
|
__simple_attr_check_format("%llu\n", 0ull);
|
|
|
|
return kvm_debugfs_open(inode, file, vm_stat_get_per_vm,
|
2016-10-19 10:49:47 +08:00
|
|
|
vm_stat_clear_per_vm, "%llu\n");
|
2016-05-18 19:26:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct file_operations vm_stat_get_per_vm_fops = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.open = vm_stat_get_per_vm_open,
|
|
|
|
.release = kvm_debugfs_release,
|
|
|
|
.read = simple_attr_read,
|
|
|
|
.write = simple_attr_write,
|
2017-05-06 23:37:19 +08:00
|
|
|
.llseek = no_llseek,
|
2016-05-18 19:26:23 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static int vcpu_stat_get_per_vm(void *data, u64 *val)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
|
|
|
|
*val = 0;
|
|
|
|
|
|
|
|
kvm_for_each_vcpu(i, vcpu, stat_data->kvm)
|
2016-08-02 12:03:22 +08:00
|
|
|
*val += *(u64 *)((void *)vcpu + stat_data->offset);
|
2016-05-18 19:26:23 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-19 10:49:47 +08:00
|
|
|
static int vcpu_stat_clear_per_vm(void *data, u64 val)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
|
|
|
|
if (val)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
kvm_for_each_vcpu(i, vcpu, stat_data->kvm)
|
|
|
|
*(u64 *)((void *)vcpu + stat_data->offset) = 0;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
static int vcpu_stat_get_per_vm_open(struct inode *inode, struct file *file)
|
|
|
|
{
|
|
|
|
__simple_attr_check_format("%llu\n", 0ull);
|
|
|
|
return kvm_debugfs_open(inode, file, vcpu_stat_get_per_vm,
|
2016-10-19 10:49:47 +08:00
|
|
|
vcpu_stat_clear_per_vm, "%llu\n");
|
2016-05-18 19:26:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct file_operations vcpu_stat_get_per_vm_fops = {
|
|
|
|
.owner = THIS_MODULE,
|
|
|
|
.open = vcpu_stat_get_per_vm_open,
|
|
|
|
.release = kvm_debugfs_release,
|
|
|
|
.read = simple_attr_read,
|
|
|
|
.write = simple_attr_write,
|
2017-05-06 23:37:19 +08:00
|
|
|
.llseek = no_llseek,
|
2016-05-18 19:26:23 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static const struct file_operations *stat_fops_per_vm[] = {
|
|
|
|
[KVM_STAT_VCPU] = &vcpu_stat_get_per_vm_fops,
|
|
|
|
[KVM_STAT_VM] = &vm_stat_get_per_vm_fops,
|
|
|
|
};
|
|
|
|
|
2008-02-08 20:20:26 +08:00
|
|
|
static int vm_stat_get(void *_offset, u64 *val)
|
2007-11-18 22:24:12 +08:00
|
|
|
{
|
|
|
|
unsigned offset = (long)_offset;
|
|
|
|
struct kvm *kvm;
|
2016-05-18 19:26:23 +08:00
|
|
|
struct kvm_stat_data stat_tmp = {.offset = offset};
|
|
|
|
u64 tmp_val;
|
2007-11-18 22:24:12 +08:00
|
|
|
|
2008-02-08 20:20:26 +08:00
|
|
|
*val = 0;
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_lock(&kvm_lock);
|
2016-05-18 19:26:23 +08:00
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
|
|
|
stat_tmp.kvm = kvm;
|
|
|
|
vm_stat_get_per_vm((void *)&stat_tmp, &tmp_val);
|
|
|
|
*val += tmp_val;
|
|
|
|
}
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_unlock(&kvm_lock);
|
2008-02-08 20:20:26 +08:00
|
|
|
return 0;
|
2007-11-18 22:24:12 +08:00
|
|
|
}
|
|
|
|
|
2016-10-19 10:49:47 +08:00
|
|
|
static int vm_stat_clear(void *_offset, u64 val)
|
|
|
|
{
|
|
|
|
unsigned offset = (long)_offset;
|
|
|
|
struct kvm *kvm;
|
|
|
|
struct kvm_stat_data stat_tmp = {.offset = offset};
|
|
|
|
|
|
|
|
if (val)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
spin_lock(&kvm_lock);
|
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
|
|
|
stat_tmp.kvm = kvm;
|
|
|
|
vm_stat_clear_per_vm((void *)&stat_tmp, 0);
|
|
|
|
}
|
|
|
|
spin_unlock(&kvm_lock);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, vm_stat_clear, "%llu\n");
|
2007-11-18 22:24:12 +08:00
|
|
|
|
2008-02-08 20:20:26 +08:00
|
|
|
static int vcpu_stat_get(void *_offset, u64 *val)
|
2007-04-19 22:27:43 +08:00
|
|
|
{
|
|
|
|
unsigned offset = (long)_offset;
|
|
|
|
struct kvm *kvm;
|
2016-05-18 19:26:23 +08:00
|
|
|
struct kvm_stat_data stat_tmp = {.offset = offset};
|
|
|
|
u64 tmp_val;
|
2007-04-19 22:27:43 +08:00
|
|
|
|
2008-02-08 20:20:26 +08:00
|
|
|
*val = 0;
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_lock(&kvm_lock);
|
2016-05-18 19:26:23 +08:00
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
|
|
|
stat_tmp.kvm = kvm;
|
|
|
|
vcpu_stat_get_per_vm((void *)&stat_tmp, &tmp_val);
|
|
|
|
*val += tmp_val;
|
|
|
|
}
|
2013-09-25 19:53:07 +08:00
|
|
|
spin_unlock(&kvm_lock);
|
2008-02-08 20:20:26 +08:00
|
|
|
return 0;
|
2007-04-19 22:27:43 +08:00
|
|
|
}
|
|
|
|
|
2016-10-19 10:49:47 +08:00
|
|
|
static int vcpu_stat_clear(void *_offset, u64 val)
|
|
|
|
{
|
|
|
|
unsigned offset = (long)_offset;
|
|
|
|
struct kvm *kvm;
|
|
|
|
struct kvm_stat_data stat_tmp = {.offset = offset};
|
|
|
|
|
|
|
|
if (val)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
spin_lock(&kvm_lock);
|
|
|
|
list_for_each_entry(kvm, &vm_list, vm_list) {
|
|
|
|
stat_tmp.kvm = kvm;
|
|
|
|
vcpu_stat_clear_per_vm((void *)&stat_tmp, 0);
|
|
|
|
}
|
|
|
|
spin_unlock(&kvm_lock);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, vcpu_stat_clear,
|
|
|
|
"%llu\n");
|
2007-11-18 22:24:12 +08:00
|
|
|
|
2009-10-02 06:43:56 +08:00
|
|
|
static const struct file_operations *stat_fops[] = {
|
2007-11-18 22:24:12 +08:00
|
|
|
[KVM_STAT_VCPU] = &vcpu_stat_fops,
|
|
|
|
[KVM_STAT_VM] = &vm_stat_fops,
|
|
|
|
};
|
2007-04-19 22:27:43 +08:00
|
|
|
|
2017-07-12 23:56:44 +08:00
|
|
|
static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
|
|
|
|
{
|
|
|
|
struct kobj_uevent_env *env;
|
|
|
|
unsigned long long created, active;
|
|
|
|
|
|
|
|
if (!kvm_dev.this_device || !kvm)
|
|
|
|
return;
|
|
|
|
|
|
|
|
spin_lock(&kvm_lock);
|
|
|
|
if (type == KVM_EVENT_CREATE_VM) {
|
|
|
|
kvm_createvm_count++;
|
|
|
|
kvm_active_vms++;
|
|
|
|
} else if (type == KVM_EVENT_DESTROY_VM) {
|
|
|
|
kvm_active_vms--;
|
|
|
|
}
|
|
|
|
created = kvm_createvm_count;
|
|
|
|
active = kvm_active_vms;
|
|
|
|
spin_unlock(&kvm_lock);
|
|
|
|
|
2019-02-12 03:02:49 +08:00
|
|
|
env = kzalloc(sizeof(*env), GFP_KERNEL_ACCOUNT);
|
2017-07-12 23:56:44 +08:00
|
|
|
if (!env)
|
|
|
|
return;
|
|
|
|
|
|
|
|
add_uevent_var(env, "CREATED=%llu", created);
|
|
|
|
add_uevent_var(env, "COUNT=%llu", active);
|
|
|
|
|
2017-07-24 19:40:03 +08:00
|
|
|
if (type == KVM_EVENT_CREATE_VM) {
|
2017-07-12 23:56:44 +08:00
|
|
|
add_uevent_var(env, "EVENT=create");
|
2017-07-24 19:40:03 +08:00
|
|
|
kvm->userspace_pid = task_pid_nr(current);
|
|
|
|
} else if (type == KVM_EVENT_DESTROY_VM) {
|
2017-07-12 23:56:44 +08:00
|
|
|
add_uevent_var(env, "EVENT=destroy");
|
2017-07-24 19:40:03 +08:00
|
|
|
}
|
|
|
|
add_uevent_var(env, "PID=%d", kvm->userspace_pid);
|
2017-07-12 23:56:44 +08:00
|
|
|
|
2019-02-28 23:34:37 +08:00
|
|
|
if (!IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
|
2019-02-12 03:02:49 +08:00
|
|
|
char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
|
2017-07-24 19:40:03 +08:00
|
|
|
|
|
|
|
if (p) {
|
|
|
|
tmp = dentry_path_raw(kvm->debugfs_dentry, p, PATH_MAX);
|
|
|
|
if (!IS_ERR(tmp))
|
|
|
|
add_uevent_var(env, "STATS_PATH=%s", tmp);
|
|
|
|
kfree(p);
|
2017-07-12 23:56:44 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
/* no need for checks, since we are adding at most only 5 keys */
|
|
|
|
env->envp[env->envp_idx++] = NULL;
|
|
|
|
kobject_uevent_env(&kvm_dev.this_device->kobj, KOBJ_CHANGE, env->envp);
|
|
|
|
kfree(env);
|
|
|
|
}
|
|
|
|
|
2018-05-30 00:22:04 +08:00
|
|
|
static void kvm_init_debug(void)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
struct kvm_stats_debugfs_item *p;
|
|
|
|
|
2008-04-16 05:05:42 +08:00
|
|
|
kvm_debugfs_dir = debugfs_create_dir("kvm", NULL);
|
2011-12-15 14:23:16 +08:00
|
|
|
|
2016-05-18 19:26:23 +08:00
|
|
|
kvm_debugfs_num_entries = 0;
|
|
|
|
for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
|
2018-05-30 00:22:04 +08:00
|
|
|
debugfs_create_file(p->name, 0644, kvm_debugfs_dir,
|
|
|
|
(void *)(long)p->offset,
|
|
|
|
stat_fops[p->kind]);
|
2011-12-15 14:23:16 +08:00
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2011-03-24 05:16:23 +08:00
|
|
|
static int kvm_suspend(void)
|
2007-02-12 16:54:48 +08:00
|
|
|
{
|
2009-09-15 17:37:46 +08:00
|
|
|
if (kvm_usage_count)
|
2010-11-16 16:37:41 +08:00
|
|
|
hardware_disable_nolock(NULL);
|
2007-02-12 16:54:48 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-03-24 05:16:23 +08:00
|
|
|
static void kvm_resume(void)
|
2007-02-12 16:54:48 +08:00
|
|
|
{
|
2010-08-20 16:07:28 +08:00
|
|
|
if (kvm_usage_count) {
|
2019-01-09 02:39:49 +08:00
|
|
|
lockdep_assert_held(&kvm_count_lock);
|
2010-11-16 16:37:41 +08:00
|
|
|
hardware_enable_nolock(NULL);
|
2010-08-20 16:07:28 +08:00
|
|
|
}
|
2007-02-12 16:54:48 +08:00
|
|
|
}
|
|
|
|
|
2011-03-24 05:16:23 +08:00
|
|
|
static struct syscore_ops kvm_syscore_ops = {
|
2007-02-12 16:54:48 +08:00
|
|
|
.suspend = kvm_suspend,
|
|
|
|
.resume = kvm_resume,
|
|
|
|
};
|
|
|
|
|
2007-07-11 23:17:21 +08:00
|
|
|
static inline
|
|
|
|
struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn)
|
|
|
|
{
|
|
|
|
return container_of(pn, struct kvm_vcpu, preempt_notifier);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
|
2015-02-26 14:58:23 +08:00
|
|
|
|
2013-03-05 02:02:07 +08:00
|
|
|
if (vcpu->preempted)
|
|
|
|
vcpu->preempted = false;
|
2007-07-11 23:17:21 +08:00
|
|
|
|
2014-08-22 00:08:05 +08:00
|
|
|
kvm_arch_sched_in(vcpu, cpu);
|
|
|
|
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_vcpu_load(vcpu, cpu);
|
2007-07-11 23:17:21 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void kvm_sched_out(struct preempt_notifier *pn,
|
|
|
|
struct task_struct *next)
|
|
|
|
{
|
|
|
|
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
|
|
|
|
|
2013-03-05 02:02:07 +08:00
|
|
|
if (current->state == TASK_RUNNING)
|
|
|
|
vcpu->preempted = true;
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_vcpu_put(vcpu);
|
2007-07-11 23:17:21 +08:00
|
|
|
}
|
|
|
|
|
2010-04-28 20:39:01 +08:00
|
|
|
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
|
2007-07-30 19:12:19 +08:00
|
|
|
struct module *module)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
int r;
|
2007-07-31 19:23:01 +08:00
|
|
|
int cpu;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-14 20:40:21 +08:00
|
|
|
r = kvm_arch_init(opaque);
|
|
|
|
if (r)
|
2007-11-29 15:35:39 +08:00
|
|
|
goto out_fail;
|
2007-11-14 20:39:31 +08:00
|
|
|
|
2013-05-08 10:57:29 +08:00
|
|
|
/*
|
|
|
|
* kvm_arch_init makes sure there's at most one caller
|
|
|
|
* for architectures that support multiple implementations,
|
|
|
|
* like intel and amd on x86.
|
2016-10-26 19:35:56 +08:00
|
|
|
* kvm_arch_init must be called before kvm_irqfd_init to avoid creating
|
|
|
|
* conflicts in case kvm is already setup for another implementation.
|
2013-05-08 10:57:29 +08:00
|
|
|
*/
|
2016-10-26 19:35:56 +08:00
|
|
|
r = kvm_irqfd_init();
|
|
|
|
if (r)
|
|
|
|
goto out_irqfd;
|
2013-05-08 10:57:29 +08:00
|
|
|
|
2009-06-07 05:52:35 +08:00
|
|
|
if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
|
2008-12-07 18:55:45 +08:00
|
|
|
r = -ENOMEM;
|
|
|
|
goto out_free_0;
|
|
|
|
}
|
|
|
|
|
2007-11-14 20:38:21 +08:00
|
|
|
r = kvm_arch_hardware_setup();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
if (r < 0)
|
2008-12-07 18:55:45 +08:00
|
|
|
goto out_free_0a;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-07-31 19:23:01 +08:00
|
|
|
for_each_online_cpu(cpu) {
|
|
|
|
smp_call_function_single(cpu,
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_check_processor_compat,
|
2008-06-06 17:18:06 +08:00
|
|
|
&r, 1);
|
2007-07-31 19:23:01 +08:00
|
|
|
if (r < 0)
|
2007-11-29 15:35:39 +08:00
|
|
|
goto out_free_1;
|
2007-07-31 19:23:01 +08:00
|
|
|
}
|
|
|
|
|
2016-12-22 03:19:54 +08:00
|
|
|
r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
|
2016-07-14 01:16:37 +08:00
|
|
|
kvm_starting_cpu, kvm_dying_cpu);
|
2007-02-12 16:54:47 +08:00
|
|
|
if (r)
|
2007-11-29 15:35:39 +08:00
|
|
|
goto out_free_2;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
register_reboot_notifier(&kvm_reboot_notifier);
|
|
|
|
|
2007-07-30 19:12:19 +08:00
|
|
|
/* A kmem cache lets us meet the alignment requirements of fx_save. */
|
2010-04-28 20:39:01 +08:00
|
|
|
if (!vcpu_align)
|
|
|
|
vcpu_align = __alignof__(struct kvm_vcpu);
|
2017-10-26 21:45:46 +08:00
|
|
|
kvm_vcpu_cache =
|
|
|
|
kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align,
|
|
|
|
SLAB_ACCOUNT,
|
|
|
|
offsetof(struct kvm_vcpu, arch),
|
|
|
|
sizeof_field(struct kvm_vcpu, arch),
|
|
|
|
NULL);
|
2007-07-30 19:12:19 +08:00
|
|
|
if (!kvm_vcpu_cache) {
|
|
|
|
r = -ENOMEM;
|
2011-03-24 05:16:23 +08:00
|
|
|
goto out_free_3;
|
2007-07-30 19:12:19 +08:00
|
|
|
}
|
|
|
|
|
2010-10-14 17:22:46 +08:00
|
|
|
r = kvm_async_pf_init();
|
|
|
|
if (r)
|
|
|
|
goto out_free;
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
kvm_chardev_ops.owner = module;
|
2008-12-02 18:17:32 +08:00
|
|
|
kvm_vm_fops.owner = module;
|
|
|
|
kvm_vcpu_fops.owner = module;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
r = misc_register(&kvm_dev);
|
|
|
|
if (r) {
|
2015-02-26 14:58:26 +08:00
|
|
|
pr_err("kvm: misc device register failed\n");
|
2010-10-14 17:22:46 +08:00
|
|
|
goto out_unreg;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2011-03-24 05:16:23 +08:00
|
|
|
register_syscore_ops(&kvm_syscore_ops);
|
|
|
|
|
2007-07-11 23:17:21 +08:00
|
|
|
kvm_preempt_ops.sched_in = kvm_sched_in;
|
|
|
|
kvm_preempt_ops.sched_out = kvm_sched_out;
|
|
|
|
|
2018-05-30 00:22:04 +08:00
|
|
|
kvm_init_debug();
|
2009-10-15 07:21:00 +08:00
|
|
|
|
2014-09-24 19:02:46 +08:00
|
|
|
r = kvm_vfio_ops_init();
|
|
|
|
WARN_ON(r);
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
return 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-10-14 17:22:46 +08:00
|
|
|
out_unreg:
|
|
|
|
kvm_async_pf_deinit();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
out_free:
|
2007-07-30 19:12:19 +08:00
|
|
|
kmem_cache_destroy(kvm_vcpu_cache);
|
2007-11-29 15:35:39 +08:00
|
|
|
out_free_3:
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
unregister_reboot_notifier(&kvm_reboot_notifier);
|
2016-07-14 01:16:37 +08:00
|
|
|
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
|
2007-11-29 15:35:39 +08:00
|
|
|
out_free_2:
|
|
|
|
out_free_1:
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_hardware_unsetup();
|
2008-12-07 18:55:45 +08:00
|
|
|
out_free_0a:
|
|
|
|
free_cpumask_var(cpus_hardware_enabled);
|
2007-11-29 15:35:39 +08:00
|
|
|
out_free_0:
|
2013-02-28 19:33:18 +08:00
|
|
|
kvm_irqfd_exit();
|
2016-10-26 19:35:56 +08:00
|
|
|
out_irqfd:
|
2013-05-08 10:57:29 +08:00
|
|
|
kvm_arch_exit();
|
|
|
|
out_fail:
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return r;
|
|
|
|
}
|
2007-11-14 20:39:31 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_init);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-11-14 20:39:31 +08:00
|
|
|
void kvm_exit(void)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2015-10-14 18:37:35 +08:00
|
|
|
debugfs_remove_recursive(kvm_debugfs_dir);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
misc_deregister(&kvm_dev);
|
2007-07-30 19:12:19 +08:00
|
|
|
kmem_cache_destroy(kvm_vcpu_cache);
|
2010-10-14 17:22:46 +08:00
|
|
|
kvm_async_pf_deinit();
|
2011-03-24 05:16:23 +08:00
|
|
|
unregister_syscore_ops(&kvm_syscore_ops);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
unregister_reboot_notifier(&kvm_reboot_notifier);
|
2016-07-14 01:16:37 +08:00
|
|
|
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
|
2010-11-16 16:37:41 +08:00
|
|
|
on_each_cpu(hardware_disable_nolock, NULL, 1);
|
2007-11-14 20:38:21 +08:00
|
|
|
kvm_arch_hardware_unsetup();
|
2007-11-14 20:40:21 +08:00
|
|
|
kvm_arch_exit();
|
2013-02-28 19:33:18 +08:00
|
|
|
kvm_irqfd_exit();
|
2008-12-07 18:55:45 +08:00
|
|
|
free_cpumask_var(cpus_hardware_enabled);
|
2014-10-09 18:30:08 +08:00
|
|
|
kvm_vfio_ops_exit();
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
2007-11-14 20:39:31 +08:00
|
|
|
EXPORT_SYMBOL_GPL(kvm_exit);
|