[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Kernel-based Virtual Machine driver for Linux
|
|
|
|
*
|
|
|
|
* This module enables machines with Intel VT-x extensions to run virtual
|
|
|
|
* machines without emulation or binary translation.
|
|
|
|
*
|
|
|
|
* MMU support
|
|
|
|
*
|
|
|
|
* Copyright (C) 2006 Qumranet, Inc.
|
2010-10-06 20:23:22 +08:00
|
|
|
* Copyright 2010 Red Hat, Inc. and/or its affiliates.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Yaniv Kamay <yaniv@qumranet.com>
|
|
|
|
* Avi Kivity <avi@qumranet.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU GPL, version 2. See
|
|
|
|
* the COPYING file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need the mmu code to access both 32-bit and 64-bit guest ptes,
|
|
|
|
* so the code in this file is compiled twice, once per pte size.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#if PTTYPE == 64
|
|
|
|
#define pt_element_t u64
|
|
|
|
#define guest_walker guest_walker64
|
|
|
|
#define FNAME(name) paging##64_##name
|
|
|
|
#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
|
|
|
|
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
#define PT_LEVEL_BITS PT64_LEVEL_BITS
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
#define PT_MAX_FULL_LEVELS 4
|
2007-12-07 20:56:58 +08:00
|
|
|
#define CMPXCHG cmpxchg
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
#else
|
2007-12-07 20:56:58 +08:00
|
|
|
#define CMPXCHG cmpxchg64
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
#define PT_MAX_FULL_LEVELS 2
|
|
|
|
#endif
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#elif PTTYPE == 32
|
|
|
|
#define pt_element_t u32
|
|
|
|
#define guest_walker guest_walker32
|
|
|
|
#define FNAME(name) paging##32_##name
|
|
|
|
#define PT_BASE_ADDR_MASK PT32_BASE_ADDR_MASK
|
2009-07-27 22:30:45 +08:00
|
|
|
#define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl)
|
|
|
|
#define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
#define PT_LEVEL_BITS PT32_LEVEL_BITS
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
#define PT_MAX_FULL_LEVELS 2
|
2007-12-07 20:56:58 +08:00
|
|
|
#define CMPXCHG cmpxchg
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#else
|
|
|
|
#error Invalid PTTYPE value
|
|
|
|
#endif
|
|
|
|
|
2009-07-27 22:30:45 +08:00
|
|
|
#define gpte_to_gfn_lvl FNAME(gpte_to_gfn_lvl)
|
|
|
|
#define gpte_to_gfn(pte) gpte_to_gfn_lvl((pte), PT_PAGE_TABLE_LEVEL)
|
2007-11-21 18:35:07 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* The guest_walker structure emulates the behavior of the hardware page
|
|
|
|
* table walker.
|
|
|
|
*/
|
|
|
|
struct guest_walker {
|
|
|
|
int level;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
gfn_t table_gfn[PT_MAX_FULL_LEVELS];
|
2007-12-12 08:12:27 +08:00
|
|
|
pt_element_t ptes[PT_MAX_FULL_LEVELS];
|
2010-08-22 19:13:33 +08:00
|
|
|
pt_element_t prefetch_ptes[PTE_PREFETCH_NUM];
|
2007-12-12 08:12:27 +08:00
|
|
|
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
|
2007-12-09 22:15:46 +08:00
|
|
|
unsigned pt_access;
|
|
|
|
unsigned pte_access;
|
2007-01-06 08:36:44 +08:00
|
|
|
gfn_t gfn;
|
2010-11-22 23:53:27 +08:00
|
|
|
struct x86_exception fault;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
};
|
|
|
|
|
2009-07-27 22:30:45 +08:00
|
|
|
static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
|
2007-11-21 18:35:07 +08:00
|
|
|
{
|
2009-07-27 22:30:45 +08:00
|
|
|
return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
|
2007-11-21 18:35:07 +08:00
|
|
|
}
|
|
|
|
|
2007-12-07 20:56:58 +08:00
|
|
|
static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
|
|
|
|
gfn_t table_gfn, unsigned index,
|
|
|
|
pt_element_t orig_pte, pt_element_t new_pte)
|
|
|
|
{
|
|
|
|
pt_element_t ret;
|
|
|
|
pt_element_t *table;
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
page = gfn_to_page(kvm, table_gfn);
|
2008-02-11 00:04:15 +08:00
|
|
|
|
2007-12-07 20:56:58 +08:00
|
|
|
table = kmap_atomic(page, KM_USER0);
|
|
|
|
ret = CMPXCHG(&table[index], orig_pte, new_pte);
|
|
|
|
kunmap_atomic(table, KM_USER0);
|
|
|
|
|
|
|
|
kvm_release_page_dirty(page);
|
|
|
|
|
|
|
|
return (ret != orig_pte);
|
|
|
|
}
|
|
|
|
|
2007-12-09 22:52:56 +08:00
|
|
|
static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
|
|
|
|
{
|
|
|
|
unsigned access;
|
|
|
|
|
|
|
|
access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
|
|
|
|
#if PTTYPE == 64
|
2010-09-10 23:31:01 +08:00
|
|
|
if (vcpu->arch.mmu.nx)
|
2007-12-09 22:52:56 +08:00
|
|
|
access &= ~(gpte >> PT64_NX_SHIFT);
|
|
|
|
#endif
|
|
|
|
return access;
|
|
|
|
}
|
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
/*
|
|
|
|
* Fetch a guest pte for a guest virtual address
|
|
|
|
*/
|
2010-09-10 23:30:47 +08:00
|
|
|
static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
|
|
|
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
2010-09-28 17:03:14 +08:00
|
|
|
gva_t addr, u32 access)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2007-10-17 18:18:47 +08:00
|
|
|
pt_element_t pte;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
gfn_t table_gfn;
|
2010-07-06 21:20:43 +08:00
|
|
|
unsigned index, pt_access, uninitialized_var(pte_access);
|
2007-10-17 18:18:47 +08:00
|
|
|
gpa_t pte_gpa;
|
2010-07-06 21:20:43 +08:00
|
|
|
bool eperm, present, rsvd_fault;
|
2010-09-28 17:03:14 +08:00
|
|
|
int offset, write_fault, user_fault, fetch_fault;
|
|
|
|
|
|
|
|
write_fault = access & PFERR_WRITE_MASK;
|
|
|
|
user_fault = access & PFERR_USER_MASK;
|
|
|
|
fetch_fault = access & PFERR_FETCH_MASK;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2009-07-06 17:21:32 +08:00
|
|
|
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
|
|
|
|
fetch_fault);
|
2007-12-07 20:56:58 +08:00
|
|
|
walk:
|
2010-07-06 21:20:43 +08:00
|
|
|
present = true;
|
|
|
|
eperm = rsvd_fault = false;
|
2010-09-10 23:30:47 +08:00
|
|
|
walker->level = mmu->root_level;
|
|
|
|
pte = mmu->get_cr3(vcpu);
|
|
|
|
|
2007-01-06 08:36:41 +08:00
|
|
|
#if PTTYPE == 64
|
2010-09-10 23:30:47 +08:00
|
|
|
if (walker->level == PT32E_ROOT_LEVEL) {
|
2010-09-10 23:30:58 +08:00
|
|
|
pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr >> 30) & 3);
|
2009-07-06 17:21:32 +08:00
|
|
|
trace_kvm_mmu_paging_element(pte, walker->level);
|
2010-07-06 21:20:43 +08:00
|
|
|
if (!is_present_gpte(pte)) {
|
|
|
|
present = false;
|
|
|
|
goto error;
|
|
|
|
}
|
2007-01-06 08:36:41 +08:00
|
|
|
--walker->level;
|
|
|
|
}
|
|
|
|
#endif
|
2006-12-30 08:49:37 +08:00
|
|
|
ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
|
2010-09-10 23:30:47 +08:00
|
|
|
(mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-12-09 22:15:46 +08:00
|
|
|
pt_access = ACC_ALL;
|
2007-01-06 08:36:40 +08:00
|
|
|
|
|
|
|
for (;;) {
|
2007-10-17 18:18:47 +08:00
|
|
|
index = PT_INDEX(addr, walker->level);
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-11-21 18:35:07 +08:00
|
|
|
table_gfn = gpte_to_gfn(pte);
|
2010-09-10 23:30:52 +08:00
|
|
|
offset = index * sizeof(pt_element_t);
|
|
|
|
pte_gpa = gfn_to_gpa(table_gfn) + offset;
|
2007-10-17 18:18:47 +08:00
|
|
|
walker->table_gfn[walker->level - 1] = table_gfn;
|
2007-12-12 08:12:27 +08:00
|
|
|
walker->pte_gpa[walker->level - 1] = pte_gpa;
|
2007-10-17 18:18:47 +08:00
|
|
|
|
2010-09-10 23:30:52 +08:00
|
|
|
if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte,
|
|
|
|
offset, sizeof(pte),
|
|
|
|
PFERR_USER_MASK|PFERR_WRITE_MASK)) {
|
2010-07-06 21:20:43 +08:00
|
|
|
present = false;
|
|
|
|
break;
|
|
|
|
}
|
2010-01-15 03:41:27 +08:00
|
|
|
|
2009-07-06 17:21:32 +08:00
|
|
|
trace_kvm_mmu_paging_element(pte, walker->level);
|
2007-10-17 18:18:47 +08:00
|
|
|
|
2010-07-06 21:20:43 +08:00
|
|
|
if (!is_present_gpte(pte)) {
|
|
|
|
present = false;
|
|
|
|
break;
|
|
|
|
}
|
2007-01-26 16:56:41 +08:00
|
|
|
|
2010-09-10 23:30:45 +08:00
|
|
|
if (is_rsvd_bits_set(&vcpu->arch.mmu, pte, walker->level)) {
|
2010-07-06 21:20:43 +08:00
|
|
|
rsvd_fault = true;
|
|
|
|
break;
|
|
|
|
}
|
2009-03-30 16:21:08 +08:00
|
|
|
|
2010-01-18 17:45:10 +08:00
|
|
|
if (write_fault && !is_writable_pte(pte))
|
2007-01-26 16:56:41 +08:00
|
|
|
if (user_fault || is_write_protection(vcpu))
|
2010-07-06 21:20:43 +08:00
|
|
|
eperm = true;
|
2007-01-26 16:56:41 +08:00
|
|
|
|
2007-10-17 18:18:47 +08:00
|
|
|
if (user_fault && !(pte & PT_USER_MASK))
|
2010-07-06 21:20:43 +08:00
|
|
|
eperm = true;
|
2007-01-26 16:56:41 +08:00
|
|
|
|
2007-01-26 16:56:41 +08:00
|
|
|
#if PTTYPE == 64
|
2010-04-06 18:31:13 +08:00
|
|
|
if (fetch_fault && (pte & PT64_NX_MASK))
|
2010-07-06 21:20:43 +08:00
|
|
|
eperm = true;
|
2007-01-26 16:56:41 +08:00
|
|
|
#endif
|
|
|
|
|
2010-07-06 21:20:43 +08:00
|
|
|
if (!eperm && !rsvd_fault && !(pte & PT_ACCESSED_MASK)) {
|
2009-07-06 17:21:32 +08:00
|
|
|
trace_kvm_mmu_set_accessed_bit(table_gfn, index,
|
|
|
|
sizeof(pte));
|
2007-12-07 20:56:58 +08:00
|
|
|
if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
|
|
|
|
index, pte, pte|PT_ACCESSED_MASK))
|
|
|
|
goto walk;
|
2010-05-05 09:09:21 +08:00
|
|
|
mark_page_dirty(vcpu->kvm, table_gfn);
|
2007-10-17 18:18:47 +08:00
|
|
|
pte |= PT_ACCESSED_MASK;
|
2007-02-19 20:37:46 +08:00
|
|
|
}
|
2007-01-06 08:36:44 +08:00
|
|
|
|
2007-12-09 22:52:56 +08:00
|
|
|
pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
|
2007-12-09 22:15:46 +08:00
|
|
|
|
2007-12-12 08:12:27 +08:00
|
|
|
walker->ptes[walker->level - 1] = pte;
|
|
|
|
|
2009-07-27 22:30:45 +08:00
|
|
|
if ((walker->level == PT_PAGE_TABLE_LEVEL) ||
|
|
|
|
((walker->level == PT_DIRECTORY_LEVEL) &&
|
2010-04-16 17:18:01 +08:00
|
|
|
is_large_pte(pte) &&
|
2009-07-27 22:30:45 +08:00
|
|
|
(PTTYPE == 64 || is_pse(vcpu))) ||
|
|
|
|
((walker->level == PT_PDPE_LEVEL) &&
|
2010-04-16 17:18:01 +08:00
|
|
|
is_large_pte(pte) &&
|
2010-09-10 23:30:47 +08:00
|
|
|
mmu->root_level == PT64_ROOT_LEVEL)) {
|
2009-07-27 22:30:45 +08:00
|
|
|
int lvl = walker->level;
|
2010-09-10 23:30:52 +08:00
|
|
|
gpa_t real_gpa;
|
|
|
|
gfn_t gfn;
|
2010-09-28 17:03:14 +08:00
|
|
|
u32 ac;
|
2009-07-27 22:30:45 +08:00
|
|
|
|
2010-09-10 23:30:52 +08:00
|
|
|
gfn = gpte_to_gfn_lvl(pte, lvl);
|
|
|
|
gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
|
2009-07-27 22:30:45 +08:00
|
|
|
|
|
|
|
if (PTTYPE == 32 &&
|
|
|
|
walker->level == PT_DIRECTORY_LEVEL &&
|
|
|
|
is_cpuid_PSE36())
|
2010-09-10 23:30:52 +08:00
|
|
|
gfn += pse36_gfn_delta(pte);
|
|
|
|
|
2010-09-28 17:03:14 +08:00
|
|
|
ac = write_fault | fetch_fault | user_fault;
|
2010-09-10 23:30:52 +08:00
|
|
|
|
|
|
|
real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn),
|
2010-09-28 17:03:14 +08:00
|
|
|
ac);
|
2010-09-10 23:30:52 +08:00
|
|
|
if (real_gpa == UNMAPPED_GVA)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
walker->gfn = real_gpa >> PAGE_SHIFT;
|
2009-07-27 22:30:45 +08:00
|
|
|
|
2007-01-06 08:36:40 +08:00
|
|
|
break;
|
2007-01-06 08:36:44 +08:00
|
|
|
}
|
2007-01-06 08:36:40 +08:00
|
|
|
|
2007-12-09 22:15:46 +08:00
|
|
|
pt_access = pte_access;
|
2007-01-06 08:36:40 +08:00
|
|
|
--walker->level;
|
|
|
|
}
|
2007-10-17 18:18:47 +08:00
|
|
|
|
2010-07-06 21:20:43 +08:00
|
|
|
if (!present || eperm || rsvd_fault)
|
|
|
|
goto error;
|
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (write_fault && !is_dirty_gpte(pte)) {
|
2007-12-07 20:56:58 +08:00
|
|
|
bool ret;
|
|
|
|
|
2009-07-06 17:21:32 +08:00
|
|
|
trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
|
2007-12-07 20:56:58 +08:00
|
|
|
ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
|
|
|
|
pte|PT_DIRTY_MASK);
|
|
|
|
if (ret)
|
|
|
|
goto walk;
|
2010-05-05 09:09:21 +08:00
|
|
|
mark_page_dirty(vcpu->kvm, table_gfn);
|
2007-10-17 18:18:47 +08:00
|
|
|
pte |= PT_DIRTY_MASK;
|
2007-12-12 08:12:27 +08:00
|
|
|
walker->ptes[walker->level - 1] = pte;
|
2007-10-17 18:18:47 +08:00
|
|
|
}
|
|
|
|
|
2007-12-09 22:15:46 +08:00
|
|
|
walker->pt_access = pt_access;
|
|
|
|
walker->pte_access = pte_access;
|
|
|
|
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
|
2010-05-05 09:58:33 +08:00
|
|
|
__func__, (u64)pte, pte_access, pt_access);
|
2007-01-26 16:56:41 +08:00
|
|
|
return 1;
|
|
|
|
|
2010-07-06 21:20:43 +08:00
|
|
|
error:
|
2010-11-22 23:53:27 +08:00
|
|
|
walker->fault.vector = PF_VECTOR;
|
|
|
|
walker->fault.error_code_valid = true;
|
|
|
|
walker->fault.error_code = 0;
|
2010-07-06 21:20:43 +08:00
|
|
|
if (present)
|
2010-11-22 23:53:27 +08:00
|
|
|
walker->fault.error_code |= PFERR_PRESENT_MASK;
|
2010-09-27 18:03:27 +08:00
|
|
|
|
2010-11-22 23:53:27 +08:00
|
|
|
walker->fault.error_code |= write_fault | user_fault;
|
2010-09-27 18:03:27 +08:00
|
|
|
|
2010-09-10 23:31:01 +08:00
|
|
|
if (fetch_fault && mmu->nx)
|
2010-11-22 23:53:27 +08:00
|
|
|
walker->fault.error_code |= PFERR_FETCH_MASK;
|
2009-03-30 16:21:08 +08:00
|
|
|
if (rsvd_fault)
|
2010-11-22 23:53:27 +08:00
|
|
|
walker->fault.error_code |= PFERR_RSVD_MASK;
|
2010-09-10 23:30:46 +08:00
|
|
|
|
2010-11-29 22:12:30 +08:00
|
|
|
walker->fault.address = addr;
|
|
|
|
walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
|
2010-09-10 23:30:46 +08:00
|
|
|
|
2010-11-22 23:53:27 +08:00
|
|
|
trace_kvm_mmu_walker_error(walker->fault.error_code);
|
2007-07-23 14:51:39 +08:00
|
|
|
return 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:47 +08:00
|
|
|
static int FNAME(walk_addr)(struct guest_walker *walker,
|
2010-09-28 17:03:14 +08:00
|
|
|
struct kvm_vcpu *vcpu, gva_t addr, u32 access)
|
2010-09-10 23:30:47 +08:00
|
|
|
{
|
|
|
|
return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.mmu, addr,
|
2010-09-28 17:03:14 +08:00
|
|
|
access);
|
2010-09-10 23:30:47 +08:00
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:50 +08:00
|
|
|
static int FNAME(walk_addr_nested)(struct guest_walker *walker,
|
|
|
|
struct kvm_vcpu *vcpu, gva_t addr,
|
2010-09-28 17:03:14 +08:00
|
|
|
u32 access)
|
2010-09-10 23:30:50 +08:00
|
|
|
{
|
|
|
|
return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
|
2010-09-28 17:03:14 +08:00
|
|
|
addr, access);
|
2010-09-10 23:30:50 +08:00
|
|
|
}
|
|
|
|
|
2010-11-23 11:08:42 +08:00
|
|
|
static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp, u64 *spte,
|
|
|
|
pt_element_t gpte)
|
|
|
|
{
|
|
|
|
u64 nonpresent = shadow_trap_nonpresent_pte;
|
|
|
|
|
|
|
|
if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL))
|
|
|
|
goto no_present;
|
|
|
|
|
|
|
|
if (!is_present_gpte(gpte)) {
|
|
|
|
if (!sp->unsync)
|
|
|
|
nonpresent = shadow_notrap_nonpresent_pte;
|
|
|
|
goto no_present;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(gpte & PT_ACCESSED_MASK))
|
|
|
|
goto no_present;
|
|
|
|
|
|
|
|
return false;
|
|
|
|
|
|
|
|
no_present:
|
|
|
|
drop_spte(vcpu->kvm, spte, nonpresent);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2010-06-11 21:28:14 +08:00
|
|
|
static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
|
2011-03-09 15:43:51 +08:00
|
|
|
u64 *spte, const void *pte, unsigned long mmu_seq)
|
2007-05-01 21:53:31 +08:00
|
|
|
{
|
|
|
|
pt_element_t gpte;
|
2007-12-09 23:00:02 +08:00
|
|
|
unsigned pte_access;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2007-05-01 21:53:31 +08:00
|
|
|
|
|
|
|
gpte = *(const pt_element_t *)pte;
|
2010-11-23 11:08:42 +08:00
|
|
|
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
return;
|
2010-11-23 11:08:42 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
|
2010-06-11 21:28:14 +08:00
|
|
|
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
|
2011-03-09 15:43:51 +08:00
|
|
|
pfn = gfn_to_pfn_atomic(vcpu->kvm, gpte_to_gfn(gpte));
|
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
2007-12-30 18:29:05 +08:00
|
|
|
return;
|
2011-03-09 15:43:51 +08:00
|
|
|
}
|
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
2008-07-25 22:24:52 +08:00
|
|
|
return;
|
2011-03-09 15:43:51 +08:00
|
|
|
|
2009-09-24 02:47:17 +08:00
|
|
|
/*
|
2011-03-18 03:24:16 +08:00
|
|
|
* we call mmu_set_spte() with host_writable = true because that
|
2009-09-24 02:47:17 +08:00
|
|
|
* vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
|
|
|
|
*/
|
2010-06-11 21:28:14 +08:00
|
|
|
mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
|
2010-06-11 21:29:42 +08:00
|
|
|
is_dirty_gpte(gpte), NULL, PT_PAGE_TABLE_LEVEL,
|
2009-09-24 02:47:17 +08:00
|
|
|
gpte_to_gfn(gpte), pfn, true, true);
|
2007-05-01 21:53:31 +08:00
|
|
|
}
|
|
|
|
|
2010-07-13 19:27:08 +08:00
|
|
|
static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
|
|
|
|
struct guest_walker *gw, int level)
|
|
|
|
{
|
|
|
|
pt_element_t curr_pte;
|
2010-08-22 19:13:33 +08:00
|
|
|
gpa_t base_gpa, pte_gpa = gw->pte_gpa[level - 1];
|
|
|
|
u64 mask;
|
|
|
|
int r, index;
|
|
|
|
|
|
|
|
if (level == PT_PAGE_TABLE_LEVEL) {
|
|
|
|
mask = PTE_PREFETCH_NUM * sizeof(pt_element_t) - 1;
|
|
|
|
base_gpa = pte_gpa & ~mask;
|
|
|
|
index = (pte_gpa - base_gpa) / sizeof(pt_element_t);
|
|
|
|
|
|
|
|
r = kvm_read_guest_atomic(vcpu->kvm, base_gpa,
|
|
|
|
gw->prefetch_ptes, sizeof(gw->prefetch_ptes));
|
|
|
|
curr_pte = gw->prefetch_ptes[index];
|
|
|
|
} else
|
|
|
|
r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa,
|
2010-07-13 19:27:08 +08:00
|
|
|
&curr_pte, sizeof(curr_pte));
|
2010-08-22 19:13:33 +08:00
|
|
|
|
2010-07-13 19:27:08 +08:00
|
|
|
return r || curr_pte != gw->ptes[level - 1];
|
|
|
|
}
|
|
|
|
|
2010-08-22 19:13:33 +08:00
|
|
|
static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
|
|
|
|
u64 *sptep)
|
2010-08-22 19:12:48 +08:00
|
|
|
{
|
|
|
|
struct kvm_mmu_page *sp;
|
2010-08-22 19:13:33 +08:00
|
|
|
pt_element_t *gptep = gw->prefetch_ptes;
|
2010-08-22 19:12:48 +08:00
|
|
|
u64 *spte;
|
2010-08-22 19:13:33 +08:00
|
|
|
int i;
|
2010-08-22 19:12:48 +08:00
|
|
|
|
|
|
|
sp = page_header(__pa(sptep));
|
|
|
|
|
|
|
|
if (sp->role.level > PT_PAGE_TABLE_LEVEL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (sp->role.direct)
|
|
|
|
return __direct_pte_prefetch(vcpu, sp, sptep);
|
|
|
|
|
|
|
|
i = (sptep - sp->spt) & ~(PTE_PREFETCH_NUM - 1);
|
|
|
|
spte = sp->spt + i;
|
|
|
|
|
|
|
|
for (i = 0; i < PTE_PREFETCH_NUM; i++, spte++) {
|
|
|
|
pt_element_t gpte;
|
|
|
|
unsigned pte_access;
|
|
|
|
gfn_t gfn;
|
|
|
|
pfn_t pfn;
|
|
|
|
bool dirty;
|
|
|
|
|
|
|
|
if (spte == sptep)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (*spte != shadow_trap_nonpresent_pte)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
gpte = gptep[i];
|
|
|
|
|
2010-11-23 11:08:42 +08:00
|
|
|
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
|
2010-08-22 19:12:48 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
|
|
|
|
gfn = gpte_to_gfn(gpte);
|
|
|
|
dirty = is_dirty_gpte(gpte);
|
|
|
|
pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
|
|
|
|
(pte_access & ACC_WRITE_MASK) && dirty);
|
|
|
|
if (is_error_pfn(pfn)) {
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
mmu_set_spte(vcpu, spte, sp->role.access, pte_access, 0, 0,
|
|
|
|
dirty, NULL, PT_PAGE_TABLE_LEVEL, gfn,
|
|
|
|
pfn, true, true);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
|
|
|
* Fetch a shadow pte for a specific level in the paging hierarchy.
|
|
|
|
*/
|
2008-12-25 21:10:50 +08:00
|
|
|
static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
|
|
|
|
struct guest_walker *gw,
|
2009-07-27 22:30:46 +08:00
|
|
|
int user_fault, int write_fault, int hlevel,
|
2010-12-07 10:35:25 +08:00
|
|
|
int *ptwrite, pfn_t pfn, bool map_writable,
|
|
|
|
bool prefault)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
2008-08-23 00:11:39 +08:00
|
|
|
unsigned access = gw->pt_access;
|
2010-07-13 19:27:10 +08:00
|
|
|
struct kvm_mmu_page *sp = NULL;
|
2010-06-30 16:05:00 +08:00
|
|
|
bool dirty = is_dirty_gpte(gw->ptes[gw->level - 1]);
|
2010-07-13 19:27:10 +08:00
|
|
|
int top_level;
|
2010-06-30 16:05:00 +08:00
|
|
|
unsigned direct_access;
|
2010-07-13 19:27:11 +08:00
|
|
|
struct kvm_shadow_walk_iterator it;
|
2008-08-23 00:11:39 +08:00
|
|
|
|
2009-06-10 19:12:05 +08:00
|
|
|
if (!is_present_gpte(gw->ptes[gw->level - 1]))
|
2008-12-25 21:10:50 +08:00
|
|
|
return NULL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-06-30 16:05:00 +08:00
|
|
|
direct_access = gw->pt_access & gw->pte_access;
|
|
|
|
if (!dirty)
|
|
|
|
direct_access &= ~ACC_WRITE_MASK;
|
|
|
|
|
2010-07-13 19:27:10 +08:00
|
|
|
top_level = vcpu->arch.mmu.root_level;
|
|
|
|
if (top_level == PT32E_ROOT_LEVEL)
|
|
|
|
top_level = PT32_ROOT_LEVEL;
|
|
|
|
/*
|
|
|
|
* Verify that the top-level gpte is still there. Since the page
|
|
|
|
* is a root page, it is either write protected (and cannot be
|
|
|
|
* changed from now on) or it is invalid (in which case, we don't
|
|
|
|
* really care if it changes underneath us after this point).
|
|
|
|
*/
|
|
|
|
if (FNAME(gpte_changed)(vcpu, gw, top_level))
|
|
|
|
goto out_gpte_changed;
|
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
for (shadow_walk_init(&it, vcpu, addr);
|
|
|
|
shadow_walk_okay(&it) && it.level > gw->level;
|
|
|
|
shadow_walk_next(&it)) {
|
2010-07-13 19:27:09 +08:00
|
|
|
gfn_t table_gfn;
|
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
drop_large_spte(vcpu, it.sptep);
|
2007-05-30 19:21:51 +08:00
|
|
|
|
2010-07-13 19:27:10 +08:00
|
|
|
sp = NULL;
|
2010-07-13 19:27:11 +08:00
|
|
|
if (!is_shadow_present_pte(*it.sptep)) {
|
|
|
|
table_gfn = gw->table_gfn[it.level - 2];
|
|
|
|
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
|
|
|
|
false, access, it.sptep);
|
2010-07-13 19:27:10 +08:00
|
|
|
}
|
2010-07-13 19:27:09 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Verify that the gpte in the page we've just write
|
|
|
|
* protected is still there.
|
|
|
|
*/
|
2010-07-13 19:27:11 +08:00
|
|
|
if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
|
2010-07-13 19:27:09 +08:00
|
|
|
goto out_gpte_changed;
|
2008-08-23 00:11:39 +08:00
|
|
|
|
2010-07-13 19:27:10 +08:00
|
|
|
if (sp)
|
2010-07-13 19:27:11 +08:00
|
|
|
link_shadow_page(it.sptep, sp);
|
2008-12-25 21:10:50 +08:00
|
|
|
}
|
2007-11-21 20:11:49 +08:00
|
|
|
|
2010-07-13 19:27:09 +08:00
|
|
|
for (;
|
2010-07-13 19:27:11 +08:00
|
|
|
shadow_walk_okay(&it) && it.level > hlevel;
|
|
|
|
shadow_walk_next(&it)) {
|
2010-07-13 19:27:09 +08:00
|
|
|
gfn_t direct_gfn;
|
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
validate_direct_spte(vcpu, it.sptep, direct_access);
|
2010-07-13 19:27:09 +08:00
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
drop_large_spte(vcpu, it.sptep);
|
2010-07-13 19:27:09 +08:00
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
if (is_shadow_present_pte(*it.sptep))
|
2010-07-13 19:27:09 +08:00
|
|
|
continue;
|
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
|
2010-07-13 19:27:09 +08:00
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
|
|
|
|
true, direct_access, it.sptep);
|
|
|
|
link_shadow_page(it.sptep, sp);
|
2010-07-13 19:27:09 +08:00
|
|
|
}
|
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
mmu_set_spte(vcpu, it.sptep, access, gw->pte_access & access,
|
|
|
|
user_fault, write_fault, dirty, ptwrite, it.level,
|
2010-12-07 10:35:25 +08:00
|
|
|
gw->gfn, pfn, prefault, map_writable);
|
2010-08-22 19:13:33 +08:00
|
|
|
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
|
2010-07-13 19:27:09 +08:00
|
|
|
|
2010-07-13 19:27:11 +08:00
|
|
|
return it.sptep;
|
2010-07-13 19:27:09 +08:00
|
|
|
|
|
|
|
out_gpte_changed:
|
2010-07-13 19:27:10 +08:00
|
|
|
if (sp)
|
2010-07-13 19:27:11 +08:00
|
|
|
kvm_mmu_put_page(sp, it.sptep);
|
2010-07-13 19:27:09 +08:00
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return NULL;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Page fault handler. There are several causes for a page fault:
|
|
|
|
* - there is no shadow pte for the guest pte
|
|
|
|
* - write access through a shadow pte marked read only so that we can set
|
|
|
|
* the dirty bit
|
|
|
|
* - write access to a shadow pte marked read only so we can update the page
|
|
|
|
* dirty bitmap, when userspace requests it
|
|
|
|
* - mmio access; in this case we will never install a present shadow pte
|
|
|
|
* - normal guest page fault due to the guest pte marked not present, not
|
|
|
|
* writable, or not executable
|
|
|
|
*
|
2007-01-06 08:36:54 +08:00
|
|
|
* Returns: 1 if we need to emulate the instruction, 0 otherwise, or
|
|
|
|
* a negative value on error.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*/
|
2010-10-18 00:13:42 +08:00
|
|
|
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
|
2010-12-07 10:48:06 +08:00
|
|
|
bool prefault)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
int write_fault = error_code & PFERR_WRITE_MASK;
|
|
|
|
int user_fault = error_code & PFERR_USER_MASK;
|
|
|
|
struct guest_walker walker;
|
2009-06-10 19:24:23 +08:00
|
|
|
u64 *sptep;
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
int write_pt = 0;
|
2007-01-06 08:36:54 +08:00
|
|
|
int r;
|
2008-04-03 03:46:56 +08:00
|
|
|
pfn_t pfn;
|
2009-07-27 22:30:46 +08:00
|
|
|
int level = PT_PAGE_TABLE_LEVEL;
|
2011-01-14 07:46:48 +08:00
|
|
|
int force_pt_level;
|
2008-07-25 22:24:52 +08:00
|
|
|
unsigned long mmu_seq;
|
2010-10-23 00:18:18 +08:00
|
|
|
bool map_writable;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
|
2007-01-06 08:36:53 +08:00
|
|
|
|
2007-01-06 08:36:54 +08:00
|
|
|
r = mmu_topup_memory_caches(vcpu);
|
|
|
|
if (r)
|
|
|
|
return r;
|
2007-01-06 08:36:53 +08:00
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
/*
|
2009-03-26 15:28:40 +08:00
|
|
|
* Look up the guest pte for the faulting address.
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
*/
|
2010-09-28 17:03:14 +08:00
|
|
|
r = FNAME(walk_addr)(&walker, vcpu, addr, error_code);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The page is not mapped by the guest. Let the guest handle it.
|
|
|
|
*/
|
2007-01-26 16:56:41 +08:00
|
|
|
if (!r) {
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: guest page fault\n", __func__);
|
2010-12-07 10:35:25 +08:00
|
|
|
if (!prefault) {
|
|
|
|
inject_page_fault(vcpu, &walker.fault);
|
|
|
|
/* reset fork detector */
|
|
|
|
vcpu->arch.last_pt_write_count = 0;
|
|
|
|
}
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-01-14 07:46:48 +08:00
|
|
|
if (walker.level >= PT_DIRECTORY_LEVEL)
|
|
|
|
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
|
|
|
|
else
|
|
|
|
force_pt_level = 1;
|
|
|
|
if (!force_pt_level) {
|
2009-07-27 22:30:46 +08:00
|
|
|
level = min(walker.level, mapping_level(vcpu, walker.gfn));
|
|
|
|
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
|
2008-02-23 22:44:30 +08:00
|
|
|
}
|
2009-07-27 22:30:46 +08:00
|
|
|
|
2008-07-25 22:24:52 +08:00
|
|
|
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
2008-09-17 07:54:47 +08:00
|
|
|
smp_rmb();
|
2010-10-14 17:22:46 +08:00
|
|
|
|
2010-12-07 10:48:06 +08:00
|
|
|
if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, write_fault,
|
2010-10-23 00:18:18 +08:00
|
|
|
&map_writable))
|
2010-10-14 17:22:46 +08:00
|
|
|
return 0;
|
2007-12-30 18:29:05 +08:00
|
|
|
|
2008-01-24 17:44:11 +08:00
|
|
|
/* mmio */
|
2010-05-31 14:28:19 +08:00
|
|
|
if (is_error_pfn(pfn))
|
|
|
|
return kvm_handle_bad_page(vcpu->kvm, walker.gfn, pfn);
|
2008-01-24 17:44:11 +08:00
|
|
|
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-07-25 22:24:52 +08:00
|
|
|
if (mmu_notifier_retry(vcpu, mmu_seq))
|
|
|
|
goto out_unlock;
|
2010-08-28 19:22:46 +08:00
|
|
|
|
2010-08-30 18:22:53 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
|
2007-12-31 21:27:49 +08:00
|
|
|
kvm_mmu_free_some_pages(vcpu);
|
2011-01-14 07:46:48 +08:00
|
|
|
if (!force_pt_level)
|
|
|
|
transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
|
2009-06-10 19:24:23 +08:00
|
|
|
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
|
2010-12-07 10:35:25 +08:00
|
|
|
level, &write_pt, pfn, map_writable, prefault);
|
2010-06-10 19:10:55 +08:00
|
|
|
(void)sptep;
|
2008-03-04 04:59:56 +08:00
|
|
|
pgprintk("%s: shadow pte %p %llx ptwrite %d\n", __func__,
|
2009-06-10 19:24:23 +08:00
|
|
|
sptep, *sptep, write_pt);
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
|
2007-04-30 22:05:38 +08:00
|
|
|
if (!write_pt)
|
2007-12-13 23:50:52 +08:00
|
|
|
vcpu->arch.last_pt_write_count = 0; /* reset fork detector */
|
2007-04-30 22:05:38 +08:00
|
|
|
|
2007-04-19 22:27:43 +08:00
|
|
|
++vcpu->stat.pf_fixed;
|
2010-08-30 18:22:53 +08:00
|
|
|
trace_kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
|
2007-12-21 08:18:26 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
return write_pt;
|
2008-07-25 22:24:52 +08:00
|
|
|
|
|
|
|
out_unlock:
|
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
|
|
|
kvm_release_pfn_clean(pfn);
|
|
|
|
return 0;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
}
|
|
|
|
|
2008-12-25 21:19:00 +08:00
|
|
|
static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
|
2008-09-24 00:18:35 +08:00
|
|
|
{
|
2008-12-25 21:19:00 +08:00
|
|
|
struct kvm_shadow_walk_iterator iterator;
|
2010-05-15 18:53:35 +08:00
|
|
|
struct kvm_mmu_page *sp;
|
2010-03-15 19:59:57 +08:00
|
|
|
gpa_t pte_gpa = -1;
|
2008-12-25 21:19:00 +08:00
|
|
|
int level;
|
|
|
|
u64 *sptep;
|
2009-03-13 01:18:43 +08:00
|
|
|
int need_flush = 0;
|
2008-12-25 21:19:00 +08:00
|
|
|
|
|
|
|
spin_lock(&vcpu->kvm->mmu_lock);
|
2008-09-24 00:18:35 +08:00
|
|
|
|
2008-12-25 21:19:00 +08:00
|
|
|
for_each_shadow_entry(vcpu, gva, iterator) {
|
|
|
|
level = iterator.level;
|
|
|
|
sptep = iterator.sptep;
|
2008-12-02 08:32:05 +08:00
|
|
|
|
2010-05-15 18:53:35 +08:00
|
|
|
sp = page_header(__pa(sptep));
|
2010-04-28 11:55:15 +08:00
|
|
|
if (is_last_spte(*sptep, level)) {
|
2010-04-28 11:54:44 +08:00
|
|
|
int offset, shift;
|
2010-03-15 19:59:57 +08:00
|
|
|
|
2010-05-15 18:53:35 +08:00
|
|
|
if (!sp->unsync)
|
|
|
|
break;
|
|
|
|
|
2010-04-28 11:54:44 +08:00
|
|
|
shift = PAGE_SHIFT -
|
|
|
|
(PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
|
|
|
|
offset = sp->role.quadrant << shift;
|
|
|
|
|
|
|
|
pte_gpa = (sp->gfn << PAGE_SHIFT) + offset;
|
2010-03-15 19:59:57 +08:00
|
|
|
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
|
2008-12-25 21:19:00 +08:00
|
|
|
|
|
|
|
if (is_shadow_present_pte(*sptep)) {
|
|
|
|
if (is_large_pte(*sptep))
|
|
|
|
--vcpu->kvm->stat.lpages;
|
2010-06-06 19:31:27 +08:00
|
|
|
drop_spte(vcpu->kvm, sptep,
|
|
|
|
shadow_trap_nonpresent_pte);
|
2009-03-13 01:18:43 +08:00
|
|
|
need_flush = 1;
|
2010-06-06 19:31:27 +08:00
|
|
|
} else
|
|
|
|
__set_spte(sptep, shadow_trap_nonpresent_pte);
|
2008-12-25 21:19:00 +08:00
|
|
|
break;
|
2008-12-23 04:49:30 +08:00
|
|
|
}
|
2008-09-24 00:18:35 +08:00
|
|
|
|
2010-05-15 18:53:35 +08:00
|
|
|
if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
|
2008-12-25 21:19:00 +08:00
|
|
|
break;
|
|
|
|
}
|
2008-09-24 00:18:35 +08:00
|
|
|
|
2009-03-13 01:18:43 +08:00
|
|
|
if (need_flush)
|
|
|
|
kvm_flush_remote_tlbs(vcpu->kvm);
|
2010-03-15 19:59:57 +08:00
|
|
|
|
|
|
|
atomic_inc(&vcpu->kvm->arch.invlpg_counter);
|
|
|
|
|
2008-12-02 08:32:05 +08:00
|
|
|
spin_unlock(&vcpu->kvm->mmu_lock);
|
2010-03-15 19:59:57 +08:00
|
|
|
|
|
|
|
if (pte_gpa == -1)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (mmu_topup_memory_caches(vcpu))
|
|
|
|
return;
|
|
|
|
kvm_mmu_pte_write(vcpu, pte_gpa, NULL, sizeof(pt_element_t), 0);
|
2008-09-24 00:18:35 +08:00
|
|
|
}
|
|
|
|
|
2010-02-10 20:21:32 +08:00
|
|
|
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
|
2010-11-22 23:53:26 +08:00
|
|
|
struct x86_exception *exception)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
{
|
|
|
|
struct guest_walker walker;
|
2007-02-12 16:54:36 +08:00
|
|
|
gpa_t gpa = UNMAPPED_GVA;
|
|
|
|
int r;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2010-09-28 17:03:14 +08:00
|
|
|
r = FNAME(walk_addr)(&walker, vcpu, vaddr, access);
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
2007-02-12 16:54:36 +08:00
|
|
|
if (r) {
|
2007-11-21 20:44:45 +08:00
|
|
|
gpa = gfn_to_gpa(walker.gfn);
|
2007-02-12 16:54:36 +08:00
|
|
|
gpa |= vaddr & ~PAGE_MASK;
|
2010-11-22 23:53:27 +08:00
|
|
|
} else if (exception)
|
|
|
|
*exception = walker.fault;
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
|
|
|
|
return gpa;
|
|
|
|
}
|
|
|
|
|
2010-09-10 23:30:50 +08:00
|
|
|
static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
|
2010-11-22 23:53:26 +08:00
|
|
|
u32 access,
|
|
|
|
struct x86_exception *exception)
|
2010-09-10 23:30:50 +08:00
|
|
|
{
|
|
|
|
struct guest_walker walker;
|
|
|
|
gpa_t gpa = UNMAPPED_GVA;
|
|
|
|
int r;
|
|
|
|
|
2010-09-28 17:03:14 +08:00
|
|
|
r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr, access);
|
2010-09-10 23:30:50 +08:00
|
|
|
|
|
|
|
if (r) {
|
|
|
|
gpa = gfn_to_gpa(walker.gfn);
|
|
|
|
gpa |= vaddr & ~PAGE_MASK;
|
2010-11-22 23:53:27 +08:00
|
|
|
} else if (exception)
|
|
|
|
*exception = walker.fault;
|
2010-09-10 23:30:50 +08:00
|
|
|
|
|
|
|
return gpa;
|
|
|
|
}
|
|
|
|
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
|
|
|
|
struct kvm_mmu_page *sp)
|
|
|
|
{
|
2008-05-29 19:20:16 +08:00
|
|
|
int i, j, offset, r;
|
|
|
|
pt_element_t pt[256 / sizeof(pt_element_t)];
|
|
|
|
gpa_t pte_gpa;
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
|
2009-01-11 19:02:10 +08:00
|
|
|
if (sp->role.direct
|
2007-11-21 03:39:54 +08:00
|
|
|
|| (PTTYPE == 32 && sp->role.level > PT_PAGE_TABLE_LEVEL)) {
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
nonpaging_prefetch_page(vcpu, sp);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2008-05-29 19:20:16 +08:00
|
|
|
pte_gpa = gfn_to_gpa(sp->gfn);
|
|
|
|
if (PTTYPE == 32) {
|
2007-11-21 03:39:54 +08:00
|
|
|
offset = sp->role.quadrant << PT64_LEVEL_BITS;
|
2008-05-29 19:20:16 +08:00
|
|
|
pte_gpa += offset * sizeof(pt_element_t);
|
|
|
|
}
|
2007-12-21 08:18:23 +08:00
|
|
|
|
2008-05-29 19:20:16 +08:00
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; i += ARRAY_SIZE(pt)) {
|
|
|
|
r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, pt, sizeof pt);
|
|
|
|
pte_gpa += ARRAY_SIZE(pt) * sizeof(pt_element_t);
|
|
|
|
for (j = 0; j < ARRAY_SIZE(pt); ++j)
|
2009-06-10 19:12:05 +08:00
|
|
|
if (r || is_present_gpte(pt[j]))
|
2008-05-29 19:20:16 +08:00
|
|
|
sp->spt[i+j] = shadow_trap_nonpresent_pte;
|
|
|
|
else
|
|
|
|
sp->spt[i+j] = shadow_notrap_nonpresent_pte;
|
2007-12-21 08:18:23 +08:00
|
|
|
}
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
}
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
/*
|
|
|
|
* Using the cached information from sp->gfns is safe because:
|
|
|
|
* - The spte has a reference to the struct page, so the pfn for a given gfn
|
|
|
|
* can't change unless all sptes pointing to it are nuked first.
|
2010-11-23 11:13:00 +08:00
|
|
|
*
|
|
|
|
* Note:
|
|
|
|
* We should flush all tlbs if spte is dropped even though guest is
|
|
|
|
* responsible for it. Since if we don't, kvm_mmu_notifier_invalidate_page
|
|
|
|
* and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
|
|
|
|
* used by guest then tlbs are not flushed, so guest is allowed to access the
|
|
|
|
* freed pages.
|
|
|
|
* And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
|
2008-09-24 00:18:33 +08:00
|
|
|
*/
|
2010-11-19 17:04:03 +08:00
|
|
|
static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
|
2008-09-24 00:18:33 +08:00
|
|
|
{
|
|
|
|
int i, offset, nr_present;
|
2010-11-19 17:03:22 +08:00
|
|
|
bool host_writable;
|
2010-04-16 17:16:40 +08:00
|
|
|
gpa_t first_pte_gpa;
|
2008-09-24 00:18:33 +08:00
|
|
|
|
|
|
|
offset = nr_present = 0;
|
|
|
|
|
2010-05-26 16:49:59 +08:00
|
|
|
/* direct kvm_mmu_page can not be unsync. */
|
|
|
|
BUG_ON(sp->role.direct);
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
if (PTTYPE == 32)
|
|
|
|
offset = sp->role.quadrant << PT64_LEVEL_BITS;
|
|
|
|
|
2010-04-16 17:16:40 +08:00
|
|
|
first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
|
|
|
|
unsigned pte_access;
|
|
|
|
pt_element_t gpte;
|
|
|
|
gpa_t pte_gpa;
|
2010-05-13 10:08:08 +08:00
|
|
|
gfn_t gfn;
|
2008-09-24 00:18:33 +08:00
|
|
|
|
|
|
|
if (!is_shadow_present_pte(sp->spt[i]))
|
|
|
|
continue;
|
|
|
|
|
2010-04-16 17:16:40 +08:00
|
|
|
pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
|
2008-09-24 00:18:33 +08:00
|
|
|
|
|
|
|
if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
|
|
|
|
sizeof(pt_element_t)))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2010-05-13 10:08:08 +08:00
|
|
|
gfn = gpte_to_gfn(gpte);
|
2010-11-23 11:08:42 +08:00
|
|
|
|
|
|
|
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
|
2010-11-23 11:13:00 +08:00
|
|
|
vcpu->kvm->tlbs_dirty++;
|
2010-11-23 11:08:42 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (gfn != sp->gfns[i]) {
|
|
|
|
drop_spte(vcpu->kvm, &sp->spt[i],
|
|
|
|
shadow_trap_nonpresent_pte);
|
2010-11-23 11:13:00 +08:00
|
|
|
vcpu->kvm->tlbs_dirty++;
|
2008-09-24 00:18:33 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
nr_present++;
|
|
|
|
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
|
2010-12-23 16:09:29 +08:00
|
|
|
host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
|
|
|
|
|
2008-09-24 00:18:33 +08:00
|
|
|
set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
|
2009-07-27 22:30:46 +08:00
|
|
|
is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
|
2009-09-24 02:47:17 +08:00
|
|
|
spte_to_pfn(sp->spt[i]), true, false,
|
2010-11-19 17:03:22 +08:00
|
|
|
host_writable);
|
2008-09-24 00:18:33 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return !nr_present;
|
|
|
|
}
|
|
|
|
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 18:21:36 +08:00
|
|
|
#undef pt_element_t
|
|
|
|
#undef guest_walker
|
|
|
|
#undef FNAME
|
|
|
|
#undef PT_BASE_ADDR_MASK
|
|
|
|
#undef PT_INDEX
|
2009-07-27 22:30:45 +08:00
|
|
|
#undef PT_LVL_ADDR_MASK
|
|
|
|
#undef PT_LVL_OFFSET_MASK
|
KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-09-17 00:58:32 +08:00
|
|
|
#undef PT_LEVEL_BITS
|
[PATCH] KVM: MMU: Shadow page table caching
Define a hashtable for caching shadow page tables. Look up the cache on
context switch (cr3 change) or during page faults.
The key to the cache is a combination of
- the guest page table frame number
- the number of paging levels in the guest
* we can cache real mode, 32-bit mode, pae, and long mode page
tables simultaneously. this is useful for smp bootup.
- the guest page table table
* some kernels use a page as both a page table and a page directory. this
allows multiple shadow pages to exist for that page, one per level
- the "quadrant"
* 32-bit mode page tables span 4MB, whereas a shadow page table spans
2MB. similarly, a 32-bit page directory spans 4GB, while a shadow
page directory spans 1GB. the quadrant allows caching up to 4 shadow page
tables for one guest page in one level.
- a "metaphysical" bit
* for real mode, and for pse pages, there is no guest page table, so set
the bit to avoid write protecting the page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-06 08:36:43 +08:00
|
|
|
#undef PT_MAX_FULL_LEVELS
|
2007-11-21 18:35:07 +08:00
|
|
|
#undef gpte_to_gfn
|
2009-07-27 22:30:45 +08:00
|
|
|
#undef gpte_to_gfn_lvl
|
2007-12-07 20:56:58 +08:00
|
|
|
#undef CMPXCHG
|