* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits)
KVM: ia64: Add intel iommu support for guests.
KVM: ia64: add directed mmio range support for kvm guests
KVM: ia64: Make pmt table be able to hold physical mmio entries.
KVM: Move irqchip_in_kernel() from ioapic.h to irq.h
KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs
KVM: Move device assignment logic to common code
KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
KVM: VMX: enable invlpg exiting if EPT is disabled
KVM: x86: Silence various LAPIC-related host kernel messages
KVM: Device Assignment: Map mmio pages into VT-d page table
KVM: PIC: enhance IPI avoidance
KVM: MMU: add "oos_shadow" parameter to disable oos
KVM: MMU: speed up mmu_unsync_walk
KVM: MMU: out of sync shadow core
KVM: MMU: mmu_convert_notrap helper
KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
KVM: MMU: mmu_parent_walk
KVM: x86: trap invlpg
KVM: MMU: sync roots on mmu reload
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix compat-vdso
x86/mm: unify init task OOM handling
x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
This series of patches re-introduces the iommu_num_pages function so that
it can be used by each architecture specific IOMMU implementations. The
series also changes IOMMU implementations for X86, Alpha, PowerPC and
UltraSparc. The other implementations are not yet changed because the
modifications required are not obvious and I can't test them on real
hardware.
This patch:
This is a preparation patch for introducing a generic iommu_num_pages function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SET_PERSONALITY macro is always called with a second argument of 0.
Remove the ibcs argument and the various tests to set the PER_SVR4
personality.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jeff Chua reported that this patch:
> -#define PTE_IDENT_ATTR 0x007 /* PRESENT+RW+USER */
> -#define PDE_IDENT_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */
> +#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */
> +#define PDE_IDENT_ATTR 0x063 /* PRESENT+RW+DIRTY+ACCESSED */
broke kernels with CONFIG_COMPAT_VDSO set with this init segfault:
init[1]: segfault at ffffe01c up b7f0dc28 sp bfc26628 error 5 in ld-2.7.90.so[b7f0b000+1c000]
Include USER bit in the PDE_IDENT_ATTR only, as the protection bits
are combined from the PDE and PTE entries. This will allow the high
mapped VDSO page in the case of CONFIG_COMPAT_VDSO to be user
readable.
Reported-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Moving irq ack notification logic as common, and make
it shared with ia64 side.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Assigned device could DMA to mmio pages, so also need to map mmio pages
into VT-d page table.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cache the unsynced children information in a per-page bitmap.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Allow guest pagetables to go out of sync. Instead of emulating write
accesses to guest pagetables, or unshadowing them, we un-write-protect
the page table and allow the guest to modify it at will. We rely on
invlpg executions to synchronize individual ptes, and will synchronize
the entire pagetable on tlb flushes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
With pages out of sync invlpg needs to be trapped. For now simply nuke
the entry.
Untested on AMD.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Examine guest pagetable and bring the shadow back in sync. Caller is responsible
for local TLB flush before re-entering guest mode.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Based on a patch by: Kay, Allen M <allen.m.kay@intel.com>
This patch enables PCI device assignment based on VT-d support.
When a device is assigned to the guest, the guest memory is pinned and
the mapping is updated in the VT-d IOMMU.
[Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present
and also control enable/disable from userspace]
Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
We're in a hot path. We can't use kmalloc() because
it might impact performance. So, we just stick the buffer that
we need into the kvm_vcpu_arch structure. This is used very
often, so it is not really a waste.
We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Based on a patch from: Amit Shah <amit.shah@qumranet.com>
This patch adds support for handling PCI devices that are assigned to
the guest.
The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment is failed by conveying a -EBUSY reply to the userspace.
Devices that share their interrupt line are not supported at the moment.
By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM intends to use paravirt code to calibrate khz. Xen
current code will do just fine. So as a first step, factor out
code to pvclock.c.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This can be used by kvm subsystems that are interested in when
interrupts are acked, for example time drift compensation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
As we execute real mode guests in VM86 mode, exception have to be
reinjected appropriately when the guest triggered them. For this purpose
the patch adopts the real-mode injection pattern used in vmx_inject_irq
to vmx_queue_exception, additionally taking care that the IP is set
correctly for #BP exceptions. Furthermore it extends
handle_rmode_exception to reinject all those exceptions that can be
raised in real mode.
This fixes the execution of himem.exe from FreeDOS and also makes its
debug.com work properly.
Note that guest debugging in real mode is broken now. This has to be
fixed by the scheduled debugging infrastructure rework (will be done
once base patches for QEMU have been accepted).
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Similar to the exception queue, this hold interrupts that have been
accepted by the virtual processor core but not yet injected.
Not yet used.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of processing nmi injection failure in the vm entry path, move
it to the vm exit path (vm_complete_interrupts()). This separates nmi
injection from nmi post-processing, and moves the nmi state from the VT
state into vcpu state (new variable nmi_injected specifying an injection
in progress).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move KVM trace definitions from x86 specific kvm headers to common kvm
headers to create a cross-architecture numbering scheme for trace
events. This means the kvmtrace_format userspace tool won't need to know
which architecture produced the log file being processed.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
As suggested by Avi, introduce accessors to read/write guest registers.
This simplifies the ->cache_regs/->decache_regs interface, and improves
register caching which is important for VMX, where the cost of
vmcs_read/vmcs_write is significant.
[avi: fix warnings]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.infradead.org/users/dwmw2/random-2.6:
Fix autoloading of MacBook Pro backlight driver.
Automatic MODULE_ALIAS() for DMI match tables.
Remove asm/a.out.h files for all architectures without a.out support.
Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT
Remove redundant CONFIG_ARCH_SUPPORTS_AOUT
S390: Update comments about why we don't use <asm-generic/statfs.h>
SPARC: Use <asm-generic/statfs.h>
PowerPC: Use <asm-generic/statfs.h>
PARISC: Use <asm-generic/statfs.h>
x86_64: Use <asm-generic/statfs.h>
IA64: Use <asm-generic/statfs.h>
ARM: Use <asm-generic/statfs.h>
Make <asm-generic/statfs.h> suitable for 64-bit platforms.
Define and use PCI_DEVICE_ID_MARVELL_88ALP01_CCIC for CAFÉ camera driver
[MTD] [NAND] Define and use PCI_DEVICE_ID_MARVELL_88ALP01_NAND for CAFÉ
Use PCI_DEVICE_ID_88ALP01 for CAFÉ chip, rather than PCI_DEVICE_ID_CAFE.
EFS: Don't set f_fsid in statfs().
Merges oprofile, timers/hpet, x86/traps, x86/time, and x86/core misc items.
* 'x86-core-v4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (132 commits)
x86: change early_ioremap to use slots instead of nesting
x86: adjust dependencies for CONFIG_X86_CMOV
dumpstack: x86: various small unification steps, fix
x86: remove additional_cpus
x86: remove additional_cpus configurability
x86: improve UP kernel when CPU-hotplug and SMP is enabled
dumpstack: x86: various small unification steps
dumpstack: i386: make kstack= an early boot-param and add oops=panic
dumpstack: x86: use log_lvl and unify trace formatting
dumptrace: x86: consistently include loglevel, print stack switch
dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack
dumpstack: x86: make printk_address equal
dumpstack: x86: move die_nmi to dumpstack_32.c
traps: x86: finalize unification of traps.c
traps: x86: make traps_32.c and traps_64.c equal
traps: x86: various noop-changes preparing for unification of traps_xx.c
traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection
traps: i386: expand clear_mem_error and remove from mach_traps.h
traps: x86_64: make io_check_error equal to the one on i386
traps: i386: use preempt_conditional_sti/cli in do_int3
...
We need a way to describe the various additional modes and flow control
features that random weird hardware shows up and software such as wine
wants to emulate as Windows supports them.
TCGETX/TCSETX and the termiox ioctl are a SYS5 extension that we might as
well adopt. This patches adds the structures and the basic ioctl interfaces
when the TCGETX etc defines are added for an architecture. Drivers wishing
to use this stuff need to add new methods.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
JP Tosoni observed:
"About a RS485 ioctl: could you consider the attached files which are
already in the Linux kernel (in include/asm-cris). They define a
TIOCSERSETRS485 (ioctl.h), and the data structure (rs485.h)
with allows to specify timings. Sounds just like what we want ?"
and he's right: sort of. Rework the structure to use flag bits and make the
time delay a fixed sized field so we don't get 32/64bit problems. Add the ioctls
to x86 so that people know what to add to their platform of choice.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
so we could remove the requirement that one needs to call
early_iounmap() in exactly reverse order of early_ioremap().
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is the last user of clear_mem_error, which is defined
only on i386. Expand the inline function and remove it from
include/asm-x86/mach-default/mach_traps.h
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- set_system_gate on i386 is really set_system_trap_gate
- set_system_gate on x86_64 is really set_system_intr_gate
- ist=0 means no special stack switch is done:
- introduce STACKFAULT_STACK, DOUBLEFAULT_STACK, NMI_STACK,
DEBUG_STACK and MCE_STACK as on x86_64.
- use the _ist variants with XXX_STACK set to zero
- remove set_system_gate
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
traps: x86: correct copy/paste bug: a trap is a GATE_TRAP
Fix copy/paste/forgot-to-edit bug in desc.h.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the x86_64-version and the i386-version of do_debug
more similar.
- introduce preempt_conditional_sti/cli to i386. The preempt-count
is now elevated during the trap handler, like on x86_64. It
does not run on a separate stack, however.
- replace an open-coded "send_sigtrap"
- copy some comments
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mark the exception handlers with "dotraplinkage" to hide the
calling convention differences between i386 and x86_64.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split out math_error from do_coprocessor_error and simd_math_error
from do_simd_coprocessor_error, like on i386. While at it, add the
"error_code" parameter to do_coprocessor_error, do_simd_coprocessor_error
and do_spurious_interrupt_bug.
This does not change the generated code, but brings the declarations in
line with all the other trap handlers.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
virt_addr_valid() calls __pa(), which calls __phys_addr(). With
CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the
address *isn't* valid. That's clearly wrong for virt_addr_valid().
We also incorporate the debugging checks into virt_addr_valid().
Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the low-level function to dump user-passed registers on i386 is
called __show_registers() whereas on x86-64 it's called __show_regs(). Unify
the API to simplify porting of kmemcheck to x86-64.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 4a701737 ("x86: move prefill_possible_map calling early, fix")
is the wrong fix: prefill_possible_map() needs to be available
even when CONFIG_HOTPLUG_CPU is not set. A followon patch will do that.
Fix this correctly by making prefill_possible_map() available even when
CONFIG_HOTPLUG_CPU is not set. The function is needed so that
the number of possible CPUs can be determined.
Tested on uniprocessor machine with CPU hotplug disabled.
From boot log:
Before: NR_CPUS: 512, nr_cpu_ids: 512, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Winchip-2 and Winchip-2A cpu choices select the
same options for kernel and compiler.
Merge them to save few bytes and reduce confusion.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The last use of trace_hardirqs_fixup is unnecessary, because the
trap is taken with interrupt off on i386 as well as x86_64, and
the irq-tracer is notified of this from the assembly code.
trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed
from include/asm-x86/irqflags.h as they are no longer used.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of using SEGMENT_IS_KERNEL_CODE, use the
"user_mode" macro, which can play the same role. Delete
the former, since it now lacks any user.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>