2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00
Commit Graph

21946 Commits

Author SHA1 Message Date
Xiantao Zhang
60a07bb9ba KVM: ia64: Add processor virtulization support
vcpu.c provides processor virtualization logic for kvm.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:09 +03:00
Xiantao Zhang
a793537a97 KVM: ia64: Add trampoline for guest/host mode switch
trampoline code targets for guest/host world switch.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:08 +03:00
Xiantao Zhang
e30af4ce7f KVM: ia64: Add mmio decoder for kvm/ia64
mmio.c includes mmio decoder, and related mmio logics.

Signed-off-by: Anthony Xu <Anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:07 +03:00
Xiantao Zhang
fbd4b5621c KVM: ia64: Add interruption vector table for vmm
vmm_ivt.S includes an ivt for vmm use.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:06 +03:00
Xiantao Zhang
964cd94a2a KVM: ia64: Add TLB virtulization support
vtlb.c includes tlb/VHPT virtulization.

Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:05 +03:00
Xiantao Zhang
bb46fb4af1 KVM: ia64: VMM module interfaces
vmm.c adds the interfaces with kvm/module, and initialize global data area.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:04 +03:00
Xiantao Zhang
a4f500381a KVM: ia64: Add header files for kvm/ia64
kvm_minstate.h : Marcos about Min save routines.
lapic.h: apic structure definition.
vcpu.h : routions related to vcpu virtualization.
vti.h  : Some macros or routines for VT support on Itanium.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:03 +03:00
Xiantao Zhang
b024b79322 KVM: ia64: Add kvm arch-specific core code for kvm/ia64
kvm_ia64.c is created to handle kvm ia64-specific core logic.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:01:03 +03:00
Heiko Carstens
f603f0731f KVM: s390: rename stfl to kvm_stfl
Temporarily rename this function to avoid merge conflicts and/or
dependencies. This function will be removed as soon as git-s390
and kvm.git are finally upstream.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:59 +03:00
Heiko Carstens
7e8e6ab48d KVM: s390: Fix incorrect return value
kvm_arch_vcpu_ioctl_run currently incorrectly always returns 0.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:58 +03:00
Marcelo Tosatti
bed1d1dfc4 KVM: MMU: prepopulate guest pages after write-protecting
Zdenek reported a bug where a looping "dmsetup status" eventually hangs
on SMP guests.

The problem is that kvm_mmu_get_page() prepopulates the shadow MMU
before write protecting the guest page tables. By doing so, it leaves a
window open where the guest can mark a pte as present while the host has
shadow cached such pte as "notrap". Accesses to such address will fault
in the guest without the host having a chance to fix the situation.

Fix by moving the write protection before the pte prefetch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:58 +03:00
Avi Kivity
fcd6dbac92 KVM: MMU: Only mark_page_accessed() if the page was accessed by the guest
If the accessed bit is not set, the guest has never accessed this page
(at least through this spte), so there's no need to mark the page
accessed.  This provides more accurate data for the eviction algortithm.

Noted by Andrea Arcangeli.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:57 +03:00
Avi Kivity
3d45830c2b KVM: Free apic access page on vm destruction
Noticed by Marcelo Tosatti.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:54 +03:00
Izik Eidus
3ee16c8145 KVM: MMU: allow the vm to shrink the kvm mmu shadow caches
Allow the Linux memory manager to reclaim memory in the kvm shadow cache.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:53 +03:00
Marcelo Tosatti
3200f405a1 KVM: MMU: unify slots_lock usage
Unify slots_lock acquision around vcpu_run(). This is simpler and less
error-prone.

Also fix some callsites that were not grabbing the lock properly.

[avi: drop slots_lock while in guest mode to avoid holding the lock
      for indefinite periods]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Sheng Yang
25c5f225be KVM: VMX: Enable MSR Bitmap feature
MSR Bitmap controls whether the accessing of an MSR causes VM Exit.
Eliminating exits on automatically saved and restored MSRs yields a
small performance gain.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:52 +03:00
Carsten Otte
fa5877439d s390: KVM guest: detect when running on kvm
This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:50 +03:00
Christian Borntraeger
77b455f1bc KVM: s390: add kvm to kconfig on s390
This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:47 +03:00
Christian Borntraeger
e28acfea5d KVM: s390: intercepts for diagnose instructions
This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
  for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself

In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.

Acked-by: Martin Schwidefsky <martin.schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:46 +03:00
Christian Borntraeger
5288fbf0ef KVM: s390: interprocessor communication via sigp
This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
  of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
  state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
  ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
  denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu

For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP  really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
                     recognized

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:46 +03:00
Christian Borntraeger
453423dce2 KVM: s390: intercepts for privileged instructions
This patch introduces in-kernel handling of some intercepts for privileged
instructions:

handle_set_prefix()        sets the prefix register of the local cpu
handle_store_prefix()      stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey()              just decrements the instruction address and retries
handle_stsch()             delivers condition code 3 "operation not supported"
handle_chsc()              same here
handle_stfl()              stores the facility list which contains the
                           capabilities of the cpu
handle_stidp()             stores cpu type/model/revision and such
handle_stsi()              stores information about the system topology

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:45 +03:00
Carsten Otte
ba5c1e9b6c KVM: s390: interrupt subsystem, cpu timer, waitpsw
This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).

In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.

This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.

The following interrupts are supported:
SIGP STOP       - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
                  (stopped) remote cpu
INT EMERGENCY   - interprocessor interrupt, usually used to signal need_reshed
                  and for smp_call_function() in the guest.
PROGRAM INT     - exception during program execution such as page fault, illegal
                  instruction and friends
RESTART         - interprocessor signal that starts a stopped cpu
INT VIRTIO      - floating interrupt for virtio signalisation
INT SERVICE     - floating interrupt for signalisations from the system
                  service processor

struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.

kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.

[christian: change virtio interrupt to 0x2603]

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:44 +03:00
Christian Borntraeger
8f2abe6a1e KVM: s390: sie intercept handling
This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.

The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
  at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
  userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
  to userland

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:43 +03:00
Heiko Carstens
b0c632db63 KVM: s390: arch backend for the kvm kernel module
This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
 (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.

The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.c    similar to arch/x86/kvm/x86.c, this implements all
                            arch callbacks for kvm. __vcpu_run calls back into
                            sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S      assembler function sie64a, which enters guest
                            context via SIE, and switches world before and after                            that
include/asm-s390/kvm_host.h contains all vital data structures needed to run
                            virtual machines on the mainframe
include/asm-s390/kvm.h      defines kvm_regs and friends for user access to
                            guest register content
arch/s390/kvm/gaccess.h     functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.h    header file for kvm-s390 internals, extended by
                            later patches

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:42 +03:00
Carsten Otte
402b08622d s390: KVM preparation: provide hook to enable pgstes in user pagetable
The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed
and the process has now a page status extended field after every page table.

Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

This patch has a small common code hit, namely making dup_mm non-static.

Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
review feedback. Now we do have the prototype for dup_mm in
include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
call task_lock() to prevent race against ptrace modification of mm_users.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:40 +03:00
Izik Eidus
37817f2982 KVM: x86: hardware task switching support
This emulates the x86 hardware task switch mechanism in software, as it is
unsupported by either vmx or svm.  It allows operating systems which use it,
like freedos, to run as kvm guests.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:39 +03:00
Izik Eidus
2e4d265349 KVM: x86: add functions to get the cpl of vcpu
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:38 +03:00
Avi Kivity
4c9fc8ef50 KVM: VMX: Add module option to disable flexpriority
Useful for debugging.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:37 +03:00
Avi Kivity
268fe02ae0 KVM: no longer EXPERIMENTAL
Long overdue.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:36 +03:00
Avi Kivity
0b49ea8659 KVM: MMU: Introduce and use spte_to_page()
Encapsulate the pte mask'n'shift in a function.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:35 +03:00
Izik Eidus
855149aaa9 KVM: MMU: fix dirty bit setting when removing write permissions
When mmu_set_spte() checks if a page related to spte should be release as
dirty or clean, it check if the shadow pte was writeble, but in case
rmap_write_protect() is called called it is possible for shadow ptes that were
writeble to become readonly and therefor mmu_set_spte will release the pages
as clean.

This patch fix this issue by marking the page as dirty inside
rmap_write_protect().

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:34 +03:00
Avi Kivity
947da53830 KVM: MMU: Set the accessed bit on non-speculative shadow ptes
If we populate a shadow pte due to a fault (and not speculatively due to a
pte write) then we can set the accessed bit on it, as we know it will be
set immediately on the next guest instruction.  This saves a read-modify-write
operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:33 +03:00
Glauber Costa
1e977aa12d x86: KVM guest: disable clock before rebooting.
This patch writes 0 (actually, what really matters is that the
LSB is cleared) to the system time msr before shutting down
the machine for kexec.

Without it, we can have a random memory location being written
when the guest comes back

It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c)
and crash_shutdown, used in the path of crash_kexec() (kexec.c)

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:31 +03:00
Glauber Costa
3c62c62502 x86: make native_machine_shutdown non-static
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:30 +03:00
Glauber Costa
ed23dc6f5b x86: allow machine_crash_shutdown to be replaced
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:29 +03:00
Marcelo Tosatti
096d14a3b5 x86: KVM guest: hypercall batching
Batch pte updates and tlb flushes in lazy MMU mode.

[avi:
 - adjust to mmu_op
 - helper for getting para_state without debug warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00
Marcelo Tosatti
1da8a77bdc x86: KVM guest: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - guest/host split
 - fix 32-bit truncation issues
 - adjust to mmu_op
 - adjust to ->release_*() renamed
 - add ->release_pud()]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:28 +03:00
Marcelo Tosatti
2f333bcb4e KVM: MMU: hypercall based pte updates and TLB flushes
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.

Don't report the feature if two dimensional paging is enabled.

[avi:
 - one mmu_op hypercall instead of one per op
 - allow 64-bit gpa on hypercall
 - don't pass host errors (-ENOMEM) to guest]

[akpm: warning fix on i386]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:27 +03:00
Avi Kivity
9f81128591 KVM: Provide unlocked version of emulator_write_phys()
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:26 +03:00
Marcelo Tosatti
0cf1bfd273 x86: KVM guest: add basic paravirt support
Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:25 +03:00
Marcelo Tosatti
a28e4f5a62 KVM: add basic paravirt support
Add basic KVM paravirt support. Avoid vm-exits on IO delays.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:24 +03:00
Sheng Yang
308b0f239e KVM: Add reset support for in kernel PIT
Separate the reset part and prepare for reset support.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:23 +03:00
Sheng Yang
e0f63cb927 KVM: Add save/restore supporting of in kernel PIT
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:22 +03:00
Sheng Yang
7837699fa6 KVM: In kernel PIT model
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.

[marcelo: make last_injected_time per-guest]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 12:00:21 +03:00
Avi Kivity
4fcaa98267 KVM: Remove pointless desc_ptr #ifdef
The desc_struct changes left an unnecessary #ifdef; remove it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Avi Kivity
019960ae99 KVM: VMX: Don't adjust tsc offset forward
Most Intel hosts have a stable tsc, and playing with the offset only
reduces accuracy.  By limiting tsc offset adjustment only to forward updates,
we effectively disable tsc offset adjustment on these hosts.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Harvey Harrison
b8688d51bb KVM: replace remaining __FUNCTION__ occurances
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Joerg Roedel
71c4dfafc0 KVM: detect if VCPU triple faults
In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU.  If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:27 +03:00
Avi Kivity
2d3ad1f40c KVM: Prefix control register accessors with kvm_ to avoid namespace pollution
Names like 'set_cr3()' look dangerously close to affecting the host.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:26 +03:00
Marcelo Tosatti
05da45583d KVM: MMU: large page support
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed.  If the largepage contains
write-protected pages, a large pte is not used.

Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.

Anthony measures a 4% improvement on 4-way kernbench, with NPT.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Marcelo Tosatti
2e53d63acb KVM: MMU: ignore zapped root pagetables
Mark zapped root pagetables as invalid and ignore such pages during lookup.

This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Alexander Graf
847f0ad8cb KVM: Implement dummy values for MSR_PERF_STATUS
Darwin relies on this and ceases to work without.

Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:25 +03:00
Harvey Harrison
14af3f3c56 KVM: sparse fixes for kvm/x86.c
In two case statements, use the ever popular 'i' instead of index:
arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here
arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one
arch/x86/kvm/x86.c:1000:9: originally declared here

Make it static.
arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static?

Drop the return statements.
arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression
arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Harvey Harrison
4866d5e3d5 KVM: SVM: make iopm_base static
Fixes sparse warning as well.
arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static?

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Harvey Harrison
77cd337f22 KVM: x86 emulator: fix sparse warnings in x86_emulate.c
Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed
variable warnings on the internal variable _tmp used by both macros.

Change the outer macro to use __tmp.

Avoids a sparse warning like the following at every call site of __emulate_2op
arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one
arch/x86/kvm/x86_emulate.c:1091:3: originally declared here
[18 more warnings suppressed]

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Amit Shah
f11c3a8d84 KVM: Add stat counter for hypercalls
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Avi Kivity
a5f61300c4 KVM: Use x86's segment descriptor struct instead of private definition
The x86 desc_struct unification allows us to remove segment_descriptor.h.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:24 +03:00
Avi Kivity
a988b910ef KVM: Add API for determining the number of supported memory slots
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Avi Kivity
f725230af9 KVM: Add API to retrieve the number of supported vcpus per vm
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Harvey Harrison
7a95727567 KVM: x86 emulator: make register_address_increment and JMP_REL static inlines
Change jmp_rel() to a function as well.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:23 +03:00
Harvey Harrison
e4706772ea KVM: x86 emulator: make register_address, address_mask static inlines
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Harvey Harrison
ddcb2885e2 KVM: x86 emulator: add ad_mask static inline
Replaces open-coded mask calculation in macros.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa
790c73f628 x86: KVM guest: paravirtualized clocksource
This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

[randy dunlap: add missing include]
[randy dunlap: disallow on Voyager or Visual WS]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Glauber de Oliveira Costa
18068523d3 KVM: paravirtualized clocksource: host part
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.

The area is binary compatible with xen, as we use the same shadow_info
structure.

[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:22 +03:00
Joerg Roedel
24e09cbf48 KVM: SVM: enable LBR virtualization
This patch implements the Last Branch Record Virtualization (LBRV) feature of
the AMD Barcelona and Phenom processors into the kvm-amd module. It will only
be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So
there is no increased world switch overhead when the guest doesn't use these
MSRs.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel
f65c229c3e KVM: SVM: allocate the MSR permission map per VCPU
This patch changes the kvm-amd module to allocate the SVM MSR permission map
per VCPU instead of a global map for all VCPUs. With this we have more
flexibility allowing specific guests to access virtualized MSRs. This is
required for LBR virtualization.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel
e6101a96c9 KVM: SVM: let init_vmcb() take struct vcpu_svm as parameter
Change the parameter of the init_vmcb() function in the kvm-amd module from
struct vmcb to struct vcpu_svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Ryan Harper
2e11384c2c KVM: VMX: fix typo in VMX header define
Looking at Intel Volume 3b, page 148, table 20-11 and noticed
that the field name is 'Deliver' not 'Deliever'.  Attached patch changes
the define name and its user in vmx.c

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel
709ddebf81 KVM: SVM: add support for Nested Paging
This patch contains the SVM architecture dependent changes for KVM to enable
support for the Nested Paging feature of AMD Barcelona and Phenom processors.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:21 +03:00
Joerg Roedel
fb72d1674d KVM: MMU: add TDP support to the KVM MMU
This patch contains the changes to the KVM MMU necessary for support of the
Nested Paging feature in AMD Barcelona and Phenom Processors.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel
cc4b6871e7 KVM: export the load_pdptrs() function to modules
The load_pdptrs() function is required in the SVM module for NPT support.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel
4d9976bbdc KVM: MMU: make the __nonpaging_map function generic
The mapping function for the nonpaging case in the softmmu does basically the
same as required for Nested Paging. Make this function generic so it can be
used for both.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:20 +03:00
Joerg Roedel
1855267210 KVM: export information about NPT to generic x86 code
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel
6c7dac72d5 KVM: SVM: add module parameter to disable Nested Paging
To disable the use of the Nested Paging feature even if it is available in
hardware this patch adds a module parameter. Nested Paging can be disabled by
passing npt=0 to the kvm_amd module.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel
e3da3acdb3 KVM: SVM: add detection of Nested Paging feature
Let SVM detect if the Nested Paging feature is available on the hardware.
Disable it to keep this patch series bisectable.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel
33bd6a0b3e KVM: SVM: move feature detection to hardware setup code
By moving the SVM feature detection from the each_cpu code to the hardware
setup code it runs only once. As an additional advance the feature check is now
available earlier in the module setup process.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel
9457a712a2 KVM: allow access to EFER in 32bit KVM
This patch makes the EFER register accessible on a 32bit KVM host. This is
necessary to boot 32 bit PAE guests under SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:19 +03:00
Joerg Roedel
9f62e19a11 KVM: VMX: unifdef the EFER specific code
To allow access to the EFER register in 32bit KVM the EFER specific code has to
be exported to the x86 generic code. This patch does this in a backwards
compatible manner.

[avi: add check for EFER-less hosts]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Joerg Roedel
50a37eb4e0 KVM: align valid EFER bits with the features of the host system
This patch aligns the bits the guest can set in the EFER register with the
features in the host processor. Currently it lets EFER.NX disabled if the
processor does not support it and enables EFER.LME and EFER.LMA only for KVM on
64 bit hosts.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Joerg Roedel
f2b4b7ddf6 KVM: make EFER_RESERVED_BITS configurable for architecture code
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:18 +03:00
Sheng Yang
2384d2b326 KVM: VMX: Enable Virtual Processor Identification (VPID)
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.

[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity
d196e34336 KVM: MMU: Decouple mmio from shadow page tables
Currently an mmio guest pte is encoded in the shadow pagetable as a
not-present trapping pte, with the SHADOW_IO_MARK bit set.  However
nothing is ever done with this information, so maintaining it is a
useless complication.

This patch moves the check for mmio to before shadow ptes are instantiated,
so the shadow code is never invoked for ptes that reference mmio.  The code
is simpler, and with future work, can be made to handle mmio concurrently.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:17 +03:00
Avi Kivity
1d6ad2073e KVM: x86 emulator: group decoding for group 1 instructions
Opcodes 0x80-0x83

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:16 +03:00
Avi Kivity
d95058a1a7 KVM: x86 emulator: add group 7 decoding
This adds group decoding for opcode 0x0f 0x01 (group 7).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity
fd60754e4f KVM: x86 emulator: Group decoding for groups 4 and 5
Add group decoding support for opcode 0xfe (group 4) and 0xff (group 5).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:15 +03:00
Avi Kivity
7d858a19ef KVM: x86 emulator: Group decoding for group 3
This adds group decoding support for opcodes 0xf6, 0xf7 (group 3).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity
43bb19cd33 KVM: x86 emulator: group decoding for group 1A
This adds group decode support for opcode 0x8f.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Avi Kivity
e09d082c03 KVM: x86 emulator: add support for group decoding
Certain x86 instructions use bits 3:5 of the byte following the opcode as an
opcode extension, with the decode sometimes depending on bits 6:7 as well.
Add support for this in the main decoding table rather than an ad-hock
adaptation per opcode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie
1ae0a13def KVM: MMU: Simplify hash table indexing
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:14 +03:00
Dong, Eddie
489f1d6526 KVM: MMU: Update shadow ptes on partial guest pte writes
A guest partial guest pte write will leave shadow_trap_nonpresent_pte
in spte, which generates a vmexit at the next guest access through that pte.

This patch improves this by reading the full guest pte in advance and thus
being able to update the spte and eliminate the vmexit.

This helps pae guests which use two 32-bit writes to set a single 64-bit pte.

[truncation fix by Eric]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 11:53:13 +03:00
David S. Miller
7cf069955f sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c
The structure has to be 8-byte aligned in size, so
this macro is just noise.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27 00:25:30 -07:00
David S. Miller
5da496e4b9 sparc64: Kill unused local ISA bus layer.
No more drivers use this, and therefore it can die.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:23 -07:00
David S. Miller
dc8ca2a111 sparc64: Do not ignore 'pmu' device ranges.
I must have disabled this due to other bugs which were fixed over
time.  And this is needed in order for child devices of "pmu"
to get proper resource values.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:20 -07:00
David S. Miller
09337f501e sparc64: Kill CONFIG_SPARC32_COMPAT
It's completely superfluous, CONFIG_COMPAT is sufficient.

What this used to be is an umbrella for enabling code shared
by all 32-bit compat binary support types.  But with the
removal of SunOS and Solaris support, the only one left is
Linux 32-bit ELF.

Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:19 -07:00
David S. Miller
05d515ef3d sparc64: Cleanups and corrections for arch/sparc64/Kconfig
Refer to chip as "SPARC" throughout.

Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific
chips such like UltraSPARC, as appropriate.

Remove non-sense help text referring to things that will never appear
on a SPARC system, such as EISA busses etc.

Use "help" instead of "--help--"

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:17 -07:00
David S. Miller
227c331178 sparc64: Fix wedged irq regression.
Kernel bugzilla 10273

As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.

What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:

	if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
		return;

which is correct, however it means that special care is needed
in the ->enable() method.

Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.

Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.

The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-26 21:41:15 -07:00
Peter Zijlstra
7f424a8b08 fix idle (arch, acpi and apm) and lockdep
OK, so 25-mm1 gave a lockdep error which made me look into this.

The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561

The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.

So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:

  idle routines are entered with IRQs disabled
  idle routines will exit with IRQs enabled

Nearly all already did this in one form or another.

Merge the 32 and 64 bit bits so they no longer have different bugs.

As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-27 00:01:45 +02:00
Linus Torvalds
c3bf9bc243 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
  x86_64/mm: check and print vmemmap allocation continuous
  x86_64: fix setup_node_bootmem to support big mem excluding with memmap
  x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
  mm: allow reserve_bootmem() cross nodes
  mm: offset align in alloc_bootmem()
  mm: fix alloc_bootmem_core to use fast searching for all nodes
  mm: make mem_map allocation continuous
2008-04-26 14:04:32 -07:00
Yinghai Lu
c2b91e2eec x86_64/mm: check and print vmemmap allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.

on 256G 8 sockets system will get
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
 [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
 [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
 [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
 [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
 [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
 [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
 [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
 [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
 [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
 [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
 [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
 [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
 [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
 [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7

instead of a very long print out...

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 22:51:09 +02:00
Yinghai Lu
1a27fc0a42 x86_64: fix setup_node_bootmem to support big mem excluding with memmap
typical case: four sockets system, every node has 4g ram, and we are using:

	memmap=10g$4g

to mask out memory on node1 and node2

when numa is enabled, early_node_mem is used to get node_data and node_bootmap.

if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.

so check it and print out some info about it.

need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.

depends on "mm: make reserve_bootmem can crossed the nodes".

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Yinghai Lu
8b3cd09ed2 x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
"mm: make reserve_bootmem can crossed the nodes" provides new
reserve_bootmem(), let reserve_bootmem_generic() use that.

reserve_bootmem_generic() is used to reserve initramdisk, so this way
we can make sure even when bootloader or kexec load ranges cross the
node memory boundaries, reserve_bootmem still works.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 22:51:08 +02:00
Linus Torvalds
9b79ed952b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
  x86, bitops: select the generic bitmap search functions
  x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
  x86: finalize bitops unification
  x86, UML: remove x86-specific implementations of find_first_bit
  x86: optimize find_first_bit for small bitmaps
  x86: switch 64-bit to generic find_first_bit
  x86: generic versions of find_first_(zero_)bit, convert i386
  bitops: use __fls for fls64 on 64-bit archs
  generic: implement __fls on all 64-bit archs
  generic: introduce a generic __fls implementation
  x86: merge the simple bitops and move them to bitops.h
  x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
  x86, uml: fix uml with generic find_next_bit for x86
  x86: change x86 to use generic find_next_bit
  uml: Kconfig cleanup
  uml: fix build error
2008-04-26 13:46:11 -07:00
Linus Torvalds
539a5fe226 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootparam:
  x86, boot: Document for linked list of struct setup_data
  x86, boot: export linked list of struct setup_data via debugfs
  x86, boot: add linked list of struct setup_data
  x86, boot: add free_early to early reservation machanism
2008-04-26 13:29:41 -07:00
Huang, Ying
c14b2adf19 x86, boot: export linked list of struct setup_data via debugfs
Export linked list of struct setup_data via debugfs.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Huang, Ying
8b664aa66e x86, boot: add linked list of struct setup_data
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Huang, Ying
50eae2a7c9 x86, boot: add free_early to early reservation machanism
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 21:34:42 +02:00
Venki Pallipadi
0124cecfc8 x86, PAT: disable /dev/mem mmap RAM with PAT
disable /dev/mem mmap of RAM with PAT. It makes things safer and
eliminates aliasing. A future improvement would be to avoid the
range_is_allowed duplication.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 21:28:29 +02:00
Alexander van Heukelum
19870def58 x86, bitops: select the generic bitmap search functions
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.

I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:17 +02:00
Alexander van Heukelum
5245698f66 x86, UML: remove x86-specific implementations of find_first_bit
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.

also update UML.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:17 +02:00
Alexander van Heukelum
2aba6925fd x86: switch 64-bit to generic find_first_bit
Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
77b9bd9c49 x86: generic versions of find_first_(zero_)bit, convert i386
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.

The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.

This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
12d9c8420b x86: merge the simple bitops and move them to bitops.h
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.

For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.

Improved comments for ffs, ffz, fls and variations.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
60b6783a04 x86, uml: fix uml with generic find_next_bit for x86
Update UML to use the generic bits too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Alexander van Heukelum
6fd92b63d0 x86: change x86 to use generic find_next_bit
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.

Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.

		Athlon		Xeon		Opteron 32/64bit
x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s

If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.

Using the generic version does give a slightly bigger kernel, though.

defconfig:	   text    data     bss     dec     hex filename
x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Ingo Molnar
18e413f719 uml: Kconfig cleanup
pointed out by Linus: arch/um/Kconfig.x86_64 should
include arch/x86/Kconfig.cpu instead of defining those
symbols itself.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Ingo Molnar
297e1b256b uml: fix build error
fix:

 arch/um/os-Linux/helper.c: In function 'run_helper':
 arch/um/os-Linux/helper.c:73: error: 'PATH_MAX' undeclared (first use in this function)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 19:21:16 +02:00
Linus Torvalds
4a27214d7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86 PAT: decouple from nonpromisc devmem
  x86 PAT: tone down debugging messages
2008-04-26 09:50:58 -07:00
Linus Torvalds
d485cb9aa2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-optimized-inlining:
  generic: make optimized inlining arch-opt-in
  x86: add optimized inlining
2008-04-26 09:48:52 -07:00
Linus Torvalds
b82287587e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-misc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-misc: (28 commits)
  x86: section mismatch fixes, #3
  x86: section mismatch fixes, #2
  x86: pgtable_32.h - prototype and section mismatch fixes
  x86: unlock_ExtINT_logic() - fix section mismatch warnings
  x86: uniq_ioapic_id - fix section mismatch warning
  x86: trampoline_32.S - switch to .cpuinit.data
  x86: use get_bios_ebda()
  x86: remove duplicate get_bios_ebda() from rio.h
  x86: get_bios_ebda() requires asm/io.h
  x86: use cpumask function for present, possible, and online cpus
  x86: cleanup div_sc() usage
  x86: cleanup clocksource_hz2mult usage
  x86: remove unnecessary memset and NULL check after alloc_bootmem()
  x86: use bitmap library for pin_programmed
  x86: use MP_intsrc_info()
  x86: use BUILD_BUG_ON() for the size of struct intel_mp_floating
  x86_64 ia32 ptrace: convert to compat_arch_ptrace
  x86_64 ia32 ptrace: use compat_ptrace_request for siginfo
  x86 signals: lift set_fs
  x86 signals: lift flags diddling code
  ...
2008-04-26 09:44:32 -07:00
Ingo Molnar
2a8a2719be x86 PAT: decouple from nonpromisc devmem
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-26 09:31:17 -07:00
Ingo Molnar
765c68bd54 generic: make optimized inlining arch-opt-in
Stephen Rothwell reported that linux-next did not build on powerpc64.

make optimized inlining dependent on architecture opt-in.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:44:55 +02:00
Ingo Molnar
60a3cdd063 x86: add optimized inlining
add CONFIG_OPTIMIZE_INLINING=y.

allow gcc to optimize the kernel image's size by uninlining
functions that have been marked 'inline'. Previously gcc was
forced by Linux to always-inline these functions via a gcc
attribute:

 #define inline	inline __attribute__((always_inline))

Especially when the user has already selected
CONFIG_OPTIMIZE_FOR_SIZE=y this can make a huge difference in
kernel image size (using a standard Fedora .config):

   text    data     bss     dec           hex filename
   5613924  562708 3854336 10030968    990f78 vmlinux.before
   5486689  562708 3854336  9903733    971e75 vmlinux.after

that's a 2.3% text size reduction (!).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:44:55 +02:00
Jacek Luczak
5afca33a43 x86: section mismatch fixes, #3
This patch fixes section mismatch warnings in unlock_ExtINT_logic().

WARNING: arch/x86/kernel/built-in.o(.text+0x14a92): Section mismatch in reference from the function unlock_ExtINT_logic()
to the function .init.text:find_isa_irq_pin()
The function unlock_ExtINT_logic() references
the function __init find_isa_irq_pin().
This is often because unlock_ExtINT_logic lacks a __init
annotation or the annotation of find_isa_irq_pin is wrong.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:48 +02:00
Jacek Luczak
28acf285de x86: unlock_ExtINT_logic() - fix section mismatch warnings
Fix following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x12cc9): Section mismatch in reference from the function unlock_ExtINT_logic()

unlock_ExtINT_logic() is only used by __init check_timer(). Annotate unlock_ExtINT_logic() witch __init.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Jacek Luczak
991074fd35 x86: uniq_ioapic_id - fix section mismatch warning
Fix folowing warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x10799): Section mismatch in reference from the function uniq_ioapic_id()

uniq_ioapic_id() is only used by __init mp_register_ioapic(). Annotate uniq_ioapic_id() with __init.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Jacek Luczak
4c01f23bdb x86: trampoline_32.S - switch to .cpuinit.data
This patch fixes section mismatch warnings of __cpuinit
setup_trampoline() on 32-bit host.

Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-26 17:35:47 +02:00
Akinobu Mita
356fa0c6e1 x86: use get_bios_ebda()
Use get_bios_ebda().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
ae5830a6f8 x86: remove duplicate get_bios_ebda() from rio.h
get_bios_ebda() exists in asm/rio.h and asm/bios_ebda.h.
This patch removes the one in asm/rio.h.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
7c04e64a1b x86: use cpumask function for present, possible, and online cpus
cpu_online(), cpu_present(), for_each_possible_cpu(), num_possible_cpus()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
877084fb1c x86: cleanup div_sc() usage
Remove the magic number in the third argment of div_sc().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
d454157b11 x86: cleanup clocksource_hz2mult usage
Remove the magic number in the second argument of clocksource_hz2mult()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
b1fceac2b9 x86: remove unnecessary memset and NULL check after alloc_bootmem()
memset and NULL check after alloc_bootmem() are unnecessary.
Because it returns zeroed memory and it never return NULL.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
a1a33fa315 x86: use bitmap library for pin_programmed
Use bitmap library for pin_programmed rather than reinvent
bitmaps.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
4abc1a0068 x86: use MP_intsrc_info()
Remove duplicate code by using MP_intsrc_info() in mpparse.c

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Akinobu Mita
5d47a271f3 x86: use BUILD_BUG_ON() for the size of struct intel_mp_floating
Use BUILD_BUG_ON() instead of compile-time error technique with
extern non-exsistent function.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath
562b80baff x86_64 ia32 ptrace: convert to compat_arch_ptrace
Now that there are no more special cases in sys32_ptrace, we
can convert to using the generic compat_sys_ptrace entry point.
The sys32_ptrace function gets simpler and becomes compat_arch_ptrace.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath
cdb6990479 x86_64 ia32 ptrace: use compat_ptrace_request for siginfo
This removes the special-case handling for PTRACE_GETSIGINFO
and PTRACE_SETSIGINFO from x86_64's sys32_ptrace.  The generic
compat_ptrace_request code handles these.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath
55928e37b2 x86 signals: lift set_fs
This lifts the set_fs(USER_DS) call for signal handler setup out of the
three places copying the same code into the one place that calls them
all.  There is no change in what it does.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Roland McGrath
8b9c5ff380 x86 signals: lift flags diddling code
This lifts the code diddling the TF and DF bits for signal handler setup
out of the several places copying the same code into the one place that
calls them all.  There is no change in what it does.

I also separated the recently-added DF bit clearing from the TF diddling.
The compiler turns them back into one instruction anyway.  The tossing
in of DF to the same line of code with no new comments was a bit more
arcane than seems wise.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Dmitri Vorobiev
f7f17a67c5 x86: remove NexGen support
It is claimed that NexGen CPUs were never shipped:

   http://lkml.org/lkml/2008/4/20/179

Also, the kernel support for these chips has been broken for
a long time, the code intended to support NexGen thereby being
essentially dead.

As an outcome of the discussion that can be found using the URL
above, this patch removes the NexGen support altogether.

The changes in this patch survived a defconfig build for i386, a
couple of successful randconfig builds, as well as a runtime test,
which consisted in booting a 32-bit x86 box up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:47 +02:00
Dmitri Vorobiev
a2b4bd9c95 x86: array can become static
In arch/x86/kernel/setup_64.c, the standard_io_resources array
is needlessly defined as global. This patch makes this variable
static.

This patch was successfully build-tested using the defconfig
for x86_64. Runtime test was performed by booting a 64-bit x86
box up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Dmitri Vorobiev
f3b14a32db x86: remove unused function amd_init_cpu()
There are no users for the function amd_init_cpu() defined in
arch/x86/kernel/cpu/amd.c. This patch removes this routine.

This patch was build-tested using defconfigs for i386 and x86_64,
and a few randconfig instances. Runtime tests were performed by
booting 32- and 64-bit x86 boxen up to the shell prompt.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich
911f6a7ba2 x86-64: extend MCE CPU quirk handling
At least on my Barcelona, I see MCE log entries after cold boot caused
by BIOS not properly clearing the respective registers. Therefore, this
patch extends the workaround to families 0x10 and 0x11 (the latter just
for completeness, I have nothing to verify this against).
At the same time, provide a way to make these entries visible via the
'mce=bootlog' command line option even on these machines.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich
79bf0e0353 i386: fix signal type for iret exception
.. since it uses ILL_BADSTK (which is meaningless in the context of
SIGSEGV).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich
86d78f6402 x86: fix watchdog ops for CoreDuo
There apparently was an unnoticed conflict between an earlier patch to
this file and mine (d1e084746b), which
I noticed only now. I suppose a change like the one below (untested) is
needed; I didn't get any response on a confirmation request for this from
the submitter of the first patch.

The issue is the writing of the 'checkbit' member at the end of
setup_intel_arch_watchdog(), which my patch made go to intel_arch_wd_ops
rather than wd_ops.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Jan Beulich
5065dbafc2 i386: fix asm constraint in do_IRQ()
Two prior changes resulted in the "ecx" clobber being lost.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 17:35:46 +02:00
Ingo Molnar
8db979bcfe x86 PAT: decouple from nonpromisc devmem
Linus pointed it out that PAT should not depend on NONPROMISC_DEVMEM.

Also make PAT non-default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 16:01:26 +02:00
Ingo Molnar
1ebcc654f0 x86 PAT: tone down debugging messages
Linus reported these excessive debug printouts:

>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0380000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000
>       Overlap at 0xe0300000-0xe0400000

turn that into a pr_debug().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-26 16:01:26 +02:00
Linus Torvalds
b9fa38f75e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (49 commits)
  [POWERPC] Add zImage.iseries to arch/powerpc/boot/.gitignore
  [POWERPC] bootwrapper: fix build error on virtex405-head.S
  [POWERPC] 4xx: Fix 460GT support to not enable FPU
  [POWERPC] 4xx: Add NOR FLASH entries to Canyonlands and Glacier dts
  [POWERPC] Xilinx: of_serial support for Xilinx uart 16550.
  [POWERPC] Xilinx: boot support for Xilinx uart 16550.
  [POWERPC] celleb: Add support for PCI Express
  [POWERPC] celleb: Move miscellaneous files for Beat
  [POWERPC] celleb: Move a file for SPU on Beat
  [POWERPC] celleb: Move files for Beat mmu and iommu
  [POWERPC] celleb: Move files for Beat hvcall interfaces
  [POWERPC] celleb: Move the SCC related code for celleb
  [POWERPC] celleb: Move the files for celleb base support
  [POWERPC] celleb: Consolidate io-workarounds code
  [POWERPC] cell: Generalize io-workarounds code
  [POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries
  [POWERPC] Convert from DBG() to pr_debug() in platforms/pseries/
  [POWERPC] Register udbg console early on pseries LPAR
  [POWERPC] Mark udbg console as CON_ANYTIME, ie. callable early in boot
  [POWERPC] Set udbg_console index to 0
  ...
2008-04-25 12:52:16 -07:00
Linus Torvalds
bf16ae2509 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-pat:
  generic: add ioremap_wc() interface wrapper
  /dev/mem: make promisc the default
  pat: cleanups
  x86: PAT use reserve free memtype in mmap of /dev/mem
  x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
  x86: PAT avoid aliasing in /dev/mem read/write
  devmem: add range_is_allowed() check to mmap of /dev/mem
  x86: introduce /dev/mem restrictions with a config option
2008-04-25 12:48:08 -07:00
Linus Torvalds
4b7227ca32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-xen-next: (52 commits)
  xen: add balloon driver
  xen: allow compilation with non-flat memory
  xen: fold xen_sysexit into xen_iret
  xen: allow set_pte_at on init_mm to be lockless
  xen: disable preemption during tlb flush
  xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
  xen: Add compatibility aliases for frontend drivers
  xen: Module autoprobing support for frontend drivers
  xen blkfront: Delay wait for block devices until after the disk is added
  xen/blkfront: use bdget_disk
  xen: Make xen-blkfront write its protocol ABI to xenstore
  xen: import arch generic part of xencomm
  xen: make grant table arch portable
  xen: replace callers of alloc_vm_area()/free_vm_area() with xen_ prefixed one
  xen: make include/xen/page.h portable moving those definitions under asm dir
  xen: add resend_irq_on_evtchn() definition into events.c
  Xen: make events.c portable for ia64/xen support
  xen: move events.c to drivers/xen for IA64/Xen support
  xen: move features.c from arch/x86/xen/features.c to drivers/xen
  xen: add missing definitions in include/xen/interface/vcpu.h which ia64/xen needs
  ...
2008-04-25 12:32:10 -07:00
Linus Torvalds
5dae61b805 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Wrap SMP IPIs with irq_enter()/irq_exit().
  [SPARC64]: Fix args to 64-bit sys_semctl() via sys_ipc().
2008-04-25 12:29:55 -07:00
Matthew Wilcox
2cfed60cc2 Update .gitignore files
Add some autogenerated files to various .gitignore files

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-25 12:27:32 -07:00
Ingo Molnar
00c6b2d5d7 x86: harden kernel code patching
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Mathieu Desnoyers
b7b66baa8b x86: clean up text_poke()
Clean up the codepath, remove alignment restrictions and do sanity
checking of the end result, to make sure we patched the right site.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Jiri Slaby
8b132ecbcf x86: fix text_poke()
kernel_text_address returns true even for modules which is not wanted
in text_poke. Use core_kernel_text instead.

This is a regression introduced in e587cadd8f
which caused occasionaly crashes after suspend/resume.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Ingo Molnar
70c9f590ff x86: remove set_fixmap() warning
set_fixmap()+clear_fixmap() is safe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
Ingo Molnar
82a355f5a2 x86: make __set_fixmap() non-init
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 19:54:07 +02:00
David S. Miller
2664ef44cf [SPARC64]: Wrap SMP IPIs with irq_enter()/irq_exit().
Otherwise all sorts of bad things can happen, including
spurious softlockup reports.

Other platforms have this same bug, in one form or
another, just don't see the issue because they
don't sleep as long as sparc64 can in NOHZ.

Thanks to some brilliant debugging by Peter Zijlstra.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-25 03:11:37 -07:00
David S. Miller
020cfb05f2 [SPARC64]: Fix args to 64-bit sys_semctl() via sys_ipc().
Second and third arguments were swapped for whatever reason.

Reported by Tom Callaway.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-25 02:12:05 -07:00
Kumar Gala
f360bf0015 [POWERPC] Add zImage.iseries to arch/powerpc/boot/.gitignore
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-25 09:48:13 +10:00
Jeremy Fitzhardinge
af7ae3b9c4 xen: allow compilation with non-flat memory
There's no real reason we can't support sparsemem/discontigmem, so do so.
This is mostly useful to support hotplug memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
b77797fb2b xen: fold xen_sysexit into xen_iret
xen_sysexit and xen_iret were doing essentially the same thing.  Rather
than having a separate implementation for xen_sysexit, we can just strip
the stack back to an iret frame and jump into xen_iret.  This removes
a lot of code and complexity - specifically, another critical region.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
2bd50036b5 xen: allow set_pte_at on init_mm to be lockless
The usual pagetable locking protocol doesn't seem to apply to updates
to init_mm, so don't rely on preemption being disabled in xen_set_pte_at
on init_mm.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Jeremy Fitzhardinge
41e332b2a2 xen: disable preemption during tlb flush
Various places in the kernel flush the tlb even though preemption doens't
guarantee the tlb flush is happening on any particular CPU.  In many cases
this doesn't seem to matter, so don't make a fuss about it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:33 +02:00
Isaku Yamahata
8d3d2106c1 xen: make grant table arch portable
split out x86 specific part from grant-table.c and
allow ia64/xen specific initialization.
ia64/xen grant table is based on pseudo physical address
(guest physical address) unlike x86/xen. On ia64 init_mm
doesn't map identity straight mapped area.
ia64/xen specific grant table initialization is necessary.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata
e04d0d0767 xen: move events.c to drivers/xen for IA64/Xen support
move arch/x86/xen/events.c undedr drivers/xen to share codes
with x86 and ia64. And minor adjustment to compile.
ia64/xen also uses events.c

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Isaku Yamahata
af711cda4f xen: move features.c from arch/x86/xen/features.c to drivers/xen
ia64/xen also uses it too. Move it into common place so that
ia64/xen can share the code.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
0f2c876952 xen: jump to iret fixup
Use jmp rather than call for the iret fixup, so its consistent with
the sysexit fixup, and it simplifies the stack (which is already
complex).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
dbe9e994c9 xen: no need for domU to worry about MCE/MCA
Mask MCE/MCA out of cpu caps.  Its harmless to leave them there, but
it does prevent the kernel from starting an unnecessary thread.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
229664bee6 xen: short-cut for recursive event handling
If an event comes in while events are currently being processed, then
just increment the counter and have the outer event loop reprocess the
pending events.  This prevents unbounded recursion on heavy event
loads (of course massive event storms will cause infinite loops).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
ee8fa1c67f xen: make sure retriggered events are set pending
retrigger_dynirq() was incomplete, and didn't properly set the event
to be pending again.  It doesn't seem to actually get used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
ee523ca1e4 xen: implement a debug-interrupt handler
Xen supports the notion of a debug interrupt which can be triggered
from the console.  For now this is implemented to show pending events,
masks and each CPU's pending event set.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:32 +02:00
Jeremy Fitzhardinge
e2a81baf66 xen: support sysenter/sysexit if hypervisor does
64-bit Xen supports sysenter for 32-bit guests, so support its
use.  (sysenter is faster than int $0x80 in 32-on-64.)

sysexit is still not supported, so we fake it up using iret.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
85958b465c x86: unify pgd ctor/dtor
All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
68db065c84 x86: unify KERNEL_PGD_PTRS
Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.

There are a couple of follow-on changes from this:
 - KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS.  The
   definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
   since it can have two different user address-space configurations.
   I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
   for all of 32/32, 32/64 and 64/64 process configurations.

 - USER_PTRS_PER_PGD was also defined and was being used for similar
   purposes.  Converting its users to KERNEL_PGD_BOUNDARY left it
   completely unused, and so I removed it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
90e9f53662 xen: make sure iret faults are trapped
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
947a69c90c xen: unify pte operations
We can fold the essentially common pte functions together now.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
430442e38e xen: make use of pte_t union
pte_t always contains a "pte" field for the whole pte value, so make
use of it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
abf33038ff xen: use appropriate pte types
Convert Xen pagetable handling to use appropriate *val_t types.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
c20311e165 x86/pgtable.h: demacro ptep_clear_flush_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
f9fbf1a36a x86/pgtable.h: demacro ptep_test_and_clear_young
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
ee5aa8d3ba x86/pgtable.h: demacro ptep_set_access_flags
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
2761fa0920 x86: add pud_alloc for 4-level pagetables
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
6944a9c894 x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
394158559d x86: move all the pgd_list handling to one place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:31 +02:00
Jeremy Fitzhardinge
5a5f8f4224 x86: move pgalloc pud and pgd operations into common place
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
170fdff705 x86: move pmd functions into common asm/pgalloc.h
Common definitions for 3-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
397f687ab7 x86: move pte functions into common asm/pgalloc.h
Common definitions for 2-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
1d262d3a49 x86: put paravirt stubs into common asm/pgalloc.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Ingo Molnar
1ec1fe73df x86: xen unify x86 add common mm pgtable c fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
4f76cd3822 x86: add common mm/pgtable.c
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Jeremy Fitzhardinge
79bf6d66ab x86: convert pgalloc_64.h from macros to inlines
Convert asm-x86/pgalloc_64.h from macros into functions (#include hell
prevents __*_free_tlb from being inline, but they're probably a bit
big to inline anyway).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:57:30 +02:00
Linus Torvalds
b69d3987f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  "make namespacecheck" fixes
  x86: fix compilation error in VisWS
  x86: voyager fix
  x86: Drop duplicate from setup.c
  intel-iommu.c: dma ops fix
2008-04-24 14:41:20 -07:00
Ingo Molnar
1f56cf1c58 /dev/mem: make promisc the default
default to the old semantics.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Ingo Molnar
28eb559b5b pat: cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
e7f260a276 x86: PAT use reserve free memtype in mmap of /dev/mem
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
f0970c13b6 x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.

x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
e045fb2a98 x86: PAT avoid aliasing in /dev/mem read/write
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Arjan van de Ven
ae531c26c5 x86: introduce /dev/mem restrictions with a config option
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.

The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)

People who want to use /dev/mem for kernel debugging can enable the config
option.

The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:40:47 +02:00