Commit Graph

1549 Commits

Author SHA1 Message Date
Len Brown
003d6a38ce Merge branch 'sfi-base' into release
Conflicts:
	drivers/acpi/power.c

Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-19 00:37:13 -04:00
Linus Torvalds
78f28b7c55 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
  x86: Move get/set_wallclock to x86_platform_ops
  x86: platform: Fix section annotations
  x86: apic namespace cleanup
  x86: Distangle ioapic and i8259
  x86: Add Moorestown early detection
  x86: Add hardware_subarch ID for Moorestown
  x86: Add early platform detection
  x86: Move tsc_init to late_time_init
  x86: Move tsc_calibration to x86_init_ops
  x86: Replace the now identical time_32/64.c by time.c
  x86: time_32/64.c unify profile_pc
  x86: Move calibrate_cpu to tsc.c
  x86: Make timer setup and global variables the same in time_32/64.c
  x86: Remove mca bus ifdef from timer interrupt
  x86: Simplify timer_ack magic in time_32.c
  x86: Prepare unification of time_32/64.c
  x86: Remove do_timer hook
  x86: Add timer_init to x86_init_ops
  x86: Move percpu clockevents setup to x86_init_ops
  x86: Move xen_post_allocator_init into xen_pagetable_setup_done
  ...

Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-09-18 14:05:47 -07:00
Linus Torvalds
a03fdb7612 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
  time: Prevent 32 bit overflow with set_normalized_timespec()
  clocksource: Delay clocksource down rating to late boot
  clocksource: clocksource_select must be called with mutex locked
  clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
  timers: Drop a function prototype
  clocksource: Resolve cpu hotplug dead lock with TSC unstable
  timer.c: Fix S/390 comments
  timekeeping: Fix invalid getboottime() value
  timekeeping: Fix up read_persistent_clock() breakage on sh
  timekeeping: Increase granularity of read_persistent_clock(), build fix
  time: Introduce CLOCK_REALTIME_COARSE
  x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
  clocksource: Avoid clocksource watchdog circular locking dependency
  clocksource: Protect the watchdog rating changes with clocksource_mutex
  clocksource: Call clocksource_change_rating() outside of watchdog_lock
  timekeeping: Introduce read_boot_clock
  timekeeping: Increase granularity of read_persistent_clock()
  timekeeping: Update clocksource with stop_machine
  timekeeping: Add timekeeper read_clock helper functions
  timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
  ...

Fix trivial conflict due to MIPS lemote -> loongson renaming.
2009-09-18 09:15:24 -07:00
David Rientjes
7715a1e887 x86/PCI: default pcibus cpumask to all cpus if it lacks affinity
The early initialization of the pci bus to node mapping leaves all busses
with a node id of -1 if it lacks memory affinity.  Thus, cpumask_of_pcibus
must return all online cpus for such busses.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-18 08:51:10 -07:00
Jack Steiner
8dc579e868 x86: SGI UV: Add volatile semantics to macros that access chipset registers
Add volatile-semantics to the SGI UV read/write macros that are
used to access chipset memory mapped registers. No direct
references to volatile are made. Instead the readq/writeq macros
are used.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: dwalker@fifo99.com
Cc: cfriesen@nortel.com
LKML-Reference: <20090910143149.GA14273@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18 14:05:32 +02:00
Jack Steiner
d2374aecda x86: SGI UV: Fix IPI macros
The UV BIOS has changed the way interrupt remapping is being done.
This affects the id used for sending IPIs. The upper id bits no
longer need to be masked off.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090909154104.GA25083@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18 14:04:25 +02:00
Linus Torvalds
df58bee21e Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
  x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c
  x86, mce: CE in last bank prevents panic by unknown MCE
  x86, mce: Fake panic support for MCE testing
  x86, mce: Move debugfs mce dir creating to mce.c
  x86, mce: Support specifying raise mode for software MCE injection
  x86, mce: Support specifying context for software mce injection
  x86, mce: fix reporting of Thermal Monitoring mechanism enabled
  x86, mce: remove never executed code
  x86, mce: add missing __cpuinit tags
  x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCE
  x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
  x86: mce: Lower maximum number of banks to architecture limit
  x86: mce: macros to compute banks MSRs
  x86: mce: Move per bank data in a single datastructure
  x86: mce: Move code in mce.c
  x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
  x86: mce: Remove old i386 machine check code
  x86: mce: Update X86_MCE description in x86/Kconfig
  x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE
  x86, mce: use atomic_inc_return() instead of add by 1
  ...

Manually fixed up trivial conflicts:
	Documentation/feature-removal-schedule.txt
	arch/x86/kernel/cpu/mcheck/mce.c
2009-09-17 21:07:08 -07:00
Linus Torvalds
dcbf77b9e8 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
  sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE
  sched: Stop buddies from hogging the system
  sched: Add new wakeup preemption mode: WAKEUP_RUNNING
  sched: Fix TASK_WAKING & loadaverage breakage
  sched: Disable wakeup balancing
  sched: Rename flags to wake_flags
  sched: Clean up the load_idx selection in select_task_rq_fair
  sched: Optimize cgroup vs wakeup a bit
  sched: x86: Name old_perf in a unique way
  sched: Implement a gentler fair-sleepers feature
  sched: Add SD_PREFER_LOCAL
  sched: Add a few SYNC hint knobs to play with
  sched: Fix sync wakeups again
  sched: Add WF_FORK
  sched: Rename sync arguments
  sched: Rename select_task_rq() argument
  sched: Feature to disable APERF/MPERF cpu_power
  x86: sched: Provide arch implementations using aperf/mperf
  x86: Add generic aperf/mperf code
  x86: Move APERF/MPERF into a X86_FEATURE
  ...

Fix up trivial conflict in arch/x86/include/asm/processor.h due to
nearby addition of amd_get_nb_id() declaration from the EDAC merge.
2009-09-17 21:00:02 -07:00
Linus Torvalds
ca043a66ae Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pat: don't use rb-tree based lookup in reserve_memtype()
  x86: Increase MIN_GAP to include randomized stack
2009-09-17 20:58:11 -07:00
Linus Torvalds
1218259b2d Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
  vsnprintf: remove duplicate comment of vsnprintf
  softirq: add BLOCK_IOPOLL to softirq_to_name
  oprofile: fix oprofile regression: select RING_BUFFER_ALLOW_SWAP
  tracing: switch function prints from %pf to %ps
  vsprintf: add %ps that is the same as %pS but is like %pf
  tracing: Fix minor bugs for __unregister_ftrace_function_probe
  tracing: remove notrace from __kprobes annotation
  tracing: optimize global_trace_clock cachelines
  MAINTAINERS: Update tracing tree details
  ftrace: document function and function graph implementation
  tracing: make testing syscall events a separate configuration
  tracing: remove some unused macros
  ftrace: add compile-time check on F_printk()
  tracing: fix F_printk() typos
  tracing: have TRACE_EVENT macro use __flags to not shadow parameter
  tracing: add static to generated TRACE_EVENT functions
  ring-buffer: typecast cmpxchg to fix PowerPC warning
  tracing: add filter event logic to special, mmiotrace and boot tracers
  tracing: remove trace_event_types.h
  tracing: use the new trace_entries.h to create format files
  ...
2009-09-17 20:56:37 -07:00
H. Peter Anvin
3bb045f1e2 Merge branch 'x86/pat' into x86/urgent
Merge reason:

Suresh Siddha (1):
      x86, pat: don't use rb-tree based lookup in reserve_memtype()

... requires previous x86/pat commits already pushed to Linus.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-17 14:40:49 -07:00
Ingo Molnar
45bd00d31d Merge branch 'linus' into tracing/core
Merge reason: Pick up kernel/softirq.c update for dependent fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-17 20:53:10 +02:00
Linus Torvalds
de55a8958f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: check NB MCE bank enable on the current node properly
  amd64_edac: Rewrite unganged mode code of f10_early_channel_count
  amd64_edac: cleanup amd64_check_ecc_enabled
  x86, EDAC: Provide function to return NodeId of a CPU
  amd64_edac: build driver only on AMD hardware
2009-09-17 09:55:52 -07:00
Linus Torvalds
4406c56d0a Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
  PCI hotplug: clean up acpi_run_hpp()
  PCI hotplug: acpiphp: use generic pci_configure_slot()
  PCI hotplug: shpchp: use generic pci_configure_slot()
  PCI hotplug: pciehp: use generic pci_configure_slot()
  PCI hotplug: add pci_configure_slot()
  PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
  PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
  PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
  PCI: Clear saved_state after the state has been restored
  PCI PM: Return error codes from pci_pm_resume()
  PCI: use dev_printk in quirk messages
  PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
  PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
  PCI Hotplug: acpiphp: find bridges the easy way
  PCI: pcie portdrv: remove unused variable
  PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
  ACPI PM: Replace wakeup.prepared with reference counter
  PCI PM: Introduce device flag wakeup_prepared
  PCI / ACPI PM: Rework some debug messages
  PCI PM: Simplify PCI wake-up code
  ...

Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases.  The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
2009-09-16 07:49:54 -07:00
Peter Zijlstra
182a85f8a1 sched: Disable wakeup balancing
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.

On a dual socket, quad core, dual thread nehalem system:

sysbench (--num_threads=16):

 SD_BALANCE_WAKE-: 13982 tx/s
 SD_BALANCE_WAKE+: 15688 tx/s

kbuild (-j16):

 SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
 SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )

(same within noise)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16 16:44:33 +02:00
Feng Tang
7bd867dfb4 x86: Move get/set_wallclock to x86_platform_ops
get/set_wallclock() have already a set of platform dependent
implementations (default, EFI, paravirt). MRST will add another
variant.

Moving them to platform ops simplifies the existing code and minimizes
the effort to integrate new variants.

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-09-16 14:34:50 +02:00
Andreas Herrmann
6a8126911a x86, EDAC: Provide function to return NodeId of a CPU
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
2009-09-16 11:33:40 +02:00
Linus Torvalds
ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Linus Torvalds
227423904c Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pat: Fix cacheflush address in change_page_attr_set_clr()
  mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
  x86: Fix earlyprintk=dbgp for machines without NX
  x86, pat: Sanity check remap_pfn_range for RAM region
  x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
  x86, pat: Add lookup_memtype to get the current memtype of a paddr
  x86, pat: Use page flags to track memtypes of RAM pages
  x86, pat: Generalize the use of page flag PG_uncached
  x86, pat: Add rbtree to do quick lookup in memtype tracking
  x86, pat: Add PAT reserve free to io_mapping* APIs
  x86, pat: New i/f for driver to request memtype for IO regions
  x86, pat: ioremap to follow same PAT restrictions as other PAT users
  x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
  x86, mtrr: make mtrr_aps_delayed_init static bool
  x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
  generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
  x86: Fix an incorrect argument of reserve_bootmem()
  x86: Fix system crash when loading with "reservetop" parameter
2009-09-15 09:19:38 -07:00
Linus Torvalds
1aaf2e5913 Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, intel_txt: clean up the impact on generic code, unbreak non-x86
  x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
  x86, intel_txt: Fix typos in Kconfig help
  x86, intel_txt: Factor out the code for S3 setup
  x86, intel_txt: tboot.c needs <asm/fixmap.h>
  intel_txt: Force IOMMU on for Intel TXT launch
  x86, intel_txt: Intel TXT Sx shutdown support
  x86, intel_txt: Intel TXT reboot/halt shutdown support
  x86, intel_txt: Intel TXT boot support
2009-09-15 09:19:20 -07:00
Linus Torvalds
66a4fe0cb8 Merge branch 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp/intel: remove restore in resume
  agp: fix uninorth build
  intel-agp: Set dma mask for i915
  agp: kill phys_to_gart() and gart_to_phys()
  intel-agp: fix sglist allocation to avoid vmalloc()
  intel-agp: Move repeated sglist free into separate function
  agp: Switch agp_{un,}map_page() to take struct page * argument
  agp: tidy up handling of scratch pages w.r.t. DMA API
  intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
  agp: Add generic support for graphics dma remapping
  agp: Switch mask_memory() method to take address argument again, not page
2009-09-15 09:18:07 -07:00
Peter Zijlstra
5cbc19a983 x86: Add generic aperf/mperf code
Move some of the aperf/mperf code out from the cpufreq driver
thingy so that other people can enjoy it too.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:51:26 +02:00
Peter Zijlstra
a8303aaf2b x86: Move APERF/MPERF into a X86_FEATURE
Move the APERFMPERF capacility into a X86_FEATURE flag so that it
can be used outside of the acpi cpufreq driver.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:51:25 +02:00
Peter Zijlstra
b8a543ea5a sched: Reduce forkexec_idx
If we're looking to place a new task, we might as well find the
idlest position _now_, not 1 tick ago.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:51:23 +02:00
Mike Galbraith
0ec9fab3d1 sched: Improve latencies and throughput
Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:

 NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 252.82 fps, 22096.60 kb/s
 encoded 600 frames, 250.69 fps, 22096.60 kb/s
 encoded 600 frames, 245.76 fps, 22096.60 kb/s

 NO_NEXT_BUDDY LB_BIAS
 encoded 600 frames, 344.44 fps, 22096.60 kb/s
 encoded 600 frames, 346.66 fps, 22096.60 kb/s
 encoded 600 frames, 352.59 fps, 22096.60 kb/s

 NO_NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 425.75 fps, 22096.60 kb/s
 encoded 600 frames, 425.45 fps, 22096.60 kb/s
 encoded 600 frames, 422.49 fps, 22096.60 kb/s

Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.

Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)

This change improves kbuild-peak parallelism as well.

Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:51:16 +02:00
Peter Zijlstra
78e7ed53c9 sched: Tweak wake_idx
When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:01:07 +02:00
Peter Zijlstra
c88d591089 sched: Merge select_task_rq_fair() and sched_balance_self()
The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.

To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().

Modify sched_balance_self() to:

  - update_shares() when walking up the domain tree,
    (it only called it for the top domain, but it should
     have done this anyway), which allows us to remove
    this ugly bit from try_to_wake_up().

  - do wake_affine() on the smallest domain that contains
    both this (the waking) and the prev (the wakee) cpu for
    WAKE invocations.

Then use the top-down balance steps it had to replace wake_idle().

This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.

SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.

Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 16:01:05 +02:00
Ingo Molnar
dca2d6ac09 Merge branch 'linus' into tracing/hw-breakpoints
Conflicts:
	arch/x86/kernel/process_64.c

Semantic conflict fixed in:
	arch/x86/kvm/x86.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 12:18:15 +02:00
Borislav Petkov
b8a4754147 x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
Since rdmsr_on_cpus and wrmsr_on_cpus are almost identical, unify them
into a common __rwmsr_on_cpus helper thus avoiding code duplication.

While at it, convert cpumask_t's to const struct cpumask *.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 09:15:54 +02:00
Linus Torvalds
f86054c245 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
  at_hdmac: Rework suspend_late()/resume_early()
  PM: Reset transition_started at dpm_resume_noirq
  PM: Update kerneldoc comments in drivers/base/power/main.c
  PM: Add convenience macro to make switching to dev_pm_ops less error-prone
  hp-wmi: Switch driver to dev_pm_ops
  floppy: Switch driver to dev_pm_ops
  PM: Trivial fixes
  PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
  PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
  PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
  PM/Hibernate: Rework shrinking of memory
  PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
  PM: Run-time PM platform device bus support
  PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
  Driver Core: Make PM operations a const pointer
  PM: Remove platform device suspend_late()/resume_early() V2
  USB: Rework musb suspend()/resume_early()
  I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
  I2C: Rework i2c-pxa suspend_late()/resume_early()
  DMA: Rework txx9dmac suspend_late()/resume_early()
  ...

Fix trivial conflict in drivers/base/platform.c (due to same
constification patch being merged in both sides, along with some other
PM work in the PM branch)
2009-09-14 20:03:54 -07:00
Linus Torvalds
69def9f05d Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
  MAINTAINERS: update KVM entry
  KVM: correct error-handling code
  KVM: fix compile warnings on s390
  KVM: VMX: Check cpl before emulating debug register access
  KVM: fix misreporting of coalesced interrupts by kvm tracer
  KVM: x86: drop duplicate kvm_flush_remote_tlb calls
  KVM: VMX: call vmx_load_host_state() only if msr is cached
  KVM: VMX: Conditionally reload debug register 6
  KVM: Use thread debug register storage instead of kvm specific data
  KVM guest: do not batch pte updates from interrupt context
  KVM: Fix coalesced interrupt reporting in IOAPIC
  KVM guest: fix bogus wallclock physical address calculation
  KVM: VMX: Fix cr8 exiting control clobbering by EPT
  KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
  KVM: Document KVM_CAP_IRQCHIP
  KVM: Protect update_cr8_intercept() when running without an apic
  KVM: VMX: Fix EPT with WP bit change during paging
  KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
  KVM: x86 emulator: Add adc and sbb missing decoder flags
  KVM: Add missing #include
  ...
2009-09-14 17:43:43 -07:00
Rafael J. Wysocki
ac8d513a68 Merge branch 'master' into for-linus 2009-09-14 20:26:05 +02:00
Linus Torvalds
55e0715f61 Merge branch 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, percpu: Collect hot percpu variables into one cacheline
  x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
  x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
2009-09-14 08:01:28 -07:00
Linus Torvalds
7dfd54a905 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, highmem_32.c: Clean up comment
  x86, pgtable.h: Clean up types
  x86: Clean up dump_pagetable()
2009-09-14 07:59:32 -07:00
Linus Torvalds
625037cc40 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: move clts into batch cpu state updates when preloading fpu
  x86-64: move unlazy_fpu() into lazy cpu state part of context switch
  x86-32: make sure clts is batched during context switch
  x86: split out core __math_state_restore
2009-09-14 07:58:08 -07:00
Linus Torvalds
c7208de304 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  x86: Fix code patching for paravirt-alternatives on 486
  x86, msr: change msr-reg.o to obj-y, and export its symbols
  x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
  x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
  x86, mcheck: Use correct cpumask for shared bank4
  x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
  x86: Fix CPU llc_shared_map information for AMD Magny-Cours
  x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
  x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
  x86, msr: fix msr-reg.S compilation with gas 2.16.1
  x86, msr: Export the register-setting MSR functions via /dev/*/msr
  x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
  x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
  x86, msr: CFI annotations, cleanups for msr-reg.S
  x86, asm: Make _ASM_EXTABLE() usable from assembly code
  x86, asm: Add 32-bit versions of the combined CFI macros
  x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
  x86, msr: Rewrite AMD rd/wrmsr variants
  x86, msr: Add rd/wrmsr interfaces with preset registers
  x86: add specific support for Intel Atom architecture
  ...
2009-09-14 07:57:32 -07:00
Linus Torvalds
9b74aec028 Merge branch 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: remove all now-duplicate header files
  x86: convert termios.h to the asm-generic version
  x86: convert almost generic headers to asm-generic version
  x86: convert trivial headers to asm-generic version
  x86: add copies of some headers to convert to asm-generic
2009-09-14 07:55:37 -07:00
Linus Torvalds
b581af5110 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/i386: Put aligned stack-canary in percpu shared_aligned section
  x86/i386: Make sure stack-protector segment base is cache aligned
  x86: Detect stack protector for i386 builds on x86_64
  x86: allow "=rm" in native_save_fl()
  x86: properly annotate alternatives.c
  x86: Introduce GDT_ENTRY_INIT(), initialize bad_bios_desc statically
  x86, 32-bit: Use generic sys_pipe()
  x86: Introduce GDT_ENTRY_INIT(), fix APM
  x86: Introduce GDT_ENTRY_INIT()
  x86: Introduce set_desc_base() and set_desc_limit()
  x86: Remove unused patch_espfix_desc()
  x86: Use get_desc_base()
2009-09-14 07:53:49 -07:00
Linus Torvalds
ffaf854b01 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
  x86, apic: Slim down stack usage in early_init_lapic_mapping()
  x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources()
  x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant
  x86: Fix x86_model test in es7000_apic_is_cluster()
  x86, apic: Move dmar_table_init() out of enable_IR()
  x86, ioapic: Panic on irq-pin binding only if needed
  x86/apic: Enable x2APIC without interrupt remapping under KVM
  x86, apic: Drop redundant bit assignment
  x86, ioapic: Throw BUG instead of NULL dereference
  x86, ioapic: Introduce for_each_irq_pin() helper
  x86: Remove superfluous NULL pointer check in destroy_irq()
  x86/ioapic.c: unify ioapic_retrigger_irq()
  x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop
  x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments
  x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop
  x86/ioapic.c: simplify add_pin_to_irq_node()
  x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop
  x86/ioapic.c: move lost comment to what seems like appropriate place
  x86/ioapic.c: remove redundant declaration of irq_pin_list
  ...
2009-09-14 07:51:20 -07:00
Linus Torvalds
483e3cd6a3 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
  ring-buffer: only enable ring_buffer_swap_cpu when needed
  ring-buffer: check for swapped buffers in start of committing
  tracing: report error in trace if we fail to swap latency buffer
  tracing: add trace_array_printk for internal tracers to use
  tracing: pass around ring buffer instead of tracer
  tracing: make tracing_reset safe for external use
  tracing: use timestamp to determine start of latency traces
  tracing: Remove mentioning of legacy latency_trace file from documentation
  tracing/filters: Defer pred allocation, fix memory leak
  tracing: remove users of tracing_reset
  tracing: disable buffers and synchronize_sched before resetting
  tracing: disable update max tracer while reading trace
  tracing: print out start and stop in latency traces
  ring-buffer: disable all cpu buffers when one finds a problem
  ring-buffer: do not count discarded events
  ring-buffer: remove ring_buffer_event_discard
  ring-buffer: fix ring_buffer_read crossing pages
  ring-buffer: remove unnecessary cpu_relax
  ring-buffer: do not swap buffers during a commit
  ring-buffer: do not reset while in a commit
  ...
2009-09-11 13:24:03 -07:00
Linus Torvalds
774a694f8c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
  sched: Fix sched::sched_stat_wait tracepoint field
  sched: Disable NEW_FAIR_SLEEPERS for now
  sched: Keep kthreads at default priority
  sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  sched: Turn off child_runs_first
  sched: Ensure that a child can't gain time over it's parent after fork()
  sched: enable SD_WAKE_IDLE
  sched: Deal with low-load in wake_affine()
  sched: Remove short cut from select_task_rq_fair()
  sched: Turn on SD_BALANCE_NEWIDLE
  sched: Clean up topology.h
  sched: Fix dynamic power-balancing crash
  sched: Remove reciprocal for cpu_power
  sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
  sched: Try to deal with low capacity
  sched: Scale down cpu_power due to RT tasks
  sched: Implement dynamic cpu_power
  sched: Add smt_gain
  sched: Update the cpu_power sum during load-balance
  sched: Add SD_PREFER_SIBLING
  ...
2009-09-11 13:23:18 -07:00
Linus Torvalds
4f0ac85416 Merge branch 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
  perf tools: Avoid unnecessary work in directory lookups
  perf stat: Clean up statistics calculations a bit more
  perf stat: More advanced variance computation
  perf stat: Use stddev_mean in stead of stddev
  perf stat: Remove the limit on repeat
  perf stat: Change noise calculation to use stddev
  x86, perf_counter, bts: Do not allow kernel BTS tracing for now
  x86, perf_counter, bts: Correct pointer-to-u64 casts
  x86, perf_counter, bts: Fail if BTS is not available
  perf_counter: Fix output-sharing error path
  perf trace: Fix read_string()
  perf trace: Print out in nanoseconds
  perf tools: Seek to the end of the header area
  perf trace: Fix parsing of perf.data
  perf trace: Sample timestamps as well
  perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
  perf trace: Sample the CPU too
  perf tools: Work around strict aliasing related warnings
  perf tools: Clean up warnings list in the Makefile
  perf tools: Complete support for dynamic strings
  ...
2009-09-11 13:22:43 -07:00
Linus Torvalds
a66a50054e Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (59 commits)
  x86/gart: Do not select AGP for GART_IOMMU
  x86/amd-iommu: Initialize passthrough mode when requested
  x86/amd-iommu: Don't detach device from pt domain on driver unbind
  x86/amd-iommu: Make sure a device is assigned in passthrough mode
  x86/amd-iommu: Align locking between attach_device and detach_device
  x86/amd-iommu: Fix device table write order
  x86/amd-iommu: Add passthrough mode initialization functions
  x86/amd-iommu: Add core functions for pd allocation/freeing
  x86/dma: Mark iommu_pass_through as __read_mostly
  x86/amd-iommu: Change iommu_map_page to support multiple page sizes
  x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
  x86/amd-iommu: Remove old page table handling macros
  x86/amd-iommu: Use 2-level page tables for dma_ops domains
  x86/amd-iommu: Remove bus_addr check in iommu_map_page
  x86/amd-iommu: Remove last usages of IOMMU_PTE_L0_INDEX
  x86/amd-iommu: Change alloc_pte to support 64 bit address space
  x86/amd-iommu: Introduce increase_address_space function
  x86/amd-iommu: Flush domains if address space size was increased
  x86/amd-iommu: Introduce set_dte_entry function
  x86/amd-iommu: Add a gneric version of amd_iommu_flush_all_devices
  ...
2009-09-11 13:16:37 -07:00
Linus Torvalds
989aa44a5f Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debug lockups: Improve lockup detection, fix generic arch fallback
  debug lockups: Improve lockup detection
2009-09-11 13:15:55 -07:00
Michal Hocko
80938332d8 x86: Increase MIN_GAP to include randomized stack
Currently we are not including randomized stack size when calculating
mmap_base address in arch_pick_mmap_layout for topdown case. This might
cause that mmap_base starts in the stack reserved area because stack is
randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB.

If the stack really grows down to mmap_base then we can get silent mmap
region overwrite by the stack values.

Let's include maximum stack randomization size into MIN_GAP which is
used as the low bound for the gap in mmap.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Stable Team <stable@kernel.org>
2009-09-10 17:00:12 -07:00
Ben Hutchings
5367b6887e x86: Fix code patching for paravirt-alternatives on 486
As reported in <http://bugs.debian.org/511703> and
<http://bugs.debian.org/515982>, kernels with paravirt-alternatives
enabled crash in text_poke_early() on at least some 486-class
processors.

The problem is that text_poke_early() itself uses inline functions
affected by paravirt-alternatives and so will modify instructions that
have already been prefetched.  Pentium and later processors will
invalidate the prefetched instructions in this case, but 486-class
processors do not.

Change sync_core() to limit prefetching on 486-class (and 386-class)
processors, and move the call to sync_core() above the call to the
modifiable local_irq_restore().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
LKML-Reference: <1252547631.3423.134.camel@localhost>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-10 16:50:19 -07:00
Frederic Weisbecker
8f8ffe2485 Merge commit 'tracing/core' into tracing/kprobes
Conflicts:
	kernel/trace/trace_export.c
	kernel/trace/trace_kprobe.c

Merge reason: This topic branch lacks an important
build fix in tracing/core:

	0dd7b74787:
	tracing: Fix double CPP substitution in TRACE_EVENT_FN

that prevents from multiple tracepoint headers inclusion crashes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-11 01:09:23 +02:00
Steven Rostedt
fc06b8520b x86/tracing: comment need for atomic nop
The dynamic function tracer relys on the macro P6_NOP5 always being
an atomic NOP. If for some reason it is changed to be two operations
(like a nop2 nop3) it can faults within the kernel when the function
tracer modifies the code.

This patch adds a comment to note that the P6_NOPs are expected to
be atomic. This will hopefully prevent anyone from changing that.

Reported-by: Mathieu Desnoyer <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-10 17:22:44 -04:00
Avi Kivity
0a79b00952 KVM: VMX: Check cpl before emulating debug register access
Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 18:11:10 +03:00
Avi Kivity
3d53c27d05 KVM: Use thread debug register storage instead of kvm specific data
Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 18:11:04 +03:00
Avi Kivity
fa6870c6b6 KVM: Add missing #include
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:49 +03:00
Avi Kivity
56e8231841 KVM: Rename x86_emulate.c to emulate.c
We're in arch/x86, what could we possibly be emulating?

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:45 +03:00
Joerg Roedel
344f414fa0 KVM: report 1GB page support to userspace
If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:19 +03:00
Joerg Roedel
04326caacf KVM: MMU: enable gbpages by increasing nr of pagesizes
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:19 +03:00
Joerg Roedel
7e4e4056f7 KVM: MMU: shadow support for 1gb pages
This patch adds support for shadow paging to the 1gb page table code in KVM.
With this code the guest can use 1gb pages even if the host does not support
them.

[ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped
           with 4kb ptes on host level ]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:19 +03:00
Joerg Roedel
852e3c19ac KVM: MMU: make direct mapping paths aware of mapping levels
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:18 +03:00
Sheng Yang
b927a3cec0 KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl
Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.

(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 08:33:16 +03:00
Gleb Natapov
a1b37100d9 KVM: Reduce runnability interface with arch support code
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:13 +03:00
Gleb Natapov
0b71785dc0 KVM: Move kvm_cpu_get_interrupt() declaration to x86 code
It is implemented only by x86.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:13 +03:00
Beth Kon
e9f4275732 KVM: PIT support for HPET legacy mode
When kvm is in hpet_legacy_mode, the hpet is providing the timer
interrupt and the pit should not be. So in legacy mode, the pit timer
is destroyed, but the *state* of the pit is maintained. So if kvm or
the guest tries to modify the state of the pit, this modification is
accepted, *except* that the timer isn't actually started. When we exit
hpet_legacy_mode, the current state of the pit (which is up to date
since we've been accepting modifications) is used to restart the pit
timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to
0xff in order to destroy the timer, but then restores the actual
value, again maintaining "current" state of the pit for possible later
reenablement.

[avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
[marcelo: fix memory corruption due to reserved storage]

Signed-off-by: Beth Kon <eak@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:12 +03:00
Gleb Natapov
fc61b800f9 KVM: Add Directed EOI support to APIC emulation
Directed EOI is specified by x2APIC, but is available even when lapic is
in xAPIC mode.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:07 +03:00
Joerg Roedel
ec04b2604c KVM: Prepare memslot data structures for multiple hugepage sizes
[avi: fix build on non-x86]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:02 +03:00
Marcelo Tosatti
229456fc34 KVM: convert custom marker based tracing to event traces
This allows use of the powerful ftrace infrastructure.

See Documentation/trace/ for usage information.

[avi, stephen: various build fixes]
[sheng: fix control register breakage]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:59 +03:00
Alexander Graf
0367b4330e x86: Add definition for IGNNE MSR
Hyper-V accesses MSR_IGNNE while running under KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:58 +03:00
Marcelo Tosatti
e799794e02 KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits
Required for EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:55 +03:00
Avi Kivity
7ffd92c53c KVM: VMX: Move rmode structure to vmx-specific code
rmode is only used in vmx, so move it to vmx.c

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:50 +03:00
Nitin A Kamble
3a624e29c7 KVM: VMX: Support Unrestricted Guest feature
"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.

    It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:49 +03:00
Avi Kivity
6de4f3ada4 KVM: Cache pdptrs
Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx,
guest memory access on svm) extract them on demand.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Huang Ying
890ca9aefa KVM: Add MCE support
The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED.  A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput
af24a4e4ae KVM: Replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h
Use standard msr-index.h's MSR declaration.

MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:38 +03:00
Rafael J. Wysocki
bf992fa2bc Merge branch 'master' into for-linus 2009-09-10 00:02:02 +02:00
Alex Chiang
a7db504052 PCI: remove pcibios_scan_all_fns()
This was #define'd as 0 on all platforms, so let's get rid of it.

This change makes pci_scan_slot() slightly easier to read.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:18 -07:00
Peter Zijlstra
a8fae3ec5f sched: enable SD_WAKE_IDLE
Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore,
enable it by default for MC, CPU and NUMA domains.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07 22:00:17 +02:00
Ingo Molnar
a1922ed661 Merge branch 'tracing/core' into tracing/hw-breakpoints
Conflicts:
	arch/Kconfig
	kernel/trace/trace.h

Merge reason: resolve the conflicts, plus adopt to the new
              ring-buffer APIs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07 08:19:51 +02:00
Ingo Molnar
ed011b22ce Merge commit 'v2.6.31-rc9' into tracing/core
Merge reason: move from -rc5 to -rc9.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:11:42 +02:00
Ingo Molnar
695a461296 Merge branch 'amd-iommu/2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2009-09-04 14:44:16 +02:00
Ingo Molnar
840a065310 sched: Turn on SD_BALANCE_NEWIDLE
Start the re-tuning of the balancer by turning on newidle.

It improves hackbench performance and parallelism on a 4x4 box.
The "perf stat --repeat 10" measurements give us:

  domain0             domain1
  .......................................
 -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
   2041.273208  task-clock-msecs         #      9.354 CPUs    ( +-   0.363% )

 +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
   2086.326925  task-clock-msecs         #     11.934 CPUs    ( +-   0.301% )

 +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE:
   2115.289791  task-clock-msecs         #     12.158 CPUs    ( +-   0.263% )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 11:52:54 +02:00
Ingo Molnar
47734f89be sched: Clean up topology.h
Re-organize the flag settings so that it's visible at a glance
which sched-domains flags are set and which not.

With the new balancer code we'll need to re-tune these details
anyway, so make it cleaner to make fewer mistakes down the
road ;-)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 11:52:53 +02:00
Jeremy Fitzhardinge
53f824520b x86/i386: Put aligned stack-canary in percpu shared_aligned section
Pack aligned things together into a special section to minimize
padding holes.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <4AA035C0.9070202@goop.org>
[ queued up in tip:x86/asm because it depends on this commit:
  x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 07:10:31 +02:00
Andreas Herrmann
4a376ec3a2 x86: Fix CPU llc_shared_map information for AMD Magny-Cours
Construct entire NodeID and use it as cpu_llc_id. Thus internal node
siblings are stored in llc_shared_map.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-03 15:09:59 -07:00
Jeremy Fitzhardinge
1ea0d14e48 x86/i386: Make sure stack-protector segment base is cache aligned
The Intel Optimization Reference Guide says:

	In Intel Atom microarchitecture, the address generation unit
	assumes that the segment base will be 0 by default. Non-zero
	segment base will cause load and store operations to experience
	a delay.
		- If the segment base isn't aligned to a cache line
		  boundary, the max throughput of memory operations is
		  reduced to one [e]very 9 cycles.
	[...]
	Assembly/Compiler Coding Rule 15. (H impact, ML generality)
	For Intel Atom processors, use segments with base set to 0
	whenever possible; avoid non-zero segment base address that is
	not aligned to cache line boundary at all cost.

We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AA01893.6000507@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-03 21:30:51 +02:00
Joerg Roedel
2b681fafcc Merge branch 'amd-iommu/pagetable' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
2009-09-03 17:14:57 +02:00
Joerg Roedel
03362a05c5 Merge branch 'amd-iommu/passthrough' into amd-iommu/2.6.32
Conflicts:
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2009-09-03 16:34:23 +02:00
Joerg Roedel
85da07c409 Merge branches 'gart/fixes', 'amd-iommu/fixes+cleanups' and 'amd-iommu/fault-handling' into amd-iommu/2.6.32 2009-09-03 16:32:00 +02:00
Joerg Roedel
0feae533dd x86/amd-iommu: Add passthrough mode initialization functions
When iommu=pt is passed on kernel command line the devices
should run untranslated. This requires the allocation of a
special domain for that purpose. This patch implements the
allocation and initialization path for iommu=pt.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:15:42 +02:00
Joerg Roedel
abdc5eb3d6 x86/amd-iommu: Change iommu_map_page to support multiple page sizes
This patch adds a map_size parameter to the iommu_map_page
function which makes it generic enough to handle multiple
page sizes. This also requires a change to alloc_pte which
is also done in this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:17 +02:00
Joerg Roedel
a6b256b413 x86/amd-iommu: Support higher level PTEs in iommu_page_unmap
This patch changes fetch_pte and iommu_page_unmap to support
different page sizes too.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:11:08 +02:00
Joerg Roedel
674d798a80 x86/amd-iommu: Remove old page table handling macros
These macros are not longer required. So remove them.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:49 +02:00
Joerg Roedel
50020fb632 x86/amd-iommu: Introduce increase_address_space function
This function will be used to increase the address space
size of a protection domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:46 +02:00
Joerg Roedel
04bfdd8406 x86/amd-iommu: Flush domains if address space size was increased
Thist patch introduces the update_domain function which
propagates the larger address space of a protection domain
to the device table and flushes all relevant DTEs and the
domain TLB.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:45 +02:00
Joerg Roedel
9355a08186 x86/amd-iommu: Make fetch_pte aware of dynamic mapping levels
This patch changes the fetch_pte function in the AMD IOMMU
driver to support dynamic mapping levels.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 16:03:34 +02:00
Joerg Roedel
b26e81b871 x86/amd-iommu: Panic if IOMMU command buffer reset fails
To prevent the driver from doing recursive command buffer
resets, just panic when that recursion happens.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:35 +02:00
Joerg Roedel
93f1cc67cf x86/amd-iommu: Add reset function for command buffers
This patch factors parts of the command buffer
initialization code into a seperate function which can be
used to reset the command buffer later.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:55:34 +02:00
Joerg Roedel
4c6f40d4e0 x86/amd-iommu: replace "AMD IOMMU" by "AMD-Vi"
This patch replaces the "AMD IOMMU" printk strings with the
official name for the hardware: "AMD-Vi".

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-09-03 15:49:56 +02:00
Ingo Molnar
f76bd108e5 Merge branch 'perfcounters/urgent' into perfcounters/core
Merge reason: We are going to modify a place modified by
              perfcounters/urgent.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 21:42:59 +02:00
Ingo Molnar
936e894a97 Merge commit 'v2.6.31-rc8' into x86/txt
Conflicts:
	arch/x86/kernel/reboot.c
	security/Kconfig

Merge reason: resolve the conflicts, bump up from rc3 to rc8.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:17:56 +02:00
Huang Ying
ae4b688db2 x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
This function measures whether the FPU/SSE state can be touched in
interrupt context. If the interrupted code is in user space or has no
valid FPU/SSE context (CR0.TS == 1), FPU/SSE state can be used in IRQ
or soft_irq context too.

This is used by AES-NI accelerated AES implementation and PCLMULQDQ
accelerated GHASH implementation.

v3:
 - Renamed to irq_fpu_usable to reflect the purpose of the function.

v2:
 - Renamed to irq_is_fpu_using to reflect the real situation.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 21:39:15 -07:00
Shane Wang
69575d3886 x86, intel_txt: clean up the impact on generic code, unbreak non-x86
Move tboot.h from asm to linux to fix the build errors of intel_txt
patch on non-X86 platforms. Remove the tboot code from generic code
init/main.c and kernel/cpu.c.

Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-01 18:25:07 -07:00
Ingo Molnar
c931aaf0e1 Merge branch 'x86/paravirt' into x86/cpu
Conflicts:
	arch/x86/include/asm/paravirt.h

Manual merge:
	arch/x86/include/asm/paravirt_types.h

Merge reason: x86/paravirt conflicts non-trivially with x86/cpu,
              resolve it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-01 12:13:30 +02:00
H. Peter Anvin
ff55df53df x86, msr: Export the register-setting MSR functions via /dev/*/msr
Make it possible to access the all-register-setting/getting MSR
functions via the MSR driver.  This is implemented as an ioctl() on
the standard MSR device node.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
2009-08-31 16:16:04 -07:00
H. Peter Anvin
8b956bf1f0 x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
Create _on_cpu helpers for {rw,wr}msr_safe_regs() analogously with the
other MSR functions.  This will be necessary to add support for these
to the MSR driver.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <petkovbb@gmail.com>
2009-08-31 16:15:57 -07:00
H. Peter Anvin
0cc0213e73 x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
For some reason, the _safe MSR functions returned -EFAULT, not -EIO.
However, the only user which cares about the return code as anything
other than a boolean is the MSR driver, which wants -EIO.  Change it
to -EIO across the board.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-08-31 15:15:23 -07:00
H. Peter Anvin
709972b1f6 x86, asm: Make _ASM_EXTABLE() usable from assembly code
We have had this convenient macro _ASM_EXTABLE() to generate exception
table entry in inline assembly.  Make it also usable for pure
assembly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:30 -07:00
H. Peter Anvin
fe9b4e4e40 x86, asm: Add 32-bit versions of the combined CFI macros
Add 32-bit versions of the combined CFI macros, equivalent to the
64-bit ones except, obviously, operating on 32-bit stack words.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:29 -07:00
Borislav Petkov
177fed1ee8 x86, msr: Rewrite AMD rd/wrmsr variants
Switch them to native_{rd,wr}msr_safe_regs and remove
pv_cpu_ops.read_msr_amd.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-2-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:28 -07:00
Borislav Petkov
132ec92f3f x86, msr: Add rd/wrmsr interfaces with preset registers
native_{rdmsr,wrmsr}_safe_regs are two new interfaces which allow
presetting of a subset of eight x86 GPRs before executing the rd/wrmsr
instructions. This is needed at least on AMD K8 for accessing an erratum
workaround MSR.

Originally based on an idea by H. Peter Anvin.

Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <1251705011-18636-1-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-31 15:14:26 -07:00
Thomas Gleixner
e11dadabf4 x86: apic namespace cleanup
boot_cpu_physical_apicid is a global variable and used as function
argument as well. Rename the function arguments to avoid confusion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 21:30:47 +02:00
Thomas Gleixner
bc07844a33 x86: Distangle ioapic and i8259
The proposed Moorestown support patches use an extra feature flag
mechanism to make the ioapic work w/o an i8259. There is a much
simpler solution.

Most i8259 specific functions are already called dependend on the irq
number less than NR_IRQS_LEGACY. Replacing that constant by a
read_mostly variable which can be set to 0 by the platform setup code
allows us to achieve the same without any special feature flags.

That trivial change allows us to proceed with MRST w/o doing a full
blown overhaul of the ioapic code which would delay MRST unduly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 19:23:09 +02:00
Thomas Gleixner
3f4110a48a x86: Add Moorestown early detection
Moorestown MID devices need to be detected early in the boot process
to setup and do not call x86_default_early_setup as there is no EBDA
region to reserve.

[ Copied the minimal code from Jacobs latest MRST series ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
2009-08-31 11:09:40 +02:00
Pan, Jacob jun
162bc7ab01 x86: Add hardware_subarch ID for Moorestown
x86 bootprotocol 2.07 has introduced hardware_subarch ID in the boot
parameters provided by FW. We use it to identify Moorestown platforms.

[ tglx: Cleanup and paravirt fix ]

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 11:09:40 +02:00
Thomas Gleixner
47a3d5da70 x86: Add early platform detection
Platforms like Moorestown require early setup and want to avoid the
call to reserve_ebda_region. The x86_init override is too late when
the MRST detection happens in setup_arch. Move the default i386
x86_init overrides and the call to reserve_ebda_region into a separate
function which is called as the default of a switch case depending on
the hardware_subarch id in boot params. This allows us to add a case
for MRST and let MRST have its own early setup function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 11:09:40 +02:00
Thomas Gleixner
2d826404f0 x86: Move tsc_calibration to x86_init_ops
TSC calibration is modified by the vmware hypervisor and paravirt by
separate means. Moorestown wants to add its own calibration routine as
well. So make calibrate_tsc a proper x86_init_ops function and
override it by paravirt or by the early setup of the vmware
hypervisor.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:47 +02:00
Thomas Gleixner
08047c4f17 x86: Move calibrate_cpu to tsc.c
Move the code where it's only user is. Also we need to look whether
this hardwired hackery might interfere with perfcounters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
64fcbac1f3 x86: Simplify timer_ack magic in time_32.c
Let the compiler optimize the timer_ack magic away in the 32bit timer
interrupt and put the same code into time_64.c. It's optimized out for
CONFIG_X86_IO_APIC on 32bit and for 64bit because timer_ack is const 0
in both cases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
ecce85089e x86: Remove do_timer hook
This is a left over of the old x86 sub arch support. Remove it and
open code it like we do in time_64.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
845b3944bb x86: Add timer_init to x86_init_ops
The timer init code is convoluted with several quirks and the paravirt
timer chooser. Figuring out which code path is actually taken is not
for the faint hearted.

Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and
replace the paravirt time chooser and the remaining x86 quirk with a
simple x86_init_ops function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
736decac64 x86: Move percpu clockevents setup to x86_init_ops
paravirt overrides the setup of the default apic timers as per cpu
timers. Moorestown needs to override that as well.

Move it to x86_init_ops setup and create a separate x86_cpuinit struct
which holds the function for the secondary evtl. hotplugabble CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
f1d7062a23 x86: Move xen_post_allocator_init into xen_pagetable_setup_done
We really do not need two paravirt/x86_init_ops functions which are
called in two consecutive source lines. Move the only user of
post_allocator_init into the already existing pagetable_setup_done
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
030cb6c00d x86: Move paravirt pagetable_setup to x86_init_ops
Replace more paravirt hackery by proper x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
6f30c1ac3f x86: Move paravirt banner printout to x86_init_ops
Replace another obscure paravirt magic and move it to
x86_init_ops. Such a hook is also useful for embedded and special
hardware.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
42bbdb43b1 x86: Replace ARCH_SETUP by a proper x86_init_ops
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support
code. It still has a lonely user in xen. Move it to x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
428cf9025b x86: Move traps_init to x86_init_ops
Replace the quirks by a simple x86_init_ops function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
66bcaf0bde x86: Move irq_init to x86_init_ops
irq_init is overridden by x86_quirks and by paravirts. Unify the whole
mess and make it an unconditional x86_init_ops function which defaults
to the standard function and can be overridden by the early platform
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
d9112f4302 x86: Move pre_intr_init to x86_init_ops
Replace the quirk machinery by a x86_init_ops function which
defaults to the standard implementation. This is also a preparatory
patch for Moorestown support which needs to replace the default
init_ISA_irqs as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
b3f1b617f4 x86: Move get/find_smp_config to x86_init_ops
Replace the quirk machinery by a x86_init_ops function which defaults
to the standard implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Ingo Molnar
eebc57f73d Merge branch 'for-ingo' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic
Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI
              tree needs this cleanup, pull it into the APIC branch as
	      well so that there's no interactions.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29 09:31:47 +02:00
Feng Tang
e55a5999ff ACPI: Handle CONFIG_ACPI=n better from linux/acpi.h
linux/acpi.h is the top level header for interfacing
with the ACPI sub-system, so acpi_disabled should be
up there instead of down in asm/acpi.h -- particularly
since asm/acpi.h doesn't exist for all architectures.

Same story for acpi_table_parse(), which is a top-level
API to Linux/ACPI.

This is necessary for building some code that
used to always depend on CONFIG_ACPI=y, but will soon
also need to build with CONFIG_ACPI=n.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-08-28 19:57:29 -04:00
Feng Tang
2a4ab640d3 ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
Some IO-APIC routines are ACPI specific now, but need to
be exposed when CONFIG_ACPI=n for the benefit of SFI.

Remove #ifdef ACPI around these routines:

io_apic_get_unique_id(int ioapic, int apic_id);
io_apic_get_version(int ioapic);
io_apic_get_redir_entries(int ioapic);

Move these routines from ACPI-specific boot.c to io_apic.c:

uniq_ioapic_id(u8 id)
mp_find_ioapic()
mp_find_ioapic_pin()
mp_register_ioapic()

Also, since uniq_ioapic_id() is now no longer static,
re-name it to io_apic_unique_id() for consistency
with the other public io_apic routines.

For simplicity, do not #ifdef the resulting code ACPI || SFI,
thought that could be done in the future if it is important
to optimize the !ACPI !SFI IO-APIC x86 kernel for size.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2009-08-28 19:57:27 -04:00
Suresh Siddha
c8bc6f3c80 x86: arch specific support for remapping HPET MSIs
x86 arch support for remapping HPET MSI's by associating the HPET timer block
with the interrupt-remapping HW unit and setting up appropriate irq_chip

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jay Fenlason <fenlason@redhat.com>
LKML-Reference: <20090804190729.630510000@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 23:33:20 +02:00
Thomas Gleixner
90e1c6969d x86: Move oem_bus_info to x86_init_ops
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:53 +02:00
Thomas Gleixner
52fdb56846 x86: Move mpc_oem_pci_bus to x86_init_ops
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
72302142e1 x86: Move smp_read_mpc_oem to x86_init_ops.
Move smp_read_mpc_oem from quirks to x86_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
fd6c666149 x86: Move mpc_apic_id to x86_init_ops
The mpc_apic_id setup is handled by a x86_quirk. Make it a
x86_init_ops function with a default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
de93410310 x86: Move ioapic_ids_setup to x86_init_ops
32bit and also the numaq code have special requirements on the
ioapic_id setup. Convert it to a x86_init_ops function and get rid
of the quirks and #ifdefs

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
f4848472cd x86: Sanitize smp_record and move it to x86_init_ops
The x86 quirkification introduced an extra ugly hackery with a
variable pointer in the mpparse code. If the pointer is initialized
then it is dereferenced and the variable set to 0 or incremented.

Create a x86_init_ops function and let the affected numaq code
hold the function. Default init is a setup noop.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
6b18ae3e2f x86: Move memory_setup to x86_init_ops
memory_setup is overridden by x86_quirks and by paravirts with weak
functions and quirks. Unify the whole mess and make it an
unconditional x86_init_ops function which defaults to the standard
function and can be overridden by the early platform code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
816c25e7d4 x86: Add reserve_ebda_region to x86_init_ops
reserve_ebda_region needs to be called befor start_kernel. Moorestown
needs to override it. Make it a x86_init_ops function and initialize
it with the default reserve_ebda_region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
8fee697d99 x86: Add request_standard_resources to x86_init
The 32bit and the 64bit code are slighty different in the reservation
of standard resources. Also the upcoming Moorestown support needs its
own version of that.

Add it to x86_init_ops and initialize it with the 64bit default. 32bit
overrides it in early boot. Now moorestown can add it's own override
w/o sprinkling the code with more #ifdefs

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
f7cf5a5b8c x86: Add probe_roms to x86_init
probe_roms is only used on 32bit. Add it to the x86_init ops and
remove the #ifdefs.

Default initializer is x86_init_noop() which is overridden in
the 32bit boot code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
57844a8f8e x86: Add x86_init infrastructure
The upcoming Moorestown support brings the embedded world to x86. The
setup code of x86 has already a couple of hooks which are either
x86_quirks or paravirt ops. Some of those setup hooks are pretty
convoluted like the timer setup and the tsc calibration code. But
there are other places which could do with a cleanup.

Instead of having inline functions/macros which are modified at
compile time I decided to introduce x86_init ops which are
unconditional in the code and make it clear that they can be changed
either during compile time or in the early boot process. The function
pointers are initialized by default functions which can be noops so
that the pointer can be called unconditionally in the most cases. This
also allows us to remove 32bit/64bit, paravirt and other #ifdeffery.

paravirt guests are just a hardware platform in the setup code, so we
should treat them as such and not hide all behind multiple layers of
indirection and compile time dependencies.

It's more obvious that x86_init.timers.timer_init() is a function
pointer than the late_time_init = choose_time_init() obscurity. It's
also way simpler to grep for x86_init.timers.timer_init and find all
the places which modify that function pointer instead of analyzing
weak functions, macros and paravirt indirections.

Note. This is not a general paravirt_ops replacement. It just will
move setup related hooks which are potentially useful for other
platform setup purposes as well out of the paravirt domain.

Add the base infrastructure without any functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
Thomas Gleixner
4152f93508 Merge branch 'sched/clock' into x86/cleanups
Reason: The tsc init cleanup depends on sched_clock_init moving past
late_time_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:06:24 +02:00
Thomas Gleixner
6effcd9245 Merge branch 'x86/paravirt' into x86/cleanups
Reason: The setup cleanups conflict with the paravirt cleanups. Avoid
a rather large merge conflict

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 16:42:32 +02:00
H. Peter Anvin
b855192c08 Merge branch 'x86/urgent' into x86/pat
Reason: Change to is_new_memtype_allowed() in x86/urgent

Resolved semantic conflicts in:

	 arch/x86/mm/pat.c
	 arch/x86/mm/ioremap.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 17:24:28 -07:00
Venkatesh Pallipadi
f584174096 x86, pat: Use page flags to track memtypes of RAM pages
Change reserve_ram_pages_type and free_ram_pages_type to use 2 page
flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking
just tracked WB or NonWB, which was not complete and did not allow
tracking of RAM fully and there was no way to get the actual type
reserved by looking at the page flags.

We use the memtype_lock spinlock for atomicity in dealing with
memtype tracking in struct page.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:24 -07:00
Venkatesh Pallipadi
9e36fda0b3 x86, pat: Add PAT reserve free to io_mapping* APIs
io_mapping_* interfaces were added, mainly for graphics drivers.
Make this interface go through the PAT reserve/free, instead of
hardcoding WC mapping. This makes sure that there are no
aliases due to unconditional WC setting.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:16 -07:00
Venkatesh Pallipadi
9fd126bc74 x86, pat: New i/f for driver to request memtype for IO regions
Add new routines to request memtype for IO regions. This will currently
be a backend for io_mapping_* routines. But, it can also be made available
to drivers directly in future, in case it is needed.

reserve interface reserves the memory, makes sure we have a compatible
memory type available and keeps the identity map in sync when needed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:10 -07:00
Masami Hiramatsu
b1cf540f0e x86: Add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from
pt_regs.
These APIs are required by kprobes-based event tracer on ftrace.
Some other debugging tools might be able to use it too.

- regs_query_register_offset(const char *name)
   Query the offset of "name" register.

- regs_query_register_name(unsigned int offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned int offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth entry of the kernel stack. (N >= 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth argument at function call. (N >= 0)

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
LKML-Reference: <20090813203444.31965.26374.stgit@localhost.localdomain>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-27 00:35:57 +02:00
Masami Hiramatsu
eb13296cfa x86: Instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding
instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
 EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
LKML-Reference: <20090813203413.31965.49709.stgit@localhost.localdomain>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-27 00:35:56 +02:00
Jason Baron
117226d158 tracing: Remove FTRACE_SYSCALL_MAX definitions
Remove the FTRACE_SYSCALL_MAX definitions now that we have converted the
syscall event tracing code to use NR_syscalls.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <f2240cdc8f0b1ca7617390c8f5ec90ba2bd348cf.1251146513.git.jbaron@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:30:39 +02:00
Jason Baron
a5a2f8e2ac tracing: Define NR_syscalls for x86_64
Express the available number of syscalls in a standard way by defining
NR_syscalls.

The common way to define it is to place its definition in asm/unistd.h
However, the number of syscalls is defined using __NR_syscall_max in
x86-64 after building a dynamic header file "asm-offsets.h"

The source file that generates this header, asm-offsets-64.c includes
unistd.h, then if we want to express NR_syscalls from __NR_syscall_max
in unistd.h only after generating the dynamic header file, we need a
watchguard.

If unistd.h is included from asm-offsets-64.c, then we are generating
asm-offset.h which defines __NR_syscall_max. At this time, we don't
want to (we can't) define NR_syscalls, then we do nothing.
Otherwise we define NR_syscalls because we know asm-offsets.h has
been generated.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20090826160910.GB2658@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:29:58 +02:00
Jason Baron
dd86dda24c tracing: Define NR_syscalls for x86 (32)
Add a NR_syscalls #define for x86. This is used in the syscall events
tracing code. Todo: make it dynamic like x86_64.

NR_syscalls is the usual name used to determine the number of syscalls
supported by the current arch. We want to unify the use of this number
across archs that support the syscall tracing. This also prepare to move
some of the arch code to core code in the syscall tracing area.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anwin <hpa@zytor.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <0f33c0f96d198fccc3ddd9ff7f5334ff5cb42706.1251146513.git.jbaron@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 21:29:55 +02:00
Cyrill Gorcunov
8f3e1df48b x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant
We already have APIC_DEFAULT_PHYS_BASE so just to be
consistent.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090824175550.927946757@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26 08:16:38 +02:00
H. Peter Anvin
ab94fcf528 x86: allow "=rm" in native_save_fl()
This is a partial revert of f1f029c7bf.

"=rm" is allowed in this context, because "pop" is explicitly defined
to adjust the stack pointer *before* it evaluates its effective
address, if it has one.  Thus, we do end up writing to the correct
address even if we use an on-stack memory argument.

The original reporter for f1f029c7bf was
apparently using a broken x86 simulator.

[ Impact: performance ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Gabe Black <spamforgabe@umich.edu>
2009-08-25 16:47:16 -07:00
H. Peter Anvin
e8a2eb47e6 Merge commit 'origin/x86/urgent' into x86/asm 2009-08-25 15:40:29 -07:00
Josh Stone
6670000119 tracing: Rename FTRACE_SYSCALLS for tracepoints
s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

The syscall enter/exit tracing is no longer specific to just ftrace, so
they now have names that reflect their tie to tracepoints instead.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26 00:17:35 +02:00
Tobias Doerffel
366d19e181 x86: add specific support for Intel Atom architecture
Add another option when selecting CPU family so the kernel can be
optimized for Intel Atom CPUs. If GCC supports tuning options for
Intel Atom they will be used.

Signed-off-by: Tobias Doerffel <tobias.doerffel@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1251018457-19157-1-git-send-email-tobias.doerffel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23 11:20:02 +02:00
H. Peter Anvin
5400743db5 x86, mtrr: make mtrr_aps_delayed_init static bool
mtr_aps_delayed_init was declared u32 and made global, but it only
ever takes boolean values and is only ever used in
arch/x86/kernel/cpu/mtrr/main.c.  Declare it "static bool" and remove
external references.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2009-08-21 17:00:02 -07:00
Suresh Siddha
d0af9eed5a x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
the need for synchronizing the logical cpu's while initializing/updating
MTRR.

Currently Linux kernel does the synchronization of all cpu's only when
a single MTRR register is programmed/updated. During an AP online
(during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
we don't follow this synchronization algorithm.

This can lead to scenarios where during a dynamic cpu online, that logical cpu
is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
HT sibling continue to run (also with cache disabled because of cr0.cd=1
on its sibling).

Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
(because of some VMX performance optimizations) and the above scenario
(with one logical cpu doing VMX activity and another logical cpu coming online)
can result in system crash.

Fix the MTRR initialization by doing rendezvous of all the cpus. During
boot and resume, we delay the MTRR/PAT init for APs till all the
logical cpu's come online and the rendezvous process at the end of AP's bringup,
will initialize the MTRR/PAT for all AP's.

For dynamic single cpu online, we synchronize all the logical cpus and
do the MTRR/PAT init on the AP that is coming online.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 16:25:55 -07:00
Jan Beulich
8b5a10fc6f x86: properly annotate alternatives.c
Some of the NOPs tables aren't used on 64-bits, quite some code and
data is needed post-init for module loading only, and a couple of
functions aren't used outside that file (i.e. can be static, and don't
need to be exported).

The change to __INITDATA/__INITRODATA is needed to avoid an assembler
warning.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4A8BC8A00200007800010823@vpn.id2.novell.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-21 15:30:12 -07:00
Jan Beulich
5946fa3d5c x86, hpet: Simplify the HPET code
On 64-bits, using unsigned long when unsigned int suffices
needlessly creates larger code (due to the need for REX
prefixes), and most of the logic in hpet.c really doesn't need
64-bit operations.

At once this avoids the need for a couple of type casts.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <4A8BC9780200007800010832@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 21:55:25 +02:00
john stultz
da15cfdae0 time: Introduce CLOCK_REALTIME_COARSE
After talking with some application writers who want very fast, but not
fine-grained timestamps, I decided to try to implement new clock_ids
to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
which returns the time at the last tick. This is very fast as we don't
have to access any hardware (which can be very painful if you're using
something like the acpi_pm clocksource), and we can even use the vdso
clock_gettime() method to avoid the syscall. The only trade off is you
only get low-res tick grained time resolution.

This isn't a new idea, I know Ingo has a patch in the -rt tree that made
the vsyscall gettimeofday() return coarse grained time when the
vsyscall64 sysctrl was set to 2. However this affects all applications
on a system.

With this method, applications can choose the proper speed/granularity
trade-off for themselves.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: nikolag@ca.ibm.com
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: arjan@infradead.org
Cc: jonathan@jonmasters.org
LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-21 21:43:46 +02:00
Rafael J. Wysocki
39cf0518d8 Merge branch 'master' into for-linus 2009-08-20 20:24:33 +02:00
Suresh Siddha
1adcaafe74 x86, pat: Allow ISA memory range uncacheable mapping requests
Max Vozeler reported:
>  Bug 13877 -  bogl-term broken with CONFIG_X86_PAT=y, works with =n
>
>  strace of bogl-term:
>  814   mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0)
>				 = -1 EAGAIN (Resource temporarily unavailable)
>  814   write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n",
>	       57) = 57

PAT code maps the ISA memory range as WB in the PAT attribute, so that
fixed range MTRR registers define the actual memory type (UC/WC/WT etc).

But the upper level is_new_memtype_allowed() API checks are failing,
as the request here is for UC and the return tracked type is WB (Tracked type is
WB as MTRR type for this legacy range potentially will be different for each
4k page).

Fix is_new_memtype_allowed() by always succeeding the ISA address range
checks, as the null PAT (WB) and def MTRR fixed range register settings
satisfy the memory type needs of the applications that map the ISA address
range.

Reported-and-Tested-by: Max Vozeler <xam@debian.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-17 14:12:44 -07:00
Cliff Wickman
3ef12c3c97 x86: Fix UV BAU destination subnode id
The SGI UV Broadcast Assist Unit is used to send TLB shootdown
messages to remote nodes of the system.  The header of the
message must contain the subnode id of the block in the
receiving hub that handles such messages.  It should always be
0x10, the id of the "LB" block.

It had previously been documented as a "must be zero" field.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <E1Mc1x7-0005Ce-6t@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15 11:58:02 +02:00
Tejun Heo
384be2b18a Merge branch 'percpu-for-linus' into percpu-for-next
Conflicts:
	arch/sparc/kernel/smp_64.c
	arch/x86/kernel/cpu/perf_counter.c
	arch/x86/kernel/setup_percpu.c
	drivers/cpufreq/cpufreq_ondemand.c
	mm/percpu.c

Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids.  As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-14 14:45:31 +09:00
Jason Baron
9daa77e2e9 tracing: Update FTRACE_SYSCALL_MAX
update FTRACE_SYSCALL_MAX to the current number of syscalls

FTRACE_SYSCALL_MAX is a temporary solution to get the number of
syscalls supported by the arch until we find a more dynamic way
to get this number.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-11 20:35:27 +02:00
Huang Ying
0dcc66851f x86, mce: Support specifying raise mode for software MCE injection
Raise mode include raising as exception or raising as poll, it is
specified via the mce.inject_flags field.

This can be used to specify raise mode of UCNA, which is UC error but
raised not as exception. And this can be used to test the filter code
of poll handler or exception handler too. For example, enforce a poll
raise mode for a fatal MCE.

ChangeLog:

v2:

- Re-base on latest x86-tip.git/mce3

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-10 13:58:41 -07:00
Huang Ying
5b7e88edc6 x86, mce: Support specifying context for software mce injection
The cpu context is specified via the new mce.inject_flags fields.
This allows more realistic machine check testing in different
situations. "RANDOM" context is implemented via NMI broadcasting to
add randomization to testing.

AK: Fix NMI broadcasting check. Fix 32-bit building. Some race
fixes. Move to module. Various changes

ChangeLog:

v3:

- Re-based on latest x86-tip.git/mce4

- Fix 32-bit building

v2:

- Re-base on latest x86-tip.git/mce3

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-10 13:58:27 -07:00
Markus Metzger
30dd568c91 x86, perf_counter, bts: Add BTS support to perfcounters
Implement a performance counter with:

    attr.type           = PERF_TYPE_HARDWARE
    attr.config         = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
    attr.sample_period  = 1

Using branch trace store (BTS) on x86 hardware, if available.

The from and to address for each branch can be sampled using:

    PERF_SAMPLE_IP      for the from address
    PERF_SAMPLE_ADDR    for the to address

[ v2: address review feedback, fix bugs ]

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 13:04:14 +02:00
Akinobu Mita
1e5de18278 x86: Introduce GDT_ENTRY_INIT()
GDT_ENTRY_INIT is static initializer of desc_struct.

We already have similar macro GDT_ENTRY() but it's static
initializer for u64 and it cannot be used for desc_struct.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090718151219.GD11294@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-08 17:44:11 +02:00
Rafael J. Wysocki
c00aafcd49 Merge branch 'master' into for-linus 2009-08-05 23:56:54 +02:00
Gleb Natapov
ce69a78450 x86/apic: Enable x2APIC without interrupt remapping under KVM
KVM would like to provide x2APIC interface to a guest without emulating
interrupt remapping device. The reason KVM prefers guest to use x2APIC
is that x2APIC interface is better virtualizable and provides better
performance than mmio xAPIC interface:

 - msr exits are faster than mmio (no page table walk, emulation)
 - no need to read back ICR to look at the busy bit
 - one 64 bit ICR write instead of two 32 bit writes
 - shared code with the Hyper-V paravirt interface

Included patch changes x2APIC enabling logic to enable it even if IR
initialization failed, but kernel runs under KVM and no apic id is
greater than 255 (if there is one spec requires BIOS to move to x2apic
mode before starting an OS).

-v2: fix build
-v3: fix bug causing compiler warning

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: "avi@redhat.com" <avi@redhat.com>
LKML-Reference: <20090720122417.GR5638@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-05 14:28:50 +02:00
Dave Airlie
b7f3158428 Merge git://git.infradead.org/~dwmw2/iommu-agp into agp-next 2009-08-05 10:16:57 +10:00
Linus Torvalds
067e18133f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Work around compilation warning in arch/x86/kernel/apm_32.c
  x86, UV: Complete IRQ interrupt migration in arch_enable_uv_irq()
  x86, 32-bit: Fix double accounting in reserve_top_address()
  x86: Don't use current_cpu_data in x2apic phys_pkg_id
  x86, UV: Fix UV apic mode
  x86, UV: Fix macros for accessing large node numbers
  x86, UV: Delete mapping of MMR rangs mapped by BIOS
  x86, UV: Handle missing blade-local memory correctly
  x86: fix assembly constraints in native_save_fl()
  x86, msr: execute on the correct CPU subset
  x86: Fix assert syntax in vmlinux.lds.S
  x86: Make 64-bit efi_ioremap use ioremap on MMIO regions
  x86: Add quirk to make Apple MacBook5,2 use reboot=pci
  x86: Fix CPA memtype reserving in the set_pages_array*() cases
  x86, pat: Fix set_memory_wc related corruption
  x86: fix section mismatch for i386 init code
2009-08-04 15:28:59 -07:00
Jack Steiner
67e83f309e x86, UV: Fix macros for accessing large node numbers
The UV chipset automatically supplies the upper bits on nodes
being referenced by MMR accesses. These bit can be deleted from
the hub addressing macros.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090727143808.GA8076@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-04 16:19:14 +02:00
Jack Steiner
6c7184b774 x86, UV: Handle missing blade-local memory correctly
UV blades may not have any blade-local memory. Add a field
(nid) to the UV blade structure to indicates whether the node
has local memory. This is needed by the GRU driver (pushed
separately).

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
LKML-Reference: <20090727143507.GA7006@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-04 16:18:01 +02:00
H. Peter Anvin
f1f029c7bf x86: fix assembly constraints in native_save_fl()
From Gabe Black in bugzilla 13888:

native_save_fl is implemented as follows:

  11static inline unsigned long native_save_fl(void)
  12{
  13        unsigned long flags;
  14
  15        asm volatile("# __raw_save_flags\n\t"
  16                     "pushf ; pop %0"
  17                     : "=g" (flags)
  18                     : /* no input */
  19                     : "memory");
  20
  21        return flags;
  22}

If gcc chooses to put flags on the stack, for instance because this is
inlined into a larger function with more register pressure, the offset
of the flags variable from the stack pointer will change when the
pushf is performed. gcc doesn't attempt to understand that fact, and
address used for pop will still be the same. It will write to
somewhere near flags on the stack but not actually into it and
overwrite some other value.

I saw this happen in the ide_device_add_all function when running in a
simulator I work on. I'm assuming that some quirk of how the simulated
hardware is set up caused the code path this is on to be executed when
it normally wouldn't.

A simple fix might be to change "=g" to "=r".

Reported-by: Gabe Black <spamforgabe@umich.edu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Stable Team <stable@kernel.org>
2009-08-03 16:36:17 -07:00
Paul Mackerras
6a7bbd57ed x86: Make 64-bit efi_ioremap use ioremap on MMIO regions
Booting current 64-bit x86 kernels on the latest Apple MacBook
(MacBook5,2) via EFI gives the following warning:

[    0.182209] ------------[ cut here ]------------
[    0.182222] WARNING: at arch/x86/mm/pageattr.c:581 __cpa_process_fault+0x44/0xa0()
[    0.182227] Hardware name: MacBook5,2
[    0.182231] CPA: called for zero pte. vaddr = ffff8800ffe00000 cpa->vaddr = ffff8800ffe00000
[    0.182236] Modules linked in:
[    0.182242] Pid: 0, comm: swapper Not tainted 2.6.31-rc4 #6
[    0.182246] Call Trace:
[    0.182254]  [<ffffffff8102c754>] ? __cpa_process_fault+0x44/0xa0
[    0.182261]  [<ffffffff81048668>] warn_slowpath_common+0x78/0xd0
[    0.182266]  [<ffffffff81048744>] warn_slowpath_fmt+0x64/0x70
[    0.182272]  [<ffffffff8102c7ec>] ? update_page_count+0x3c/0x50
[    0.182280]  [<ffffffff818d25c5>] ? phys_pmd_init+0x140/0x22e
[    0.182286]  [<ffffffff8102c754>] __cpa_process_fault+0x44/0xa0
[    0.182292]  [<ffffffff8102ce60>] __change_page_attr_set_clr+0x5f0/0xb40
[    0.182301]  [<ffffffff810d1035>] ? vm_unmap_aliases+0x175/0x190
[    0.182307]  [<ffffffff8102d4ae>] change_page_attr_set_clr+0xfe/0x3d0
[    0.182314]  [<ffffffff8102dcca>] _set_memory_uc+0x2a/0x30
[    0.182319]  [<ffffffff8102dd4b>] set_memory_uc+0x7b/0xb0
[    0.182327]  [<ffffffff818afe31>] efi_enter_virtual_mode+0x2ad/0x2c9
[    0.182334]  [<ffffffff818a1c66>] start_kernel+0x2db/0x3f4
[    0.182340]  [<ffffffff818a1289>] x86_64_start_reservations+0x99/0xb9
[    0.182345]  [<ffffffff818a1389>] x86_64_start_kernel+0xe0/0xf2
[    0.182357] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.182982] init_memory_mapping: 00000000ffffc000-0000000100000000
[    0.182993]  00ffffc000 - 0100000000 page 4k

This happens because the 64-bit version of efi_ioremap calls
init_memory_mapping for all addresses, regardless of whether they are
RAM or MMIO.  The EFI tables on this machine ask for runtime access to
some MMIO regions:

[    0.000000] EFI: mem195: type=11, attr=0x8000000000000000, range=[0x0000000093400000-0x0000000093401000) (0MB)
[    0.000000] EFI: mem196: type=11, attr=0x8000000000000000, range=[0x00000000ffc00000-0x00000000ffc40000) (0MB)
[    0.000000] EFI: mem197: type=11, attr=0x8000000000000000, range=[0x00000000ffc40000-0x00000000ffc80000) (0MB)
[    0.000000] EFI: mem198: type=11, attr=0x8000000000000000, range=[0x00000000ffc80000-0x00000000ffca4000) (0MB)
[    0.000000] EFI: mem199: type=11, attr=0x8000000000000000, range=[0x00000000ffca4000-0x00000000ffcb4000) (0MB)
[    0.000000] EFI: mem200: type=11, attr=0x8000000000000000, range=[0x00000000ffcb4000-0x00000000ffffc000) (3MB)
[    0.000000] EFI: mem201: type=11, attr=0x8000000000000000, range=[0x00000000ffffc000-0x0000000100000000) (0MB)

This arranges to pass the EFI memory type through to efi_ioremap, and
makes efi_ioremap use ioremap rather than init_memory_mapping if the
type is EFI_MEMORY_MAPPED_IO.  With this, the above warning goes away.

Signed-off-by: Paul Mackerras <paulus@samba.org>
LKML-Reference: <19062.55858.533494.471153@cargo.ozlabs.ibm.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-03 13:34:25 -07:00
Linus Torvalds
ed8d9adf35 x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
This is very useful for some common things like 'get_current()' and
'get_thread_info()', which can be used multiple times in a function, and
where the result is cacheable.

tj: Added the magical undocumented "P" modifier to UP __percpu_arg()
    to force gcc to dereference the pointer value passed in via the
    "p" input constraint.  Without this, percpu_read_stable() returns
    the address of the percpu variable.  Also added comment explaining
    the difference between percpu_read() and percpu_read_stable().

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-04 01:28:52 +09:00
David Woodhouse
6a12235c7d agp: kill phys_to_gart() and gart_to_phys()
There seems to be no reason for these -- they're a 1:1 mapping on all
platforms.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-08-03 09:05:00 +01:00
Ingo Molnar
47cab6a722 debug lockups: Improve lockup detection, fix generic arch fallback
As Andrew noted, my previous patch ("debug lockups: Improve lockup
detection") broke/removed SysRq-L support from architecture that do
not provide a __trigger_all_cpu_backtrace implementation.

Restore a fallback path and clean up the SysRq-L machinery a bit:

 - Rename the arch method to arch_trigger_all_cpu_backtrace()

 - Simplify the define

 - Document the method a bit - in the hope of more architectures
   adding support for it.

[ The patch touches Sparc code for the rename. ]

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
LKML-Reference: <20090802140809.7ec4bb6b.akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-03 09:56:52 +02:00
Rusty Russell
a91d74a3c4 lguest: update commentry
Every so often, after code shuffles, I need to go through and unbitrot
the Lguest Journey (see drivers/lguest/README).  Since we now use RCU in
a simple form in one place I took the opportunity to expand that explanation.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
2009-07-30 16:03:46 +09:30
Rusty Russell
2e04ef7691 lguest: fix comment style
I don't really notice it (except to begrudge the extra vertical
space), but Ingo does.  And he pointed out that one excuse of lguest
is as a teaching tool, it should set a good example.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
2009-07-30 16:03:45 +09:30
Bartlomiej Zolnierkiewicz
f3a0867b12 x86, mce: fix reporting of Thermal Monitoring mechanism enabled
Early Pentium M models use different method for enabling TM2
(per paragraph 13.5.2.3 of the "Intel 64 and IA-32 Architectures
Software Developer's Manual Volume 3A: System Programming Guide,
Part 1").

Tested on the affected Pentium M variant (model == 13).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-29 15:45:13 -07:00
Rafael J. Wysocki
b4093d6235 Merge branch 'master' into for-linus 2009-07-29 20:28:08 +02:00
FUJITA Tomonori
8d4f5339d1 x86, IA64, powerpc: add phys_to_dma() and dma_to_phys()
This adds two functions, phys_to_dma() and dma_to_phys() to x86, IA64
and powerpc. swiotlb uses them. phys_to_dma() converts a physical
address to a dma address. dma_to_phys() does the opposite.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
2009-07-28 14:19:20 +09:00
FUJITA Tomonori
99becaca86 x86: add dma_capable() to replace is_buffer_dma_capable()
dma_capable() eventually replaces is_buffer_dma_capable(), which tells
if a memory area is dma-capable or not. The problem of
is_buffer_dma_capable() is that it doesn't take a pointer to struct
device so it doesn't work for POWERPC.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
2009-07-28 14:19:19 +09:00
Linus Torvalds
ca597a02cd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure
  x86, amd: Don't probe for extended APIC ID if APICs are disabled
  x86, mce: Rename incorrect macro name "CONFIG_X86_THRESHOLD"
  x86-64: Fix bad_srat() to clear all state
  x86, mce: Fix set_trigger() accessor
  x86: Fix movq immediate operand constraints in uaccess.h
  x86: Fix movq immediate operand constraints in uaccess_64.h
  x86: Add reboot fixup for SBC-fitPC2
  x86: Include all of .data.* sections in _edata on 64-bit
  x86: Add quirk for Intel DG45ID board to avoid low memory corruption
2009-07-27 12:18:09 -07:00
Benjamin Herrenschmidt
9e1b32caa5 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27 12:10:38 -07:00
Magnus Damm
d7aacaddca Driver Core: Add platform device arch data V3
Allow architecture specific data in struct platform_device V3.

With this patch struct pdev_archdata is added to struct
platform_device, similar to struct dev_archdata in found in
struct device. Useful for architecture code that needs to
keep extra data associated with each platform device.

Struct pdev_archdata is different from dev.platform_data, the
convention is that dev.platform_data points to driver-specific
data. It may or may not be required by the driver. The format
of this depends on driver but is the same across architectures.

The structure pdev_archdata is a place for architecture specific
data. This data is handled by architecture specific code (for
example runtime PM), and since it is architecture specific it
should _never_ be touched by device driver code. Exactly like
struct dev_archdata but for platform devices.

[rjw: This change is for power management mostly and that's why it
 goes through the suspend tree.]

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-07-22 00:28:38 +02:00
Joseph Cihula
3162534069 x86, intel_txt: Intel TXT boot support
This patch adds kernel configuration and boot support for Intel Trusted
Execution Technology (Intel TXT).

Intel's technology for safer computing, Intel Trusted Execution
Technology (Intel TXT), defines platform-level enhancements that
provide the building blocks for creating trusted platforms.

Intel TXT was formerly known by the code name LaGrande Technology (LT).

Intel TXT in Brief:
o  Provides dynamic root of trust for measurement (DRTM)
o  Data protection in case of improper shutdown
o  Measurement and verification of launched environment

Intel TXT is part of the vPro(TM) brand and is also available some
non-vPro systems.  It is currently available on desktop systems based on
the Q35, X38, Q45, and Q43 Express chipsets (e.g. Dell Optiplex 755, HP
dc7800, etc.) and mobile systems based on the GM45, PM45, and GS45
Express chipsets.

For more information, see http://www.intel.com/technology/security/.
This site also has a link to the Intel TXT MLE Developers Manual, which
has been updated for the new released platforms.

A much more complete description of how these patches support TXT, how to
configure a system for it, etc. is in the Documentation/intel_txt.txt file
in this patch.

This patch provides the TXT support routines for complete functionality,
documentation for TXT support and for the changes to the boot_params structure,
and boot detection of a TXT launch.  Attempts to shutdown (reboot, Sx) the system
will result in platform resets; subsequent patches will support these shutdown modes
properly.

 Documentation/intel_txt.txt      |  210 +++++++++++++++++++++
 Documentation/x86/zero-page.txt  |    1
 arch/x86/include/asm/bootparam.h |    3
 arch/x86/include/asm/fixmap.h    |    3
 arch/x86/include/asm/tboot.h     |  197 ++++++++++++++++++++
 arch/x86/kernel/Makefile         |    1
 arch/x86/kernel/setup.c          |    4
 arch/x86/kernel/tboot.c          |  379 +++++++++++++++++++++++++++++++++++++++
 security/Kconfig                 |   30 +++
 9 files changed, 827 insertions(+), 1 deletion(-)

Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-21 11:49:06 -07:00
H. Peter Anvin
ebe119cd09 x86: Fix movq immediate operand constraints in uaccess.h
The movq instruction, generated by __put_user_asm() when used for
64-bit data, takes a sign-extended immediate ("e") not a zero-extended
immediate ("Z").

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: stable@kernel.org
2009-07-20 23:27:39 -07:00
Uros Bizjak
155b735295 x86: Fix movq immediate operand constraints in uaccess_64.h
arch/x86/include/asm/uaccess_64.h uses wrong asm operand constraint
("ir") for movq insn. Since movq sign-extends its immediate operand,
"er" constraint should be used instead.

Attached patch changes all uses of __put_user_asm in uaccess_64.h to use
"er" when "q" insn suffix is involved.

Patch was compile tested on x86_64 with defconfig.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
2009-07-20 20:46:17 -07:00
Akinobu Mita
57594742a2 x86: Introduce set_desc_base() and set_desc_limit()
Rename set_base()/set_limit to set_desc_base()/set_desc_limit()
and rewrite them in C. These are naturally introduced by the
idea of get_desc_base()/get_desc_limit().

The conversion actually found the bug in apm_32.c:
bad_bios_desc is written at run-time, but it is defined const
variable.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090718151105.GC11294@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-19 18:27:52 +02:00
Akinobu Mita
fde0312d01 x86: Remove unused patch_espfix_desc()
patch_espfix_desc() is not used after commit
dc4c2a0aed

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090718150955.GB11294@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-19 18:27:52 +02:00
Linus Torvalds
499ee0710f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/pci: insert ioapic resource before assigning unassigned resources
2009-07-17 10:51:55 -07:00
Matias Zabaljauregui
5780888bca lguest: fix journey
fix: "make Guest" was complaining about duplicated G:032

Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-17 21:47:44 +09:30
Linus Torvalds
85be928c41 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf report: Add "Fractal" mode output - support callchains with relative overhead rate
  perf_counter tools: callchains: Manage the cumul hits on the fly
  perf report: Change default callchain parameters
  perf report: Use a modifiable string for default callchain options
  perf report: Warn on callchain output request from non-callchain file
  x86: atomic64: Inline atomic64_read() again
  x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
  x86: atomic64: Improve atomic64_xchg()
  x86: atomic64: Export APIs to modules
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
  x86: atomic64: Fix unclean type use in atomic64_xchg()
  x86: atomic64: Make atomic_read() type-safe
  x86: atomic64: Reduce size of functions
  x86: atomic64: Improve atomic64_add_return()
  x86: atomic64: Improve cmpxchg8b()
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
  x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
  perf report: Annotate variable initialization
  ...
2009-07-10 14:25:03 -07:00
Peter Zijlstra
c99e6efe1b sched: INIT_PREEMPT_COUNT
Pull the initial preempt_count value into a single
definition site.

Maintainers for: alpha, ia64 and m68k, please have a look,
your arch code is funny.

The header magic is a bit odd, but similar to the KERNEL_DS
one, CPP waits with expanding these macros until the
INIT_THREAD_INFO macro itself is expanded, which is in
arch/*/kernel/init_task.c where we've already included
sched.h so we're good.

Cc: tony.luck@intel.com
Cc: rth@twiddle.net
Cc: geert@linux-m68k.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 14:24:05 -07:00
Yinghai Lu
857fdc53a0 x86/pci: insert ioapic resource before assigning unassigned resources
Stephen reported that his DL585 G2 needed noapic after 2.6.22 (?)

Dann bisected it down to:
  commit 30a18d6c3f
  Date:   Tue Feb 19 03:21:20 2008 -0800

      x86: multi pci root bus with different io resource range, on
      64-bit

It turns out that:
  1. that AMD-based systems have two HT chains.
  2. BIOS doesn't allocate resources for BAR 6 of devices under 8132 etc
  3. that multi-peer-root patch will try to split root resources to peer
     root resources according to PCI conf of NB
  4. PCI core assigns unassigned resources, but they overlap with BARs
     that are used by ioapic addr of io4 and 8132.

The reason: at that point ioapic address are not inserted yet.  Solution
is to insert ioapic resources into the tree a bit earlier.

Reported-by: Stephen Frost <sfrost@snowman.net>
Reported-and-Tested-by: dann frazier <dannf@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@jbarnes-g45.(none)>
2009-07-10 13:03:14 -07:00
Linus Torvalds
e864561c12 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  cxgb3: Fix crash caused by stashing wrong netdev_queue
  ixgbe: Fix coexistence of FCoE and Flow Director in 82599
  memory barrier: adding smp_mb__after_lock
  net: adding memory barrier to the poll and receive callbacks
  netpoll: Fix carrier detection for drivers that are using phylib
  includecheck fix: include/linux, rfkill.h
  p54: tx refused but queue active
  Atheros Kconfig needs to be dependent on WLAN_80211
  mac80211: fix docbook
  mac80211_hwsim: avoid NULL access
  ssb: Add support for 4318E
  b43: Add support for 4318E
  zd1211rw: adding SONY IFU-WLM2 (054c:0257) as a zd1211b device
  zd1211rw: 07b8:6001 is a ZD1211B
  r6040: bump driver version to 0.24 and date to 08 July 2009
  r6040: restore MIER register correctly when IRQ line is shared
  ipv4: Fix fib_trie rebalancing, part 4 (root thresholds)
  davinci_emac: fix kernel oops when changing MAC address while interface is down
  igb: set lan id prior to configuring phy
  mac80211: minstrel: avoid accessing negative indices in rix_to_ndx()
  ...
2009-07-09 20:33:18 -07:00
Andi Kleen
3ccdccfadb x86: mce: Lower maximum number of banks to architecture limit
The Intel x86 architecture right now only supports 32 machine check
banks, more would bump into other MSRs.

So lower the max define to 32.

This only affects a few bitmaps, most data structures are dynamically
sized anyways.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
a2d32bcbc0 x86: mce: macros to compute banks MSRs
Instead of open coded calculations for bank MSRs hide the indexing of higher
banks MCE register MSRs in new macros.

No semantic changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
c1ebf83561 x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
Drop the CONFIG_X86_NEW_MCE symbol and change all
references to it to check for CONFIG_X86_MCE directly.

No code changes

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
5bb38adcb5 x86: mce: Remove old i386 machine check code
As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE

This patch only removes code.

The ancient machine check code for very old systems that are not supported
by CONFIG_X86_NEW_MCE is still kept.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:46 -07:00
Jiri Olsa
ad46276952 memory barrier: adding smp_mb__after_lock
Adding smp_mb__after_lock define to be used as a smp_mb call after
a lock.

Making it nop for x86, since {read|write|spin}_lock() on x86 are
full memory barriers.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:58 -07:00
Linus Torvalds
faf80d62e4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix usage of bios intcall()
  x86: Remove unused function lapic_watchdog_ok()
  x86: Remove unused variable disable_x2apic
  x86, kvm: Fix section mismatches in kvm.c
  x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user
  x86: Fix fixmap page order for FIX_TEXT_POKE0,1
  amd-iommu: set evt_buf_size correctly
  amd-iommu: handle alias entries correctly in init code
  x86: Fix printk call in print_local_apic()
  x86: Declare check_efer() before it gets used
  x86: Mark device_nb as static and fix NULL noise
  x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
  xen: Use kcalloc() in xen_init_IRQ()
  x86: Fix fixmap ordering
  x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c
2009-07-06 17:45:44 -07:00
Eric Dumazet
a79f0da80a x86: atomic64: Inline atomic64_read() again
Now atomic64_read() is light weight (no register pressure and
small icache), we can inline it again.

Also use "=&A" constraint instead of "+A" to avoid warning
about unitialized 'res' variable. (gcc had to force 0 in eax/edx)

  $ size vmlinux.prev vmlinux.after
     text    data     bss     dec     hex filename
  4908667  451676 1684868 7045211  6b805b vmlinux.prev
  4908651  451676 1684868 7045195  6b804b vmlinux.after

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <4A4E1AA2.30002@gmail.com>
[ Also fix typo in atomic64_set() export ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:45:00 +02:00
Tejun Heo
8c4bfc6e88 x86,percpu: generalize lpage first chunk allocator
Generalize and move x86 setup_pcpu_lpage() into
pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
around the generalized version.  Other than taking size parameters and
using arch supplied callbacks to allocate/free/map memory,
pcpu_lpage_first_chunk() is identical to the original implementation.

This simplifies arch code and will help converting more archs to
dynamic percpu allocator.

While at it, factor out pcpu_calc_fc_sizes() which is common to
pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().

[ Impact: code reorganization and generalization ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04 08:10:59 +09:00
Ingo Molnar
3a8d1788b3 x86: atomic64: Improve atomic64_xchg()
Remove the read-first logic from atomic64_xchg() and simplify
the loop.

This function was the last user of __atomic64_read() - remove it.

Also, change the 'real_val' assumption from the somewhat quirky
1ULL << 32 value to the (just as arbitrary, but simpler) value
of 0.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <tip-05118ab8859492ac9ddda0154cf90e37b0a4a0b0@git.kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 20:23:55 +02:00
Paul Mackerras
8e049ef054 x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
Occasionally we get bugs where atomic_read or atomic_set are
used on atomic64_t variables or vice versa.  These bugs don't
generate warnings on x86 because atomic_read and atomic_set are
coded as macros rather than C functions, so we don't get any
type-checking on their arguments; similarly for atomic64_read
and atomic64_set in 64-bit kernels.

This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily. It
also converts atomic_cmpxchg and atomic_xchg, and
atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
type-checking on their arguments too.

Compiling a typical 64-bit x86 config, this generates no new
warnings, and the vmlinux text is 86 bytes smaller.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:42:39 +02:00
Jaswinder Singh Rajput
c7210e1ff8 x86: Remove unused function lapic_watchdog_ok()
lapic_watchdog_ok() is a global function but no one is using it.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1246554335.2242.29.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:34:31 +02:00
Mathieu Desnoyers
12b9d7ccb8 x86: Fix fixmap page order for FIX_TEXT_POKE0,1
Masami reported:

> Since the fixmap pages are assigned higher address to lower,
> text_poke() has to use it with inverted order (FIX_TEXT_POKE1
> to FIX_TEXT_POKE0).

I prefer to just invert the order of the fixmap declaration.
It's simpler and more straightforward.

Backward fixmaps seems to be used by both x86 32 and 64.

It's really rare but a nasty bug, because it only hurts when
instructions to patch are crossing a page boundary. If this
happens, the fixmap write accesses will spill on the following
fixmap, which may very well crash the system. And this does not
crash the system, it could leave illegal instructions in place.
Thanks Masami for finding this.

It seems to have crept into the 2.6.30-rc series, so this calls
for a -stable inclusion.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090701213722.GH19926@Krystal>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:34:09 +02:00
Ingo Molnar
3217120873 x86: atomic64: Make atomic_read() type-safe
Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.

Change atomic_read() to be a type-safe inline, and this exposes
the atomic64 bogosity as well:

  arch/x86/lib/atomic64_32.c: In function ‘atomic64_xchg’:
  arch/x86/lib/atomic64_32.c:39: warning: passing argument 1 of ‘atomic_read’ from incompatible pointer type

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 13:26:45 +02:00
Ingo Molnar
b7882b7c65 x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.

Move them to a separate file: arch/x86/lib/atomic64_32.c

Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.

This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.

[ More intrusive changes are not yet done in this patch. ]

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 13:26:39 +02:00
Eric Dumazet
bbf2a330d9 x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
Locked instructions on two cache lines at once are painful. If
atomic64_t uses two cache lines, my test program is 10x slower.

The chance for that is significant: 4/32 or 12.5%.

Make sure an atomic64_t is 8 bytes aligned.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
[ changed it to __aligned(8) as per Andrew's suggestion ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 13:26:38 +02:00
Linus Torvalds
43644679a1 x86: fix power-of-2 round_up/round_down macros
These macros had two bugs:
 - the type of the mask was not correctly expanded to the full size of
   the argument being expanded, resulting in possible loss of high bits
   when mixing types.
 - the alignment argument was evaluated twice, despite the macro looking
   like a fancy function (but it really does need to be a macro, since
   it works on arbitrary integer types)

Noticed by Peter Anvin, and with a fix that is a modification of his
suggestion (bug noticed by Yinghai Lu).

Cc: Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-02 12:05:10 -07:00
Frederic Weisbecker
0406ca6d8e perf_counter: Ignore the nmi call frames in the x86-64 backtraces
About every callchains recorded with perf record are filled up
including the internal perfcounter nmi frame:

 perf_callchain
 perf_counter_overflow
 intel_pmu_handle_irq
 perf_counter_nmi_handler
 notifier_call_chain
 atomic_notifier_call_chain
 notify_die
 do_nmi
 nmi

We want ignore this frame as it's not interesting for
instrumentation. To solve this, we simply ignore every frames
from nmi context.

New example of "perf report -s sym -c" after this patch:

9.59%  [k] search_by_key
             4.88%
                search_by_key
                reiserfs_read_locked_inode
                reiserfs_iget
                reiserfs_lookup
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
                0x406fb1

             3.19%
                search_by_key
                search_by_entry_key
                reiserfs_find_entry
                reiserfs_lookup
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
                0x406fb1
[...]

For now this patch only solves the problem in x86-64.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 22:37:23 +02:00
David Woodhouse
788d84bba4 Fix pci_unmap_addr() et al on i386.
We can run a 32-bit kernel on boxes with an IOMMU, so we need
pci_unmap_addr() etc. to work -- without it, drivers will leak mappings.

To be honest, this whole thing looks like it's more pain than it's
worth; I'm half inclined to remove the no-op #else case altogether.

But this is the minimal fix, which just does the right thing if
CONFIG_DMAR is set.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org  [ for 2.6.30 ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-01 11:19:29 -07:00
Jaswinder Singh Rajput
44973998a1 x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1 is already declared in msr-index.h.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1246450778.6940.8.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 15:23:43 +02:00
Linus Torvalds
55bcab4695 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
  perf report: Add --symbols parameter
  perf report: Add --comms parameter
  perf report: Add --dsos parameter
  perf_counter tools: Adjust only prelinked symbol's addresses
  perf_counter: Provide a way to enable counters on exec
  perf_counter tools: Reduce perf stat measurement overhead/skew
  perf stat: Use percentages for scaling output
  perf_counter, x86: Update x86_pmu after WARN()
  perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
  perf stat: Improve output
  perf stat: Fix multi-run stats
  perf stat: Add -n/--null option to run without counters
  perf_counter tools: Remove dead code
  perf_counter: Complete counter swap
  perf report: Print sorted callchains per histogram entries
  perf_counter tools: Prepare a small callchain framework
  perf record: Fix unhandled io return value
  perf_counter tools: Add alias for 'l1d' and 'l1i'
  perf-report: Add bare minimum PERF_EVENT_READ parsing
  perf-report: Add modes for inherited stats and no-samples
  ...
2009-06-30 19:02:59 -07:00
Jan Beulich
789d03f584 x86: Fix fixmap ordering
The merge of the 32- and 64-bit fixmap headers made a latent
bug on x86-64 a real one: with the right config settings
it is possible for FIX_OHCI1394_BASE to overlap the FIX_BTMAP_*
range.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org> # for 2.6.30.x
LKML-Reference: <4A4A0A8702000078000082E8@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 00:12:22 +02:00
Figo.zhang
ce0c0f9eec x86, pgtable.h: Clean up types
Use "unsigned long" consistently, not "unsigned".

Signed-off-by: Figo.zhang <figo1802@gmail.com>
LKML-Reference: <1246183659.2530.4.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 06:14:42 +02:00
Akinobu Mita
087975b06b x86: Clean up dump_pagetable()
Use pgtable access helpers for 32-bit version dump_pagetable()
and get rid of __typeof__() operators. This needs to make
pmd_pfn() available for 2-level pgtable.

Also, remove some casts for 64-bit version dump_pagetable().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090627063514.GA2834@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 06:14:42 +02:00
Linus Torvalds
8326e284f8 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, delay: tsc based udelay should have rdtsc_barrier
  x86, setup: correct include file in <asm/boot.h>
  x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
  x86, mce: percpu mcheck_timer should be pinned
  x86: Add sysctl to allow panic on IOCK NMI error
  x86: Fix uv bau sending buffer initialization
  x86, mce: Fix mce resume on 32bit
  x86: Move init_gbpages() to setup_arch()
  x86: ensure percpu lpage doesn't consume too much vmalloc space
  x86: implement percpu_alloc kernel parameter
  x86: fix pageattr handling for lpage percpu allocator and re-enable it
  x86: reorganize cpa_process_alias()
  x86: prepare setup_pcpu_lpage() for pageattr fix
  x86: rename remap percpu first chunk allocator to lpage
  x86: fix duplicate free in setup_pcpu_remap() failure path
  percpu: fix too lazy vunmap cache flushing
  x86: Set cpu_llc_id on AMD CPUs
2009-06-28 11:05:28 -07:00
H. Peter Anvin
658dbfeb5e x86, setup: correct include file in <asm/boot.h>
<asm/boot.h> needs <asm/pgtable_types.h>, not <asm/page_types.h> in
order to resolve PMD_SHIFT.  Also, correct a +1 which really should be
+ THREAD_ORDER.

This is a build error which was masked by a typoed #ifdef.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-25 15:16:06 -07:00
Robert P. J. Day
22f4319d6b x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
CONFIG_X86_64 was misspelled (wrong case), which caused the x86-64
kernel to advertise itself as more relocatable than it really is.
This could in theory cause boot failures once bootloaders start
support the new relocation fields.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-25 13:33:11 -07:00
Peter Zijlstra
194002b274 perf_counter, x86: Add mmap counter read support
Update the mmap control page with the needed information to
use the userspace RDPMC instruction for self monitoring.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 21:39:06 +02:00
Linus Torvalds
236e946b53 Revert "PCI: use ACPI _CRS data by default"
This reverts commit 9e9f46c44e.

Quoting from the commit message:

 "At this point, it seems to solve more problems than it causes, so let's
  try using it by default.  It's an easy revert if it ends up causing
  trouble."

And guess what? The _CRS code causes trouble.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-24 16:23:03 -07:00
Len Brown
fbe8cddd2d Merge branches 'acerhdf', 'acpi-pci-bind', 'bjorn-pci-root', 'bugzilla-12904', 'bugzilla-13121', 'bugzilla-13396', 'bugzilla-13533', 'bugzilla-13612', 'c3_lock', 'hid-cleanups', 'misc-2.6.31', 'pdc-leak-fix', 'pnpacpi', 'power_nocheck', 'thinkpad_acpi', 'video' and 'wmi' into release 2009-06-24 01:19:50 -04:00
Linus Torvalds
687d680985 Merge git://git.infradead.org/~dwmw2/iommu-2.6.31
* git://git.infradead.org/~dwmw2/iommu-2.6.31:
  intel-iommu: Fix one last ia64 build problem in Pass Through Support
  VT-d: support the device IOTLB
  VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
  VT-d: add device IOTLB invalidation support
  VT-d: parse ATSR in DMA Remapping Reporting Structure
  PCI: handle Virtual Function ATS enabling
  PCI: support the ATS capability
  intel-iommu: dmar_set_interrupt return error value
  intel-iommu: Tidy up iommu->gcmd handling
  intel-iommu: Fix tiny theoretical race in write-buffer flush.
  intel-iommu: Clean up handling of "caching mode" vs. IOTLB flushing.
  intel-iommu: Clean up handling of "caching mode" vs. context flushing.
  VT-d: fix invalid domain id for KVM context flush
  Fix !CONFIG_DMAR build failure introduced by Intel IOMMU Pass Through Support
  Intel IOMMU Pass Through Support

Fix up trivial conflicts in drivers/pci/{intel-iommu.c,intr_remapping.c}
2009-06-22 21:38:22 -07:00
Linus Torvalds
59ef7a83f1 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (74 commits)
  PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write
  PCI: Fix the NIU MSI-X problem in a better way
  PCI ASPM: remove get_root_port_link
  PCI ASPM: cleanup pcie_aspm_sanity_check
  PCI ASPM: remove has_switch field
  PCI ASPM: cleanup calc_Lx_latency
  PCI ASPM: cleanup pcie_aspm_get_cap_device
  PCI ASPM: cleanup clkpm checks
  PCI ASPM: cleanup __pcie_aspm_check_state_one
  PCI ASPM: cleanup initialization
  PCI ASPM: cleanup change input argument of aspm functions
  PCI ASPM: cleanup misc in struct pcie_link_state
  PCI ASPM: cleanup clkpm state in struct pcie_link_state
  PCI ASPM: cleanup latency field in struct pcie_link_state
  PCI ASPM: cleanup aspm state field in struct pcie_link_state
  PCI ASPM: fix typo in struct pcie_link_state
  PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
  PCI: remove redundant __msi_set_enable()
  PCI PM: consistently use type bool for wake enable variable
  x86/ACPI: Correct maximum allowed _CRS returned resources and warn if exceeded
  ...
2009-06-22 11:59:51 -07:00
Tejun Heo
e59a1bb2fd x86: fix pageattr handling for lpage percpu allocator and re-enable it
lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator.  When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.

This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.

pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address.  pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().

Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.

With this, lpage percpu allocator should work correctly.  Re-enable
it.

[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Borislav Petkov
e487683990 x86, mce: fix typo in comment in asm/mce.h
Fix comment to match the actual declaration.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-20 23:27:16 -07:00
Linus Torvalds
9063c61fd5 x86, 64-bit: Clean up user address masking
The discussion about using "access_ok()" in get_user_pages_fast() (see
commit 7f81890687: "x86: don't use
'access_ok()' as a range check in get_user_pages_fast()" for details and
end result), made us notice that x86-64 was really being very sloppy
about virtual address checking.

So be way more careful and straightforward about masking x86-64 virtual
addresses:

 - All the VIRTUAL_MASK* variants now cover half of the address
   space, it's not like we can use the full mask on a signed
   integer, and the larger mask just invites mistakes when
   applying it to either half of the 48-bit address space.

 - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
   obvious when it transforms a file offset into a
   (kernel-half) virtual address.

 - Unify/simplify the 32-bit and 64-bit USER_DS definition to
   be based on TASK_SIZE_MAX.

This cleanup and more careful/obvious user virtual address checking also
uncovered a buglet in the x86-64 implementation of strnlen_user(): it
would do an "access_ok()" check on the whole potential area, even if the
string itself was much shorter, and thus return an error even for valid
strings. Our sloppy checking had hidden this.

So this fixes 'strnlen_user()' to do this properly, the same way we
already handled user strings in 'strncpy_from_user()'.  Namely by just
checking the first byte, and then relying on fault handling for the
rest.  That always works, since we impose a guard page that cannot be
mapped at the end of the user space address space (and even if we
didn't, we'd have the address space hole).

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-20 15:40:00 -07:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds
1eb51c33b2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix out of scope variable access in sched_slice()
  sched: Hide runqueues from direct refer at source code level
  sched: Remove unneeded __ref tag
  sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20 10:57:40 -07:00
Linus Torvalds
c4c5ab3089 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
  x86, mce: fix error path in mce_create_device()
  x86: use zalloc_cpumask_var for mce_dev_initialized
  x86: fix duplicated sysfs attribute
  x86: de-assembler-ize asm/desc.h
  i386: fix/simplify espfix stack switching, move it into assembly
  i386: fix return to 16-bit stack from NMI handler
  x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
  x86: Remove duplicated #include's
  x86: msr.h linux/types.h is only required for __KERNEL__
  x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
  x86, mce: mce_intel.c needs <asm/apic.h>
  x86: apic/io_apic.c: dmar_msi_type should be static
  x86, io_apic.c: Work around compiler warning
  x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
  x86: mce: Handle banks == 0 case in K7 quirk
  x86, boot: use .code16gcc instead of .code16
  x86: correct the conversion of EFI memory types
  x86: cap iomem_resource to addressable physical memory
  x86, mce: rename _64.c files which are no longer 64-bit-specific
  x86, mce: mce.h cleanup
  ...

Manually fix up trivial conflict in arch/x86/mm/fault.c
2009-06-20 10:49:48 -07:00
Ingo Molnar
1d99100120 Merge branch 'x86/mce3' into x86/urgent 2009-06-20 10:54:22 +02:00
Ingo Molnar
0c87197142 perf_counter, x86: Improve interactions with fast-gup
Improve a few details in perfcounter call-chain recording that
makes use of fast-GUP:

- Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
  racy and can be changed on another CPU, so we have to be careful
  about how we access them. The PAE branch is already careful with
  read-barriers - but the non-PAE and 64-bit side needs an
  ACCESS_ONCE() to make sure the pte value is observed only once.

- make the checks a bit stricter so that we can feed it any kind of
  cra^H^H^H user-space input ;-)

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 16:55:16 +02:00
Arnd Bergmann
73a2d096fd x86: remove all now-duplicate header files
All files that have been made identical to the asm-generic
version in the previous patches can now be removed,
guaranteeing that this does not introduce semantic changes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <cover.1245354003.git.arnd@arndb.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 14:40:03 -07:00
Arnd Bergmann
69d5ffdaad x86: convert termios.h to the asm-generic version
This patch turned out more controversial than expected
and may get dropped in the future. I'm including it
for reference anyway.

The user_termio_to_kernel_termios and kernel_termios_to_user_termio
functions on x86 are lacking error checking from get_user and
are not portable to big-endian systems, so the asm-generic
header has to differ in this regard.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <cover.1245354003.git.arnd@arndb.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 14:39:58 -07:00
Arnd Bergmann
06f5013aa8 x86: convert almost generic headers to asm-generic version
In x86, mman.h, module.h, scatterlist.h, types.h and ucontext.h
can use the asm-generic version by just defining the x86
specific parts locally and falling back on the generic code
for the common bits.

This patch illustrates the differences between the x86 and
asm-generic versions by changing a file that is initially
identical to the x86 version to one that is identical
to the asm-generic version.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <cover.1245354003.git.arnd@arndb.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 14:39:53 -07:00
Arnd Bergmann
7bfd124d6d x86: convert trivial headers to asm-generic version
For these nine header files, the asm-generic version should
be semantically identical to what is in x86. Change the
contents to be binary identical, for better review.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <cover.1245354003.git.arnd@arndb.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 14:39:47 -07:00
Arnd Bergmann
4adc667593 x86: add copies of some headers to convert to asm-generic
Just an intermediate step to make reviewing easier.
These files are identical copies of the existing headers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <cover.1245354003.git.arnd@arndb.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 14:39:40 -07:00
FUJITA Tomonori
7c095e4603 dma-mapping: x86: use asm-generic/dma-mapping-common.h
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:58 -07:00
Alexander van Heukelum
bc3f5d3dbd x86: de-assembler-ize asm/desc.h
asm/desc.h is included in three assembly files, but the only macro
it defines, GET_DESC_BASE, is never used. This patch removes the
includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
from asm/desc.h.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:35:10 -07:00
Jeremy Fitzhardinge
e6e9cac8c3 x86: split out core __math_state_restore
Split the core fpu state restoration out into __math_state_restore, which
assumes that cr0.TS is clear and that the fpu context has been initialized.

This will be used during context switch.  There are two reasons this is
desireable:

- There's a small clarification.  When __switch_to() calls math_state_restore,
  it relies on the fact that tsk_used_math() returns true, and so will
  never do a blocking init_fpu().  __math_state_restore() does not have
  (or need) that logic, so the question never arises.

- It allows the clts() to be moved earler in __switch_to() so it can be performed
  while cpu context updates are batched (will be done in a later patch).

[ Impact: refactor code to make reuse cleaner; no functional change ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-06-17 13:21:25 -07:00
Jeremy Fitzhardinge
ac5672f82c x86/paravirt: split paravirt definitions into paravirt_types.h
Split the monolithic asm/paravirt.h into separate paravirt.h (inlines and other
"active" definitions), and paravirt_types.h (types, constants and other "passive"
definitions).  This makes it easier to use the type/constant definitions without
pulling in everything else and causing circular dependency problems.

[ Impact: cleanup ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-06-17 13:07:12 -07:00
Jaswinder Singh Rajput
8fa62ad9d2 x86: msr.h linux/types.h is only required for __KERNEL__
<linux/types.h> is only required for __KERNEL__ as whole file is covered with it

Also fixed some spacing issues for usr/include/asm-x86/msr.h

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: "H. Peter Anvin" <hpa@kernel.org>
LKML-Reference: <1245228070.2662.1.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:56:01 +02:00