Move some structure definitions from the C code to the BAU
header file, and change the organization of that header file a
little.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aI-54@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use a pointer from the per-cpu BAU control structure to the
per-cpu BAU statistics structure.
We nearly always know the first before needing the second.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aB-2k@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The numalink network can become so congested that TLB shootdown
using the Broadcast Assist Unit becomes slower than using IPI's.
In that case, disable the use of the BAU for a period of time.
The period is tunable. When the period expires the use of the
BAU is re-enabled. A count of these actions is added to the
statistics file.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004a4-0a@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the Broadcast Assist Unit driver's nine tuning values variable by
making them accessible through a read/write debugfs file.
The file will normally be mounted as
/sys/kernel/debug/sgi_uv/bau_tunables. The tunables are kept in each
cpu's per-cpu BAU structure.
The patch also does a little name improvement, and corrects the reset of
two destination timeout counters.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNx-0004Zx-Uo@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Calculate the Broadcast Assist Unit's destination timeout period from the
values in the relevant MMR's.
Store it in each cpu's per-cpu BAU structure so that a destination
timeout can be differentiated from a 'plugged' situation in which all
software ack resources are already allocated and a timeout is pending.
That case returns an immediate destination error.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNx-0004Zq-RK@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
i7core_edac: Better describe the supported devices
Add support for Westmere to i7core_edac driver
i7core_edac: don't free on success
i7core_edac: Add support for X5670
Always call i7core_[ur]dimm_check_mc_ecc_err
i7core_edac: fix memory leak of i7core_dev
EDAC: add __init to i7core_xeon_pci_fixup
i7core_edac: Fix wrong device id for channel 1 devices
i7core: add support for Lynnfield alternate address
i7core_edac: Add initial support for Lynnfield
i7core_edac: do not export static functions
edac: fix i7core build
edac: i7core_edac produces undefined behaviour on 32bit
i7core_edac: Use a more generic approach for probing PCI devices
i7core_edac: PCI device is called NONCORE, instead of NOCORE
i7core_edac: Fix ringbuffer maxsize
i7core_edac: First store, then increment
i7core_edac: Better parse "any" addrmask
i7core_edac: Use a lockless ringbuffer
edac: Create an unique instance for each kobj
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, smpboot: Fix cores per node printing on boot
x86/amd-iommu: Fall back to GART if initialization fails
x86/amd-iommu: Fix crash when request_mem_region fails
x86/mm: Remove unused DBG() macro
arch/x86/kernel: Add missing spin_unlock
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix crash in swevents
perf buildid-list: Fix --with-hits event processing
perf scripts python: Give field dict to unhandled callback
perf hist: fix objdump output parsing
perf-record: Check correct pid when forking
perf: Do the comm inheritance per thread in event__process_task
perf: Use event__process_task from perf sched
perf: Process comm events by tid
blktrace: Fix new kernel-doc warnings
perf_events: Fix unincremented buffer base on partial copy
perf_events: Fix event scheduling issues introduced by transactional API
perf_events, trace: Fix perf_trace_destroy(), mutex went missing
perf_events, trace: Fix probe unregister race
perf_events: Fix races in group composition
perf_events: Fix races and clean up perf_event and perf_mmap_data interaction
Percpu initialization happens now after booting the cores on the
machine and this causes them all to be displayed as belonging to
node 0:
Jun 8 05:57:21 kepek kernel: [ 0.106999] Booting Node 0,
Processors #1#2#3#4#5#6#7#8#9#10#11#12#13#14#15#16#17#18#19#20#21#22#23 Ok.
Use early_cpu_to_node() to get the correct node of each core
instead.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <20100601190455.GA14237@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
kbuild: Revert part of e8d400a to resolve a conflict
kbuild: Fix checking of scm-identifier variable
gconfig: add support to show hidden options that have prompts
menuconfig: add support to show hidden options which have prompts
gconfig: remove show_debug option
gconfig: remove dbg_print_ptype() and dbg_print_stype()
kconfig: fix zconfdump()
kconfig: some small fixes
add random binaries to .gitignore
kbuild: Include gen_initramfs_list.sh and the file list in the .d file
kconfig: recalc symbol value before showing search results
.gitignore: ignore *.lzo files
headerdep: perlcritic warning
scripts/Makefile.lib: Align the output of LZO
kbuild: Generate modules.builtin in make modules_install
Revert "kbuild: specify absolute paths for cscope"
kbuild: Do not unnecessarily regenerate modules.builtin
headers_install: use local file handles
headers_check: fix perl warnings
export_report: fix perl warnings
...
This patch implements a fallback to the GART IOMMU if this
is possible and the AMD IOMMU initialization failed.
Otherwise the fallback would be nommu which is very
problematic on machines with more than 4GB of memory or
swiotlb which hurts io-performance.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When request_mem_region fails the error path tries to
disable the IOMMUs. This accesses the mmio-region which was
not allocated leading to a kernel crash. This patch fixes
the issue.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The transactional API patch between the generic and model-specific
code introduced several important bugs with event scheduling, at
least on X86. If you had pinned events, e.g., watchdog, and were
over-committing the PMU, you would get bogus counts. The bug was
showing up on Intel CPU because events would move around more
often that on AMD. But the problem also existed on AMD, though
harder to expose.
The issues were:
- group_sched_in() was missing a cancel_txn() in the error path
- cpuc->n_added was not properly maintained, leading to missing
actions in hw_perf_enable(), i.e., n_running being 0. You cannot
update n_added until you know the transaction has succeeded. In
case of failed transaction n_added was not adjusted back.
- in case of failed transactions, event_sched_out() was called
and eventually invoked x86_disable_event() to touch the HW reg.
But with transactions, on X86, event_sched_in() does not touch
HW registers, it simply collects events into a list. Thus, you
could end up calling x86_disable_event() on a counter which
did not correspond to the current event when idx != -1.
The patch modifies the generic and X86 code to avoid all those problems.
First, we keep track of the number of events added last. In case the
transaction fails, we substract them from n_added. This approach is
necessary (as opposed to delaying updates to n_added) because not all
event updates use the transaction API, e.g., single events.
Second, we encapsulate the event_sched_in() and event_sched_out() in
group_sched_in() inside the transaction. That makes the operations
symmetrical and you can also detect that you are inside a transaction
and skip the HW reg access by checking cpuc->group_flag.
With this patch, you can now overcommit the PMU even with pinned
system-wide events present and still get valid counts.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274796225.5882.1389.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cpufeature: Unbreak compile with gcc 3.x
x86, pat: Fix memory leak in free_memtype
x86, k8: Fix section mismatch for powernowk8_exit()
lib/atomic64_test: fix missing include of linux/kernel.h
x86: remove last traces of quicklist usage
x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012
x86: "nosmp" command line option should force the system into UP mode
arch/x86/pci: use kasprintf
x86, apic: ack all pending irqs when crashed/on kexec
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
tracing: Add __used annotation to event variable
perf, trace: Fix !x86 build bug
perf report: Support multiple events on the TUI
perf annotate: Fix up usage of the build id cache
x86/mmiotrace: Remove redundant instruction prefix checks
perf annotate: Add TUI interface
perf tui: Remove annotate from popup menu after failure
perf report: Don't start the TUI if -D is used
perf: Fix getline undeclared
perf: Optimize perf_tp_event_match()
perf: Remove more code from the fastpath
perf: Optimize the !vmalloc backed buffer
perf: Optimize perf_output_copy()
perf: Fix wakeup storm for RO mmap()s
perf-record: Share per-cpu buffers
perf-record: Remove -M
perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
...
x86 arch specific changes to use generic numa_node_id() based on generic
percpu variable infrastructure. Back out x86's custom version of
numa_node_id()
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for msr, cpuid, and
therm_throt.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a spin_unlock missing on the error path. The locks and unlocks are
balanced in other functions, so it seems that the same should be the case
here.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E1;
@@
* spin_lock(E1,...);
<+... when != E1
if (...) {
... when != E1
* return ...;
}
...+>
* spin_unlock(E1,...);
// </smpl>
Cc: stable@kernel.org
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Fix the following warning:
"WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72):
Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb
The function __exit powernowk8_exit() references a variable
__cpuinitdata cpb_nb. This is often seen when error handling in the exit
function uses functionality in the init path. The fix is often to remove
the __cpuinitdata annotation of cpb_nb so it may be used outside an init
section."
Cc: <stable@kernel.org>
Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100525152858.GA24836@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This adds:
alias: devname:<name>
to some common kernel modules, which will allow the on-demand loading
of the kernel module when the device node is accessed.
Ideally all these modules would be compiled-in, but distros seems too
much in love with their modularization that we need to cover the common
cases with this new facility. It will allow us to remove a bunch of pretty
useless init scripts and modprobes from init scripts.
The static device node aliases will be carried in the module itself. The
program depmod will extract this information to a file in the module directory:
$ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
# Device nodes to trigger on-demand module loading.
microcode cpu/microcode c10:184
fuse fuse c10:229
ppp_generic ppp c108:0
tun net/tun c10:200
dm_mod mapper/control c10:235
Udev will pick up the depmod created file on startup and create all the
static device nodes which the kernel modules specify, so that these modules
get automatically loaded when the device node is accessed:
$ /sbin/udevd --debug
...
static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
static_dev_create_from_modules: mknod '/dev/fuse' c10:229
static_dev_create_from_modules: mknod '/dev/ppp' c108:0
static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
A few device nodes are switched to statically allocated numbers, to allow
the static nodes to work. This might also useful for systems which still run
a plain static /dev, which is completely unsafe to use with any dynamic minor
numbers.
Note:
The devname aliases must be limited to the *common* and *single*instance*
device nodes, like the misc devices, and never be used for conceptually limited
systems like the loop devices, which should rather get fixed properly and get a
control node for losetup to talk to, instead of creating a random number of
device nodes in advance, regardless if they are ever used.
This facility is to hide the mess distros are creating with too modualized
kernels, and just to hide that these modules are not compiled-in, and not to
paper-over broken concepts. Thanks! :)
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Ian Kent <raven@themaw.net>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch b7e2ecef92 (perf, trace: Optimize tracepoints by removing
IRQ-disable from perf/tracepoint interaction) made the
unfortunate mistake of assuming the world is x86 only, correct
this.
The problem was that perf_fetch_caller_regs() did
local_save_flags() into regs->flags, and I re-used that to
remove another local_save_flags(), forgetting !x86 doesn't have
regs->flags.
Do the reverse, remove the local_save_flags() from
perf_fetch_caller_regs() and let the ftrace site do the
local_save_flags() instead.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: acme@redhat.com
Cc: efault@gmx.de
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
LKML-Reference: <1274778175.5882.623.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The low-memory corruption checker triggers during suspend/resume, so we
need to reserve the low 64k. Don't be fooled that the BIOS identifies
itself as "Dell Inc.", it's still Phoenix BIOS.
[ hpa: I think we blacklist almost every BIOS in existence. We should
either change this to a whitelist or just make it unconditional. ]
Signed-off-by: Gabor Gombas <gombasg@digikabel.hu>
LKML-Reference: <201005241913.o4OJDIMM010877@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
Bits set in cpu_possible_mask prior to the execution of
prefill_possible_map() (i.e. when parsing ACPI or MPS tables) would
prevent the SMP alternatives logic from switching to UP mode, plus
unnecessary setup of per-CPU data for CPUs that can never come online.
Additionally, without CONFIG_HOTPLUG_CPU disabled CPUs can never come
online, and hence setting cpu_possible_mask bits for them is again a
simple waste of resources.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <201005241913.o4OJDH3Z010874@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register) safely,
it doesn't consider lower prioritized queued interrupts stored in the IRR
register.
If you have more than one pending interrupt within the same 32 bit word in
the LAPIC vector table registers you may find yourself entering the IO
APIC setup with pending interrupts left in the LAPIC. This is a situation
for wich the IO APIC setup is not prepared. Depending of what/which
interrupt vector/vectors are stuck in the APIC tables your system may show
various degrees of malfunctioning. That was the reason why the
check_timer() failed in our system, the timer interrupts was blocked by
pending interrupts from the old kernel when routed trough the IO APIC.
Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:
+if (loops++ > MAX_LOOPS) {
+ printk("LAPIC pending clean-up")
+ break;
+}
while (queued);
with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
[trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual
apic_read calls which may take rather long (suggested by: Avi Kivity
<avi@redhat.com>) If no tsc is available bail out quickly after
cpu_khz, if we broke out too early and still have irqs pending (which
should never happen?) we still get a WARN_ON...
V3: - Fixed indentation -> checkpatch clean
- max_loops must be signed
V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case]
Cc: <jbohac@novell.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
KVM: x86: Add missing locking to arch specific vcpu ioctls
KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
KVM: MMU: Segregate shadow pages with different cr0.wp
KVM: x86: Check LMA bit before set_efer
KVM: Don't allow lmsw to clear cr0.pe
KVM: Add cpuid.txt file
KVM: x86: Tell the guest we'll warn it about tsc stability
x86, paravirt: don't compute pvclock adjustments if we trust the tsc
x86: KVM guest: Try using new kvm clock msrs
KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
KVM: x86: add new KVMCLOCK cpuid feature
KVM: x86: change msr numbers for kvmclock
x86, paravirt: Add a global synchronization point for pvclock
x86, paravirt: Enable pvclock flags in vcpu_time_info structure
KVM: x86: Inject #GP with the right rip on efer writes
KVM: SVM: Don't allow nested guest to VMMCALL into host
KVM: x86: Fix exception reinjection forced to true
KVM: Fix wallclock version writing race
KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
...
Allow kdb to work properly with with earlyprintk=vga by interpreting
the backspace and carriage return output characters. These
interpretation of these characters is used for simple line editing
provided in the kdb shell.
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
If the kernel debugger was configured, attached and started with
kgdbwait, the hardware breakpoint registers should get restored by the
kgdb code which is managing the dr registers.
CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
It is not possible to use the hw_breakpoint.c API prior to mm_init(),
but it is possible to use hardware breakpoints with the kernel
debugger.
Prior to smp_init() it is possible to simply write to the dr registers
of the boot cpu directly. This can be used up until the
kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c
API will get used.
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The kernel debugger can operate well before mm_init(), but the x86
hardware breakpoint code which uses the perf api requires that the
kernel allocators are initialized.
This means the kernel debug core needs to provide an optional arch
specific call back to allow the initialization functions to run after
the kernel has been further initialized.
The kdb shell already had a similar restriction with an early
initialization and late initialization. The kdb_init() was moved into
the debug core's version of the late init which is called
dbg_late_init();
CC: kgdb-bugreport@lists.sourceforge.net
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Allow the x86 arch to have early exception processing for the purpose
of debugging via the kgdb.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The only way the debugger can handle a trap in inside rcu_lock,
notify_die, or atomic_notifier_call_chain without a triple fault is
to have a low level "first opportunity handler" in the int3 exception
handler.
Generally this will be something the vast majority of folks will not
need, but for those who need it, it is added as a kernel .config
option called KGDB_LOW_LEVEL_TRAP.
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: H. Peter Anvin <hpa@zytor.com>
CC: x86@kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Remove all the references to the kgdb_post_primary_code. This
function serves no useful purpose because you can obtain the same
information from the "struct kgdb_state *ks" from with in the
debugger, if for some reason you want the data.
Also remove the unintentional duplicate assignment for ks->ex_vector.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
These are the minimum changes to the kgdb core in order to enable an
API to connect a new front end (kdb) to the debug core.
This patch introduces the dbg_kdb_mode variable controls where the
user level I/O is routed. It will be routed to the gdbstub (kgdb) or
to the kdb front end which is a simple shell available over the kgdboc
connection.
You can switch back and forth between kdb or the gdb stub mode of
operation dynamically. From gdb stub mode you can blindly type
"$3#33", or from the kdb mode you can enter "kgdb" to switch to the
gdb stub.
The logic in the debug core depends on kdb to look for the typical gdb
connection sequences and return immediately with KGDB_PASS_EVENT if a
gdb serial command sequence is detected. That should allow a
reasonably seamless transition between kdb -> gdb without leaving the
kernel exception state. The two gdb serial queries that kdb is
responsible for detecting are the "?" and "qSupported" packets.
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Martin Hicks <mort@sgi.com>
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (22 commits)
ACPI: fix early DSDT dmi check warnings on ia64
ACPICA: Update version to 20100428.
ACPICA: Update/clarify some parameter names associated with acpi_handle
ACPICA: Rename acpi_ex_system_do_suspend->acpi_ex_system_do_sleep
ACPICA: Prevent possible allocation overrun during object copy
ACPICA: Split large file, evgpeblk
ACPICA: Add GPE support for dynamically loaded ACPI tables
ACPICA: Clarify/rename some root table descriptor fields
ACPICA: Update version to 20100331.
ACPICA: Minimize the differences between linux GPE code and ACPICA code base
ACPI: add boot option acpi=copy_dsdt to fix corrupt DSDT
ACPICA: Update DSDT copy/detection.
ACPICA: Add subsystem option to force copy of DSDT to local memory
ACPICA: Add detection of corrupted/replaced DSDT
ACPICA: Add write support for DataTable operation regions
ACPICA: Fix for acpi_reallocate_root_table for incorrect root table copy
ACPICA: Update comments/headers, no functional change
ACPICA: Update version to 20100304
ACPICA: Fix for possible fault in acpi_ex_release_mutex
ACPICA: Standardize integer output for ACPICA warnings/errors
...
Traditionally, fatal MCE will cause Linux print error log to console
then reboot. Because MCE registers will preserve their content after
warm reboot, the hardware error can be logged to disk or network after
reboot. But system may fail to warm reboot, then you may lose the
hardware error log. ERST can help here. Through saving the hardware
error log into flash via ERST before go panic, the hardware error log
can be gotten from the flash after system boot successful again.
The fatal MCE processing procedure with ERST involved is as follow:
- Hardware detect error, MCE raised
- MCE read MCE registers, check error severity (fatal), prepare error record
- Write MCE error record into flash via ERST
- Go panic, then trigger system reboot
- System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
for error record of previous boot via ERST, and output and clear
them if available
- /sbin/mcelog logs error records into disk or network
ERST only accepts CPER record format, but there is no pre-defined CPER
section can accommodate all information in struct mce, so a customized
section type is defined to hold struct mce inside a CPER record as an
error section.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.
On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hpet: Add reference to chipset erratum documentation for disable-hpet-msi-quirk
x86, hpet: Restrict read back to affected ATI chipsets
We have an enum mrst_timer_options, use it so that the kernel knows if
we're missing something from a switch statement or equivalent.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
We have an enum, might as well use it. While we're at it, make it an
inline... there is really no point in calling a function for this
stuff.
LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Always-on local APIC timer (ARAT) has been introduced to Medfield, along
with the platform APB timers we have more timer configuration options
between Moorestown and Medfield.
This patch adds run-time detection of avaiable timer features so that
we can treat Medfield as a variant of Moorestown and set up the optimal
timer options for each platform. i.e.
Medfield: per cpu always-on local APIC timer
Moorestown: per cpu APB timer
Manual override is possible via cmdline option x86_mrst_timer.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-4-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Medfield is the follow-up of Moorestown, it is treated under the same
HW sub-architecture. However, we do need to know the CPU type in order
for some of the driver to act accordingly.
We also have different optimal clock configuration for each CPU type.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Some extra CPU features such as ARAT is needed in early boot so
that x86_init function pointers can be set up properly.
http://lkml.org/lkml/2010/5/18/519
At start_kernel() level, this patch moves init_scattered_cpuid_features()
from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than
platform specific x86_init layer setup. Suggested by HPA.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
If the HV told us we can fully trust the TSC, skip any
correction
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.
However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.
So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.
This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.
Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.
Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.
A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.
We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.
OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.
After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Zachary Amsden <zamsden@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Avi Kivity <avi@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.
Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Per document, for feature control MSR:
Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
of VMXON in SMX operation causes a general-protection exception.
Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
of VMXON outside SMX operation causes a general-protection exception.
This patch is to enable this kind of check with SMX for VMXON in KVM.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This snippet somehow escaped the commit:
| commit 137351e0fe
| Author: Cyrill Gorcunov <gorcunov@openvz.org>
| Date: Sat May 8 15:25:52 2010 +0400
|
| x86, perf: P4 PMU -- protect sensible procedures from preemption
so bring it eventually back. It helps to catch
preemption issue (if there will be, rule of thumb --
don't use raw_ if you can).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100518212439.167259349@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To prevent from clashes in future code modifications
do a real check for ESCR address being in hash. At
moment the callers are known to pass sane values but
better to be on a safe side.
And comment fix.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100518212439.004503600@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined
x86, irq: Kill io_apic_renumber_irq
x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's.
x86, ioapic: Simplify probe_nr_irqs_gsi.
x86, ioapic: Optimize pin_2_irq
x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.
x86, ioapic: In mpparse use mp_register_ioapic
x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end
x86, ioapic: Fix the types of gsi values
x86, ioapic: Fix io_apic_redir_entries to return the number of entries.
x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h
x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs
x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi
x86, acpi/irq: pci device dev->irq is an isa irq not a gsi
x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq
x86, acpi/irq: Introduce apci_isa_irq_to_gsi
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, fpu: Use static_cpu_has() to implement use_xsave()
x86: Add new static_cpu_has() function using alternatives
x86, fpu: Use the proper asm constraint in use_xsave()
x86, fpu: Unbreak FPU emulation
x86: Introduce 'struct fpu' and related API
x86: Eliminate TS_XSAVE
x86-32: Don't set ignore_fpu_irq in simd exception
x86: Merge kernel_math_error() into math_error()
x86: Merge simd_math_error() into math_error()
x86-32: Rework cache flush denied handler
Fix trivial conflict in arch/x86/kernel/process.c
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hypervisor: add missing <linux/module.h>
Modify the VMware balloon driver for the new x86_hyper API
x86, hypervisor: Export the x86_hyper* symbols
x86: Clean up the hypervisor layer
x86, HyperV: fix up the license to mshyperv.c
x86: Detect running on a Microsoft HyperV system
x86, cpu: Make APERF/MPERF a normal table-driven flag
x86, k8: Fix build error when K8_NB is disabled
x86, cacheinfo: Disable index in all four subcaches
x86, cacheinfo: Make L3 cache info per node
x86, cacheinfo: Reorganize AMD L3 cache structure
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, cacheinfo: Unify AMD L3 cache index disable checking
cpufreq: Unify sysfs attribute definition macros
powernow-k8: Fix frequency reporting
x86, cpufreq: Add APERF/MPERF support for AMD processors
x86: Unify APERF/MPERF support
powernow-k8: Add core performance boost support
x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo
Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and
drivers/cpufreq/cpufreq_ondemand.c
* 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix LOCK_PREFIX_HERE for uniprocessor build
x86, atomic64: In selftest, distinguish x86-64 from 586+
x86-32: Fix atomic64_inc_not_zero return value convention
lib: Fix atomic64_inc_not_zero test
lib: Fix atomic64_add_unless return value convention
x86-32: Fix atomic64_add_unless return value convention
lib: Fix atomic64_add_unless test
x86: Implement atomic[64]_dec_if_positive()
lib: Only test atomic64_dec_if_positive on archs having it
x86-32: Rewrite 32-bit atomic64 functions in assembly
lib: Add self-test for atomic64_t
x86-32: Allow UP/SMP lock replacement in cmpxchg64
x86: Add support for lock prefix in alternatives
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Use .cfi_sections for assembly code
x86-64: Reduce SMP locks table size
x86, asm: Introduce and use percpu_inc()
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
perf tools: Add mode to build without newt support
perf symbols: symbol inconsistency message should be done only at verbose=1
perf tui: Add explicit -lslang option
perf options: Type check all the remaining OPT_ variants
perf options: Type check OPT_BOOLEAN and fix the offenders
perf options: Check v type in OPT_U?INTEGER
perf options: Introduce OPT_UINTEGER
perf tui: Add workaround for slang < 2.1.4
perf record: Fix bug mismatch with -c option definition
perf options: Introduce OPT_U64
perf tui: Add help window to show key associations
perf tui: Make <- exit menus too
perf newt: Add single key shortcuts for zoom into DSO and threads
perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
perf newt: Fix the 'A'/'a' shortcut for annotate
perf newt: Make <- exit the ui_browser
x86, perf: P4 PMU - fix counters management logic
perf newt: Make <- zoom out filters
perf report: Report number of events, not samples
perf hist: Clarify events_stats fields usage
...
Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Reduce stack_trace usage
lockdep: No need to disable preemption in debug atomic ops
lockdep: Actually _dec_ in debug_atomic_dec
lockdep: Provide off case for redundant_hardirqs_on increment
lockdep: Simplify debug atomic ops
lockdep: Fix redundant_hardirqs_on incremented with irqs enabled
lockstat: Make lockstat counting per cpu
i8253: Convert i8253_lock to raw_spinlock
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Add amd_iommu=off command line option
iommu-api: Remove iommu_{un}map_range functions
x86/amd-iommu: Implement ->{un}map callbacks for iommu-api
x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes
x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes
x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes
kvm: Change kvm_iommu_map_pages to map large pages
VT-d: Change {un}map_range functions to implement {un}map interface
iommu-api: Add ->{un}map callbacks to iommu_ops
iommu-api: Add iommu_map and iommu_unmap functions
iommu-api: Rename ->{un}map function pointers to ->{un}map_range
It might happen that an event can overflow without
the proper overflow flag set. Check the sign bit in
the raw counter value to solve this problem.
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: fweisbec@gmail.com
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1274083984.6540.15.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(At the moment the "SB700 Family Product Errata" document is available
at http://support.amd.com/us/Embedded_TechDocs/46837.pdf)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100517164324.GB10254@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Moorestown does not have BIOS provided MP tables, we can save some time
by avoiding scaning of these tables. e.g.
[ 0.000000] Scan SMP from c0000000 for 1024 bytes.
[ 0.000000] Scan SMP from c009fc00 for 1024 bytes.
[ 0.000000] Scan SMP from c00f0000 for 65536 bytes.
[ 0.000000] Scan SMP from c00bfff0 for 1024 bytes.
Searching EBDA with the base at 0x40E will also result in random pointer
deferencing within 1MB. This can be a problem in Lincroft if the pointer
hits VGA area and VGA mode is not enabled.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1273873281-17489-8-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Combining the softlockup and hardlockup code causes watchdog.c
to build even without the hardlockup detection support.
So if an arch, that has the previous and the new nmi watchdog
implementations cohabiting, wants to know if the generic one
is in use, CONFIG_LOCKUP_DETECTOR is not a reliable check.
We need to use CONFIG_HARDLOCKUP_DETECTOR instead.
Fixes:
kernel/built-in.o: In function `touch_nmi_watchdog':
(.text+0x449bc): multiple definition of `touch_nmi_watchdog'
arch/sparc/kernel/built-in.o:(.text+0x11b28): first defined here
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <20100514151121.GR15159@redhat.com>
[ use CONFIG_HARDLOCKUP_DETECTOR instead of CONFIG_PERF_EVENTS_NMI]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Jaswinder reported this #GP:
|
| Message from syslogd@ht at May 14 09:39:32 ...
| kernel:[ 314.908612] EIP: [<c100ccca>]
| x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70
|
Ming has narrowed it down to a comparision issue
between arguments with different sizes and
signs. As result event index reached a wrong
value which in turn led to a GP fault.
At the same time it was found that p4_next_cntr
has broken logic and should return the counter
index only if it was not yet borrowed for
another event.
Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Reported-by: Lin Ming <ming.m.lin@intel.com>
Bisected-by: Lin Ming <ming.m.lin@intel.com>
Tested-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100514190815.GG13509@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, k8: Fix build error when K8_NB is disabled
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
x86: Fix fake apicid to node mapping for numa emulation
If host CPU is exposed to a guest the OSVW MSRs are not guaranteed
to be present and a GP fault occurs. Thus checking the feature flag is
essential.
Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100427101348.GC4489@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On some configs the following build error triggers:
arch/x86/kernel/apic/hw_nmi.c:35: error: 'apic' undeclared (first use in this function)
arch/x86/kernel/apic/hw_nmi.c:35: error: (Each undeclared identifier is reported only once
arch/x86/kernel/apic/hw_nmi.c:35: error: for each function it appears in.)
Because asm/apic.h was only included implicitly. Include it explicitly.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1273713674-8434-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linear search over all p4 MSRs should be fine if only
we would not use it in events scheduling routine which
is pretty time critical. Lets use hashes. It should speed
scheduling up significantly.
v2: Steven proposed to use more gentle approach than issue
BUG on error, so we use WARN_ONCE now
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100512174242.GA5190@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The design of the hardlockup watchdog has changed and cruft was left
behind in the hw_nmi.c file. Just remove the code that isn't used
anymore.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-7-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
As part of the transition of the nmi watchdog to something more
generic, the trigger_all_cpu_backtrace code is getting left behind.
Put it in its own die_notifier so it can still be used.
V2:
- use arch_spin_locks
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-6-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector. Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.
Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.
To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event. If that stops counting on a cpu, then the cpu is
most likely in trouble.
To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.
I tested this on x86_64 and both the softlockup and hardlockup paths
work.
V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events
V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions
V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes
V5:
- split apart warn flags for hard and soft lockups
TODO:
- figure out how to make an arch-agnostic clock2cycles call
(if possible) to feed into perf events as a sample period
[fweisbec: merged conflict patch]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The ACPI spec tells us that the firmware will reenable SCI_EN on resume.
Reality disagrees in some cases. The ACPI spec tells us that the only way
to set SCI_EN is via an SMM call.
https://bugzilla.kernel.org/show_bug.cgi?id=13745 shows us that doing so
may break machines. Tracing the ACPI calls made by Windows shows that it
unconditionally sets SCI_EN on resume with a direct register write, and
therefore the overwhelming probability is that everything is fine with
this behaviour.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch adds a command line option to tell the AMD IOMMU
driver to not initialize any IOMMU it finds.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Fix kprobe/x86 to check removed int3 when failing to get kprobe
from hlist. Since we have a time window between checking int3
exists on probed address and getting kprobe on that address,
we can have following scenario:
-------
CPU1 CPU2
hit int3
check int3 exists
remove int3
remove kprobe from hlist
get kprobe from hlist
no kprobe->OOPS!
-------
This patch moves int3 checking if there is no kprobe on that
address for fixing this problem as follows:
------
CPU1 CPU2
hit int3
remove int3
remove kprobe from hlist
get kprobe from hlist
no kprobe->check int3 exists
->rollback&retry
------
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100427223348.2322.9112.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently all fpu state access is through tsk->thread.xstate. Since we wish
to generalize fpu access to non-task contexts, wrap the state in a new
'struct fpu' and convert existing access to use an fpu API.
Signal frame handlers are not converted to the API since they will remain
task context only things.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The fpu code currently uses current->thread_info->status & TS_XSAVE as
a way to distinguish between XSAVE capable processors and older processors.
The decision is not really task specific; instead we use the task status to
avoid a global memory reference - the value should be the same across all
threads.
Eliminate this tie-in into the task structure by using an alternative
instruction keyed off the XSAVE cpu feature; this results in shorter and
faster code, without introducing a global memory reference.
[ hpa: in the future, this probably should use an asm jmp ]
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
edac_mce module is an interface module that gets mcelog data and
forwards to any registered edac module that expects to receive data via
mce.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
EXPORT_SYMBOL() needs <linux/module.h> to be included; fixes modular
builds of the VMware balloon driver, and any future modular drivers
which depends on the hypervisor.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ky Srinivasan <ksrinivasan@novell.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
LKML-Reference: <4BE49778.6060800@zytor.com>
Export x86_hyper and the related specific structures, allowing for
hypervisor identification by modules.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ky Srinivasan <ksrinivasan@novell.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
LKML-Reference: <4BE49778.6060800@zytor.com>
RAW events are special and we should be ready for user passing
in insane event index values.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.315897547@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The caller already has done such a check.
And it was wrong anyway, it had to be '>=' rather than '>'
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.130386882@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven reported:
|
| I'm getting:
|
| Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727
| Call Trace:
| [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0
| [<ffffffff81019874>] p4_hw_config+0x2b/0x15c
| [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f
| [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be
| [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c
| [<ffffffff810c68b2>] T.850+0x273/0x42e
| [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1
| [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69
| [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b
|
| When running perf record in latest tip/perf/core
|
Due to the fact that p4 counters are shared between HT threads
we synthetically divide the whole set of counters into two
non-intersected subsets. And while we're "borrowing" counters
from these subsets we should not be preempted (well, strictly
speaking in p4_hw_config we just pre-set reference to the
subset which allow to save some cycles in schedule routine
if it happens on the same cpu). So use get_cpu/put_cpu pair.
Also p4_pmu_schedule_events should use smp_processor_id rather
than raw_ version. This allow us to catch up preemption issue
(if there will ever be).
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112716.963478928@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If an event is not RAW we should not exit p4_hw_config
early but call x86_setup_perfctr as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clean up the hypervisor layer and the hypervisor drivers, using an ops
structure instead of an enumeration with if statements.
The identity of the hypervisor, if needed, can be tested by testing
the pointer value in x86_hyper.
The MS-HyperV private state is moved into a normal global variable
(it's per-system state, not per-CPU state). Being a normal bss
variable, it will be left at all zero on non-HyperV platforms, and so
can generally be tested for HyperV-specific features without
additional qualification.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Greg KH <greg@kroah.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ky Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <4BE49778.6060800@zytor.com>
This should have been GPLv2 only, we cut and pasted from the wrong file
originally, sorry.
Also removed some unneeded boilerplate license code, we all know where
to find the GPLv2, and that there's no warranty as that is implicit from
the license.
Cc: Ky Srinivasan <ksrinivasan@novell.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
LKML-Reference: <20100507235541.GA15448@kroah.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Check hlt instruction was targeted for some older CPUs. It is an expensive
operation in that it takes 4 ticks to break out the check. We can avoid
such check completely for newer x86 cpus (family >= 5).
[ hpa: corrected family > 5 to family >= 5 ]
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1273269585-14346-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Convert to the transactional PMU API and remove the duplication of
group_sched_in().
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1272002172.5707.61.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename perf_event_attr::precise to perf_event_attr::precise_ip and
widen it to 2 bits. This new field describes the required precision of
the PERF_SAMPLE_IP field:
0 - SAMPLE_IP can have arbitrary skid
1 - SAMPLE_IP must have constant skid
2 - SAMPLE_IP requested to have 0 skid
3 - SAMPLE_IP must have 0 skid
And modify the Intel PEBS code accordingly. The PEBS implementation
now supports up to precise_ip == 2, where we perform the IP fixup.
Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit
should be set for each PERF_SAMPLE_IP field known to match the actual
instruction triggering the event.
This new scheme allows for a PEBS mode that uses the buffer for more
than a single event.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Its broken, we really should get PERF_SAMPLE_REGS sorted.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There may exist constraints with a cmask set to zero. In this case
for_each_event_constraint() will not work properly. Now weight is used
instead of the cmask for loop exit detection. Weight is always a value
other than zero since the default contains the HWEIGHT from the
counter mask and in other cases a value of zero does not fit too.
This is in preparation of ibs event constraints that wont have a
cmask.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To reuse this function for events with different enable bit masks,
this mask is part of the function's argument list now.
The function will be used later to control ibs events too.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-6-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perfctr setup calls are in the corresponding .hw_config()
functions now. This makes it possible to introduce config functions
for other pmu events that are not perfctr specific.
Also, all of a sudden the code looks much nicer.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move x86_setup_perfctr(), no other changes made.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split __hw_perf_event_init() to configure pmu events other than
perfctrs. Perfctr code is moved to a separate function
x86_setup_perfctr(). This and the following patches refactor the code.
Split in multiple patches for better review.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch integrates HyperV detection within the framework currently
used by VmWare. With this patch, we can avoid having to replicate the
HyperV detection code in each of the Microsoft HyperV drivers.
Reworked and tweaked by Greg K-H to build properly.
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <20100506190841.GA1605@kroah.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that the generic irq layer is performing the exact same remapping as
io_apic_renumber_irq we can kill this weird es7000 specific function.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-15-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
ACPI irq source overrides are allowed for the 16 isa irqs and are
allowed to map any gsi to any isa irq. A few motherboards have been
seen to take advantage of this and put the isa irqs on the 2nd or
3rd ioapic. This causes some problems, most notably the fact
that we can not use any gsi < 16.
To correct this move the gsis that are not isa irqs and have
a gsi number < 16 into the linux irq space just past gsi_end.
This is what the es7000 platform is doing today. Moving only the
low 16 gsis above the rest of the gsi's only penalizes weird
platforms, leaving sane acpi implementations with a 1-1 mapping
of gsis and irqs.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-14-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use the global gsi_end value now that all ioapics have
valid gsi numbers instead of a combination of acpi_probe_gsi
and walking all of the ioapics and couting their number of
entries by hand if acpi_probe_gsi gave us an answer we did
not like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-13-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that all ioapics have valid gsi_base values use this to
accellerate pin_2_irq. In the case of acpi this also ensures
that pin_2_irq will compute the same irq value for an ioapic
pin as acpi will.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-12-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that all ioapic registration happens in mp_register_ioapic we can
move the calculation of nr_ioapic_registers there from enable_IO_APIC.
The number of ioapic registers is already calucated in mp_register_ioapic
so all that really needs to be done is to save the caluclated value
in nr_ioapic_registers.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-11-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Long ago MP_ioapic_info was the primary way of setting up our
ioapic data structures and mp_register_ioapic was a compatibility
shim for acpi code. Now the situation is reversed and
and mp_register_ioapic is the primary way of setting up our
ioapic data structures.
Keep the setting up of ioapic data structures uniform by
having mp_register_ioapic call mp_register_ioapic.
This changes a few fields:
- type: is now hardset to MP_IOAPIC but type had to
bey MP_IOAPIC or MP_ioapic_info would not have been called.
- flags: is now hard coded to MPC_APIC_USABLE.
We require flags to contain at least MPC_APIC_USEBLE in
MP_ioapic_info and we don't ever examine flags so dropping
a few flags that might possibly exist that we have never
used is harmless.
- apicaddr: Unchanged
- apicver: Read from the ioapic instead of using the cached
hardware value in the MP table. The real hardware value
will be more accurate.
- apicid: Now verified to be unique and changed if it is not.
If the BIOS got this right this is a noop. If the BIOS did
not fixing things appears to be the better solution.
This adds gsi_base and gsi_end values to our ioapics defined with
the mpatable, which will make our lives simpler later since
we can always assume gsi_base and gsi_end are valid.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-10-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add the global variable gsi_end and teach mp_register_ioapic
to keep it uptodate as we add more ioapics into the system.
ioapics can only be added early in boot so the code that
runs later can treat gsi_end as a constant.
Remove the have hacks in sfi.c to second guess mp_register_ioapic
by keeping t's own running total of how many gsi's have been seen,
and instead use the gsi_end.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-9-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patches fixes the types of gsi_base and gsi_end values in
struct mp_ioapic_gsi, and the gsi parameter of mp_find_ioapic
and mp_find_ioapic_pin
A gsi is cannonically a u32, not an int.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-8-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
io_apic_redir_entries has a huge conceptual bug. It returns the maximum
redirection entry not the number of redirection entries. Which simply
does not match what the name of the function. This just caught me
and it caught Feng Tang, and Len Brown when they wrote sfi_parse_ioapic.
Modify io_apic_redir_entries to actually return the number of redirection
entries, and fix the callers so that they properly handle receiving the
number of the number of redirection table entries, instead of the
number of redirection table entries less one.
While the usage in sfi.c does not show up in this patch it is fixed
by virtue of the fact that io_apic_redir_entries now has the semantics
sfi_parse_ioapic most reasonably expects.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-7-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove the assumption that there is not an override for isa irq 0.
Instead lookup the gsi and from that lookup the ioapic and pin of each
isa irq indivdually.
In general this should not have any behavioural affect but in
perverse cases this gets all of the details correct, instead of
doing something weird.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-5-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Currently acpi_sci_ioapic_setup calls mp_override_legacy_irq with
bus_irq == gsi, which is wrong if we are comming from an override
Instead pass the bus_irq into acpi_sci_ioapic_setup.
This fix was inspired by a similar fix from:
Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-4-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In perverse acpi implementations the isa irqs are not identity mapped
to the first 16 gsi. Furthermore at least the extended interrupt
resource capability may return gsi's and not isa irqs. So since
what we get from acpi is a gsi teach acpi_get_overrride_irq to
operate on a gsi instead of an isa_irq.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There are a number of cases where the current code makes the assumption
that isa irqs identity map to the first 16 acpi global system intereupts.
In most instances that assumption is correct as that is the required
behaviour in dual i8259 mode and the default behavior in ioapic mode.
However there are some systems out there that take advantage of acpis
interrupt remapping for the isa irqs to have a completely different
mapping of isa_irq to gsi.
Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the
code that needs it. Initially this will be just the current assumed
identity mapping to ensure it's introduction does not cause regressions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-1-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
APERF/MPERF can be handled via the table like all the other scattered
CPU flags.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
Any processor that supports simd will have an internal fpu, and the
irq13 handler will not be enabled.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Clean up the kernel exception handling and make it more similar to
the other traps.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The only difference between FPU and SIMD exceptions is where the
status bits are read from (cwd/swd vs. mxcsr). This also fixes
the discrepency introduced by commit adf77bac, which fixed FPU
but not SIMD.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The cache flush denied error is an erratum on some AMD 486 clones. If an invd
instruction is executed in userspace, the processor calls exception 19 (13 hex)
instead of #GP (13 decimal). On cpus where XMM is not supported, redirect
exception 19 to do_general_protection(). Also, remove die_if_kernel(), since
this was the last user.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
With F10, model 10, all valid frequencies are in the ACPI _PST table.
Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current policies of breakpoints in x86 and SH are the following:
- task bound breakpoints can only break on userspace addresses
- cpu wide breakpoints can only break on kernel addresses
The former rule prevents ptrace breakpoints to be set to trigger on
kernel addresses, which is good. But as a side effect, we can't
breakpoint on kernel addresses for task bound breakpoints.
The latter rule simply makes no sense, there is no reason why we
can't set breakpoints on userspace while performing cpu bound
profiles.
We want the following new policies:
- task bound breakpoint can set userspace address breakpoints, with
no particular privilege required.
- task bound breakpoints can set kernelspace address breakpoints but
must be privileged to do that.
- cpu bound breakpoints can do what they want as they are privileged
already.
To implement these new policies, this patch checks if we are dealing
with a kernel address breakpoint, if so and if the exclude_kernel
parameter is set, we tell the user that the breakpoint is invalid,
which makes a good generic ptrace protection.
If we don't have exclude_kernel, ensure the user has the right
privileges as kernel breakpoints are quite sensitive (risk of
trap recursion attacks and global performance impacts).
[ Paul Mundt: keep addr space check for sh signal delivery and fix
double function declaration]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Tag ptrace breakpoints with the exclude_kernel attribute set. This
will make it easier to set generic policies on breakpoints, when it
comes to ensure nobody unpriviliged try to breakpoint on the kernel.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move(). It is possible that xen guests have
irq_desc->chip_data = NULL.
Test for NULL chip_data pointer before attempting to complete an irq move.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20100427152434.16193.49104.sendpatchset@prarit.bos.redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> [2.6.33]
Merge reason:
Conflict between LOCK_PREFIX_HERE and relative alternatives
pointers
Resolved Conflicts:
arch/x86/include/asm/alternative.h
arch/x86/kernel/alternative.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Disable large pages on CPUs with Atom erratum AAE44
x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
x86, mrst: Conditionally register cpu hotplug notifier for apbt
After programming the HPET, we do a readback as a workaround for
ATI/SBx00 chipsets as a synchronization. Unfortunately this triggers
an erratum in newer ICH chipsets (ICH9+) where reading the comparator
immediately after the write returns the old value. Furthermore, as
always, I/O reads are bad for performance.
Therefore, restrict the readback to the chipsets that need it, or, for
debugging purposes, when we are running with hpet=verbose.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
It's not used by any module, and i386 (as well as some other arches)
also doesn't export its equivalent (swapper_pg_dir).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF33BD020000780003B3E4@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reduce the SMP locks table size by using relative pointers instead of
absolute ones, thus cutting the table size by half.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF30FE020000780003B3B6@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This is a standalone version of VMware Balloon driver. Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation). In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor. Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.
We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.
There were also some concerns whether current ballooning technique is the
right thing. If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.
We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.
The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline. The driver will only activate
if host is VMware so everyone else should not be affected at all.
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atom erratum AAE44/AAF40/AAG38/AAH41:
"If software clears the PS (page size) bit in a present PDE (page
directory entry), that will cause linear addresses mapped through this
PDE to use 4-KByte pages instead of using a large page after old TLB
entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then
it may fetch from a different physical address than specified by
either the old large page translation or the new 4-KByte page
translation. This erratum may also cause speculative code fetches from
incorrect addresses."
[http://download.intel.com/design/processor/specupdt/319536.pdf]
Where as commit 211b3d03c7 seems to
workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
opportunity for the bug to occur and does not totally remove it. This
patch disables mixed 4K/4MB page tables totally avoiding the page
splitting and not tripping this processor issue.
This is based on an original patch by Colin King.
Originally-by: Colin Ian King <colin.king@canonical.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
Cc: <stable@kernel.org>
When we do a thread switch, we clear the outgoing FS/GS base if the
corresponding selector is nonzero. This is taken by __switch_to() as
an entry invariant; it does not verify that it is true on entry.
However, copy_thread() doesn't enforce this constraint, which can
result in inconsistent results after fork().
Make copy_thread() match the behavior of __switch_to().
Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4BD1E061.8030605@zytor.com>
Cc: <stable@kernel.org>
When disabling an L3 cache index, make sure we disable that index in
all four subcaches of the L3. Clarify nomenclature while at it, wrt to
disable slots versus disable index and rename accordingly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Currently, we're allocating L3 cache info and calculating indices for
each online cpu which is clearly superfluous. Instead, we need to do
this per-node as is each L3 cache.
No functional change, only per-cpu memory savings.
-v2: Allocate L3 cache descriptors array dynamically.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add a struct representing L3 cache attributes (subcache sizes and
indices count) and move the respective members out of _cpuid4_info.
Also, stash the struct pci_dev ptr into the struct simplifying the code
even more.
There should be no functionality change resulting from this patch except
slightly slimming the _cpuid4_info per-cpu vars.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3
cache index disable. Concentrate the family, model, stepping checking at
one place and enable the feature implicitly on upcoming Fam10h models.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
APB timer is used on Moorestown platforms but not on a standard PC.
If APB timer code is compiled in but not initialized at run-time due
to lack of FW reported SFI table, kernel would panic when the non-boot
CPUs are offlined and notifier is called.
https://bugzilla.kernel.org/show_bug.cgi?id=15786
This patch ensures CPU hotplug notifier for APB timer is only registered
when the APBT timer block is initialized.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1271701423-1162-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Some BIOS on Toshiba machines corrupt the DSDT, so add a new
boot option acpi=copy_dsdt to workaround it.
Add warning message to ask users to use this option if corrupt DSDT detected.
Also build a DMI blacklist to check it and automatically copy DSDT.
https://bugzilla.kernel.org/show_bug.cgi?id=14679
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Below patch introduces perf_guest_info_callbacks and related
register/unregister functions. Add more PERF_RECORD_MISC_XXX bits
meaning guest kernel and guest user space.
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Fix all sparse warnings in building uv_irq.c.
arch/x86/kernel/uv_irq.c:46:17: warning: symbol 'uv_irq_chip' was not declared. Should it be static?
arch/x86/kernel/uv_irq.c:143:50: error: no identifier for function argument
arch/x86/kernel/uv_irq.c:162:13: error: typename in expression
arch/x86/kernel/uv_irq.c:162:13: error: undefined identifier 'restrict'
arch/x86/kernel/uv_irq.c:250:44: error: no identifier for function argument
arch/x86/kernel/uv_irq.c:260:17: error: typename in expression
arch/x86/kernel/uv_irq.c:260:17: error: undefined identifier 'restrict'
arch/x86/kernel/uv_irq.c:233:50: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/uv_irq.c:233:50: expected int *pnode
arch/x86/kernel/uv_irq.c:233:50: got unsigned int *<noident>
arch/x86/include/asm/uv/uv_hub.h:318:44: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uv/uv_hub.h:318:44: expected void volatile [noderef] <asn:2>*addr
arch/x86/include/asm/uv/uv_hub.h:318:44: got unsigned long *
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100416175142.f4b59683.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- increase performance of the interrupt handler
- release timed-out software acknowledge resources
- recover from continuous-busy status due to a hardware issue
- add a 'throttle' to keep a uvhub from sending more than a
specified number of broadcasts concurrently (work around the hardware issue)
- provide a 'nobau' boot command line option
- rename 'pnode' and 'node' to 'uvhub' (the 'node' terminology
is ambiguous)
- add some new statistics about the scope of broadcasts, retries, the
hardware issue and the 'throttle'
- split off new function uv_bau_retry_msg() from
uv_bau_process_message() per community coding style feedback.
- simplify the argument list to uv_bau_process_message(), per
community coding style feedback.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: linux-mm@kvack.org
Cc: Jack Steiner <steiner@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <E1O25Z4-0004Ur-PB@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No need to seek on this file, so prevent it outright so we can
avoid using default_llseek - removes one more BKL usage.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[drop useless llseek = no_llseek and smp_lock.h inclusion]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnd Bergmann <arnd@relay.de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
LKML-Reference: <1270910781-8786-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With F10, model 10, all valid frequencies are in the ACPI _PST table.
Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Starting with model 10 of Family 0x10, AMD processors may have
support for APERF/MPERF. Add support for identifying it and using
it within cpufreq. Move the APERF/MPERF functions out of the
acpi-cpufreq code and into their own file so they can easily be
shared.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <20100401141956.GA1930@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Initialize this CPUID flag feature in common code. It could be made a
standalone function later, maybe, if more functionality is duplicated.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Starting with F10h, revE, AMD processors add support for a dynamic
core boosting feature called Core Performance Boost. When a specific
condition is present, a subset of the cores on a system are boosted
beyond their P0 operating frequency to speed up the performance of
single-threaded applications.
In the normal case, the system comes out of reset with core boosting
enabled. This patch adds a sysfs knob with which core boosting can be
switched on or off for benchmarking purposes.
While at it, make the CPB code hotplug-aware so that taking cores
offline wouldn't interfere with boosting the remaining online cores.
Furthermore, add cpu_online_mask hotplug protection as suggested by
Andrew.
Finally, cleanup the driver init codepath and update copyrights.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-3-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
By semi-popular demand, this adds the Core Performance Boost feature
flag to /proc/cpuinfo. Possible use case for this is userspace tools
like cpufreq-aperf, for example, so that they don't have to jump through
hoops of accessing "/dev/cpu/%d/cpuid" in order to check for CPB hw
support, or call cpuid from userspace.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-2-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When we fetch the hot regs and rewind to the nth caller, it
might happen that we dereference a frame pointer outside the
kernel stack boundaries, like in this example:
perf_trace_sched_switch+0xd5/0x120
schedule+0x6b5/0x860
retint_careful+0xd/0x21
Since we directly dereference a userspace frame pointer here while
rewinding behind retint_careful, this may end up in a crash.
Fix this by simply using probe_kernel_address() when we rewind the
frame pointer.
This issue will have a much more proper fix in the next version of the
perf_arch_fetch_caller_regs() API that will only need to rewind to the
first caller.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Miller <davem@davemloft.net>
Cc: Archs <linux-arch@vger.kernel.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
x86: Increase CONFIG_NODES_SHIFT max to 10
ibft, x86: Change reserve_ibft_region() to find_ibft_region()
x86, hpet: Fix bug in RTC emulation
x86, hpet: Erratum workaround for read after write of HPET comparator
bootmem, x86: Fix 32bit numa system without RAM on node 0
nobootmem, x86: Fix 32bit numa system without RAM on node 0
x86: Handle overlapping mptables
x86: Make e820_remove_range to handle all covered case
x86-32, resume: do a global tlb flush in S4 resume
If we boot into a crash-kernel the gart might still be
enabled and its caches might be dirty. This can result in
undefined behavior later. Fix it by explicitly disabling the
gart hardware before initialization and flushing the caches
after enablement.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This effectively reverts commit 61d047be99.
Disabling the IOMMU can potetially allow DMA transactions to
complete without being translated. Leave it enabled, and allow
crash kernel to do the IOMMU reinitialization properly.
Cc: stable@kernel.org
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
To catch future potential issues we can add a warning whenever we issue
a command before the command buffer is fully initialized.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Hit another kdump problem as reported by Neil Horman. When initializaing
the IOMMU, we attach devices to their domains before the IOMMU is
fully (re)initialized. Attaching a device will issue some important
invalidations. In the context of the newly kexec'd kdump kernel, the
IOMMU may have stale cached data from the original kernel. Because we
do the attach too early, the invalidation commands are placed in the new
command buffer before the IOMMU is updated w/ that buffer. This leaves
the stale entries in the kdump context and can renders device unusable.
Simply enable the IOMMU before we do the attach.
Cc: stable@kernel.org
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).
Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is useless now that we have a pure stack frame
walker, as given addr are always reliable.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Jan Grossmann reported kernel boot panic while booting SMP
kernel on his system with a single core cpu. SMP kernels call
enable_IR_x2apic() from native_smp_prepare_cpus() and on
platforms where the kernel doesn't find SMP configuration we
ended up again calling enable_IR_x2apic() from the
APIC_init_uniprocessor() call in the smp_sanity_check(). Thus
leading to kernel panic.
Don't call enable_IR_x2apic() and default_setup_apic_routing()
from APIC_init_uniprocessor() in CONFIG_SMP case.
NOTE: this kind of non-idempotent and assymetric initialization
sequence is rather fragile and unclean, we'll clean that up
in v2.6.35. This is the minimal fix for v2.6.34.
Reported-by: Jan.Grossmann@kielnet.net
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <jbarnes@virtuousgeek.org>
Cc: <david.woodhouse@intel.com>
Cc: <weidong.han@intel.com>
Cc: <youquan.song@intel.com>
Cc: <Jan.Grossmann@kielnet.net>
Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x]
LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
According to the Xeon-5600 errata the Westmere suffers the same PMU
programming bug as the original Nehalem did.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All variables that have __initconst should also be const.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane noticed that the ANY flag was in generic arch code, and Cyrill
reported that it broke the P4 code.
Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and
provide intel_pmu and amd_pmu specific versions of this callback.
The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra
event bits AMD64 has.
Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Robert Richter <robert.richter@amd.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269968113.5258.442.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ARCH_PERFMON_EVENTSEL bit masks are often used in the kernel. This
patch adds macros for the bit masks and removes local defines. The
function intel_pmu_raw_event() becomes x86_pmu_raw_event() which is
generic for x86 models and same also for p6. Duplicate code is
removed.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330092821.GH11907@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The big rename:
cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events
accidentally renamed some members of stucts that were named after
registers in the spec. To avoid confusion this patch reverts some
changes. The related specs are MSR descriptions in AMD's BKDGs and the
ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32
Architectures Software Developer's Manuals.
This patch does:
$ sed -i -e 's:num_events:num_counters:g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event_amd.c \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_p6.c \
arch/x86/kernel/cpu/perf_event_p4.c \
arch/x86/oprofile/op_model_ppro.c
$ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \
arch/x86/kernel/cpu/perf_event_amd.c \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_p6.c \
arch/x86/kernel/cpu/perf_event_p4.c
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory
address (tried to interpret the frame pointer, and return address as a
64-bit pointer).
Fix this by using a struct stack_frame with 32-bit pointers when the
TIF_IA32 flag is set.
Note that TIF_IA32 flag must be used, and not is_compat_task(), because
the latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for
example).
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 3f6da39 ("perf: Rework and fix the arch CPU-hotplug hooks") moved
the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP
however amd_nb_id() doesn't work yet on prepare so it would simply bail
basically reverting to a state where we do not properly track node wide
constraints - causing weird perf results.
Fix up the AMD NorthBridge initialization code by allocating from
CPU_UP_PREPARE and installing it from CPU_STARTING once we have the
proper nb_id. It also properly deals with the allocation failing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ robustify using amd_has_nb() ]
Signed-off-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269353485.5109.48.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because we need to have cpu identification things done by the time we run
CPU_STARTING notifiers.
( This init ordering will be relied on by the next fix. )
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269353485.5109.48.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This allows arch code could decide the way to reserve the ibft.
And we should reserve ibft as early as possible, instead of BOOTMEM
stage, in case the table is in RAM range and is not reserved by BIOS
(this will often be the case.)
Move to just after find_smp_config().
Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.
-v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4BB510FB.80601@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Jones <pjones@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
CC: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
We think there exists a bug in the HPET code that emulates the RTC.
In the normal case, when the RTC frequency is set, the rtc driver tells
the hpet code about it here:
int hpet_set_periodic_freq(unsigned long freq)
{
uint64_t clc;
if (!is_hpet_enabled())
return 0;
if (freq <= DEFAULT_RTC_INT_FREQ)
hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq;
else {
clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC;
do_div(clc, freq);
clc >>= hpet_clockevent.shift;
hpet_pie_delta = (unsigned long) clc;
}
return 1;
}
If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then
hpet_pie_limit (a static) is set to non-zero. Then, on every one-shot
HPET interrupt, hpet_rtc_timer_reinit is called to compute the next
timeout. Well, that function has this logic:
if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit)
delta = hpet_default_delta;
else
delta = hpet_pie_delta;
Since hpet_pie_limit is not 0, hpet_default_delta is used. That
corresponds to 64Hz.
Now, if you set a different rtc frequency, you'll take the else path
through hpet_set_periodic_freq, but unfortunately no one resets
hpet_pie_limit back to 0.
Boom....now you are stuck with 64Hz RTC interrupts forever.
The patch below just resets the hpet_pie_limit value when requested freq
is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <201003112200.o2BM0Hre012875@imap1.linux-foundation.org>
Signed-off-by: Daniel Hecht <dhecht@vmware.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote:
> Hello,
>
> Again, on the Intel DP55KG board:
>
> # uname -a
> Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux
>
> [ 1.237600] ------------[ cut here ]------------
> [ 1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80()
> [ 1.238221] Hardware name:
> [ 1.238504] hpet: compare register read back failed.
> [ 1.238793] Modules linked in:
> [ 1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1
> [ 1.239605] Call Trace:
> [ 1.239886] <IRQ> [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0
> [ 1.240409] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [ 1.240699] [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50
> [ 1.240992] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [ 1.241281] [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80
> [ 1.241573] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [ 1.241859] [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100
> [ 1.246533] [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30
> [ 1.246826] [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0
> [ 1.247118] [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160
> [ 1.247407] [<ffffffff81029f55>] ? handle_irq+0x15/0x20
> [ 1.247689] [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0
> [ 1.247976] [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa
> [ 1.248262] <EOI> [<ffffffff8102f277>] ? mwait_idle+0x57/0x80
> [ 1.248796] [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0
> [ 1.249080] ---[ end trace db7f668fb6fef4e1 ]---
>
> Is this something Intel has to fix or is it a bug in the kernel?
This is a chipset erratum.
Thomas: You mentioned we can retain this check only for known-buggy and
hpet debug kind of options. But here is the simple workaround patch for
this particular erratum.
Some chipsets have a erratum due to which read immediately following a
write of HPET comparator returns old comparator value instead of most
recently written value.
Erratum 15 in
"Intel I/O Controller Hub 9 (ICH9) Family Specification Update"
(http://www.intel.com/assets/pdf/specupdate/316973.pdf)
Workaround for the errata is to read the comparator twice if the first
one fails.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: <stable@kernel.org>
We found a system where the MP table MPC and MPF structures overlap.
That doesn't really matter because the mptable is not used anyways with ACPI,
but it leads to a panic in the early allocator due to the overlapping
reservations in 2.6.33.
Earlier kernels handled this without problems.
Simply change these reservations to reserve_early_overlap_ok to avoid
the panic.
Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100329074111.GA22821@basil.fritz.box>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
It is required to call hw_breakpoint_init() on an attr before using it
in any other calls. This fixes the problem where kgdb will sometimes
fail to initialize on x86_64.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: 2.6.33 <stable@kernel.org>
LKML-Reference: <1269975907-27602-1-git-send-email-jason.wessel@windriver.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().
Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.
Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.
This makes the task migration event working and fix the context
switch callchains and origin ip.
Example: perf record -a -e cs
Before:
10.91% ksoftirqd/0 0 [k] 0000000000000000
|
--- (nil)
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_swevent_ctx_event
do_perf_sw_event
__perf_sw_event
perf_event_task_sched_out
schedule
run_ksoftirqd
kthread
kernel_thread_helper
After:
23.77% hald-addon-stor [kernel.kallsyms] [k] schedule
|
--- schedule
|
|--60.00%-- schedule_timeout
| wait_for_common
| wait_for_completion
| blk_execute_rq
| scsi_execute
| scsi_execute_req
| sr_test_unit_ready
| |
| |--66.67%-- sr_media_change
| | media_changed
| | cdrom_media_changed
| | sr_block_media_changed
| | check_disk_change
| | cdrom_open
v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and
looks e820_remove_range does not work right.
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] LGUEST: 0000000000000000 - 0000000004000000 (usable)
[ 0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS!
[ 0.000000] DMI not present or invalid.
[ 0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000
[ 0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000
root cause is: the e820_remove_range doesn't handle the all covered
case. e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...)
produces a bogus range as a result.
Make it match e820_update_range() by handling that case too.
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <4BB18E55.6090903@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
in a more compact fashion.
Example:
Allocated new RAMDISK: 00ec2000 - 0248ce57
Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56
The new RAMDISK's end is not page aligned.
Last page could be shared with other users.
When free_init_pages are called for initrd or .init, the page
could be freed and we could corrupt other data.
code segment in free_init_pages():
| for (; addr < end; addr += PAGE_SIZE) {
| ClearPageReserved(virt_to_page(addr));
| init_page_count(virt_to_page(addr));
| memset((void *)(addr & ~(PAGE_SIZE-1)),
| POISON_FREE_INITMEM, PAGE_SIZE);
| free_page(addr);
| totalram_pages++;
| }
last half page could be used as one whole free page.
So page align the boundaries.
-v2: make the original initramdisk to be aligned, according to
Johannes, otherwise we have the chance to lose one page.
we still need to keep initrd_end not aligned, otherwise it could
confuse decompressor.
-v3: change to WARN_ON instead, suggested by Johannes.
-v4: use PAGE_ALIGN, suggested by Johannes.
We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
Add comments about assuming ramdisk start is aligned
in relocate_initrd(), change to re get ramdisk_image instead of save it
to make diff smaller. Add warning for wrong range, suggested by Johannes.
-v6: remove one WARN()
We need to align beginning in free_init_pages()
do not copy more than ramdisk_size, noticed by Johannes
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, amd: Restrict usage of c1e_idle()
x86: Fix placement of FIX_OHCI1394_BASE
x86: Handle legacy PIC interrupts on all the cpu's
Implement the workaround for Intel Errata AAK100 and AAP53.
Also, remove the Core-i7 name for Nehalem events since there are
also Westmere based i7 chips.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269608924.12097.147.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement ptrace-block-step using TIF_BLOCKSTEP which will set
DEBUGCTLMSR_BTF when set for a task while preserving any other
DEBUGCTLMSR bits.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100325135414.017536066@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.
It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.
Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.
So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The adding of raw event support lead to complete code
refactoring. I hope is became more readable then it was.
The list of changes:
1) The 64bit config field is enough to hold all information we need
to track event details. To achieve it we used *own* enum for
events selection in ESCR register and map this key into proper
value at moment of event enabling.
For the same reason we use 12LSB bits in CCCR register -- to track
which exactly cache trace event was requested. And we cear this bits
at real 'write' moment.
2) There is no per-cpu area reserved for P4 PMU anymore. We
don't need it. All is held by config.
3) Now we may use any available counter, ie we try to grab any
possible counter.
v2:
- Lin Ming reported the lack of ESCR selector in CCCR for cache events
v3:
- Don't loose cache event codes at config unpacking procedure, we may
need it one day so no obscure hack behind our back, better to clear
reserved bits explicitly when needed (thanks Ming for pointing out)
- Lin Ming fixed misplaced opcodes in cache events
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com>
[ v4: did a few whitespace fixlets ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 3f6da39053
(perf: Rework and fix the arch CPU-hotplug hooks) broke suspend to
RAM on my HP nx6325 (and most likely on other AMD-based boxes too)
by allowing amd_pmu_cpu_offline() to be executed for CPUs that are
going offline as part of the suspend process. The problem is that
cpuhw->amd_nb may be NULL already, so the function should make sure
it's not NULL before accessing the object pointed to by it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently c1e_idle returns true for all CPUs greater than or equal to
family 0xf model 0x40. This covers too many CPUs.
Meanwhile a respective erratum for the underlying problem was filed
(#400). This patch adds the logic to check whether erratum #400
applies to a given CPU.
Especially for CPUs where SMI/HW triggered C1e is not supported,
c1e_idle() doesn't need to be used. We can check this by looking at
the respective OSVW bit for erratum #400.
Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100319110922.GA19614@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch adds support for S3 memory integrity protection within an Intel(R)
TXT launched kernel, for all kernel and userspace memory. All RAM used by the
kernel and userspace, as indicated by memory ranges of type E820_RAM and
E820_RESERVED_KERN in the e820 table, will be integrity protected.
The MAINTAINERS file is also updated to reflect the maintainers of the
TXT-related code.
All MACing is done in tboot, based on a complexity analysis and tradeoff.
v3: Compared with v2, this patch adds a check of array size in
tboot.c, and a note to specify which c/s of tboot supports this kind
of MACing in intel_txt.txt.
Signed-off-by: Shane Wang <shane.wang@intel.com>
LKML-Reference: <4B973DDA.6050902@intel.com>
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
perf: Fix unexported generic perf_arch_fetch_caller_regs
perf record: Don't try to find buildids in a zero sized file
perf: export perf_trace_regs and perf_arch_fetch_caller_regs
perf, x86: Fix hw_perf_enable() event assignment
perf, ppc: Fix compile error due to new cpu notifiers
perf: Make the install relative to DESTDIR if specified
kprobes: Calculate the index correctly when freeing the out-of-line execution slot
perf tools: Fix sparse CPU numbering related bugs
perf_event: Fix oops triggered by cpu offline/online
perf: Drop the obsolete profile naming for trace events
perf: Take a hot regs snapshot for trace events
perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
perf/x86-64: Use frame pointer to walk on irq and process stacks
lockdep: Move lock events under lockdep recursion protection
perf report: Print the map table just after samples for which no map was found
perf report: Add multiple event support
perf session: Change perf_session post processing functions to take histogram tree
perf session: Add storage for seperating event types in report
perf session: Change add_hist_entry to take the tree root instead of session
perf record: Add ID and to recorded event data when recording multiple events
...
- A few ESCR have escaped fixing at previous attempt.
- p4_escr_map is read only, make it const.
Nothing serious.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100318211256.GH5062@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If reserve_pmc_hardware() succeeds but reserve_ds_buffers()
fails, then we need to release_pmc_hardware. It won't be done
by the destroy() callback because we return before setting it
in case of error.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: robert.richter@amd.com
Cc: perfmon2-devel@lists.sf.net
LKML-Reference: <4ba1568b.15185e0a.182a.7802@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
arch/x86/kernel/cpu/perf_event.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Move the HT bit setting code from p4_pmu_event_map to
p4_hw_config. So the cache events can get HT bit set correctly.
Tested on my P4 desktop, below 6 cache events work:
L1-dcache-load-misses
LLC-load-misses
dTLB-load-misses
dTLB-store-misses
iTLB-loads
iTLB-load-misses
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, we use opcode(Event and Event-Selector) + emask to
look up template in p4_templates.
But cache events (L1-dcache-load-misses, LLC-load-misses, etc)
use the same event(P4_REPLAY_EVENT) to do the counting, ie, they
have the same opcode and emask. So we can not use current lookup
mechanism to find the template for cache events.
This patch introduces a "key", which is the index into
p4_templates. The low 12 bits of CCCR are reserved, so we can
hide the "key" in the low 12 bits of hwc->config.
We extract the key from hwc->config and then quickly find the
template.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC
case we're safe to remove this conditional compilation and clean up
the code a bit.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: fweisbec@gmail.com
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: peterz@infradead.org
LKML-Reference: <20100317104356.232371479@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
SGI:UV: Delete extra boot messages that describe the system
topology. These messages are no longer useful.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100317154038.GA29346@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The same information is stored also in x86_pmu.intel_ctrl. This
patch removes perf_event_mask and instead uses
x86_pmu.intel_ctrl directly.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-5-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This member in the struct is not used anymore and can be
removed.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function reserve_pmc_hardware() and release_pmc_hardware()
were hard to read. This patch improves readability of the code by
removing most of the CONFIG_X86_LOCAL_APIC macros.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.
As a general rule, weak functions should not have their symbol
exported in the same file they are defined.
So let's export it on trace_event_perf.c as it is used by trace
events only.
This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If x86_pmu.hw_config() fails a fixed error code (-EOPNOTSUPP) is
returned even if a different error was reported. This patch fixes
this.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Lin Ming <ming.m.lin@intel.com>
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: gorcunov@openvz.org
Cc: peterz@infradead.org
Cc: fweisbec@gmail.com
LKML-Reference: <20100316160733.GR1585@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.
As a general rule, weak functions should not have their symbol
exported in the same file they are defined.
So let's export it on trace_event_perf.c as it is used by trace
events only.
This fixes:
ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar reported that with the recent changes of not
statically blocking IRQ0_VECTOR..IRQ15_VECTOR's on all the
cpu's, broke an AMD platform (with Nvidia chipset) boot when
"noapic" boot option is used.
On this platform, legacy PIC interrupts are getting delivered to
all the cpu's instead of just the boot cpu. Thus not
initializing the vector to irq mapping for the legacy irq's
resulted in not handling certain interrupts causing boot hang.
Fix this by initializing the vector to irq mapping on all the
logical cpu's, if the legacy IRQ is handled by the legacy PIC.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
[ -v2: io-apic-enabled improvement ]
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1268692386.3296.43.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This should turn on instruction counting on P4s, which was missing in
the first version of the new PMU driver.
It's inaccurate for now, we still need dependant event to tag mops
before we can count them precisely. The result is that the number of
instruction may be lifted up.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the early _PDC evaluation path knows how to correctly
evaluate _PDC on only physically present processors, there's no
need for the processor driver to evaluate it later when it loads.
To cover the hotplug case, push _PDC evaluation down into the
hotplug paths.
Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
acpi=ht was important in 2003 -- before ACPI was
universally deployed and enabled by default in
the major Linux distributions.
At that time, there were a fair number of people who
or chose to, or needed to, run with acpi=off,
yet also wanted access to Hyper-threading.
Today we find that many invocations of "acpi=ht"
are accidental, and thus is it possible that it
is doing more harm than good.
In 2.6.34, we warn on invocation of acpi=ht.
In 2.6.35, we delete the boot option.
Signed-off-by: Len Brown <len.brown@intel.com>
SuSE added these entries when deploying ACPI in Linux-2.4.
I pulled them into Linux-2.6 on 2003-08-09.
Over the last 6+ years, several entries have proven to be
unnecessary and deleted, while no new entries have been added.
Matthew suggests that they now have negative value, and I agree.
Based-on-patch-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit f56e8a076 "x86/mce: Fix RCU lockdep splats" introduced the
following build bug:
arch/x86/kernel/cpu/mcheck/mce.c: In function 'mce_log':
arch/x86/kernel/cpu/mcheck/mce.c:166: error: 'mce_read_mutex' undeclared (first use in this function)
arch/x86/kernel/cpu/mcheck/mce.c:166: error: (Each undeclared identifier is reported only once
arch/x86/kernel/cpu/mcheck/mce.c:166: error: for each function it appears in.)
Move the in-the-middle-of-file lock variable up to the variable
definition section, the top of the .c file.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1267830207-9474-3-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix pick_next_highest_task_rt() for cgroups
sched: Cleanup: remove unused variable in try_to_wake_up()
x86: Fix sched_clock_cpu for systems with unsynchronized TSC
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems
x86, UV: Fix target_cpus() in x2apic_uv_x.c
x86: Reduce per cpu warning boot up messages
x86: Reduce per cpu MCA boot up messages
x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_aperture
Fix trivial conflicts in kernel/trace/ftrace.c
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Provide generic perf_sample_data initialization
MAINTAINERS: Add Arnaldo as tools/perf/ co-maintainer
perf trace: Don't use pager if scripting
perf trace/scripting: Remove extraneous header read
perf, ARM: Modify kuser rmb() call to compile for Thumb-2
x86/stacktrace: Don't dereference bad frame pointers
perf archive: Don't try to collect files without a build-id
perf_events, x86: Fixup fixed counter constraints
perf, x86: Restrict the ANY flag
perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
perf, x86: add some IBS macros to perf_event.h
perf, x86: make IBS macros available in perf_event.h
hw-breakpoints: Remove stub unthrottle callback
x86/hw-breakpoints: Remove the name field
perf: Remove pointless breakpoint union
perf lock: Drop the buffers multiplexing dependency
perf lock: Fix and add misc documentally things
percpu: Add __percpu sparse annotations to hw_breakpoint
Ingo reported:
|
| There's a build failure on -tip with the P4 driver, on UP 32-bit, if
| PERF_EVENTS is enabled but UP_APIC is disabled:
|
| arch/x86/built-in.o: In function `p4_pmu_handle_irq':
| perf_event.c:(.text+0xa756): undefined reference to `apic'
| perf_event.c:(.text+0xa76e): undefined reference to `apic'
|
So we have to unmask LVTPC only if we're configured to have one.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100313081116.GA5179@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
de957628ce changed setting of the
x86_init.iommu.iommu_init function ptr only when GART IOMMU is
found.
One side effect of it is that num_k8_northbridges
is not initialized anymore if not explicitly
called. This resulted in uninitialized pointers in
<arch/x86/kernel/cpu/intel_cacheinfo.c:amd_calc_l3_indices()>,
for example, which uses the num_k8_northbridges thing through
node_to_k8_nb_misc().
Fix that through an initcall that runs right after the PCI
subsystem and does all the scanning. Then, remove initialization
in gart_iommu_init() which is a rootfs_initcall and we're
running before that.
What is more, since num_k8_northbridges is being used in other
places beside GART IOMMU, include it whenever we add AMD CPU
support. The previous dependency chain in kconfig contained
K8_NB depends on AGP_AMD64|GART_IOMMU
which was clearly incorrect. The more natural way in terms of
hardware dependency should be
AGP_AMD64|GART_IOMMU depends on K8_NB depends on CPU_SUP_AMD &&
PCI. Make it so Number One!
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100312144303.GA29262@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
doc: fix typo in comment explaining rb_tree usage
Remove fs/ntfs/ChangeLog
doc: fix console doc typo
doc: cpuset: Update the cpuset flag file
Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
Remove drivers/parport/ChangeLog
Remove drivers/char/ChangeLog
doc: typo - Table 1-2 should refer to "status", not "statm"
tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
devres/irq: Fix devm_irq_match comment
Remove reference to kthread_create_on_cpu
tree-wide: Assorted spelling fixes
tree-wide: fix 'lenght' typo in comments and code
drm/kms: fix spelling in error message
doc: capitalization and other minor fixes in pnp doc
devres: typo fix s/dev/devm/
Remove redundant trailing semicolons from macros
fix typo "definetly" -> "definitely" in comment
tree-wide: s/widht/width/g typo in comments
...
Fix trivial conflict in Documentation/laptops/00-INDEX
Add generic implementations of the old and really old uname system calls.
Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
not going to bother with another ifdef for that special case.
m32r implemented an old uname but never wired it up, so kill it, too.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On an architecture that supports 32-bit compat we need to override the
reported machine in uname with the 32-bit value. Instead of doing this
separately in every architecture introduce a COMPAT_UTS_MACHINE define in
<asm/compat.h> and apply it directly in sys_newuname().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a generic implementation of the ipc demultiplexer syscall. Except for
s390 and sparc64 all implementations of the sys_ipc are nearly identical.
There are slight differences in the types of the parameters, where mips
and powerpc as the only 64-bit architectures with sys_ipc use unsigned
long for the "third" argument as it gets casted to a pointer later, while
it traditionally is an "int" like most other paramters. frv goes even
further and uses unsigned long for all parameters execept for "ptr" which
is a pointer type everywhere. The change from int to unsigned long for
"third" and back to "int" for the others on frv should be fine due to the
in-register calling conventions for syscalls (we already had a similar
issue with the generic sys_ptrace), but I'd prefer to have the arch
maintainers looks over this in details.
Except for that h8300, m68k and m68knommu lack an impplementation of the
semtimedop sub call which this patch adds, and various architectures have
gets used - at least on i386 it seems superflous as the compat code on
x86-64 and ia64 doesn't even bother to implement it.
[akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a generic implementation of the old mmap() syscall, which expects its
argument in a memory block and switch all architectures over to use it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
target_cpu() should initially target all cpus, not just cpu 0.
Otherwise systems with lots of disks can exhaust the interrupt
vectors on cpu 0 if a large number of disks are discovered
before the irq balancer is running.
Note: UV code only...
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100311184328.GA21433@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In case of not assigned x86_pmu and software events NULL dereference may
being hit via x86_pmu::schedule_events method.
Fix it by checking if x86_pmu is initialized at all.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100311215016.GG25162@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The netburst PMU is way different from the "architectural
perfomance monitoring" specification that current CPUs use.
P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
perfomance monitoring events.
A few implementational details:
1) We need a separate x86_pmu::hw_config helper in struct
x86_pmu since register bit-fields are quite different from P6,
Core and later cpu series.
2) For the same reason is a x86_pmu::schedule_events helper
introduced.
3) hw_perf_event::config consists of packed ESCR+CCCR values.
It's allowed since in reality both registers only use a half
of their size. Of course before making a real write into a
particular MSR we need to unpack the value and extend it to
a proper size.
4) The tuple of packed ESCR+CCCR in hw_perf_event::config
doesn't describe the memory address of ESCR MSR register
so that we need to keep a mapping between these tuples
used and available ESCR (various P4 events may use same
ESCRs but not simultaneously), for this sake every active
event has a per-cpu map of hw_perf_event::idx <--> ESCR
addresses.
5) Since hw_perf_event::idx is an offset to counter/control register
we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
strips it down to 8 registers and event armed may never be turned
off (ie the bit in active_mask is set but the loop never reaches
this index to check), thanks to Peter Zijlstra
Restrictions:
- No cascaded counters support (do we ever need them?)
- No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
doesn't work for now)
- There are events with same counters which can't work simultaneously
(need to use intersected ones due to broken counter 1)
- No PERF_COUNT_HW_CACHE_ events yet
Todo:
- Implement dependent events
- Need proper hashing for event opcodes (no linear search, good for
debugging stage but not in real loads)
- Some events counted during a clock cycle -- need to set threshold
for them and count every clock cycle just to get summary statistics
(ie to behave the same way as other PMUs do)
- Need to swicth to use event_constraints
- To support RAW events we need to encode a global list of P4 events
into p4_templates
- Cache events need to be added
Event support status matrix:
Event status
-----------------------------
cycles works
cache-references works
cache-misses works
branch-misses works
bus-cycles partially (does not work on 64bit cpu with HT enabled)
instruction doesnt work (needs dependent event [mop tagging])
branches doesnt work
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100311165439.GB5129@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>