* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
x86, perf events: Check if we have APIC enabled
perf_event: Fix variable initialization in other codepaths
perf kmem: Fix unused argument build warning
perf symbols: perf_header__read_build_ids() offset'n'size should be u64
perf symbols: dsos__read_build_ids() should read both user and kernel buildids
perf tools: Align long options which have no short forms
perf kmem: Show usage if no option is specified
sched: Mark sched_clock() as notrace
perf sched: Add max delay time snapshot
perf tools: Correct size given to memset
perf_event: Fix perf_swevent_hrtimer() variable initialization
perf sched: Fix for getting task's execution time
tracing/kprobes: Fix field creation's bad error handling
perf_event: Cleanup for cpu_clock_perf_event_update()
perf_event: Allocate children's perf_event_ctxp at the right time
perf_event: Clean up __perf_event_init_context()
hw-breakpoints: Modify breakpoints without unregistering them
perf probe: Update perf-probe document
perf probe: Support --del option
trace-kprobe: Support delete probe syntax
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
[CPUFREQ] make internal cpufreq_add_dev_* static
[CPUFREQ] use an enum for speedstep processor identification
[CPUFREQ] Document units for transition latency
[CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings
[CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/
[CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
[CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
ext3: Fix data / filesystem corruption when write fails to copy data
ext4: Support for 64-bit quota format
ext3: Support for vfsv1 quota format
quota: Implement quota format with 64-bit space and inode limits
quota: Move definition of QFMT_OCFS2 to linux/quota.h
ext2: fix comment in ext2_find_entry about return values
ext3: Unify log messages in ext3
ext2: clear uptodate flag on super block I/O error
ext2: Unify log messages in ext2
ext3: make "norecovery" an alias for "noload"
ext3: Don't update the superblock in ext3_statfs()
ext3: journal all modifications in ext3_xattr_set_handle
ext2: Explicitly assign values to on-disk enum of filetypes
quota: Fix WARN_ON in lookup_one_len
const: struct quota_format_ops
ubifs: remove manual O_SYNC handling
afs: remove manual O_SYNC handling
kill wait_on_page_writeback_range
vfs: Implement proper O_SYNC semantics
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Always process the whole breakpoint list on activate or deactivate
kgdb: continue and warn on signal passing from gdb
kgdb,x86: do not set kgdb_single_step on x86
kgdb: allow for cpu switch when single stepping
kgdb,i386: Fix corner case access to ss with NMI watch dog exception
kgdb: Replace strstr() by strchr() for single-character needles
kgdbts: Read buffer overflow
kgdb: Read buffer overflow
kgdb,x86: remove redundant test
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap:
Add missing alignment check in arch/score sys_mmap()
fix broken aliasing checks for MAP_FIXED on sparc32, mips, arm and sh
Get rid of open-coding in ia64_brk()
sparc_brk() is not needed anymore
switch do_brk() to get_unmapped_area()
Take arch_mmap_check() into get_unmapped_area()
fix a struct file leak in do_mmap_pgoff()
Unify sys_mmap*
Cut hugetlb case early for 32bit on ia64
arch_mmap_check() on mn10300
Kill ancient crap in s390 compat mmap
arm: add arch_mmap_check(), get rid of sys_arm_mremap()
file ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()
kill useless checks in sparc mremap variants
fix pgoff in "have to relocate" case of mremap()
fix the arch checks in MREMAP_FIXED case
fix checks for expand-in-place mremap
do_mremap() untangling, part 3
do_mremap() untangling, part 2
untangling do_mremap(), part 1
On an SMP system the kgdb_single_step flag has the possibility to
indefinitely hang the system in the case. Consider the case where,
CPU 1 has the schedule lock and CPU 0 is set to single step, there is
no way for CPU 0 to run another task.
The easy way to observe the problem is to make 2 cpus busy, and run
the kgdb test suite. You will see that it hangs the system very
quickly.
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
echo V1 > /sys/module/kgdbts/parameters/kgdbts
The side effect of this patch is that there is the possibility
to miss a breakpoint in the case that a single step operation
was executed to step over a breakpoint in common code.
The trade off of the missed breakpoint is preferred to
hanging the kernel. This can be fixed in the future by
using kprobes or another strategy to step over planted
breakpoints with out of line execution.
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
It is possible for the user_mode_vm(regs) check to return true on the
i368 arch for a non master kgdb cpu or when the master kgdb cpu
handles the NMI watch dog exception.
The solution is simply to select the correct gdb_ss location
based on the check to user_mode_vm(regs).
CC: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The for loop starts with a breakno of 0, and ends when it's 4. so
this test is always true.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
New helper - sys_mmap_pgoff(); switch syscalls to using it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: try harder to balloon up under memory pressure.
Xen balloon: fix totalram_pages counting.
xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
xen: improve error handling in do_suspend.
xen: don't leak IRQs over suspend/resume.
xen: call clock resume notifier on all CPUs
xen: use iret for return from 64b kernel to 32b usermode
xen: don't call dpm_resume_noirq() with interrupts disabled.
xen: register runstate info for boot CPU early
xen: register runstate on secondary CPUs
xen: register timer interrupt with IRQF_TIMER
xen: correctly restore pfn_to_mfn_list_list after resume
xen: restore runstate_info even if !have_vcpu_info_placement
xen: re-register runstate area earlier on resume.
xen: wait up to 5 minutes for device connetion
xen: improvement to wait_for_devices()
xen: fix is_disconnected_device/exists_disconnected_device
xen/xenbus: make DEVICE_ATTR()s static
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment. This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.
This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag. To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.
This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition. Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set. The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.
We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.
Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op. We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPICA: Update version to 20091112.
ACPICA: Add additional module-level code support
ACPICA: Deploy new create integer interface where appropriate
ACPICA: New internal utility function to create Integer objects
ACPICA: Add repair for predefined methods that must return sorted lists
ACPICA: Fix possible fault if return Package objects contain NULL elements
ACPICA: Add post-order callback to acpi_walk_namespace
ACPICA: Change package length error message to an info message
ACPICA: Reduce severity of predefined repair messages, Warning to Info
ACPICA: Update version to 20091013
ACPICA: Fix possible memory leak for Scope ASL operator
ACPICA: Remove possibility of executing _REG methods twice
ACPICA: Add repair for bad _MAT buffers
ACPICA: Add repair for bad _BIF/_BIX packages
Currently, when ptrace needs to modify a breakpoint, like disabling
it, changing its address, type or len, it calls
modify_user_hw_breakpoint(). This latter will perform the heavy and
racy task of unregistering the old breakpoint and registering a new
one.
This is racy as someone else might steal the reserved breakpoint
slot under us, which is undesired as the breakpoint is only
supposed to be modified, sometimes in the middle of a debugging
workflow. We don't want our slot to be stolen in the middle.
So instead of unregistering/registering the breakpoint, just
disable it while we modify its breakpoint fields and re-enable it
after if necessary.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Make WARN_ON understandable
x86: arch specific support for remapping HPET MSIs
intr-remap: generic support for remapping HPET MSIs
x86, hpet: Simplify the HPET code
x86, hpet: Disable per-cpu hpet timer if ARAT is supported
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: core - Prevent too-small buffer sizes
hwrng: virtio-rng - Convert to new API
hwrng: core - Replace u32 in driver API with byte array
crypto: ansi_cprng - Move FIPS functions under CONFIG_CRYPTO_FIPS
crypto: testmgr - Add ghash algorithm test before provide to users
crypto: ghash-clmulni-intel - Put proper .data section in place
crypto: ghash-clmulni-intel - Use gas macro for PCLMULQDQ-NI and PSHUFB
crypto: aesni-intel - Use gas macro for AES-NI instructions
x86: Generate .byte code for some new instructions via gas macro
crypto: ghash-intel - Fix irq_fpu_usable usage
crypto: ghash-intel - Add PSHUFB macros
crypto: ghash-intel - Hard-code pshufb
crypto: ghash-intel - Fix building failure on x86_32
crypto: testmgr - Fix warning
crypto: ansi_cprng - Fix test in get_prng_bytes
crypto: hash - Remove cra_u.{digest,hash}
crypto: api - Remove digest case from procfs show handler
crypto: hash - Remove legacy hash/digest code
crypto: ansi_cprng - Add FIPS wrapper
crypto: ghash - Add PCLMULQDQ accelerated implementation
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: don't restart timer if disabled
x86: Use -maccumulate-outgoing-args for sane mcount prologues
x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage
x86: AMD Northbridge: Verify NB's node is online
x86 VSDO: Fix Kconfig help
x86: Fix typo in Intel CPU cache size descriptor
x86: Add new Intel CPU cache size descriptors
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
vgacon: Add support for setting the default cursor state
vc: Add support for hiding the cursor when creating VTs
x86, setup: Store the boot cursor state
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency
* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: merge the standard and compat start_thread() functions
x86-64: make compat_start_thread() match start_thread()
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: pat: Remove ioremap_default()
x86: pat: Clean up req_type special case for reserve_memtype()
x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86, mm: Correct the implementation of is_untracked_pat_range()
x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
x86, mtrr: Fix sorting of mtrr after subtracting
x86: Move find_smp_config() earlier and avoid bootmem usage
x86, platform: Change is_untracked_pat_range() to bool; cleanup init
x86: Change is_ISA_range() into an inline function
x86, mm: is_untracked_pat_range() takes a normal semiclosed range
x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
x86: UV SGI: Don't track GRU space in PAT
x86: SGI UV: Fix BAU initialization
x86, numa: Use near(er) online node instead of roundrobin for NUMA
x86, numa, bootmem: Only free bootmem on NUMA failure path
x86: Change crash kernel to reserve via reserve_early()
x86: Eliminate redundant/contradicting cache line size config options
x86: When cleaning MTRRs, do not fold WP into UC
x86: remove "extern" from function prototypes in <asm/proto.h>
x86, mm: Report state of NX protections during boot
x86, mm: Clean up and simplify NX enablement
x86, pageattr: Make set_memory_(x|nx) aware of NX support
x86, sleep: Always save the value of EFER
...
Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
arch/x86/include/asm/x86_init.h
arch/x86/kernel/x86_init.c
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ucode-amd: Move family check to microcde_amd.c's init function
x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
x86: ucode-amd: Load ucode-patches once and not separately of each CPU
x86, amd-ucode: Remove needless log messages
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
cfq-iosched: Do not access cfqq after freeing it
block: include linux/err.h to use ERR_PTR
cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
blkio: Allow CFQ group IO scheduling even when CFQ is a module
blkio: Implement dynamic io controlling policy registration
blkio: Export some symbols from blkio as its user CFQ can be a module
block: Fix io_context leak after failure of clone with CLONE_IO
block: Fix io_context leak after clone with CLONE_IO
cfq-iosched: make nonrot check logic consistent
io controller: quick fix for blk-cgroup and modular CFQ
cfq-iosched: move IO controller declerations to a header file
cfq-iosched: fix compile problem with !CONFIG_CGROUP
blkio: Documentation
blkio: Wait on sync-noidle queue even if rq_noidle = 1
blkio: Implement group_isolation tunable
blkio: Determine async workload length based on total number of queues
blkio: Wait for cfq queue to get backlogged if group is empty
blkio: Propagate cgroup weight updation to cfq groups
blkio: Drop the reference to queue once the task changes cgroup
blkio: Provide some isolation between groups
...
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
KVM: VMX: Fix comparison of guest efer with stale host value
KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
KVM: Drop user return notifier when disabling virtualization on a cpu
KVM: VMX: Disable unrestricted guest when EPT disabled
KVM: x86 emulator: limit instructions to 15 bytes
KVM: s390: Make psw available on all exits, not just a subset
KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
KVM: VMX: Report unexpected simultaneous exceptions as internal errors
KVM: Allow internal errors reported to userspace to carry extra data
KVM: Reorder IOCTLs in main kvm.h
KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
KVM: only clear irq_source_id if irqchip is present
KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
KVM: VMX: Remove vmx->msr_offset_efer
KVM: MMU: update invlpg handler comment
KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
KVM: remove duplicated task_switch check
KVM: powerpc: Fix BUILD_BUG_ON condition
KVM: VMX: Use shared msr infrastructure
...
Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
Delete empty or incomplete inat-tables.c if gen-insn-attr-x86.awk
failed, because it causes a build error if user tries to build
kernel next time.
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091207170033.19230.37688.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At least, insn.c and inat.c is needed for kprobe for now. So,
this compile those only if KPROBES is enabled.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <878wdg8icq.fsf@devron.myhome.or.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the following warning:
arch/x86/tools/test_get_len.c: In function "main":
arch/x86/tools/test_get_len.c:116: warning: unused variable "c"
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we enter in irq, two things can happen to preserve the link
to the previous frame pointer:
- If we were in an irq already, we don't switch to the irq stack
as we are inside. We just need to save the previous frame
pointer and to link the new one to the previous.
- Otherwise we need another level of indirection. We enter the irq with
the previous stack. We save the previous bp inside and make bp
pointing to its saved address. Then we switch to the irq stack and
push bp another time but to the new stack. This makes two levels to
dereference instead of one.
In the second case, the current stacktrace code omits the second level
and loses the frame pointer accuracy. The stack that follows will then
be considered as unreliable.
Handling that makes the perf callchain happier.
Before:
43.94% [k] _raw_read_lock
|
--- _read_lock
|
|--60.53%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| synaptics_process_byte
| psmouse_handle_byte
| psmouse_interrupt
| serio_interrupt
| i8042_interrupt
| handle_IRQ_event
| handle_edge_irq
| handle_irq
| __irqentry_text_start
| ret_from_intr
| |
| |--30.43%-- __select
| |
| |--17.39%-- 0x454f15
| |
| |--13.04%-- __read
| |
| |--13.04%-- vread_hpet
| |
| |--13.04%-- _xcb_lock_io
| |
| --13.04%-- 0x7f630878ce8
After:
50.00% [k] _raw_read_lock
|
--- _read_lock
|
|--98.97%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| |
| |--96.88%-- synaptics_process_byte
| | psmouse_handle_byte
| | psmouse_interrupt
| | serio_interrupt
| | i8042_interrupt
| | handle_IRQ_event
| | handle_edge_irq
| | handle_irq
| | __irqentry_text_start
| | ret_from_intr
| | |
| | |--39.78%-- __const_udelay
| | | |
| | | |--91.89%-- ath5k_hw_register_timeout
| | | | ath5k_hw_noise_floor_calibration
| | | | ath5k_hw_reset
| | | | ath5k_reset
| | | | ath5k_config
| | | | ieee80211_hw_config
| | | | |
| | | | |--88.24%-- ieee80211_scan_work
| | | | | worker_thread
| | | | | kthread
| | | | | child_rip
| | | | |
| | | | --11.76%-- ieee80211_scan_completed
| | | | ieee80211_scan_work
| | | | worker_thread
| | | | kthread
| | | | child_rip
| | | |
| | | --8.11%-- ath5k_hw_noise_floor_calibration
| | | ath5k_hw_reset
| | | ath5k_reset
| | | ath5k_config
Note: This does not only affect perf events but also x86-64
stacktraces. They were considered as unreliable once we quit
the irq stack frame.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
While dumping a stacktrace, the end of the exception stack won't link
the frame pointer to the previous stack.
The interrupted stack will then be considered as unreliable and ignored
by perf, as the frame pointer is unreliable itself.
This happens because we overwrite the frame pointer that links to the
interrupted frame with the address of the exception stack. This is
done in order to reserve space inside.
But rbp has been chosen here only because it is not a scratch register,
so that the address of the exception stack remains in rbp after calling
do_debug(), we can then release the exception stack space without the
need to retrieve its address again.
But we can pick another non-scratch register to do that, so that we
preserve the link to the interrupted stack frame in the stacktraces.
Just randomly choose r12. Every registers are saved just before and
restored just after calling do_debug(). And r12 is not used in the
middle, which makes it a perfect candidate.
Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw
Before:
44.18% [k] _raw_read_lock
|
|
--- |--6.31%-- waitid
|
|--4.26%-- writev
|
|--3.63%-- __select
|
|--3.15%-- __waitpid
| |
| |--28.57%-- 0x8b52e00000139f
| |
| |--28.57%-- 0x8b52e0000013c6
| |
| |--14.29%-- 0x7fde786dc000
| |
| |--14.29%-- 0x62696c2f7273752f
| |
| --14.29%-- 0x1ea9df800000000
|
|--3.00%-- __poll
After:
43.94% [k] _raw_read_lock
|
--- _read_lock
|
|--60.53%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| synaptics_process_byte
| psmouse_handle_byte
| psmouse_interrupt
| serio_interrupt
| i8042_interrupt
| handle_IRQ_event
| handle_edge_irq
| handle_irq
| __irqentry_text_start
| ret_from_intr
| |
| |--30.43%-- __select
| |
| |--17.39%-- 0x454f15
| |
| |--13.04%-- __read
| |
| |--13.04%-- vread_hpet
| |
| |--13.04%-- _xcb_lock_io
| |
| --13.04%-- 0x7f630878ce87
Note: it does not only affect perf events but also other stacktraces in
x86-64. They were considered as unreliable once we quit the debug
stack frame.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Dumping the callchains from breakpoint events with perf gives strange
results:
3.75% perf [kernel] [k] _raw_read_unlock
|
--- _raw_read_unlock
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_bp_event
hw_breakpoint_exceptions_notify
notifier_call_chain
__atomic_notifier_call_chain
atomic_notifier_call_chain
notify_die
do_debug
debug
munmap
We are infected with all the debug stack. Like the nmi stack, the debug
stack is undesired as it is part of the profiling path, not helpful for
the user.
Ignore it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
struct perf_event::event callback was called when a breakpoint
triggers. But this is a rather opaque callback, pretty
tied-only to the breakpoint API and not really integrated into perf
as it triggers even when we don't overflow.
We prefer to use overflow_handler() as it fits into the perf events
rules, being called only when we overflow.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Drop the callback and task parameters from modify_user_hw_breakpoint().
For now we have no user that need to modify a breakpoint to the point
of changing its handler or its task context.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Limit number of per cpu TSC sync messages
x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks
x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details
x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes
x86: Suppress stack overrun message for init_task
x86: Fix cpu_devs[] initialization in early_cpu_init()
x86: Remove CPU cache size output for non-Intel too
x86: Minimise printk spew from per-vendor init code
x86: Remove the CPU cache size printk's
cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c
x86: Make sure we also print a Code: line for show_regs()
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
x86, cpuid: Simplify the code in cpuid_open
x86, cpuid: Remove the bkl from cpuid_open()
x86, msr: Remove the bkl from msr_open()
x86: AMD Geode LX optimizations
x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix a section mismatch in arch/x86/kernel/setup.c
x86: Fixup last users of irq_chip->typename
x86: Remove BKL from apm_32
x86: Remove BKL from microcode
x86: use kernel_stack_pointer() in kprobes.c
x86: use kernel_stack_pointer() in kgdb.c
x86: use kernel_stack_pointer() in dumpstack.c
x86: use kernel_stack_pointer() in process_32.c
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
x86/alternatives: Check replacementlen <= instrlen at build time
x86, 64-bit: Set data segments to null after switching to 64-bit mode
x86: Clean up the loadsegment() macro
x86: Optimize loadsegment()
x86: Add missing might_fault() checks to copy_{to,from}_user()
x86-64: __copy_from_user_inatomic() adjustments
x86: Remove unused thread_return label from switch_to()
x86, 64-bit: Fix bstep_iret jump
x86: Don't use the strict copy checks when branch profiling is in use
x86, 64-bit: Move K8 B step iret fixup to fault entry asm
x86: Generate cmpxchg build failures
x86: Add a Kconfig option to turn the copy_from_user warnings into errors
x86: Turn the copy_from_user check into an (optional) compile time warning
x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
x86, apic: Enable lapic nmi watchdog on AMD Family 11h
x86: Remove unnecessary mdelay() from cpu_disable_common()
x86, ioapic: Document another case when level irq is seen as an edge
x86, ioapic: Fix the EOI register detection mechanism
x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
x86: SGI UV: Map low MMR ranges
x86: apic: Print out SRAT table APIC id in hex
x86: Re-get cfg_new in case reuse/move irq_desc
x86: apic: Remove not needed #ifdef
x86: io-apic: IO-APIC MMIO should not fail on resource insertion
x86: Remove asm/apicnum.h
x86: apic: Do not use stacked physid_mask_t
x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
x86, ioapic: Use snrpintf while set names for IO-APIC resourses
x86, apic: Use PAGE_SIZE instead of numbers
x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
x86: Use EOI register in io-apic on intel platforms
x86: Force irq complete move during cpu offline
x86: Remove move_cleanup_count from irq_cfg
x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
...