This patch adds an internal sofware filter to complement
the (optional) LBR hardware filter.
The software filter is necessary:
- as a substitute when there is no HW LBR filter (e.g., Atom, Core)
- to complement HW LBR filter in case of errata (e.g., Nehalem/Westmere)
- to provide finer grain filtering (e.g., all processors)
Sometimes the LBR HW filter cannot distinguish between two types
of branches. For instance, to capture syscall as CALLS, it is necessary
to enable the LBR_FAR filter which will also capture JMP instructions.
Thus, a second pass is necessary to filter those out, this is what the
SW filter can do.
The SW filter is built on top of the internal x86 disassembler. It
is a best effort filter especially for user level code. It is subject
to the availability of the text page of the program.
The SW filter is enabled on all Intel processors. It is bypassed
when the user is capturing all branches at all priv levels.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements PERF_SAMPLE_BRANCH support for Intel
x86processors. It connects PERF_SAMPLE_BRANCH to the actual LBR.
The patch adds the hooks in the PMU irq handler to save the LBR
on counter overflow for both regular and PEBS modes.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-8-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch adds a restriction for Intel Atom LBR support. Only
steppings 10 (PineView) and more recent are supported. Older models
do not have a functional LBR. Their LBR does not freeze on PMU
interrupt which makes LBR unusable in the context of perf_events.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-7-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the mappings from the generic PERF_SAMPLE_BRANCH_*
filters to the actual Intel x86LBR filters, whenever they exist.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-6-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If precise sampling is enabled on Intel x86 then perf_event uses PEBS.
To correct for the off-by-one error of PEBS, perf_event uses LBR when
precise_sample > 1.
On Intel x86 PERF_SAMPLE_BRANCH_STACK is implemented using LBR,
therefore both features must be coordinated as they may not
configure LBR the same way.
For PEBS, LBR needs to capture all branches at the priv level of
the associated event.
This patch checks that the branch type and priv level of BRANCH_STACK
is compatible with that of the PEBS LBR requirement, thereby allowing:
$ perf record -b any,u -e instructions:upp ....
But:
$ perf record -b any_call,u -e instructions:upp
Is not possible.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Intel LBR on some recent processor is capable
of filtering branches by type. The filter is configurable
via the LBR_SELECT MSR register.
There are limitation on how this register can be used.
On Nehalem/Westmere, the LBR_SELECT is shared by the two HT threads
when HT is on. It is private to each core when HT is off.
On SandyBridge, the LBR_SELECT register is private to each thread
when HT is on. It is private to each core when HT is off.
The kernel must manage the sharing of LBR_SELECT. It allows
multiple users on the same logical CPU to use LBR_SELECT as
long as they program it with the same value. Across sibling
CPUs (HT threads), the same restriction applies on NHM/WSM.
This patch implements this sharing logic by leveraging the
mechanism put in place for managing the offcore_response
shared MSR.
We modify __intel_shared_reg_get_constraints() to cause
x86_get_event_constraint() to be called because LBR may
be associated with events that may be counter constrained.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the LBR definitions for NHM/WSM/SNB and Core.
It also adds the definitions for the architected LBR MSR:
LBR_SELECT, LBRT_TOS.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the ability to sample taken branches to the
perf_event interface.
The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.
This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.
To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.
Sampled taken branches may be filtered by type and/or priv
levels.
The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.
Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.
The following generic filters are currently defined:
- PERF_SAMPLE_USER
only branches whose targets are at the user level
- PERF_SAMPLE_KERNEL
only branches whose targets are at the kernel level
- PERF_SAMPLE_HV
only branches whose targets are at the hypervisor level
- PERF_SAMPLE_ANY
any type of branches (subject to priv levels filters)
- PERF_SAMPLE_ANY_CALL
any call branches (may incl. syscall on some arch)
- PERF_SAMPLE_ANY_RET
any return branches (may incl. syscall returns on some arch)
- PERF_SAMPLE_IND_CALL
indirect call branches
Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).
The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the sampling tools (top, record) are back working on AMD systems.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJPUmaEAAoJENZQFvNTUqpA12IP/1JyMC7jtS6PUTKYOnTfwe51
hTdFBUCAquj2cFJNWOe9AidBe1eOUFMgdNBW3hEhvZA3arOJVO+eAyUb72y24yQm
mbn1GMIYEj3MR+nESo7HqQ4J9byE1ngNp8IYZk3wmTit+6k8XaF2dN4o+3ZPdY6S
zkOT4bjzG8C4Y614S/qXdulS42EJUG/oyhB0Fhom2LEFt+WCCB/b0Z2XIZuh2dEt
E1oEzy5wcQZMhIFpl4LWrS/zPJ8+NBAJdBmuNvHoF5IFZxIqcK4/vUWxMVVzMc3f
xVO6rzzcpENHF/qVXztDzeIfVcnMIKBY2cEzsZI17e+gH+Mdda0ByQXwbbG4ccxF
DIxy6gwQkUpTX8tvxv1E2LTFl7qAEQ+2Ryv3yY54jjJ/UwayesEFZk0yPnnOauV6
3wHKqVmyz0We+0zsh5C7UWUiA2DI+dR1xxONd7c1FoXNRpFxCwd1DGBhG5sWTjOi
9Loh0Zq2SyD1r5f6nDbdKw1e9LaK25DLZrlkQUcGyB+sZcyQL4oYDAZK9dJHdUNj
fzaQb7e+3fFZP0hcVhUy17tTWR67pg6BGljrerVD75AKZjPpsGCj4t36yv80y+fK
yIgLyTcbGHFubqc2EvxJ9BMhSKjla/s5sqvsrQv/5OYPNrtBVaHBU5Xg+xbF+ni0
hG0MiE02CAxCau6Yxh56
=sbHZ
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Cherry picked fixes from perf/core, together with the kernel fix (1018faa),
the sampling tools (top, record) are back working on AMD systems.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just fall back to resetting those fields, if set, warning the user that
that feature is not available.
If guest samples appear they will just be discarded because no struct
machine will be found and thus the event will be accounted as not
handled and dropped, see 0c09571.
Reported-by: Namhyung Kim <namhyung@gmail.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-vuwxig36mzprl5n7nzvnxxsh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Setting perf_guest to true by default makes no sense because the perf
subcommands can not setup guest symbol information and thus not process
and guest samples. The only exception is perf-kvm which changes the
perf_guest value on its own. So change the default for perf_guest back
to false.
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328893505-4115-3-git-send-email-joerg.roedel@amd.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A recent refactoring of perf-record introduced the following:
perf record -a -B
Couldn't generating buildids. Use --no-buildid to profile anyway.
sleep: Terminated
I believe the triple negative was meant to be only a double negative.
:-) While I'm there, fixed the grammar on the error message.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1328567272-13190-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It turned out that a performance counter on AMD does not
count at all when the GO or HO bit is set in the control
register and SVM is disabled in EFER.
This patch works around this issue by masking out the HO bit
in the performance counter control register when SVM is not
enabled.
The GO bit is not touched because it is only set when the
user wants to count in guest-mode only. So when SVM is
disabled the counter should not run at all and the
not-counting is the intended behaviour.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Avi Kivity <avi@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: stable@vger.kernel.org # v3.2
Link: http://lkml.kernel.org/r/1330523852-19566-1-git-send-email-joerg.roedel@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The 'perf probe' command allows kprobe to be inserted at any offset from
a function start, which results in adding kprobes to unintended
location. (example: perf probe do_fork+10000 is allowed even though
size of do_fork is ~904).
My previous patch https://lkml.org/lkml/2012/2/24/42 addressed the case
where DWARF info was available for the kernel. This patch fixes the
case where perf probe is used on a kernel without debuginfo available.
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F4C544D.1010909@linux.vnet.ibm.com
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If threads in a multi-threaded process have names shorter than the main
thread the comm for the named threads is not properly terminated.
E.g., for the process 'namedthreads' where each thread is named noploop%d
where %d is the thread number:
Before:
perf script -f comm,tid,ip,sym,dso
noploop:4ads 21616 400a49 noploop (/tmp/namedthreads)
The 'ads' in the thread comm bleeds over from the process name.
After:
perf script -f comm,tid,ip,sym,dso
noploop:4 21616 400a49 noploop (/tmp/namedthreads)
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1330111898-68071-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf probe command allows kprobe to be inserted at any offset from a
function start, which results in adding kprobes to unintended location.
Example: perf probe do_fork+10000 is allowed even though size of do_fork
is ~904.
This patch will ensure probe addition fails when the offset specified is
greater than size of the function.
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/4F473F33.4060409@linux.vnet.ibm.com
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On old kernels that don't support sample_id_all feature,
perf_evlist__id2evsel() returns NULL for non-sampling events.
This breaks perf top when multiple events are given on command line. Fix
it by using first evsel in the evlist. This will also prevent getting
the same (potential) problem in such new tool/ old kernel combo.
Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1329702447-25045-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the jump label enabled case, calling static_key_enabled()
results in a function call. The function returns the results of
a compare, so it really doesn't need the overhead of a full
function call. Let's make it 'static inline' for both the jump
label enabled and disabled cases.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/201202281849.q1SIn1p2023270@int-mx10.intmail.prod.int.phx2.redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If kzalloc() for TYPE_DATA failed on a given cpu, previous chunk
of TYPE_INST will be leaked. Fix it.
Thanks to Peter Zijlstra for suggesting this better solution. It
should work as long as the initial value of the region is all
0's and that's the case of static (per-cpu) memory allocation.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/1330391978-28070-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Here are some trivial NTFS changes (a spelling fix and two use before
NULL check cases found by Coverity as well as an update in MAINTAINERS
for the path to the ntfs git repo) together with a simple LDM fix for
parsing fragmented VBLKs.
* git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs:
NTFS: Update git repo path in MAINTAINERS file.
LDM: Fix reassembly of extended VBLKs.
NTFS: Correct two spelling errors "dealocate" to "deallocate" in mft.c.
NTFS: Do not dereference pointer before checking for NULL.
NTFS: Remove unused variable.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/AMD: Fix UP build error
x86: Specify a size for the cmp in the NMI handler
x86/nmi: Test saved %cs in NMI to determine nested NMI case
x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors
x86/microcode: Remove noisy AMD microcode warning
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Handle pending irqs in irq_startup()
genirq: Unmask oneshot irqs when thread was not woken
The new is_compat_task() define for the !COMPAT case in
include/linux/compat.h conflicts with a similar define in
arch/s390/include/asm/compat.h.
This is the minimal patch which fixes the build issues.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
converted back to WB but end up being recycled in the general memory
pool as WC.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPStrRAAoJEFjIrFwIi8fJovAH/RBUJdeDw8x5ki2yDhAz/80S
+yZKiGaaUYYCB0Fo/BIwVhBQeDabGz8rJCdOv40tRpRCiRD7JIfMo5tCS6QIFF7P
UvhVuJcqltxIoRjz7nGX8iSUl48JKy9vqmqWXIucG3rYQ7YOkadwVTbhsg4a9U6P
fcqexzUuXb4fr6CNBBpL3LqHfDaKNovgESHlAmzrcaRGbOADp9LVlWkR6kwiTnIA
e5yU/DEW9Ej6wJM90Mx9Rg3y22hBZEL1p5NJjaiMrOY2LzX7bE4+mTgtk+a4FNGD
8WJZm/WWhdsWrKlj8vCKOuJkIgQYJURVMySEGdzM91P1FpJ3edJxIM3qlA958vc=
=jggO
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Two fixes to fix a memory corruption bug when WC pages never get
converted back to WB but end up being recycled in the general memory
pool as WC.
There is a better way of fixing this, but there is not enough time to do
the full benchmarking to pick one of the right options - so picking the
one that favors stability for right now.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* tag 'stable/for-linus-fixes-3.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pat: Disable PAT support for now.
xen/setup: Remove redundant filtering of PTE masks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPSsdkAAoJENkgDmzRrbjxHQUP/RxiMxjwoX4Q/Fzrapg1fkag
0ZhZ2N9w/6AwIGF/IAqiX4BRYJueI3+LJp4W7kwKQTBTn9L1DEvw1i6htBhci25P
uW6ki1fJZh9ewT4FbMAkA7uapqloijfQoFEt0dsWlSMgzh+RqfVNDIhILsKaYCey
uFnRWrPPyB1vW0ofiB7j4LrEuRnEsitsg42Zqdpq4mX4t/cPfP4uZvvB0za6IxLV
MhPSyJ2cuuEtdIs/6DhrftDImu0k7Tbd3Jvsf7DmkF0wZOV5mCWRuroRq8kJWvzP
oJhv4DhHL8uLkqj3ljqkrKaSqQ+e1OVE0dq8bbA/3WRK8L1V94C53bTYcGuYnByN
reIXx5l4GeqDTi+SBMug3bXtrakv3pq92cBVmc4ERwtBPVObGU6Tljgdke05MF8j
i7Kg1bUYX3v3zyj5mVoCflpc5OXUcwA4CFn5mV6zHNNFvEbWQaQLnDTXDGihgjgd
Xl8W58V4yOqbXWpjnT09g934Vt1GuFrhraulbeo128bPUyg+CpMCh5cMdJIfSFxg
Rti7xLeXHm6D+7B5VpvA574k6Ah9HtCkbLSy/W0Wxa/3ckPHW2QYHWM3Sq1kBz2B
TH6sXLgJor1P8M64tiZsoF/VMvdWmBth3ZwOU1ggVZvtMGkqDkxqkkQxw7Cql579
o1NSSN/cyWlrcymWY02h
=Rmyt
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/rustyrussell/linux
* tag 'for-linus' of git://github.com/rustyrussell/linux:
mod/file2alias: make modpost compile on darwin again
commit e49ce14150 breaks cross compiling
the linux kernel on darwin hosts.
This fix introduce some minimal glue to adopt linker section handling
for darwin hosts.
Signed-off-by: Andreas Bießmann <andreas@biessmann.de>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Jochen Friedrich <jochen@scram.de>
CC: Samuel Ortiz <sameo@linux.intel.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Bernhard Walle <bernhard@bwalle.de>
1) ICMP sockets leave err uninitialized but we try to return it for the
unsupported MSG_OOB case, reported by Dave Jones.
2) Add new Zaurus device ID entries, from Dave Jones.
3) Pointer calculation in hso driver memset is wrong, from Dan
Carpenter.
4) ks8851_probe() checks unsigned value as negative, fix also from Dan
Carpenter.
5) Fix crashes in atl1c driver due to TX queue handling, from Eric
Dumazet. I anticipate some TX side locking fixes coming in the near
future for this driver as well.
6) The inline directive fix in Bluetooth which was breaking the build
only with very new versions of GCC, from Johan Hedberg.
7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
window, reported by Meelis Roos and fixed by Eric Dumazet.
8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.
9) Some ip6_route_output() callers test the return value for NULL, but
this never happens as the convention is to return a dst entry with
dst->error set. Fixes from RonQing Li.
10) Logitech Harmony 900 should be handled by zaurus driver not
cdc_ether, update white lists and black lists accordingly. From
Scott Talbert.
11) Receiving from certain kinds of devices there won't be a MAC header,
so there is no MAC header to fixup in the IPSEC code, and if we try
to do it we'll crash. Fix from Eric Dumazet.
12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
Petrilin.
13) Fix regression in link-down handling in davinci_emac which causes
all RX descriptors to be freed up and therefore RX to wedge
completely, from Christian Riesch.
14) It took two attempts, but ctnetlink soft lockups seem to be
cured now, from Pablo Neira Ayuso.
15) Endianness bug fix in ENIC driver, from Santosh Nayak.
16) The long ago conversion of the PPP fragmentation code over to
abstracted SKB list handling wasn't perfect, once we get an
out of sequence SKB we don't flush the rest of them like we
should. From Ben McKeegan.
17) Fix regression of ->ip_summed initialization in sfc driver.
From Ben Hutchings.
18) Bluetooth timeout mistakenly using msecs instead of jiffies,
from Andrzej Kaczmarek.
19) Using _sync variant of work cancellation results in deadlocks,
use the non _sync variants instead. From Andre Guedes.
20) Bluetooth rfcomm code had reference counting problems leading
to crashes, fix from Octavian Purdila.
21) The conversion of netem over to classful qdisc handling added
two bugs to netem_dequeue(), fixes from Eric Dumazet.
22) Missing pci_iounmap() in ATM Solos driver. Fix from Julia Lawall.
23) b44_pci_exit() should not have __exit tag since it's invoked from
non-__exit code. From Nikola Pajkovsky.
24) The conversion of the neighbour hash tables over to RCU added a
race, fixed here by adding the necessary reread of tbl->nht, fix
from Michel Machado.
25) When we added VF (virtual function) attributes for network device
dumps, this potentially bloats up the size of the dump of one
network device such that the dump size is too large for the buffer
allocated by properly written netlink applications.
In particular, if you add 255 VFs to a network device, parts of
GLIBC stop working.
To fix this, we add an attribute that is used to turn on these
extended portions of the network device dump. Sophisticaed
applications like 'ip' that want to see this stuff will be changed
to set the attribute, whereas things like GLIBC that don't care
about VFs simply will not, and therefore won't be busted by the
mere presence of VFs on a network device.
Thanks to the tireless work of Greg Rose on this fix.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
sfc: Fix assignment of ip_summed for pre-allocated skbs
ppp: fix 'ppp_mp_reconstruct bad seq' errors
enic: Fix endianness bug.
gre: fix spelling in comments
netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
davinci_emac: Do not free all rx dma descriptors during init
mlx4_core: Fixing array indexes when setting port types
phy: IC+101G and PHY_HAS_INTERRUPT flag
netdev/phy/icplus: Correct broken phy_init code
ipsec: be careful of non existing mac headers
Move Logitech Harmony 900 from cdc_ether to zaurus
hso: memsetting wrong data in hso_get_count()
netfilter: ip6_route_output() never returns NULL.
ethernet/broadcom: ip6_route_output() never returns NULL.
ipv6: ip6_route_output() never returns NULL.
jme: Fix FIFO flush issue
atm: clip: remove clip_tbl
ipv4: ping: Fix recvmsg MSG_OOB error handling.
rtnetlink: Fix problem with buffer allocation
...
The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.
Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.
We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.
Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJPR9QMAAoJEMsfJm/On5mBWbgP+wcJFIpqh84EyFGQuhSt61su
MQYcF8fWexSBzYjK5TB0nOougY/DQfTIBrqGf00Yr0RM81W3DlfbFNfl9tlBSwee
SEMBIm8qXALjWGb0eL3gy+Y8U10I2Pe0AvAvbiw3rjuJadqCLKljIQjQgwsuY0zY
yHwmeYgLTHVQfH8pk1n2QCFviqC4Lfd4Ltf/9EdQjm3WMxbJdLfHFnb99JvUt9Cu
b/NlXjeECZFLmX4P7JltdcfixbDUls9bPioFoldR8g5k1t/nCpnuUqPyj75zjQFk
D3WwzdA+YzH2nvoj0c+RjFBXYoHG7M/DFkKAy3ZF2cuRvxWP8QDK+WBbbOfA3HBF
YeefJp7hThRZVTkLCW+rG93PcUAIbOyX2tEk7DuaVlM9MyMjE/2SMqVw+AO6PBDS
4qXgfBrFmpVB6rapChD2M8RJsjbV5rw6eI3jM3+e2MJWDrCKgi13KHaCsUWTRFXI
l429vy809sdAx+8dTJK4KOIytFXDrrsw6tMPs9JQvcbLr2Bpn/4/jyUSyrZ0a7J4
ovfQtFk7y9NPHgjQmOswZm2yJPLwEJbuMUcwK3iIpelX8Y51ltMuVaQymKTeJAwi
+elRGK24q8C8wmRKT3mul4qSTmdvC+8nsAMD4/ozKsgDyO4oc9SYhf46mjCXzyn2
km0owCCnyoFWzS+w9lUJ
=YJz+
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Couple of minor driver fixes.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (max34440) Fix resetting temperature history
hwmon: (f75375s) Fix register write order when setting fans to full speed
hwmon: (ads1015) Fix file leak in probe function
hwmon: (max6639) Fix PPR register initialization to set both channels
hwmon: (max6639) Fix FAN_FROM_REG calculation
three kbuild fixes for 3.3:
- make deb-pkg symlink race fix.
- make coccicheck fix.
- Dropping the check for modutils. This is not a regression, but
allows the module-init-tools replacement kmod work with the 3.3
kernel.
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
coccicheck: change handling of C={1,2} when M= is set
builddeb: Don't create files in /tmp with predictable names
kbuild: do not check for ancient modutils tools
When the autofs protocol version 5 packet type was added in commit
5c0a32fc2c ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.
However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.
And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.
End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.
As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.
Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.
So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.
Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY. We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.
Commit bc8acf2c8c ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE. This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.
Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPSByoAAoJEDeqqVYsXL0M3CAIAJT740lwOyc9hyIBarZiZcZj
Ib9rPPppThKYvbo8w6Q6xITNTocohSnmPnjbfXZzN4nLrp1Xbzi6A4YLSeY3oxwE
t11LOMnXYPgCOCNZA3iJ4WadVbfs4Id6PWWZPnifWl6rZ2mhvtWmkCNzayY0Kv2t
WuX0j8ds0KgDG6xpfXKoXvHeNuEDJ5aZF/gtI1kmo1eilwPjlovCjsEWetHr/FQA
0jIKFdgf/nZ1ENZU0ztqGd/Q3er6t7G9qS7cFxUa4fWsqG+8Kl+KIk2PHDLL1QHu
tlYtaGm5kbh5d2tfzAD4HJZqJRw2LQ6U1gqofoAKS7JbqYPUJZDoERQtWZUpoj4=
=GZJg
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
SCSI fixes on 20120224:
"This is a set of assorted bug fixes for power management, mpt2sas,
ipr, the rdac device handler and quite a big chunk for qla2xxx (plus a
use after free of scsi_host in scsi_scan.c). "
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] scsi_dh_rdac: Fix for unbalanced reference count
[SCSI] scsi_pm: Fix bug in the SCSI power management handler
[SCSI] scsi_scan: Fix 'Poison overwritten' warning caused by using freed 'shost'
[SCSI] qla2xxx: Update version number to 8.03.07.13-k.
[SCSI] qla2xxx: Proper detection of firmware abort error code for ISP82xx.
[SCSI] qla2xxx: Remove resetting memory during device initialization for ISP82xx.
[SCSI] qla2xxx: Complete mailbox command timedout to avoid initialization failures during next reset cycle.
[SCSI] qla2xxx: Remove check for null fcport from host reset handler.
[SCSI] qla2xxx: Correct out of bounds read of ISP2200 mailbox registers.
[SCSI] qla2xxx: Remove errant clearing of MBX_INTERRUPT flag during CT-IOCB processing.
[SCSI] qla2xxx: Clear options-flags while issuing stop-firmware mbx command.
[SCSI] qla2xxx: Add an "is reset active" helper.
[SCSI] qla2xxx: Add check for null fcport references in qla2xxx_queuecommand.
[SCSI] qla2xxx: Propagate up abort failures.
[SCSI] isci: Fix NULL ptr dereference when no firmware is being loaded
[SCSI] ipr: fix eeh recovery for 64-bit adapters
[SCSI] mpt2sas: Fix mismatch in mpt2sas_base_hard_reset_handler() mutex lock-unlock
This patch fixes a (mostly cosmetic) bug introduced by the patch
'ppp: Use SKB queue abstraction interfaces in fragment processing'
found here: http://www.spinics.net/lists/netdev/msg153312.html
The above patch rewrote and moved the code responsible for cleaning
up discarded fragments but the new code does not catch every case
where this is necessary. This results in some discarded fragments
remaining in the queue, and triggering a 'bad seq' error on the
subsequent call to ppp_mp_reconstruct. Fragments are discarded
whenever other fragments of the same frame have been lost.
This can generate a lot of unwanted and misleading log messages.
This patch also adds additional detail to the debug logging to
make it clearer which fragments were lost and which other fragments
were discarded as a result of losses. (Run pppd with 'kdebug 1'
option to enable debug logging.)
Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts a portion of d0bc1fb4 so that coccicheck will
work properly when C=1 or C=2.
Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michal Marek <mmarek@suse.cz>
The original spelling and bad word choice makes these comments hard to read.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] hdpvr: update picture controls to support firmware versions > 0.15
[media] wl128x: fix build errors when GPIOLIB is not enabled
[media] hdpvr: fix race conditon during start of streaming
[media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes
[media] imon: don't wedge hardware after early callbacks