Commit Graph

134 Commits

Author SHA1 Message Date
Linus Torvalds
6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Linus Torvalds
eeee78cf77 Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
 
 Tracepoints have helper functions for the TP_printk() called
 __print_symbolic() and __print_flags() that lets a numeric number be
 displayed as a a human comprehensible text. What is placed in the
 TP_printk() is also shown in the tracepoint format file such that
 user space tools like perf and trace-cmd can parse the binary data
 and express the values too. Unfortunately, the way the TRACE_EVENT()
 macro works, anything placed in the TP_printk() will be shown pretty
 much exactly as is. The problem arises when enums are used. That's
 because unlike macros, enums will not be changed into their values
 by the C pre-processor. Thus, the enum string is exported to the
 format file, and this makes it useless for user space tools.
 
 The TRACE_DEFINE_ENUM() solves this by converting the enum strings
 in the TP_printk() format into their number, and that is what is
 shown to user space. For example, the tracepoint tlb_flush currently
 has this in its format file:
 
      __print_symbolic(REC->reason,
         { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
         { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
         { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
         { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
 
 After adding:
 
      TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
      TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
 
 Its format file will contain this:
 
      __print_symbolic(REC->reason,
         { 0, "flush on task switch" },
         { 1, "remote shootdown" },
         { 2, "local shootdown" },
         { 3, "local mm shootdown" })
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
 wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
 y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
 dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
 aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
 JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
 =Ixwt
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Some clean ups and small fixes, but the biggest change is the addition
  of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

  Tracepoints have helper functions for the TP_printk() called
  __print_symbolic() and __print_flags() that lets a numeric number be
  displayed as a a human comprehensible text.  What is placed in the
  TP_printk() is also shown in the tracepoint format file such that user
  space tools like perf and trace-cmd can parse the binary data and
  express the values too.  Unfortunately, the way the TRACE_EVENT()
  macro works, anything placed in the TP_printk() will be shown pretty
  much exactly as is.  The problem arises when enums are used.  That's
  because unlike macros, enums will not be changed into their values by
  the C pre-processor.  Thus, the enum string is exported to the format
  file, and this makes it useless for user space tools.

  The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
  the TP_printk() format into their number, and that is what is shown to
  user space.  For example, the tracepoint tlb_flush currently has this
  in its format file:

     __print_symbolic(REC->reason,
        { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
        { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
        { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
        { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

  After adding:

     TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
     TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

  Its format file will contain this:

     __print_symbolic(REC->reason,
        { 0, "flush on task switch" },
        { 1, "remote shootdown" },
        { 2, "local shootdown" },
        { 3, "local mm shootdown" })"

* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
  tracing: Add enum_map file to show enums that have been mapped
  writeback: Export enums used by tracepoint to user space
  v4l: Export enums used by tracepoints to user space
  SUNRPC: Export enums in tracepoints to user space
  mm: tracing: Export enums in tracepoints to user space
  irq/tracing: Export enums in tracepoints to user space
  f2fs: Export the enums in the tracepoints to userspace
  net/9p/tracing: Export enums in tracepoints to userspace
  x86/tlb/trace: Export enums in used by tlb_flush tracepoint
  tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
  tracing: Allow for modules to convert their enums to values
  tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
  tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
  tracing: Give system name a pointer
  brcmsmac: Move each system tracepoints to their own header
  iwlwifi: Move each system tracepoints to their own header
  mac80211: Move message tracepoints to their own header
  tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
  tracing: Add TRACE_SYSTEM_VAR to kvm-s390
  tracing: Add TRACE_SYSTEM_VAR to intel-sst
  ...
2015-04-14 10:49:03 -07:00
Linus Torvalds
8de29a35dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:

 - quite a few firmware fixes for RMI driver by Andrew Duggan

 - huion and uclogic drivers have been substantially overlaping in
   functionality laterly.  This redundancy is fixed by hid-huion driver
   being merged into hid-uclogic; work done by Benjamin Tissoires and
   Nikolai Kondrashov

 - i2c-hid now supports ACPI GPIO interrupts; patch from Mika Westerberg

 - Some of the quirks, that got separated into individual drivers, have
   historically had EXPERT dependency.  As HID subsystem matured (as
   well as the individual drivers), this made less and less sense.  This
   dependency is now being removed by patch from Jean Delvare

 - Logitech lg4ff driver received a couple of improvements for mode
   switching, by Michal Malý

 - multitouch driver now supports clickpads, patches by Benjamin
   Tissoires and Seth Forshee

 - hid-sensor framework received a substantial update; namely support
   for Custom and Generic pages is being added; work done by Srinivas
   Pandruvada

 - wacom driver received substantial update; it now supports
   i2c-conntected devices (Mika Westerberg), Bamboo PADs are now
   properly supported (Benjamin Tissoires), much improved battery
   reporting (Jason Gerecke) and pen proximity cleanups (Ping Cheng)

 - small assorted fixes and device ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (68 commits)
  HID: sensor: Update document for custom sensor
  HID: sensor: Custom and Generic sensor support
  HID: debug: fix error handling in hid_debug_events_read()
  Input - mt: Fix input_mt_get_slot_by_key
  HID: logitech-hidpp: fix error return code
  HID: wacom: Add support for Cintiq 13HD Touch
  HID: logitech-hidpp: add a module parameter to keep firmware gestures
  HID: usbhid: yet another mouse with ALWAYS_POLL
  HID: usbhid: more mice with ALWAYS_POLL
  HID: wacom: set stylus_in_proximity before checking touch_down
  HID: wacom: use wacom_wac_finger_count_touches to set touch_down
  HID: wacom: remove hardcoded WACOM_QUIRK_MULTI_INPUT
  HID: pidff: effect can't be NULL
  HID: add quirk for PIXART OEM mouse used by HP
  HID: add HP OEM mouse to quirk ALWAYS_POLL
  HID: wacom: ask for a in-prox report when it was missed
  HID: hid-sensor-hub: Fix sparse warning
  HID: hid-sensor-hub: fix attribute read for logical usage id
  HID: plantronics: fix Kconfig default
  HID: pidff: support more than one concurrent effect
  ...
2015-04-14 09:25:26 -07:00
Jiri Kosina
05f6d02521 Merge branches 'for-4.0/upstream-fixes', 'for-4.1/genius', 'for-4.1/huion-uclogic-merge', 'for-4.1/i2c-hid', 'for-4.1/kconfig-drop-expert-dependency', 'for-4.1/logitech', 'for-4.1/multitouch', 'for-4.1/rmi', 'for-4.1/sony', 'for-4.1/upstream' and 'for-4.1/wacom' into for-linus 2015-04-13 23:41:15 +02:00
Steven Rostedt (Red Hat)
32eb3d0d09 tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
Document the use of TRACE_DEFINE_ENUM() by adding enums to the
trace-event-sample.h and using this macro to convert them in the format
files.

Also update the comments and sho the use of __print_symbolic() and
__print_flags() as well as adding comments abount __print_array().

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08 09:39:58 -04:00
Steven Rostedt (Red Hat)
889204278c tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
Add documentation about TRACE_SYSTEM needing to be alpha-numeric or with
underscores, and that if it is not, then the use of TRACE_SYSTEM_VAR is
required to make something that is.

An example of this is shown in samples/trace_events/trace-events-sample.h

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08 09:39:55 -04:00
Alexei Starovoitov
9811e35359 samples/bpf: Add kmem_alloc()/free() tracker tool
One BPF program attaches to kmem_cache_alloc_node() and
remembers all allocated objects in the map.
Another program attaches to kmem_cache_free() and deletes
corresponding object from the map.

User space walks the map every second and prints any objects
which are older than 1 second.

Usage:

	$ sudo tracex4

Then start few long living processes. The 'tracex4' will print
something like this:

	obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32
	obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32
	obj 0xffff880465848000 is  8sec old was allocated at ip ffffffff8105dc32
	obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32

	$ addr2line -fispe vmlinux ffffffff8105dc32
	do_fork at fork.c:1665

As soon as processes exit the memory is reclaimed and 'tracex4'
prints nothing.

Similar experiment can be done with the __kmalloc()/kfree() pair.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 13:25:51 +02:00
Alexei Starovoitov
5c7fc2d27d samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool
BPF C program attaches to
blk_mq_start_request()/blk_update_request() kprobe events to
calculate IO latency.

For every completed block IO event it computes the time delta
in nsec and records in a histogram map:

	map[log10(delta)*10]++

User space reads this histogram map every 2 seconds and prints
it as a 'heatmap' using gray shades of text terminal. Black
spaces have many events and white spaces have very few events.
Left most space is the smallest latency, right most space is
the largest latency in the range.

Usage:

	$ sudo ./tracex3
	and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.

Observe IO latencies and how different activity (like 'make
kernel') affects it.

Similar experiments can be done for network transmit latencies,
syscalls, etc.

'-t' flag prints the heatmap using normal ascii characters:

$ sudo ./tracex3 -t
  heatmap of IO latency
  # - many events with this latency
    - few events
	|1us      |10us     |100us    |1ms      |10ms     |100ms    |1s |10s
				 *ooo. *O.#.                                    # 221
			      .  *#     .                                       # 125
				 ..   .o#*..                                    # 55
			    .  . .  .  .#O                                      # 37
				 .#                                             # 175
				       .#*.                                     # 37
				  #                                             # 199
		      .              . *#*.                                     # 55
				       *#..*                                    # 42
				  #                                             # 266
			      ...***Oo#*OO**o#* .                               # 629
				  #                                             # 271
				      . .#o* o.*o*                              # 221
				. . o* *#O..                                    # 50

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 13:25:51 +02:00
Alexei Starovoitov
d822a19268 samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall
this example has two probes in one C file that attach to
different kprove events and use two different maps.

1st probe is x64 specific equivalent of dropmon. It attaches to
kfree_skb, retrevies 'ip' address of kfree_skb() caller and
counts number of packet drops at that 'ip' address. User space
prints 'location - count' map every second.

2nd probe attaches to kprobe:sys_write and computes a histogram
of different write sizes

Usage:
	$ sudo tracex2
	location 0xffffffff81695995 count 1
	location 0xffffffff816d0da9 count 2

	location 0xffffffff81695995 count 2
	location 0xffffffff816d0da9 count 2

	location 0xffffffff81695995 count 3
	location 0xffffffff816d0da9 count 2

	557145+0 records in
	557145+0 records out
	285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
		   syscall write() stats
	     byte_size       : count     distribution
	       1 -> 1        : 3        |                                      |
	       2 -> 3        : 0        |                                      |
	       4 -> 7        : 0        |                                      |
	       8 -> 15       : 0        |                                      |
	      16 -> 31       : 2        |                                      |
	      32 -> 63       : 3        |                                      |
	      64 -> 127      : 1        |                                      |
	     128 -> 255      : 1        |                                      |
	     256 -> 511      : 0        |                                      |
	     512 -> 1023     : 1118968  |************************************* |

Ctrl-C at any time. Kernel will auto cleanup maps and programs

	$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
	0xffffffff816d0da9 0xffffffff81695995:
	./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
	./bld_x64/../net/unix/af_unix.c:1231

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 13:25:50 +02:00
Alexei Starovoitov
b896c4f95a samples/bpf: Add simple non-portable kprobe filter example
tracex1_kern.c - C program compiled into BPF.

It attaches to kprobe:netif_receive_skb()

When skb->dev->name == "lo", it prints sample debug message into
trace_pipe via bpf_trace_printk() helper function.

tracex1_user.c - corresponding user space component that:
  - loads BPF program via bpf() syscall
  - opens kprobes:netif_receive_skb event via perf_event_open()
    syscall
  - attaches the program to event via ioctl(event_fd,
    PERF_EVENT_IOC_SET_BPF, prog_fd);
  - prints from trace_pipe

Note, this BPF program is non-portable. It must be recompiled
with current kernel headers. kprobe is not a stable ABI and
BPF+kprobe scripts may no longer be meaningful when kernel
internals change.

No matter in what way the kernel changes, neither the kprobe,
nor the BPF program can ever crash or corrupt the kernel,
assuming the kprobes, perf and BPF subsystem has no bugs.

The verifier will detect that the program is using
bpf_trace_printk() and the kernel will print 'this is a DEBUG
kernel' warning banner, which means that bpf_trace_printk()
should be used for debugging of the BPF program only.

Usage:
$ sudo tracex1
            ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
            ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84

            ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
            ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 13:25:50 +02:00
Greg Kroah-Hartman
07afb6ace3 samples/kobject: be explicit in the module license
Rusty pointed out that the module license should be "GPL v2" to properly
match the notice at the top of the files, so make that change.

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:41:42 +01:00
Rastislav Barlik
5fd637e7a7 samples/kobject: Use kstrtoint instead of sscanf
Use kstrtoint function instead of sscanf and check for return values.

Signed-off-by: Rastislav Barlik <barlik@zoho.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:40:31 +01:00
Pavel Machek
04303f8ec1 HID: samples/hidraw: make it possible to select device
Makefile that can actually build the example, and allow selecting device to
work on.

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-15 10:11:21 -04:00
Kees Cook
3a9af0bd34 samples/seccomp: improve label helper
Fixes a potential corruption with uninitialized stack memory in the
seccomp BPF sample program.

[akpm@linux-foundation.org: coding-style fixlet]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Robert Swiecki <swiecki@google.com>
Tested-by: Robert Swiecki <swiecki@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:55 -08:00
Linus Torvalds
41cbc01f6e The updates included in this pull request for ftrace are:
o Several clean ups to the code
 
    One such clean up was to convert to 64 bit time keeping, in the
    ring buffer benchmark code.
 
  o Adding of __print_array() helper macro for TRACE_EVENT()
 
  o Updating the sample/trace_events/ to add samples of different ways to
    make trace events. Lots of features have been added since the sample
    code was made, and these features are mostly unknown. Developers
    have been making their own hacks to do things that are already available.
 
  o Performance improvements. Most notably, I found a performance bug where
    a waiter that is waiting for a full page from the ring buffer will
    see that a full page is not available, and go to sleep. The sched
    event caused by it going to sleep would cause it to wake up again.
    It would see that there was still not a full page, and go back to sleep
    again, and that would wake it up again, until finally it would see a
    full page. This change has been marked for stable.
 
    Other improvements include removing global locks from fast paths.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e
 XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w
 +qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm
 iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3
 BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9
 v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA=
 =u2Mm
 -----END PGP SIGNATURE-----

Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The updates included in this pull request for ftrace are:

   o Several clean ups to the code

     One such clean up was to convert to 64 bit time keeping, in the
     ring buffer benchmark code.

   o Adding of __print_array() helper macro for TRACE_EVENT()

   o Updating the sample/trace_events/ to add samples of different ways
     to make trace events.  Lots of features have been added since the
     sample code was made, and these features are mostly unknown.
     Developers have been making their own hacks to do things that are
     already available.

   o Performance improvements.  Most notably, I found a performance bug
     where a waiter that is waiting for a full page from the ring buffer
     will see that a full page is not available, and go to sleep.  The
     sched event caused by it going to sleep would cause it to wake up
     again.  It would see that there was still not a full page, and go
     back to sleep again, and that would wake it up again, until finally
     it would see a full page.  This change has been marked for stable.

  Other improvements include removing global locks from fast paths"

* tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Do not wake up a splice waiter when page is not full
  tracing: Fix unmapping loop in tracing_mark_write
  tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
  tracing: Add TRACE_EVENT_FN example
  tracing: Add TRACE_EVENT_CONDITION sample
  tracing: Update the TRACE_EVENT fields available in the sample code
  tracing: Separate out initializing top level dir from instances
  tracing: Make tracing_init_dentry_tr() static
  trace: Use 64-bit timekeeping
  tracing: Add array printing helper
  tracing: Remove newline from trace_printk warning banner
  tracing: Use IS_ERR() check for return value of tracing_init_dentry()
  tracing: Remove unneeded includes of debugfs.h and fs.h
  tracing: Remove taking of trace_types_lock in pipe files
  tracing: Add ref count to tracer for when they are being read by pipe
2015-02-12 08:37:41 -08:00
Linus Torvalds
1d9c5d79e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull live patching infrastructure from Jiri Kosina:
 "Let me provide a bit of history first, before describing what is in
  this pile.

  Originally, there was kSplice as a standalone project that implemented
  stop_machine()-based patching for the linux kernel.  This project got
  later acquired, and the current owner is providing live patching as a
  proprietary service, without any intentions to have their
  implementation merged.

  Then, due to rising user/customer demand, both Red Hat and SUSE
  started working on their own implementation (not knowing about each
  other), and announced first versions roughly at the same time [1] [2].

  The principle difference between the two solutions is how they are
  making sure that the patching is performed in a consistent way when it
  comes to different execution threads with respect to the semantic
  nature of the change that is being introduced.

  In a nutshell, kPatch is issuing stop_machine(), then looking at
  stacks of all existing processess, and if it decides that the system
  is in a state that can be patched safely, it proceeds insterting code
  redirection machinery to the patched functions.

  On the other hand, kGraft provides a per-thread consistency during one
  single pass of a process through the kernel and performs a lazy
  contignuous migration of threads from "unpatched" universe to the
  "patched" one at safe checkpoints.

  If interested in a more detailed discussion about the consistency
  models and its possible combinations, please see the thread that
  evolved around [3].

  It pretty quickly became obvious to the interested parties that it's
  absolutely impractical in this case to have several isolated solutions
  for one task to co-exist in the kernel.  During a dedicated Live
  Kernel Patching track at LPC in Dusseldorf, all the interested parties
  sat together and came up with a joint aproach that would work for both
  distro vendors.  Steven Rostedt took notes [4] from this meeting.

  And the foundation for that aproach is what's present in this pull
  request.

  It provides a basic infrastructure for function "live patching" (i.e.
  code redirection), including API for kernel modules containing the
  actual patches, and API/ABI for userspace to be able to operate on the
  patches (look up what patches are applied, enable/disable them, etc).

  It's relatively simple and minimalistic, as it's making use of
  existing kernel infrastructure (namely ftrace) as much as possible.
  It's also self-contained, in a sense that it doesn't hook itself in
  any other kernel subsystem (it doesn't even touch any other code).
  It's now implemented for x86 only as a reference architecture, but
  support for powerpc, s390 and arm is already in the works (adding
  arch-specific support basically boils down to teaching ftrace about
  regs-saving).

  Once this common infrastructure gets merged, both Red Hat and SUSE
  have agreed to immediately start porting their current solutions on
  top of this, abandoning their out-of-tree code.  The plan basically is
  that each patch will be marked by flag(s) that would indicate which
  consistency model it is willing to use (again, the details have been
  sketched out already in the thread at [3]).

  Before this happens, the current codebase can be used to patch a large
  group of secruity/stability problems the patches for which are not too
  complex (in a sense that they don't introduce non-trivial change of
  function's return value semantics, they don't change layout of data
  structures, etc) -- this corresponds to LEAVE_FUNCTION &&
  SWITCH_FUNCTION semantics described at [3].

  This tree has been in linux-next since December.

    [1] https://lkml.org/lkml/2014/4/30/477
    [2] https://lkml.org/lkml/2014/7/14/857
    [3] https://lkml.org/lkml/2014/11/7/354
    [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt

  [ The core code is introduced by the three commits authored by Seth
    Jennings, which got a lot of changes incorporated during numerous
    respins and reviews of the initial implementation.  All the followup
    commits have materialized only after public tree has been created,
    so they were not folded into initial three commits so that the
    public tree doesn't get rebased ]"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add missing newline to error message
  livepatch: rename config to CONFIG_LIVEPATCH
  livepatch: fix uninitialized return value
  livepatch: support for repatching a function
  livepatch: enforce patch stacking semantics
  livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING
  livepatch: fix deferred module patching order
  livepatch: handle ancient compilers with more grace
  livepatch: kconfig: use bool instead of boolean
  livepatch: samples: fix usage example comments
  livepatch: MAINTAINERS: add git tree location
  livepatch: use FTRACE_OPS_FL_IPMODIFY
  livepatch: move x86 specific ftrace handler code to arch/x86
  livepatch: samples: add sample live patching module
  livepatch: kernel: add support for live patching
  livepatch: kernel: add TAINT_LIVEPATCH
2015-02-10 18:35:40 -08:00
Steven Rostedt (Red Hat)
7496946a88 tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
Add to samples/trace_events/ the macros DECLARE_EVENT_CLASS() and
DEFINE_EVENT() and recommend using them over multiple TRACE_EVENT()
macros if the multiple events have the same format.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-09 18:05:51 -05:00
Steven Rostedt (Red Hat)
6adc13f8c0 tracing: Add TRACE_EVENT_FN example
If a function should be called before a tracepoint is enabled
and/or after it is disabled, the TRACE_EVENT_FN() serves this
purpose. But it is not well documented. Having it as a sample would
help developers to know how to use it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-09 18:05:39 -05:00
Steven Rostedt (Red Hat)
c4c7eb2938 tracing: Add TRACE_EVENT_CONDITION sample
The sample code lacks an example of TRACE_EVENT_CONDITION, and it
has been expressed to me that this feature for TRACE_EVENT is not
well known and not used when it could be.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-09 16:05:55 -05:00
Steven Rostedt (Red Hat)
4e20e3a60b tracing: Update the TRACE_EVENT fields available in the sample code
The sample code in samples/trace_events/ is extremely out of date and does
not show all the new fields that have been added since the sample code
was written. As most people are unaware of these new fields, adding sample
code and explanations of those fields should help out.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-09 15:27:04 -05:00
Josh Poimboeuf
12cf89b550 livepatch: rename config to CONFIG_LIVEPATCH
Rename CONFIG_LIVE_PATCHING to CONFIG_LIVEPATCH to make the naming of
the config and the code more consistent.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-02-04 11:25:51 +01:00
Alexei Starovoitov
ba1a68bf13 samples: bpf: relax test_maps check
hash map is unordered, so get_next_key() iterator shouldn't
rely on particular order of elements. So relax this test.

Fixes: ffb65f27a1 ("bpf: add a testsuite for eBPF maps")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:20:40 -08:00
Josh Poimboeuf
700a3048aa livepatch: samples: fix usage example comments
Fix a few typos in the livepatch-sample.c usage example comments and add
some whitespace to make the comments a little more legible.

Reported-by: Udo Seidel <udoseidel@gmx.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-24 00:10:00 +01:00
Seth Jennings
13d1cf7e70 livepatch: samples: add sample live patching module
Add a sample live patching module.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:49 +01:00
Alexei Starovoitov
fbe3310840 samples: bpf: large eBPF program in C
sockex2_kern.c is purposefully large eBPF program in C.
llvm compiles ~200 lines of C code into ~300 eBPF instructions.

It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing
can be done by eBPF.
Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep
stats of number of packets per IP.
User space loads eBPF program, attaches it to loopback interface and prints
dest_ip->#packets stats every second.

Usage:
$sudo samples/bpf/sockex2
ip 127.0.0.1 count 19
ip 127.0.0.1 count 178115
ip 127.0.0.1 count 369437
ip 127.0.0.1 count 559841
ip 127.0.0.1 count 750539

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:34 -08:00
Alexei Starovoitov
a80857822b samples: bpf: trivial eBPF program in C
this example does the same task as previous socket example
in assembler, but this one does it in C.

eBPF program in kernel does:
    /* assume that packet is IPv4, load one byte of IP->proto */
    int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
    long *value;

    value = bpf_map_lookup_elem(&my_map, &index);
    if (value)
        __sync_fetch_and_add(value, 1);

Corresponding user space reads map[tcp], map[udp], map[icmp]
and prints protocol stats every second

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov
249b812d80 samples: bpf: elf_bpf file loader
simple .o parser and loader using BPF syscall.
.o is a standard ELF generated by LLVM backend

It parses elf file compiled by llvm .c->.o
- parses 'maps' section and creates maps via BPF syscall
- parses 'license' section and passes it to syscall
- parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns
  by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
- loads eBPF programs via BPF syscall

One ELF file can contain multiple BPF programs.

int load_bpf_file(char *path);
populates prog_fd[] and map_fd[] with FDs received from bpf syscall

bpf_helpers.h - helper functions available to eBPF programs written in C

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:33 -08:00
Alexei Starovoitov
03f4723ed7 samples: bpf: example of stateful socket filtering
this socket filter example does:
- creates arraymap in kernel with key 4 bytes and value 8 bytes

- loads eBPF program which assumes that packet is IPv4 and loads one byte of
  IP->proto from the packet and uses it as a key in a map

  r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)];
  *(u32*)(fp - 4) = r0;
  value = bpf_map_lookup_elem(map_fd, fp - 4);
  if (value)
       (*(u64*)value) += 1;

- attaches this program to raw socket

- every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP]
  to see how many packets of given protocol were seen on loopback interface

Usage:
$sudo samples/bpf/sock_example
TCP 0 UDP 0 ICMP 0 packets
TCP 187600 UDP 0 ICMP 4 packets
TCP 376504 UDP 0 ICMP 8 packets
TCP 563116 UDP 0 ICMP 12 packets
TCP 753144 UDP 0 ICMP 16 packets

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Alexei Starovoitov
7943c0f329 bpf: remove test map scaffolding and user proper types
proper types and function helpers are ready. Use them in verifier testsuite.
Remove temporary stubs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:44:00 -05:00
Alexei Starovoitov
ffb65f27a1 bpf: add a testsuite for eBPF maps
. check error conditions and sanity of hash and array map APIs
. check large maps (that kernel gracefully switches to vmalloc from kmalloc)
. check multi-process parallel access and stress test

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:43:59 -05:00
Alexei Starovoitov
342ded4096 samples: bpf: add a verifier test and summary line
- add a test specifically targeting verifier state pruning.
It checks state propagation between registers, storing that
state into stack and state pruning algorithm recognizing
equivalent stack and register states.

- add summary line to spot failures easier

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 15:44:37 -04:00
Alexei Starovoitov
32bf08a625 bpf: fix bug in eBPF verifier
while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824da ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-21 21:43:46 -04:00
Linus Torvalds
35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
Alexei Starovoitov
fd10c2ef3e bpf: add tests to verifier testsuite
add 4 extra tests to cover jump verification better

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:30:33 -04:00
Alexei Starovoitov
3c731eba48 bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
		  const struct sock_filter_int *insns, int insn_len,
		  const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array

and BPF_*() macros to build instructions

2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.

3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.

$ sudo ./test_verifier
 #0 add+sub+mul OK
 #1 unreachable OK
 #2 unreachable2 OK
 #3 out of range jump OK
 #4 out of range jump2 OK
 #5 test1 ld_imm64 OK
 ...

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:05:15 -04:00
Michael Ellerman
e8ac6ea8a4 kprobes: update jprobe_example.c for do_fork() change
In commit e80d666 "flagday: kill pt_regs argument of do_fork()", the
arguments to do_fork() changed.

The example code in jprobe_example.c was not updated to match, so the
arguments inside the jprobe handler do not match reality.

Fix it by updating the arguments to match do_fork(). While we're at it
use pr_info() for brevity, and print stack_start as well for interest.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-26 11:11:12 +02:00
Zhao Hongjiang
d8fae2f644 tracing: Change trace event sample to use strlcpy instead of strncpy
Strings should be copied with strlcpy instead of strncpy when they will
later be printed via %s. This guarantees that they terminate with a
NUL '\0' character and do not run pass the end of the allocated string.

This is only for sample code, but it should stil represent a good
role model.

Link: http://lkml.kernel.org/p/51C2E204.1080501@huawei.com

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:33 -04:00
Steven Rostedt
4d4c9cc839 tracing: Add __field_struct macro for TRACE_EVENT()
Currently the __field() macro in TRACE_EVENT is only good for primitive
values, such as integers and pointers, but it fails on complex data types
such as structures or unions. This is because the __field() macro
determines if the variable is signed or not with the test of:

  (((type)(-1)) < (type)1)

Unfortunately, that fails when type is a structure.

Since trace events should support structures as fields a new macro
is created for such a case called __field_struct() which acts exactly
the same as __field() does but it does not do the signed type check
and just uses a constant false for that answer.

Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-21 00:18:42 -04:00
Rusty Russell
de5109898a samples/kobject/: avoid world-writable sysfs files.
In line with practice for module parameters, we're adding a build-time
check that sysfs files aren't world-writable.

Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-05-14 10:53:57 +09:30
Markos Chandras
e9107f88c9 samples/seccomp/Makefile: do not build tests if cross-compiling for MIPS
The Makefile is designed to use the host toolchain so it may be unsafe
to build the tests if the kernel has been configured and built for
another architecture.  This fixes a build problem when the kernel has
been configured and built for the MIPS architecture but the host is not
MIPS (cross-compiled).  The MIPS syscalls are only defined if one of the
following is true:

 1) _MIPS_SIM == _MIPS_SIM_ABI64
 2) _MIPS_SIM == _MIPS_SIM_ABI32
 3) _MIPS_SIM == _MIPS_SIM_NABI32

Of course, none of these make sense on a non-MIPS toolchain and the
following build problem occurs when building on a non-MIPS host.

  linux/usr/include/linux/kexec.h:50: userspace cannot reference function or variable defined in the kernel
  samples/seccomp/bpf-direct.c: In function `emulator':
  samples/seccomp/bpf-direct.c:76:17: error: `__NR_write' undeclared (first use in this function)

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:06 -07:00
Bjorn Helgaas
e756bc5670 kobject: fix kset sample error path
Previously, example_init() leaked a kset if any of the object creations
failed.  This fixes the leak by calling kset_unregister() in the error
path.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-03 10:13:30 -08:00
Stefani Seibold
498d319bb5 kfifo API type safety
This patch enhances the type safety for the kfifo API.  It is now safe
to put const data into a non const FIFO and the API will now generate a
compiler warning when reading from the fifo where the destination
address is pointing to a const variable.

As a side effect the kfifo_put() does now expect the value of an element
instead a pointer to the element.  This was suggested Russell King.  It
make the handling of the kfifo_put easier since there is no need to
create a helper variable for getting the address of a pointer or to pass
integers of different sizes.

IMHO the API break is okay, since there are currently only six users of
kfifo_put().

The code is also cleaner by kicking out the "if (0)" expressions.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:23 +09:00
Linus Torvalds
4de9ad9bc0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull Tile arch updates from Chris Metcalf:
 "These changes bring in a bunch of new functionality that has been
  maintained internally at Tilera over the last year, plus other stray
  bits of work that I've taken into the tile tree from other folks.

  The changes include some PCI root complex work, interrupt-driven
  console support, support for performing fast-path unaligned data
  fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
  vDSO support for gettimeofday(), a serial driver for the tilegx
  on-chip UART, KGDB support, more optimized string routines, support
  for ftrace and kprobes, improved ASLR, and many bug fixes.

  We also remove support for the old TILE64 chip, which is no longer
  buildable"

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
  tile: refresh tile defconfig files
  tile: rework <asm/cmpxchg.h>
  tile PCI RC: make default consistent DMA mask 32-bit
  tile: add null check for kzalloc in tile/kernel/setup.c
  tile: make __write_once a synonym for __read_mostly
  tile: remove support for TILE64
  tile: use asm-generic/bitops/builtin-*.h
  tile: eliminate no-op "noatomichash" boot argument
  tile: use standard tile_bundle_bits type in traps.c
  tile: simplify code referencing hypervisor API addresses
  tile: change <asm/system.h> to <asm/switch_to.h> in comments
  tile: mark pcibios_init() as __init
  tile: check for correct compiler earlier in asm-offsets.c
  tile: use standard 'generic-y' model for <asm/hw_irq.h>
  tile: use asm-generic version of <asm/local64.h>
  tile PCI RC: add comment about "PCI hole" problem
  tile: remove DEBUG_EXTRA_FLAGS kernel config option
  tile: add virt_to_kpte() API and clean up and document behavior
  tile: support FRAME_POINTER
  tile: support reporting Tilera hypervisor statistics
  ...
2013-09-06 11:14:33 -07:00
Jiri Kosina
63faf15dba Merge branches 'for-3.12/devm', 'for-3.12/i2c-hid', 'for-3.12/i2c-hid-dt', 'for-3.12/logitech', 'for-3.12/multitouch-win8', 'for-3.12/trasnport-driver-cleanup', 'for-3.12/uhid', 'for-3.12/upstream' and 'for-3.12/wiimote' into for-linus 2013-09-06 11:58:37 +02:00
David Herrmann
f5e4e7fdd5 HID: uhid: improve uhid example client
This extends the uhid example client. It properly documents the built-in
report-descriptor an adds explicit report-numbers.

Furthermore, LED output reports are added to utilize the new UHID output
reports of the kernel. Support for 3 basic LEDs is added and a small
report-parser to print debug messages if output reports were received.

To test this, simply write the EV_LED+LED_CAPSL+1 event to the evdev
device-node of the uhid-device and the kernel will forward it to your uhid
client.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-09-04 11:35:14 +02:00
Tony Lu
3fa17c395b tile: support kprobes on tilegx
This change includes support for Kprobes, Jprobes and Return Probes.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Tony Lu <zlu@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-08-30 11:55:53 -04:00
Paul Gortmaker
8cd3c556b5 HID: samples/hidraw: add .gitignore file
To fix:

 # Untracked files:
 #   (use "git add <file>..." to include in what will be committed)
 #
 #	samples/hidraw/hid-example

as seen in git status output after an allyesconfig build.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-08-20 12:48:58 +02:00
Jiri Kosina
f37130533f HID: hidraw: warn if userspace headers are outdated
Put a warning into sample hidraw code in samples/hidraw/hid-example.c
in case the userspace headers are missing the necessary defines and
need to be updated.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-27 17:29:18 +01:00
Linus Torvalds
8f55cea410 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "There are lots of improvements, the biggest changes are:

  Main kernel side changes:

   - Improve uprobes performance by adding 'pre-filtering' support, by
     Oleg Nesterov.

   - Make some POWER7 events available in sysfs, equivalent to what was
     done on x86, from Sukadev Bhattiprolu.

   - tracing updates by Steve Rostedt - mostly misc fixes and smaller
     improvements.

   - Use perf/event tracing to report PCI Express advanced errors, by
     Tony Luck.

   - Enable northbridge performance counters on AMD family 15h, by Jacob
     Shin.

   - This tracing commit:

        tracing: Remove the extra 4 bytes of padding in events

     changes the ABI.  All involved parties (PowerTop in particular)
     seem to agree that it's safe to do now with the introduction of
     libtraceevent, but the devil is in the details ...

  Main tooling side changes:

   - Add 'event group view', from Namyung Kim:

     To use it, 'perf record' should group events when recording.  And
     then perf report parses the saved group relation from file header
     and prints them together if --group option is provided.  You can
     use the 'perf evlist' command to see event group information:

        $ perf record -e '{ref-cycles,cycles}' noploop 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

        $ perf evlist --group
        {ref-cycles,cycles}

     With this example, default perf report will show you each event
     separately.

     You can use --group option to enable event group view:

        $ perf report --group
        ...
        # group: {ref-cycles,cycles}
        # ========
        # Samples: 7K of event 'anon group { ref-cycles, cycles }'
        # Event count (approx.): 6876107743
        #
        #         Overhead  Command      Shared Object                      Symbol
        # ................  .......  .................  ..........................
            99.84%  99.76%  noploop  noploop            [.] main
             0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
             0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
             0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
             0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
             0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
             0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
             0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
             0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

     As you can see the Overhead column now contains both of ref-cycles
     and cycles and header line shows group information also - 'anon
     group { ref-cycles, cycles }'.  The output is sorted by period of
     group leader first.

   - Initial GTK+ annotate browser, from Namhyung Kim.

   - Add option for runtime switching perf data file in perf report,
     just press 's' and a menu with the valid files found in the current
     directory will be presented, from Feng Tang.

   - Add support to display whole group data for raw columns, from Jiri
     Olsa.

   - Add per processor socket count aggregation in perf stat, from
     Stephane Eranian.

   - Add interval printing in 'perf stat', from Stephane Eranian.

   - 'perf test' improvements

   - Add support for wildcards in tracepoint system name, from Jiri
     Olsa.

   - Add anonymous huge page recognition, from Joshua Zhu.

   - perf build-id cache now can show DSOs present in a perf.data file
     that are not in the cache, to integrate with build-id servers being
     put in place by organizations such as Fedora.

   - perf top now shares more of the evsel config/creation routines with
     'record', paving the way for further integration like 'top'
     snapshots, etc.

   - perf top now supports DWARF callchains.

   - Fix mmap limitations on 32-bit, fix from David Miller.

   - 'perf bench numa mem' NUMA performance measurement suite

   - ... and lots of fixes, performance improvements, cleanups and other
     improvements I failed to list - see the shortlog and git log for
     details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
  perf/x86/amd: Enable northbridge performance counters on AMD family 15h
  perf/hwbp: Fix cleanup in case of kzalloc failure
  perf tools: Fix build with bison 2.3 and older.
  perf tools: Limit unwind support to x86 archs
  perf annotate: Make it to be able to skip unannotatable symbols
  perf gtk/annotate: Fail early if it can't annotate
  perf gtk/annotate: Show source lines with gray color
  perf gtk/annotate: Support multiple event annotation
  perf ui/gtk: Implement basic GTK2 annotation browser
  perf annotate: Fix warning message on a missing vmlinux
  perf buildid-cache: Add --update option
  uprobes/perf: Avoid uprobe_apply() whenever possible
  uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
  uprobes/perf: Teach trace_uprobe/perf code to pre-filter
  uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
  uprobes: Introduce uprobe_apply()
  perf: Introduce hw_perf_event->tp_target and ->tp_list
  uprobes/perf: Always increment trace_uprobe->nhit
  uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
  uprobes/tracing: Introduce is_trace_uprobe_enabled()
  ...
2013-02-19 17:49:41 -08:00
Arnd Bergmann
275aaa6833 samples/seccomp: be less stupid about cross compiling
The seccomp filters are currently built for the build host, not for the
machine that they are going to run on, but they are also built for with
the -m32 flag if the kernel is built for a 32 bit machine, both of which
seems rather odd.

It broke allyesconfig on my machine, which is x86-64, but building for
32 bit ARM, with this error message:

  In file included from /usr/include/stdio.h:28:0,
                   from samples/seccomp/bpf-fancy.c:15:
  /usr/include/features.h:324:26: fatal error: bits/predefs.h: No such file or directory

because there are no 32 bit libc headers installed on this machine.  We
should really be building all the samples for the target machine rather
than the build host, but since the infrastructure for that appears to be
missing right now, let's be a little bit smarter and not pass the '-m32'
flag to the HOSTCC when cross- compiling.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Morris <james.l.morris@oracle.com>
Acked-by: Will Drewry <wad@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-05 20:38:49 +11:00