Commit Graph

499 Commits

Author SHA1 Message Date
Linus Torvalds
e919a3f705 Minor tracing updates:
- Make buffer_percent read/write. The buffer_percent file is how users can
   state how long to block on the tracing buffer depending on how much
   is in the buffer. When it hits the "buffer_percent" it will wake the
   task waiting on the buffer. For some reason it was set to read-only.
   This was not noticed because testing was done as root without SELinux,
   but with SELinux it will prevent even root to write to it without having
   CAP_DAC_OVERRIDE.
 
 - The "touched_functions" was added this merge window, but one of the
   reasons for adding it was not implemented. That was to show what functions
   were not only touched, but had either a direct trampoline attached to
   it, or a kprobe or live kernel patching that can "hijack" the function
   to run a different function. The point is to know if there's functions
   in the kernel that may not be behaving as the kernel code shows. This can
   be used for debugging. TODO: Add this information to kernel oops too.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZFUcrxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgOoAP0U2R6+jvA2ehQFb0UTCH9wEu2uEELA
 g2CkdPNdn6wJjAD+O1+v5nVkqSpsArjHOhv5OGYrgh+VSXK3Z8EpQ9vUVgg=
 =nfoh
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull more tracing updates from Steven Rostedt:

 - Make buffer_percent read/write.

   The buffer_percent file is how users can state how long to block on
   the tracing buffer depending on how much is in the buffer. When it
   hits the "buffer_percent" it will wake the task waiting on the
   buffer. For some reason it was set to read-only.

   This was not noticed because testing was done as root without
   SELinux, but with SELinux it will prevent even root to write to it
   without having CAP_DAC_OVERRIDE.

 - The "touched_functions" was added this merge window, but one of the
   reasons for adding it was not implemented.

   That was to show what functions were not only touched, but had either
   a direct trampoline attached to it, or a kprobe or live kernel
   patching that can "hijack" the function to run a different function.
   The point is to know if there's functions in the kernel that may not
   be behaving as the kernel code shows. This can be used for debugging.

   TODO: Add this information to kernel oops too.

* tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached
  tracing: Fix permissions for the buffer_percent file
2023-05-05 13:11:02 -07:00
Steven Rostedt (Google)
6ce2c04fcb ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached
If a function had ever had IPMODIFY or DIRECT attached to it, where this
is how live kernel patching and BPF overrides work, mark them and display
an "M" in the enabled_functions and touched_functions files. This can be
used for debugging. If a function had been modified and later there's a bug
in the code related to that function, this can be used to know if the cause
is possibly from a live kernel patch or a BPF program that changed the
behavior of the code.

Also update the documentation on the enabled_functions and
touched_functions output, as it was missing direct callers and CALL_OPS.
And include this new modify attribute.

Link: https://lore.kernel.org/linux-trace-kernel/20230502213233.004e3ae4@gandalf.local.home

Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-05 11:09:25 -04:00
Linus Torvalds
d579c468d7 tracing updates for 6.4:
- User events are finally ready!
   After lots of collaboration between various parties, we finally locked
   down on a stable interface for user events that can also work with user
   space only tracing. This is implemented by telling the kernel (or user
   space library, but that part is user space only and not part of this
   patch set), where the variable is that the application uses to know if
   something is listening to the trace. There's also an interface to tell
   the kernel about these events, which will show up in the
   /sys/kernel/tracing/events/user_events/ directory, where it can be
    enabled. When it's enabled, the kernel will update the variable, to tell
   the application to start writing to the kernel.
   See https://lwn.net/Articles/927595/
 
 - Cleaned up the direct trampolines code to simplify arm64 addition of
   direct trampolines. Direct trampolines use the ftrace interface but
   instead of jumping to the ftrace trampoline, applications (mostly BPF)
   can register their own trampoline for performance reasons.
 
 - Some updates to the fprobe infrastructure. fprobes are more efficient than
   kprobes, as it does not need to save all the registers that kprobes on
   ftrace do. More work needs to be done before the fprobes will be exposed
   as dynamic events.
 
 - More updates to references to the obsolete path of
   /sys/kernel/debug/tracing for the new /sys/kernel/tracing path.
 
 - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line
   by line instead of all at once. There's users in production kernels that
   have a large data dump that originally used printk() directly, but the
   data dump was larger than what printk() allowed as a single print.
   Using seq_buf() to do the printing fixes that.
 
 - Add /sys/kernel/tracing/touched_functions that shows all functions that
   was every traced by ftrace or a direct trampoline. This is used for
   debugging issues where a traced function could have caused a crash by
   a bpf program or live patching.
 
 - Add a "fields" option that is similar to "raw" but outputs the fields of
   the events. It's easier to read by humans.
 
 - Some minor fixes and clean ups.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZEr36xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quZHAQCzuqnn2S8DsPd3Sy1vKIYaj0uajW5D
 Kz1oUJH4F0H7kgEA8XwXkdtfKpOXWc/ZH4LWfL7Orx2wJZJQMV9dVqEPDAE=
 =w0Z1
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - User events are finally ready!

   After lots of collaboration between various parties, we finally
   locked down on a stable interface for user events that can also work
   with user space only tracing.

   This is implemented by telling the kernel (or user space library, but
   that part is user space only and not part of this patch set), where
   the variable is that the application uses to know if something is
   listening to the trace.

   There's also an interface to tell the kernel about these events,
   which will show up in the /sys/kernel/tracing/events/user_events/
   directory, where it can be enabled.

   When it's enabled, the kernel will update the variable, to tell the
   application to start writing to the kernel.

   See https://lwn.net/Articles/927595/

 - Cleaned up the direct trampolines code to simplify arm64 addition of
   direct trampolines.

   Direct trampolines use the ftrace interface but instead of jumping to
   the ftrace trampoline, applications (mostly BPF) can register their
   own trampoline for performance reasons.

 - Some updates to the fprobe infrastructure. fprobes are more efficient
   than kprobes, as it does not need to save all the registers that
   kprobes on ftrace do. More work needs to be done before the fprobes
   will be exposed as dynamic events.

 - More updates to references to the obsolete path of
   /sys/kernel/debug/tracing for the new /sys/kernel/tracing path.

 - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer
   line by line instead of all at once.

   There are users in production kernels that have a large data dump
   that originally used printk() directly, but the data dump was larger
   than what printk() allowed as a single print.

   Using seq_buf() to do the printing fixes that.

 - Add /sys/kernel/tracing/touched_functions that shows all functions
   that was every traced by ftrace or a direct trampoline. This is used
   for debugging issues where a traced function could have caused a
   crash by a bpf program or live patching.

 - Add a "fields" option that is similar to "raw" but outputs the fields
   of the events. It's easier to read by humans.

 - Some minor fixes and clean ups.

* tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits)
  ring-buffer: Sync IRQ works before buffer destruction
  tracing: Add missing spaces in trace_print_hex_seq()
  ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
  recordmcount: Fix memory leaks in the uwrite function
  tracing/user_events: Limit max fault-in attempts
  tracing/user_events: Prevent same address and bit per process
  tracing/user_events: Ensure bit is cleared on unregister
  tracing/user_events: Ensure write index cannot be negative
  seq_buf: Add seq_buf_do_printk() helper
  tracing: Fix print_fields() for __dyn_loc/__rel_loc
  tracing/user_events: Set event filter_type from type
  ring-buffer: Clearly check null ptr returned by rb_set_head_page()
  tracing: Unbreak user events
  tracing/user_events: Use print_format_fields() for trace output
  tracing/user_events: Align structs with tabs for readability
  tracing/user_events: Limit global user_event count
  tracing/user_events: Charge event allocs to cgroups
  tracing/user_events: Update documentation for ABI
  tracing/user_events: Use write ABI in example
  tracing/user_events: Add ABI self-test
  ...
2023-04-28 15:57:53 -07:00
Lin Yu Chen
c9b951c313 docs: trace: Fix typo in ftrace.rst
There is a typo in the sentence "A kernel developer must be
conscience ...". The word conscience should be conscious.
This patch fixes it.

Signed-off-by: Lin Yu Chen <starpt.official@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230412183739.89894-1-starpt.official@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-04-20 17:53:38 -06:00
Beau Belgrave
27dc2ae7c8 tracing/user_events: Update documentation for ABI
The ABI for user_events has changed from mmap() based to remote writes.
Update the documentation to reflect these changes, add new section for
unregistering events since lifetime is now tied to tasks instead of
files.

Link: https://lkml.kernel.org/r/20230328235219.203-10-beaub@linux.microsoft.com

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29 06:52:09 -04:00
Steven Rostedt (Google)
80a76994b2 tracing: Add "fields" option to show raw trace event fields
The hex, raw and bin formats come from the old PREEMPT_RT patch set
latency tracer. That actually gave real alternatives to reading the ascii
buffer. But they have started to bit rot and they do not give a good
representation of the tracing data.

Add "fields" option that will read the trace event fields and parse the
data from how the fields are defined:

With "fields" = 0 (default)

 echo 1 > events/sched/sched_switch/enable
 cat trace
         <idle>-0       [003] d..2.   540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120
     kworker/3:1-83      [003] d..2.   540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0       [003] d..2.   540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120
            sshd-807     [003] d..2.   540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
          <idle>-0       [001] d..2.   540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
   kworker/u16:4-58      [001] d..2.   540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120
            bash-830     [001] d..2.   540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120
   kworker/u16:4-58      [001] d..2.   540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120
            bash-830     [001] d..2.   540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120

 echo 1 > options/fields
           <...>-998     [002] d..2.   538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd
          <idle>-0       [001] d..2.   538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1
            bash-830     [001] d..2.   538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
   kworker/u16:4-58      [001] d..2.   538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4
            bash-830     [001] d..2.   538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
   kworker/u16:4-58      [001] d..2.   538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4
            bash-830     [001] d..2.   538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash
          <idle>-0       [003] d..2.   538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3
            sshd-807     [003] d..2.   538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd

It traces the data safely without using the trace print formatting.

Link: https://lore.kernel.org/linux-trace-kernel/20230328145156.497651be@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29 06:52:08 -04:00
Masami Hiramatsu (Google)
8be098a9eb docs: tracing: Update fprobe documentation
Update fprobe.rst for
 - the private entry_data argument
 - the return value of the entry handler
 - the nr_rethook_node field.

Link: https://lkml.kernel.org/r/167526701579.433354.3057889264263546659.stgit@mhiramat.roam.corp.google.com

Cc: Florent Revest <revest@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28 18:52:23 -04:00
Linus Torvalds
693fed981e Char/Misc and other driver subsystem changes for 6.3-rc1
Here is the large set of driver changes for char/misc drivers and other
 smaller driver subsystems that flow through this git tree.
 
 Included in here are:
   - New IIO drivers and features and improvments in that subsystem
   - New hwtracing drivers and additions to that subsystem
   - lots of interconnect changes and new drivers as that subsystem seems
     under very active development recently.  This required also merging
     in the icc subsystem changes through this tree.
   - FPGA driver updates
   - counter subsystem and driver updates
   - MHI driver updates
   - nvmem driver updates
   - documentation updates
   - Other smaller driver updates and fixes, full details in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/inQw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yksvwCeOvU//SPwrbIpaeHAmHUv0PSVOrwAoKmt4ICh
 hQUudlztfkvUJxKIH0gh
 =Sjk4
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver subsystem updates from Greg KH:
 "Here is the large set of driver changes for char/misc drivers and
  other smaller driver subsystems that flow through this git tree.

  Included in here are:

   - New IIO drivers and features and improvments in that subsystem

   - New hwtracing drivers and additions to that subsystem

   - lots of interconnect changes and new drivers as that subsystem
     seems under very active development recently. This required also
     merging in the icc subsystem changes through this tree.

   - FPGA driver updates

   - counter subsystem and driver updates

   - MHI driver updates

   - nvmem driver updates

   - documentation updates

   - Other smaller driver updates and fixes, full details in the
     shortlog

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (223 commits)
  scripts/tags.sh: fix incompatibility with PCRE2
  firmware: coreboot: Remove GOOGLE_COREBOOT_TABLE_ACPI/OF Kconfig entries
  mei: lower the log level for non-fatal failed messages
  mei: bus: disallow driver match while dismantling device
  misc: vmw_balloon: fix memory leak with using debugfs_lookup()
  nvmem: stm32: fix OPTEE dependency
  dt-bindings: nvmem: qfprom: add IPQ8074 compatible
  nvmem: qcom-spmi-sdam: register at device init time
  nvmem: rave-sp-eeprm: fix kernel-doc bad line warning
  nvmem: stm32: detect bsec pta presence for STM32MP15x
  nvmem: stm32: add OP-TEE support for STM32MP13x
  nvmem: core: use nvmem_add_one_cell() in nvmem_add_cells_from_of()
  nvmem: core: add nvmem_add_one_cell()
  nvmem: core: drop the removal of the cells in nvmem_add_cells()
  nvmem: core: move struct nvmem_cell_info to nvmem-provider.h
  nvmem: core: add an index parameter to the cell
  of: property: add #nvmem-cell-cells property
  of: property: make #.*-cells optional for simple props
  of: base: add of_parse_phandle_with_optional_args()
  net: add helper eth_addr_add()
  ...
2023-02-24 12:47:33 -08:00
Linus Torvalds
2b79eb73e2 probes updates for 6.3:
- Skip negative return code check for snprintf in eprobe.
 
 - Add recursive call test cases for kprobe unit test
 
 - Add 'char' type to probe events to show it as the character instead of value.
 
 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols.
 
 - Fix kselftest to check filter on eprobe event correctly.
 
 - Add filter on eprobe to the README file in tracefs.
 
 - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly.
 
 - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly.
 
 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP0JdYACgkQ2/sHvwUr
 Pxt6sQf/TD9Kwqx3XG1tnLPev6yt2nuggUippHwWUFHlJtMyUaLV8aKFqByyEe+j
 tCQvrFIIJq242xg0Jac/MAf2exlWG9jsmVZPmvC1YzepOAbjXu2eBkIS7LsbeHjF
 JJypNnEceffWCpNoD6nlvR0xWXenqRbZJwdsGqo3u+fXnzTurEMY2GU2xOyv39tv
 S1uNLPANJxdMb/2iUsUE3hMbe82dqr8zPcApqWFtTBB6QPHI3B2SjuQHpQxwbTPl
 bzAl0yQkLSQXprVzT7xJ4xLnzbl1ljgJBci5aX8BFF+VD9oYkypdfYVczBH5VsP9
 E3eT9T9lRf4Q99EqxNy5uw7NqQXGQg==
 =CMPb
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull kprobes updates from Masami Hiramatsu:

 - Skip negative return code check for snprintf in eprobe

 - Add recursive call test cases for kprobe unit test

 - Add 'char' type to probe events to show it as the character instead
   of value

 - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols

 - Fix kselftest to check filter on eprobe event correctly

 - Add filter on eprobe to the README file in tracefs

 - Fix optprobes to check whether there is 'under unoptimizing' optprobe
   when optimizing another kprobe correctly

 - Fix optprobe to check whether there is 'under unoptimizing' optprobe
   when fetching the original instruction correctly

 - Fix optprobe to free 'forcibly unoptimized' optprobe correctly

* tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: no need to check for negative ret value for snprintf
  test_kprobes: Add recursed kprobe test case
  tracing/probe: add a char type to show the character value of traced arguments
  selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols
  selftests/ftrace: Fix eprobe syntax test case to check filter support
  tracing/eprobe: Fix to add filter on eprobe description in README file
  x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
  x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
  kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23 13:03:08 -08:00
Linus Torvalds
b72b5fecc1 tracing updates for 6.3:
- Add function names as a way to filter function addresses
 
 - Add sample module to test ftrace ops and dynamic trampolines
 
 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a task is
   scheduled out and recorded when it gets scheduled back in.
 
 - Add trace event helper __get_buf() to use as a temporary buffer when printing
   out trace event output.
 
 - Add kernel command line to create trace instances on boot up.
 
 - Add enabling of events to instances created at boot up.
 
 - Add trace_array_puts() to write into instances.
 
 - Allow boot instances to take a snapshot at the end of boot up.
 
 - Allow live patch modules to include trace events
 
 - Minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY/PaaBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qh5iAPoD0LKZzD33rhO5Ec4hoexE0DkqycP3
 dvmOMbCBL8GkxwEA+d2gLz/EquSFm166hc4D79Sn3geCqvkwmy8vQWVjIQc=
 =M82D
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add function names as a way to filter function addresses

 - Add sample module to test ftrace ops and dynamic trampolines

 - Allow stack traces to be passed from beginning event to end event for
   synthetic events. This will allow seeing the stack trace of when a
   task is scheduled out and recorded when it gets scheduled back in.

 - Add trace event helper __get_buf() to use as a temporary buffer when
   printing out trace event output.

 - Add kernel command line to create trace instances on boot up.

 - Add enabling of events to instances created at boot up.

 - Add trace_array_puts() to write into instances.

 - Allow boot instances to take a snapshot at the end of boot up.

 - Allow live patch modules to include trace events

 - Minor fixes and clean ups

* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  tracing: Remove unnecessary NULL assignment
  tracepoint: Allow livepatch module add trace event
  tracing: Always use canonical ftrace path
  tracing/histogram: Fix stacktrace histogram Documententation
  tracing/histogram: Fix stacktrace key
  tracing/histogram: Fix a few problems with stacktrace variable printing
  tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
  tracing/histogram: Don't use strlen to find length of stacktrace variables
  tracing: Allow boot instances to have snapshot buffers
  tracing: Add trace_array_puts() to write into instance
  tracing: Add enabling of events to boot instances
  tracing: Add creation of instances at boot command line
  tracing: Fix trace_event_raw_event_synth() if else statement
  samples: ftrace: Make some global variables static
  ftrace: sample: avoid open-coded 64-bit division
  samples: ftrace: Include the nospec-branch.h only for x86
  tracing: Acquire buffer from temparary trace sequence
  tracing/histogram: Wrap remaining shell snippets in code blocks
  tracing/osnoise: No need for schedule_hrtimeout range
  bpf/tracing: Use stage6 of tracing to not duplicate macros
  ...
2023-02-23 10:20:49 -08:00
Donglin Peng
8478cca1e3 tracing/probe: add a char type to show the character value of traced arguments
There are scenes that we want to show the character value of traced
arguments other than a decimal or hexadecimal or string value for debug
convinience. I add a new type named 'char' to do it and a new test case
file named 'kprobe_args_char.tc' to do selftest for char type.

For example:

The to be traced function is 'void demo_func(char type, char *name);', we
can add a kprobe event as follows to show argument values as we want:

echo  'p:myprobe demo_func $arg1:char +0($arg2):char[5]' > kprobe_events

we will get the following trace log:

... myprobe: (demo_func+0x0/0x29) arg1='A' arg2={'b','p','f','1',''}

Link: https://lore.kernel.org/all/20221219110613.367098-1-dolinux.peng@gmail.com/

Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Tom Zanussi
d8f0ae3ebe tracing/histogram: Fix stacktrace histogram Documententation
Fix a small problem with the histogram specification in the
Documentation, and change the example to show output using a
stacktrace field rather than the global stacktrace.

Link: https://lkml.kernel.org/r/f75f807dd4998249e513515f703a2ff7407605f4.1676063532.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-16 13:49:39 -05:00
Bagas Sanjaya
a2ff84a5d1 tracing/histogram: Wrap remaining shell snippets in code blocks
Most shell command snippets (echo/cat) and their output are already in
literal code blocks. However a few still isn't wrapped, in which the
htmldocs output is ugly.

Wrap the remaining unwrapped snippets, while also fix recent kernel test
robot warnings.

Link: https://lore.kernel.org/linux-trace-kernel/20230129031402.47420-1-bagasdotme@gmail.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/linux-doc/202301290253.LU5yIxcJ-lkp@intel.com/
Fixes: 88238513bb ("tracing/histogram: Document variable stacktrace")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-07 12:41:11 -05:00
Ross Zwisler
2abfcd293b docs: ftrace: always use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many parts of Documentation still reference this older debugfs path, so
let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230125213251.2013791-1-zwisler@google.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-31 14:02:30 -07:00
Steven Rostedt (Google)
88238513bb tracing/histogram: Document variable stacktrace
Add a little documentation (and a useful example) of how a stacktrace can
be used within a histogram variable and synthetic event.

Link: https://lkml.kernel.org/r/20230117152236.320181354@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:24 -05:00
Steven Rostedt (Google)
e6745a4da9 tracing: Add a way to filter function addresses to function names
There's been several times where an event records a function address in
its field and I needed to filter on that address for a specific function
name. It required looking up the function in kallsyms, finding its size,
and doing a compare of "field >= function_start && field < function_end".

But this would change from boot to boot and is unreliable in scripts.
Also, it is useful to have this at boot up, where the addresses will not
be known. For example, on the boot command line:

  trace_trigger="initcall_finish.traceoff if func.function == acpi_init"

To implement this, add a ".function" prefix, that will check that the
field is of size long, and the only operations allowed (so far) are "=="
and "!=".

Link: https://lkml.kernel.org/r/20221219183213.916833763@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-25 10:31:11 -05:00
Yoann Congal
5d18c23c76 Documentation: kprobetrace: Split paragraphs
Add an empty line to force the output to split paragraphs like it is
splitin the REST source.

Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230121225304.1711635-4-yoann.congal@smile.fr
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-24 15:00:34 -07:00
Yoann Congal
015b5162be Documentation: kprobetrace: Fix code block markup
This display the following code extract as a code block instead of a
normal paragraph.

Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230121225304.1711635-3-yoann.congal@smile.fr
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-24 15:00:28 -07:00
Yoann Congal
776b32b756 Documentation: kprobetrace: Fix some typos
* Uncapitalise tracepoint
* Hyphen in *-based
* Plurals
* fetch-args -> fetchargs
* 2bytes hex -> 2-byte hex
* .. -> .
* arch -> architecture

Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230121225304.1711635-2-yoann.congal@smile.fr
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-24 15:00:17 -07:00
Bagas Sanjaya
705159622c Documentation: coresight: tpdm: Add dummy comment after sysfs list
kernel test robot reported htmldocs warning:

Documentation/trace/coresight/coresight-tpdm.rst:43: WARNING: Document may not end with a transition.

Since there is no more documentation left for TPDM, fix the warning by adding
dummy comment, thus creating the required text transition.

Link: https://lore.kernel.org/linux-doc/202301210955.zYxDrLgv-lkp@intel.com/
Fixes: 758d638667 ("Documentation: trace: Add documentation for TPDM and TPDA")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230121040015.28139-3-bagasdotme@gmail.com
2023-01-23 11:59:57 +00:00
Bagas Sanjaya
ba0f3ae66c Documentation: coresight: Extend title heading syntax in TPDM and TPDA documentation
kernel test robot reported htmldocs warnings:

Documentation/trace/coresight/coresight-tpda.rst:3: WARNING: Title overline too short.
Documentation/trace/coresight/coresight-tpdm.rst:3: WARNING: Title overline too short.

Extend title heading syntax (overline and underline) to match title text to
fix these warnings.

While at it, trim unneeded period in the title text.

Link: https://lore.kernel.org/linux-doc/202301210955.zYxDrLgv-lkp@intel.com/
Fixes: 758d638667 ("Documentation: trace: Add documentation for TPDM and TPDA")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230121040015.28139-2-bagasdotme@gmail.com
2023-01-23 11:59:57 +00:00
Mao Jinlong
758d638667 Documentation: trace: Add documentation for TPDM and TPDA
Add documentation for the TPDM and TPDA under trace/coresight.

Signed-off-by: Mao Jinlong <quic_jinlmao@quicinc.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230117145708.16739-9-quic_jinlmao@quicinc.com
2023-01-20 11:40:00 +00:00
Qi Liu
2d4103ae31 Documentation: Add document for UltraSoc SMB driver
Bring in documentation for UltraSoc SMB driver.
It simply describes the device, sysfs interface and the
firmware bindings.

Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230114101302.62320-3-hejunhao3@huawei.com
2023-01-16 10:16:15 +00:00
Donglin Peng
71240f94f1 docs: ftrace: fix a issue with duplicated subtitle number
The subtitle "5.3 Clearing filters" and "5.3 Subsystem filters" has
the same index number, let's fix it.

Fixes: 95b696088c ("tracing/filters: add filter Documentation")
Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221209025119.1371570-1-dolinux.peng@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-02 16:29:48 -07:00
Linus Torvalds
af9b3fa15d Trace probes updates for 6.2:
- New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address
 
 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5yAHxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoWoAP9ZLmqgIqlH3Zcms31SR250kLXxsxT3
 JHe82hiuI1I3fAD/Z93QLHw9wngLqIMx/wXsdFjTNOGGWdxfclSWI2qI6Q0=
 =KaJg
 -----END PGP SIGNATURE-----

Merge tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace probes updates from Steven Rostedt:

 - New "symstr" type for dynamic events that writes the name of the
   function+offset into the ring buffer and not just the address

 - Prevent kernel symbol processing on addresses in user space probes
   (uprobes).

 - And minor fixes and clean ups

* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Reject symbol/symstr type for uprobe
  tracing/probes: Add symstr type for dynamic events
  kprobes: kretprobe events missing on 2-core KVM guest
  kprobes: Fix check for probe enabled in kill_kprobe()
  test_kprobes: Fix implicit declaration error of test_kprobes
  tracing: Fix race where eprobes can be called before the event
2022-12-21 18:57:24 -08:00
Linus Torvalds
fe36bb8736 Tracing updates for 6.2:
- Add options to the osnoise tracer
   o panic_on_stop option that panics the kernel if osnoise is greater than some
     user defined threshold.
   o preempt option, to test noise while preemption is disabled
   o irq option, to test noise when interrupts are disabled
 
 - Add .percent and .graph suffix to histograms to give different outputs
 
 - Add nohitcount to disable showing hitcount in histogram output
 
 - Add new __cpumask() to trace event fields to annotate that a unsigned long
   array is a cpumask to user space and should be treated as one.
 
 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing and take
   snapshots.
 
 - Fix x86/kmmio mmio tracer to work with the updates to lockdep
 
 - Unify the panic and die notifiers
 
 - Add back ftrace_expect reference that is used to extract more information in
   the ftrace_bug() code.
 
 - Have trigger filter parsing errors show up in the tracing error log.
 
 - Updated MAINTAINERS file to add kernel tracing  mailing list and patchwork
   info
 
 - Use IDA to keep track of event type numbers.
 
 - And minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5vIcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlO7AQCmtZbriadAR6N7Llj092YXmYfzrxyi
 1WS35vhpZsBJ8gEA8j68l+LrgNt51N2gXlTXEHgXzdBgL/TKAPSX4D99GQY=
 =z1pe
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add options to the osnoise tracer:
      - 'panic_on_stop' option that panics the kernel if osnoise is
        greater than some user defined threshold.
      - 'preempt' option, to test noise while preemption is disabled
      - 'irq' option, to test noise when interrupts are disabled

 - Add .percent and .graph suffix to histograms to give different
   outputs

 - Add nohitcount to disable showing hitcount in histogram output

 - Add new __cpumask() to trace event fields to annotate that a unsigned
   long array is a cpumask to user space and should be treated as one.

 - Add trace_trigger kernel command line parameter to enable trace event
   triggers at boot up. Useful to trace stack traces, disable tracing
   and take snapshots.

 - Fix x86/kmmio mmio tracer to work with the updates to lockdep

 - Unify the panic and die notifiers

 - Add back ftrace_expect reference that is used to extract more
   information in the ftrace_bug() code.

 - Have trigger filter parsing errors show up in the tracing error log.

 - Updated MAINTAINERS file to add kernel tracing mailing list and
   patchwork info

 - Use IDA to keep track of event type numbers.

 - And minor fixes and clean ups

* tag 'trace-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  tracing: Fix cpumask() example typo
  tracing: Improve panic/die notifiers
  ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernels
  tracing: Do not synchronize freeing of trigger filter on boot up
  tracing: Remove pointer (asterisk) and brackets from cpumask_t field
  tracing: Have trigger filter parsing errors show up in error_log
  x86/mm/kmmio: Remove redundant preempt_disable()
  tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
  Documentation/osnoise: Add osnoise/options documentation
  tracing/osnoise: Add preempt and/or irq disabled options
  tracing/osnoise: Add PANIC_ON_STOP option
  Documentation/osnoise: Escape underscore of NO_ prefix
  tracing: Fix some checker warnings
  tracing/osnoise: Make osnoise_options static
  tracing: remove unnecessary trace_trigger ifdef
  ring-buffer: Handle resize in early boot up
  tracing/hist: Fix issue of losting command info in error_log
  tracing: Fix issue of missing one synthetic field
  tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
  tracing/hist: Fix wrong return value in parse_action_params()
  ...
2022-12-15 18:01:16 -08:00
Masami Hiramatsu (Google)
b26a124cbf tracing/probes: Add symstr type for dynamic events
Add 'symstr' type for storing the kernel symbol as a string data
instead of the symbol address. This allows us to filter the
events by wildcard symbol name.

e.g.
  # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' >> dynamic_events
  # cat events/eprobes/wqfunc/format
  name: wqfunc
  ID: 2110
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:__data_loc char[] symname;	offset:8;	size:4;	signed:1;

  print fmt: " symname=\"%s\"", __get_str(symname)

Note that there is already 'symbol' type which just change the
print format (so it still stores the symbol address in the tracing
ring buffer.) On the other hand, 'symstr' type stores the actual
"symbol+offset/size" data as a string.

Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-12-15 08:48:57 +09:00
wuqiang
3b7ddab8a1 kprobes: kretprobe events missing on 2-core KVM guest
Default value of maxactive is set as num_possible_cpus() for nonpreemptable
systems. For a 2-core system, only 2 kretprobe instances would be allocated
in default, then these 2 instances for execve kretprobe are very likely to
be used up with a pipelined command.

Here's the testcase: a shell script was added to crontab, and the content
of the script is:

  #!/bin/sh
  do_something_magic `tr -dc a-z < /dev/urandom | head -c 10`

cron will trigger a series of program executions (4 times every hour). Then
events loss would be noticed normally after 3-4 hours of testings.

The issue is caused by a burst of series of execve requests. The best number
of kretprobe instances could be different case by case, and should be user's
duty to determine, but num_possible_cpus() as the default value is inadequate
especially for systems with small number of cpus.

This patch enables the logic for preemption as default, thus increases the
minimum of maxactive to 10 for nonpreemptable systems.

Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang <wuqiang.matt@bytedance.com>
Reviewed-by: Solar Designer <solar@openwall.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-12-15 08:48:40 +09:00
Linus Torvalds
cf619f8919 fs.ovl.setgid.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bt7AAKCRCRxhvAZXjc
 ovAOAP9qcrUqs2MoyBDe6qUXThYY9w2rgX/ZI4ZZmbtsXEDGtQEA/LPddq8lD8o9
 m17zpvMGbXXRwz4/zVGuyWsHgg0HsQ0=
 =ioRq
 -----END PGP SIGNATURE-----

Merge tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull setgid inheritance updates from Christian Brauner:
 "This contains the work to make setgid inheritance consistent between
  modifying a file and when changing ownership or mode as this has been
  a repeated source of very subtle bugs. The gist is that we perform the
  same permission checks in the write path as we do in the ownership and
  mode changing paths after this series where we're currently doing
  different things.

  We've already made setgid inheritance a lot more consistent and
  reliable in the last releases by moving setgid stripping from the
  individual filesystems up into the vfs. This aims to make the logic
  even more consistent and easier to understand and also to fix
  long-standing overlayfs setgid inheritance bugs. Miklos was nice
  enough to just let me carry the trivial overlayfs patches from Amir
  too.

  Below is a more detailed explanation how the current difference in
  setgid handling lead to very subtle bugs exemplified via overlayfs
  which is a victim of the current rules. I hope this explains why I
  think taking the regression risk here is worth it.

  A long while ago I found a few setgid inheritance bugs in overlayfs in
  the write path in certain conditions. Amir recently picked this back
  up in [1] and I jumped on board to fix this more generally.

  On the surface all that overlayfs would need to fix setgid inheritance
  would be to call file_remove_privs() or file_modified() but actually
  that isn't enough because the setgid inheritance api is wildly
  inconsistent in that area.

  Before this pr setgid stripping in file_remove_privs()'s old
  should_remove_suid() helper was inconsistent with other parts of the
  vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
  S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
  and the caller isn't privileged over the inode although we require
  this already in setattr_prepare() and setattr_copy() and so all
  filesystem implement this requirement implicitly because they have to
  use setattr_{prepare,copy}() anyway.

  But the inconsistency shows up in setgid stripping bugs for overlayfs
  in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
  generic/687). For example, we test whether suid and setgid stripping
  works correctly when performing various write-like operations as an
  unprivileged user (fallocate, reflink, write, etc.):

      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k

  The test basically creates a file with 6666 permissions. While the
  file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
  set.

  On a regular filesystem like xfs what will happen is:

      sys_fallocate()
      -> vfs_fallocate()
         -> xfs_file_fallocate()
            -> file_modified()
               -> __file_remove_privs()
                  -> dentry_needs_remove_privs()
                     -> should_remove_suid()
                  -> __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -> notify_change()
                        -> setattr_copy()

  In should_remove_suid() we can see that ATTR_KILL_SUID is raised
  unconditionally because the file in the test has S_ISUID set.

  But we also see that ATTR_KILL_SGID won't be set because while the
  file is S_ISGID it is not S_IXGRP (see above) which is a condition for
  ATTR_KILL_SGID being raised.

  So by the time we call notify_change() we have attr->ia_valid set to
  ATTR_KILL_SUID | ATTR_FORCE.

  Now notify_change() sees that ATTR_KILL_SUID is set and does:

      ia_valid      = attr->ia_valid |= ATTR_MODE
      attr->ia_mode = (inode->i_mode & ~S_ISUID);

  which means that when we call setattr_copy() later we will definitely
  update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

  Now we call into the filesystem's ->setattr() inode operation which
  will end up calling setattr_copy(). Since ATTR_MODE is set we will
  hit:

      if (ia_valid & ATTR_MODE) {
              umode_t mode = attr->ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &&
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &= ~S_ISGID;
              inode->i_mode = mode;
      }

  and since the caller in the test is neither capable nor in the group
  of the inode the S_ISGID bit is stripped.

  But assume the file isn't suid then ATTR_KILL_SUID won't be raised
  which has the consequence that neither the setgid nor the suid bits
  are stripped even though it should be stripped because the inode isn't
  in the caller's groups and the caller isn't privileged over the inode.

  If overlayfs is in the mix things become a bit more complicated and
  the bug shows up more clearly.

  When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
  file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
  raised but because the check in notify_change() is questioning the
  ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
  the S_ISGID bit isn't removed even though it should be stripped:

      sys_fallocate()
      -> vfs_fallocate()
         -> ovl_fallocate()
            -> file_remove_privs()
               -> dentry_needs_remove_privs()
                  -> should_remove_suid()
               -> __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -> notify_change()
                     -> ovl_setattr()
                        /* TAKE ON MOUNTER'S CREDS */
                        -> ovl_do_notify_change()
                           -> notify_change()
                        /* GIVE UP MOUNTER'S CREDS */
           /* TAKE ON MOUNTER'S CREDS */
           -> vfs_fallocate()
              -> xfs_file_fallocate()
                 -> file_modified()
                    -> __file_remove_privs()
                       -> dentry_needs_remove_privs()
                          -> should_remove_suid()
                       -> __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -> notify_change()

  The fix for all of this is to make file_remove_privs()'s
  should_remove_suid() helper perform the same checks as we already
  require in setattr_prepare() and setattr_copy() and have
  notify_change() not pointlessly requiring S_IXGRP again. It doesn't
  make any sense in the first place because the caller must calculate
  the flags via should_remove_suid() anyway which would raise
  ATTR_KILL_SGID

  Note that some xfstests will now fail as these patches will cause the
  setgid bit to be lost in certain conditions for unprivileged users
  modifying a setgid file when they would've been kept otherwise. I
  think this risk is worth taking and I explained and mentioned this
  multiple times on the list [2].

  Enforcing the rules consistently across write operations and
  chmod/chown will lead to losing the setgid bit in cases were it
  might've been retained before.

  While I've mentioned this a few times but it's worth repeating just to
  make sure that this is understood. For the sake of maintainability,
  consistency, and security this is a risk worth taking.

  If we really see regressions for workloads the fix is to have special
  setgid handling in the write path again with different semantics from
  chmod/chown and possibly additional duct tape for overlayfs. I'll
  update the relevant xfstests with if you should decide to merge this
  second setgid cleanup.

  Before that people should be aware that there might be failures for
  fstests where unprivileged users modify a setgid file"

Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]

* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: use consistent setgid checks in is_sxid()
  ovl: remove privs in ovl_fallocate()
  ovl: remove privs in ovl_copyfile()
  attr: use consistent sgid stripping checks
  attr: add setattr_should_drop_sgid()
  fs: move should_remove_suid()
  attr: add in_group_or_capable()
2022-12-12 19:03:10 -08:00
Daniel Bristot de Oliveira
d358dfe60b Documentation/osnoise: Add osnoise/options documentation
Add the documentation about the osnoise/options file, the options,
and some additional explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/fde5567a4bae364f67fd1e9a644d1d62862618a6.1670623111.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:05 -05:00
Bagas Sanjaya
0e162c6f1c Documentation/osnoise: Escape underscore of NO_ prefix
kernel test robot reported unknown target name warning:

Documentation/trace/osnoise-tracer.rst:112: WARNING: Unknown target name: "no".

The warning causes NO_ prefix to be rendered as link text instead, which
points to non-existent link target.

Escape the prefix underscore to fix the warning.

Link: https://lkml.kernel.org/r/20221125034300.24168-1-bagasdotme@gmail.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org>
Link: https://lore.kernel.org/linux-doc/202211240447.HxRNftE5-lkp@intel.com/
Fixes: 67543cd6b8 ("Documentation/osnoise: Add osnoise/options documentation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10 13:36:05 -05:00
Masami Hiramatsu (Google)
8c2b997901 tracing: docs: Update histogram doc for .percent/.graph and 'nohitcount'
Update histogram document for .percent/.graph suffixes and 'nohitcount'
option.

Link: https://lore.kernel.org/linux-trace-kernel/166610815604.56030.4124933216911828519.stgit@devnote2

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
2022-12-09 23:48:05 -05:00
Daniel Bristot de Oliveira
67543cd6b8 Documentation/osnoise: Add osnoise/options documentation
Add the documentation about the osnoise/options file, along
with an explanation about the OSNOISE_WORKLOAD option.

Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23 19:08:31 -05:00
Zheng Yejian
a635beeacc tracing/histogram: Update document for KEYS_MAX size
After commit 4f36c2d85c ("tracing: Increase tracing map KEYS_MAX size"),
'keys' supports up to three fields.

Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-18 15:41:56 -06:00
Christian Brauner
ed5a7047d2
attr: use consistent sgid stripping checks
Currently setgid stripping in file_remove_privs()'s should_remove_suid()
helper is inconsistent with other parts of the vfs. Specifically, it only
raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
inode isn't in the caller's groups and the caller isn't privileged over the
inode although we require this already in setattr_prepare() and
setattr_copy() and so all filesystem implement this requirement implicitly
because they have to use setattr_{prepare,copy}() anyway.

But the inconsistency shows up in setgid stripping bugs for overlayfs in
xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping works
correctly when performing various write-like operations as an unprivileged
user (fallocate, reflink, write, etc.):

echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k

The test basically creates a file with 6666 permissions. While the file has
the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
regular filesystem like xfs what will happen is:

sys_fallocate()
-> vfs_fallocate()
   -> xfs_file_fallocate()
      -> file_modified()
         -> __file_remove_privs()
            -> dentry_needs_remove_privs()
               -> should_remove_suid()
            -> __remove_privs()
               newattrs.ia_valid = ATTR_FORCE | kill;
               -> notify_change()
                  -> setattr_copy()

In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.

But we also see that ATTR_KILL_SGID won't be set because while the file
is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.

So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
ATTR_KILL_SUID is set and does:

ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);

which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.

Now we call into the filesystem's ->setattr() inode operation which will
end up calling setattr_copy(). Since ATTR_MODE is set we will hit:

if (ia_valid & ATTR_MODE) {
        umode_t mode = attr->ia_mode;
        vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
        if (!vfsgid_in_group_p(vfsgid) &&
            !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                mode &= ~S_ISGID;
        inode->i_mode = mode;
}

and since the caller in the test is neither capable nor in the group of the
inode the S_ISGID bit is stripped.

But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
has the consequence that neither the setgid nor the suid bits are stripped
even though it should be stripped because the inode isn't in the caller's
groups and the caller isn't privileged over the inode.

If overlayfs is in the mix things become a bit more complicated and the bug
shows up more clearly. When e.g., ovl_setattr() is hit from
ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
ATTR_KILL_SGID might be raised but because the check in notify_change() is
questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
stripped the S_ISGID bit isn't removed even though it should be stripped:

sys_fallocate()
-> vfs_fallocate()
   -> ovl_fallocate()
      -> file_remove_privs()
         -> dentry_needs_remove_privs()
            -> should_remove_suid()
         -> __remove_privs()
            newattrs.ia_valid = ATTR_FORCE | kill;
            -> notify_change()
               -> ovl_setattr()
                  // TAKE ON MOUNTER'S CREDS
                  -> ovl_do_notify_change()
                     -> notify_change()
                  // GIVE UP MOUNTER'S CREDS
     // TAKE ON MOUNTER'S CREDS
     -> vfs_fallocate()
        -> xfs_file_fallocate()
           -> file_modified()
              -> __file_remove_privs()
                 -> dentry_needs_remove_privs()
                    -> should_remove_suid()
                 -> __remove_privs()
                    newattrs.ia_valid = attr_force | kill;
                    -> notify_change()

The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper to perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have notify_change()
not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
first place because the caller must calculate the flags via
should_remove_suid() anyway which would raise ATTR_KILL_SGID.

While we're at it we move should_remove_suid() from inode.c to attr.c
where it belongs with the rest of the iattr helpers. Especially since it
returns ATTR_KILL_S{G,U}ID flags. We also rename it to
setattr_should_drop_suidgid() to better reflect that it indicates both
setuid and setgid bit removal and also that it returns attr flags.

Running xfstests with this doesn't report any regressions. We should really
try and use consistent checks.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-18 10:09:47 +02:00
Linus Torvalds
f2b220ef93 A handful of relatively simple documentation fixes, plus a set of patches
catching the Chinese translation up with the front-page rework.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmNIPyAPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y6UMH/2ZLziHH0jQkoBAIhxyUzU3ZfXLlq5Xqo6vS
 oBYfJlYLClY/dmfTc3HAI6UhhJLGTcTNDaC1H4YQdgeP6RVJruFThOYF9WgW0FFl
 SgsBKhTbi3dpfdzjID8bJM7ytkbIvV4voNh52J9L1TA3z/CPxKSiXCScFAH/o12t
 E+CMjtmgi2P8w3kqgX59FMavp3W8M8HsT6u/wVoKb+zXjjqXGFYEXTjjKUxufRf6
 QWkaQGb0PHq9+2hAhgF4vdy4tWB9lr7r2ENZ8YKUkYUYfv5KGAqt39J7A4rC+g7w
 4Rvzznd0BJv3nuZ4rdxom7cOJ77i3lmWSJ65FoDHNeQ/8VBNuZc=
 =UpLC
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.1-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "A handful of relatively simple documentation fixes, plus a set of
  patches catching the Chinese translation up with the front-page
  rework"

* tag 'docs-6.1-2' of git://git.lwn.net/linux:
  Documentation: rtla: Correct command line example
  docs/zh_CN: add a man-pages link to zh_CN/index.rst
  docs/zh_CN: Rewrite the Chinese translation front page
  docs/zh_CN: add zh_CN/arch.rst
  docs/zh_CN: promote the title of zh_CN/process/index.rst
  docs/zh_CN: Update the translation of page_owner to 6.0-rc7
  docs/zh_CN: Update the translation of ksm to 6.0-rc7
  docs/howto: Replace abundoned URL of gmane.org
  Documentation: ubifs: Fix compression idiom
  Documentation/mm/page_owner.rst: delete frequently changing experimental data
  docs/zh_CN: Fix build warning
  docs: ftrace: Correct access mode
2022-10-13 10:58:32 -07:00
Linus Torvalds
d465bff130 perf tools changes for v6.1: 1st batch
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement
   patches went via tip.
 
   Example:
 
   $ sudo perf mem record -- -c 10000
   ^C[ perf record: Woken up 227 times to write data ]
   [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
 
   $ sudo perf mem report -F mem,sample,snoop
   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
   Memory access                  Samples  Snoop
   N/A                             700620  N/A
   L1 hit                          126675  N/A
   L2 hit                             424  N/A
   L3 hit                             664  HitM
   L3 hit                              10  N/A
   Local RAM hit                        2  N/A
   Remote RAM (1 hop) hit            8558  N/A
   Remote Cache (1 hop) hit             3  N/A
   Remote Cache (1 hop) hit             2  HitM
   Remote Cache (2 hops) hit           10  HitM
   Remote Cache (2 hops) hit            6  N/A
   Uncached hit                         4  N/A
   $
 
 - "perf lock" improvements:
 
   - Add -E/--entries option to limit the number of entries to display, say to ask for
     just the top 5 contended locks.
 
   - Add -q/--quiet option to suppress header and debug messages.
 
   - Add a 'perf test' kernel lock contention entry to test 'perf lock'.
 
 - "perf lock contention" improvements:
 
   - Ask BPF's bpf_get_stackid() to skip some callchain entries.
 
     The ones closer to the tooling are bpf related and not that interesting, the
     ones calling the locking function are the ones we're interested in, example
     of a full, unskipped callstack:
 
   - Allow changing the callstack depth and number of entries to skip.
 
            1     10.74 us     10.74 us     10.74 us     spinlock   __bpf_trace_contention_begin+0xb
                           0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                           0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                           0xffffffffbb8b8e75  bpf_trace_run2+0x35
                           0xffffffffbb7eab9b  __bpf_trace_contention_begin+0xb
                           0xffffffffbb7ebe75  queued_spin_lock_slowpath+0x1f5
                           0xffffffffbc1c26ff  _raw_spin_lock+0x1f
                           0xffffffffbb841015  tick_do_update_jiffies64+0x25
                           0xffffffffbb8409ee  tick_irq_enter+0x9e
 
   - Show full callstack in verbose mode (-v option), sometimes this is desirable
     instead of showing just one callstack entry.
 
 - Allow multiple time ranges in 'perf record --delay' to help in reducing the
   amount of data collected from hardware tracing (Intel PT, etc) when there is
   a rough idea of periods of time where events of interest take time.
 
 - Add Intel PT to record only decoder debug messages when error happens.
 
 - Improve layout of Intel PT man page.
 
 - Add new branch types: alignment, data and inst faults and arch specific ones,
   such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64.
 
   Kernel enablement went thru the tip tree.
 
 - Fix 'perf probe' error log check in 'perf test' when no debuginfo is
   available.
 
 - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU
   not at the core number.
 
 - Fix flags parsing in 'perf trace' filters.
 
 - Introduce compact encoding of CPU range encoding on perf.data, to avoid
   having a bitmap with all the CPUs.
 
 - Improvements to the 'perf stat' metrics, including adding "core_wide", and
   computing "smt" from the CPU topology.
 
 - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows
   tooling to ask for the precise number of lost samples for a given event.
 
 - Add 'addr' sort key to see just the address of sampled instructions:
 
   $ perf record -o- true | perf report -i- -s addr
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.000 MB - ]
   # Samples: 12  of event 'cycles:u'
   # Event count (approx.): 252512
   #
   # Overhead  Address
   # ........  ..................
       42.96%  0x7f96f08443d7
       29.55%  0x7f96f0859b50
       14.76%  0x7f96f0852e02
        8.30%  0x7f96f0855028
        4.43%  0xffffffff8de01087
 
   perf annotate: Toggle full address <-> offset display
 
 - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output'
   mode ('o' hotkey) to toggle showing full virtual address or just the offset.
 
 - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads,
   at the start of a 'perf record' session, speeding up that record startup phase.
 
 - Add a command line option to specify build ids in 'perf inject'.
 
 - Update JSON event files for the Intel alderlake, broadwell, broadwellde,
   broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge,
   ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and
   tigerlake processors.
 
 - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms.
 
 - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and
   this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64
   systems.
 
 - Add a 'perf test' entry to test the resolution of java symbols, where an
   output like this is expected:
 
      8.18%  jshell    jitted-50116-29.so    [.] Interpreter
      0.75%  Thread-1  jitted-83602-1670.so  [.] jdk.internal.jimage.BasicImageReader.getString(int)
 
 - Add tests for the ARM64 CoreSight hardware tracing feature, with specially
   crafted pureloop, memcpy, thread loop and unroll tread that then gets
   traced and the output compared with expected output.
 
   Documentation explaining it is also included.
 
 - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events
   are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps,
   verify that this gets all recorded correctly.
 
 - Introduce pthread mutex wrappers to allow for building with clang's
   -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable",
   "exclusive_lock_function", "exclusive_trylock_function",
   "exclusive_locks_required", and "no_thread_safety_analysis" compiler function
   attributes.
 
 - Fix empty version number when building outside of a git repo.
 
 - Improve feature detection display when multiple versions of a feature are present, such
   as for binutils libbfd, that has a mix of possible ways to detect according to the
   Linux distribution.
 
   Previously in some cases we had:
 
   Auto-detecting system features
   <SNIP>
   ...                                  libbfd: [ on  ]
   ...                          libbfd-liberty: [ on  ]
   ...                        libbfd-liberty-z: [ on  ]
   <SNIP>
 
   Now for this case we show just the main feature:
 
   Auto-detecting system features
   <SNIP>
   ...                                  libbfd: [ on  ]
   <SNIP>
 
 - Remove some unused structs, variables, macros, function prototypes and
   includes from various places.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY0CKuAAKCRCyPKLppCJ+
 JywwAQDWLForEnEZNk92Fd3y342Lh9W/8z1V51dKK7XdY1cV6AD/Rn5L57v7k/yG
 mG5w2Fd1J/xBjlsL/BvNlimUD2tbkQA=
 =XPMg
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

 - Add support for AMD on 'perf mem' and 'perf c2c', the kernel
   enablement patches went via tip.

   Example:

      $ sudo perf mem record -- -c 10000
      ^C[ perf record: Woken up 227 times to write data ]
      [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

      $ sudo perf mem report -F mem,sample,snoop
      Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
      Memory access                  Samples  Snoop
      N/A                             700620  N/A
      L1 hit                          126675  N/A
      L2 hit                             424  N/A
      L3 hit                             664  HitM
      L3 hit                              10  N/A
      Local RAM hit                        2  N/A
      Remote RAM (1 hop) hit            8558  N/A
      Remote Cache (1 hop) hit             3  N/A
      Remote Cache (1 hop) hit             2  HitM
      Remote Cache (2 hops) hit           10  HitM
      Remote Cache (2 hops) hit            6  N/A
      Uncached hit                         4  N/A
      $

 - "perf lock" improvements:

     - Add -E/--entries option to limit the number of entries to
       display, say to ask for just the top 5 contended locks.

     - Add -q/--quiet option to suppress header and debug messages.

     - Add a 'perf test' kernel lock contention entry to test 'perf
       lock'.

 - "perf lock contention" improvements:

     - Ask BPF's bpf_get_stackid() to skip some callchain entries.

       The ones closer to the tooling are bpf related and not that
       interesting, the ones calling the locking function are the ones
       we're interested in, example of a full, unskipped callstack:

     - Allow changing the callstack depth and number of entries to skip.

           1     10.74 us     10.74 us     10.74 us     spinlock   __bpf_trace_contention_begin+0xb
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffbb8b8e75  bpf_trace_run2+0x35
                          0xffffffffbb7eab9b  __bpf_trace_contention_begin+0xb
                          0xffffffffbb7ebe75  queued_spin_lock_slowpath+0x1f5
                          0xffffffffbc1c26ff  _raw_spin_lock+0x1f
                          0xffffffffbb841015  tick_do_update_jiffies64+0x25
                          0xffffffffbb8409ee  tick_irq_enter+0x9e

     - Show full callstack in verbose mode (-v option), sometimes this
       is desirable instead of showing just one callstack entry.

 - Allow multiple time ranges in 'perf record --delay' to help in
   reducing the amount of data collected from hardware tracing (Intel
   PT, etc) when there is a rough idea of periods of time where events
   of interest take time.

 - Add Intel PT to record only decoder debug messages when error
   happens.

 - Improve layout of Intel PT man page.

 - Add new branch types: alignment, data and inst faults and arch
   specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
   debug_data on arm64.

   Kernel enablement went thru the tip tree.

 - Fix 'perf probe' error log check in 'perf test' when no debuginfo is
   available.

 - Fix 'perf stat' aggregation mode logic, it should be looking at the
   CPU not at the core number.

 - Fix flags parsing in 'perf trace' filters.

 - Introduce compact encoding of CPU range encoding on perf.data, to
   avoid having a bitmap with all the CPUs.

 - Improvements to the 'perf stat' metrics, including adding
   "core_wide", and computing "smt" from the CPU topology.

 - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
   that allows tooling to ask for the precise number of lost samples for
   a given event.

 - Add 'addr' sort key to see just the address of sampled instructions:

      $ perf record -o- true | perf report -i- -s addr
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.000 MB - ]
      # Samples: 12  of event 'cycles:u'
      # Event count (approx.): 252512
      #
      # Overhead  Address
      # ........  ..................
          42.96%  0x7f96f08443d7
          29.55%  0x7f96f0859b50
          14.76%  0x7f96f0852e02
           8.30%  0x7f96f0855028
           4.43%  0xffffffff8de01087

      perf annotate: Toggle full address <-> offset display

 - Add 'f' hotkey to the 'perf annotate' TUI interface when in
   'disassembler output' mode ('o' hotkey) to toggle showing full
   virtual address or just the offset.

 - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
   pre-existing threads, at the start of a 'perf record' session,
   speeding up that record startup phase.

 - Add a command line option to specify build ids in 'perf inject'.

 - Update JSON event files for the Intel alderlake, broadwell,
   broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
   icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
   skylake, skylakex, and tigerlake processors.

 - Update vendor JSON event files for the ARM Neoverse V1 and E1
   platforms.

 - Add a 'perf test' entry for 'perf mem' where a struct has false
   sharing and this gets detected in the 'perf mem' output, tested with
   Intel, AMD and ARM64 systems.

 - Add a 'perf test' entry to test the resolution of java symbols, where
   an output like this is expected:

       8.18%  jshell    jitted-50116-29.so    [.] Interpreter
       0.75%  Thread-1  jitted-83602-1670.so  [.] jdk.internal.jimage.BasicImageReader.getString(int)

 - Add tests for the ARM64 CoreSight hardware tracing feature, with
   specially crafted pureloop, memcpy, thread loop and unroll tread that
   then gets traced and the output compared with expected output.

   Documentation explaining it is also included.

 - Add per thread Intel PT 'perf test' entry to check that
   PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
   mixture of per thread and per CPU events and mmaps, verify that this
   gets all recorded correctly.

 - Introduce pthread mutex wrappers to allow for building with clang's
   -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
   "lockable", "exclusive_lock_function", "exclusive_trylock_function",
   "exclusive_locks_required", and "no_thread_safety_analysis" compiler
   function attributes.

 - Fix empty version number when building outside of a git repo.

 - Improve feature detection display when multiple versions of a feature
   are present, such as for binutils libbfd, that has a mix of possible
   ways to detect according to the Linux distribution.

   Previously in some cases we had:

      Auto-detecting system features
      <SNIP>
      ...                                  libbfd: [ on  ]
      ...                          libbfd-liberty: [ on  ]
      ...                        libbfd-liberty-z: [ on  ]
      <SNIP>

   Now for this case we show just the main feature:

      Auto-detecting system features
      <SNIP>
      ...                                  libbfd: [ on  ]
      <SNIP>

 - Remove some unused structs, variables, macros, function prototypes
   and includes from various places.

* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
  perf script: Add missing fields in usage hint
  perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
  perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
  tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
  perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
  perf test coresight: Add relevant documentation about ARM64 CoreSight testing
  perf test: Add git ignore for tmp and output files of ARM CoreSight tests
  perf test coresight: Add unroll thread test shell script
  perf test coresight: Add unroll thread test tool
  perf test coresight: Add thread loop test shell scripts
  perf test coresight: Add thread loop test tool
  perf test coresight: Add memcpy thread test shell script
  perf test coresight: Add memcpy thread test tool
  perf test: Add git ignore for perf data generated by the ARM CoreSight tests
  perf test: Add arm64 asm pureloop test shell script
  perf test: Add asm pureloop test tool
  ...
2022-10-11 15:02:25 -07:00
Linus Torvalds
cdf072acb5 Tracing updates for 6.1:
Major changes:
 
  - Changed location of tracing repo from personal git repo to:
    git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
 
  - Added Masami Hiramatsu as co-maintainer
 
  - Updated MAINTAINERS file to separate out FTRACE as it is
    more than just TRACING.
 
 Minor changes:
 
  - Added Mark Rutland as FTRACE reviewer
 
  - Updated user_events to make it on its way to remove the BROKEN tag.
    The changes should now be acceptable but will run it through
    a cycle and hopefully we can remove the BROKEN tag next release.
 
  - Added filtering to eprobes
 
  - Added a delta time to the benchmark trace event
 
  - Have the histogram and filter callbacks called via a switch
    statement instead of indirect functions. This speeds it up to
    avoid retpolines.
 
  - Add a way to wake up ring buffer waiters waiting for the
    ring buffer to fill up to its watermark.
 
  - New ioctl() on the trace_pipe_raw file to wake up ring buffer
    waiters.
 
  - Wake up waiters when the ring buffer is disabled.
    A reader may block when the ring buffer is disabled,
    but if it was blocked when the ring buffer is disabled
    it should then wake up.
 
 Fixes:
 
  - Allow splice to read partially read ring buffer pages
    Fixes splice never moving forward.
 
  - Fix inverted compare that made the "shortest" ring buffer
    wait queue actually the longest.
 
  - Fix a race in the ring buffer between resetting a page when
    a writer goes to another page, and the reader.
 
  - Fix ftrace accounting bug when function hooks are added at
    boot up before the weak functions are set to "disabled".
 
  - Fix bug that freed a user allocated snapshot buffer when
    enabling a tracer.
 
  - Fix possible recursive locks in osnoise tracer
 
  - Fix recursive locking direct functions
 
  - And other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYz70cxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpLKAP4+yOje7ZY/b3R4tTx0EIWiKdhqPx6t
 Nvam2+WR2PN3QQEAqiK2A+oIbh3Zjp1MyhQWuulssWKtSTXhIQkbs7ioYAc=
 =MsQw
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:
 "Major changes:

   - Changed location of tracing repo from personal git repo to:
     git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git

   - Added Masami Hiramatsu as co-maintainer

   - Updated MAINTAINERS file to separate out FTRACE as it is more than
     just TRACING.

  Minor changes:

   - Added Mark Rutland as FTRACE reviewer

   - Updated user_events to make it on its way to remove the BROKEN tag.
     The changes should now be acceptable but will run it through a
     cycle and hopefully we can remove the BROKEN tag next release.

   - Added filtering to eprobes

   - Added a delta time to the benchmark trace event

   - Have the histogram and filter callbacks called via a switch
     statement instead of indirect functions. This speeds it up to avoid
     retpolines.

   - Add a way to wake up ring buffer waiters waiting for the ring
     buffer to fill up to its watermark.

   - New ioctl() on the trace_pipe_raw file to wake up ring buffer
     waiters.

   - Wake up waiters when the ring buffer is disabled. A reader may
     block when the ring buffer is disabled, but if it was blocked when
     the ring buffer is disabled it should then wake up.

  Fixes:

   - Allow splice to read partially read ring buffer pages. This fixes
     splice never moving forward.

   - Fix inverted compare that made the "shortest" ring buffer wait
     queue actually the longest.

   - Fix a race in the ring buffer between resetting a page when a
     writer goes to another page, and the reader.

   - Fix ftrace accounting bug when function hooks are added at boot up
     before the weak functions are set to "disabled".

   - Fix bug that freed a user allocated snapshot buffer when enabling a
     tracer.

   - Fix possible recursive locks in osnoise tracer

   - Fix recursive locking direct functions

   - Other minor clean ups and fixes"

* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
  ftrace: Create separate entry in MAINTAINERS for function hooks
  tracing: Update MAINTAINERS to reflect new tracing git repo
  tracing: Do not free snapshot if tracer is on cmdline
  ftrace: Still disable enabled records marked as disabled
  tracing/user_events: Move pages/locks into groups to prepare for namespaces
  tracing: Add Masami Hiramatsu as co-maintainer
  tracing: Remove unused variable 'dups'
  MAINTAINERS: add myself as a tracing reviewer
  ring-buffer: Fix race between reset page and reading page
  tracing/user_events: Update ABI documentation to align to bits vs bytes
  tracing/user_events: Use bits vs bytes for enabled status page data
  tracing/user_events: Use refcount instead of atomic for ref tracking
  tracing/user_events: Ensure user provided strings are safely formatted
  tracing/user_events: Use WRITE instead of READ for io vector import
  tracing/user_events: Use NULL for strstr checks
  tracing: Fix spelling mistake "preapre" -> "prepare"
  tracing: Wake up waiters when tracing is disabled
  tracing: Add ioctl() to force ring buffer waiters to wake up
  tracing: Wake up ring buffer waiters on closing of the file
  ring-buffer: Add ring_buffer_wake_waiters()
  ...
2022-10-10 12:20:55 -07:00
Leo Yan
9c1ab6d54a docs: ftrace: Correct access mode
The documentation gives an example for opening trace marker with
write-only mode, but the flag WR_ONLY is not defined by glibc.

Use O_WRONLY to replace it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221008083250.3160-1-leo.yan@linaro.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-10-10 12:50:53 -06:00
Linus Torvalds
a09476668e Char/Misc and other driver changes for 6.1-rc1
Here is the large set of char/misc and other small driver subsystem
 changes for 6.1-rc1.  Loads of different things in here:
   - IIO driver updates, additions, and changes.  Probably the largest
     part of the diffstat
   - habanalabs driver update with support for new hardware and features,
     the second largest part of the diff.
   - fpga subsystem driver updates and additions
   - mhi subsystem updates
   - Coresight driver updates
   - gnss subsystem updates
   - extcon driver updates
   - icc subsystem updates
   - fsi subsystem updates
   - nvmem subsystem and driver updates
   - misc driver updates
   - speakup driver additions for new features
   - lots of tiny driver updates and cleanups
 
 All of these have been in the linux-next tree for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY0GQmA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylyVQCeNJjZ3hy+Wz8WkPSY+NkehuIhyCIAnjXMOJP8
 5G/JQ+rpcclr7VOXlS66
 =zVkU
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc and other driver updates from Greg KH:
 "Here is the large set of char/misc and other small driver subsystem
  changes for 6.1-rc1. Loads of different things in here:

   - IIO driver updates, additions, and changes. Probably the largest
     part of the diffstat

   - habanalabs driver update with support for new hardware and
     features, the second largest part of the diff.

   - fpga subsystem driver updates and additions

   - mhi subsystem updates

   - Coresight driver updates

   - gnss subsystem updates

   - extcon driver updates

   - icc subsystem updates

   - fsi subsystem updates

   - nvmem subsystem and driver updates

   - misc driver updates

   - speakup driver additions for new features

   - lots of tiny driver updates and cleanups

  All of these have been in the linux-next tree for a while with no
  reported issues"

* tag 'char-misc-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (411 commits)
  w1: Split memcpy() of struct cn_msg flexible array
  spmi: pmic-arb: increase SPMI transaction timeout delay
  spmi: pmic-arb: block access for invalid PMIC arbiter v5 SPMI writes
  spmi: pmic-arb: correct duplicate APID to PPID mapping logic
  spmi: pmic-arb: add support to dispatch interrupt based on IRQ status
  spmi: pmic-arb: check apid against limits before calling irq handler
  spmi: pmic-arb: do not ack and clear peripheral interrupts in cleanup_irq
  spmi: pmic-arb: handle spurious interrupt
  spmi: pmic-arb: add a print in cleanup_irq
  drivers: spmi: Directly use ida_alloc()/free()
  MAINTAINERS: add TI ECAP driver info
  counter: ti-ecap-capture: capture driver support for ECAP
  Documentation: ABI: sysfs-bus-counter: add frequency & num_overflows items
  dt-bindings: counter: add ti,am62-ecap-capture.yaml
  counter: Introduce the COUNTER_COMP_ARRAY component type
  counter: Consolidate Counter extension sysfs attribute creation
  counter: Introduce the Count capture component
  counter: 104-quad-8: Add Signal polarity component
  counter: Introduce the Signal polarity component
  counter: interrupt-cnt: Implement watch_validate callback
  ...
2022-10-08 08:56:37 -07:00
Carsten Haitzler
dc2e0fb00b perf test coresight: Add relevant documentation about ARM64 CoreSight testing
Add/improve documentation helping people get started with CoreSight and
perf as well as describe the testing and how it works.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20220909152803.2317006-14-carsten.haitzler@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06 14:50:55 -03:00
Tiezhu Yang
e94102e506 docs, kprobes: Fix the wrong location of Kprobes
After commit 22471e1313 ("kconfig: use a menu in arch/Kconfig to reduce
clutter"), the location of Kprobes is under "General architecture-dependent
options" rather than "General setup".

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Fixes: 22471e1313 ("kconfig: use a menu in arch/Kconfig to reduce clutter")
Link: https://lore.kernel.org/r/1663322106-12178-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-29 13:07:52 -06:00
Beau Belgrave
933678b618 tracing/user_events: Update ABI documentation to align to bits vs bytes
Update the documentation to reflect the new ABI requirements and how to
use the byte index with the mask properly to check event status.

Link: https://lkml.kernel.org/r/20220728233309.1896-7-beaub@linux.microsoft.com

Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29 10:17:37 -04:00
Akhil Raj
7f77ebbf75 Delete duplicate words from kernel docs
I have deleted duplicate words like

to, guest, trace, when, we

Signed-off-by: Akhil Raj <lf32.dev@gmail.com>
Link: https://lore.kernel.org/r/20220829065239.4531-1-lf32.dev@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-09-27 13:21:43 -06:00
Yicong Yang
a7112b747c docs: trace: Add HiSilicon PTT device driver documentation
Document the introduction and usage of HiSilicon PTT device driver as well
as the sysfs attributes description provided by the driver.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
[Fixed month and kernel version]
Link: https://lore.kernel.org/r/20220816114414.4092-5-yangyicong@huawei.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-09-08 18:41:22 -06:00
German Gomez
04d1edb0ec coresight: etm4x: docs: Add documentation for 'ts_source' sysfs interface
Sync sysfs documentation pages to include the new ts_source (timestamp
source) interface.

Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20220823160650.455823-3-james.clark@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-08-26 13:38:35 -06:00
Christophe JAILLET
b99ee26a1a coresight: docs: Fix a broken reference
Since the commit in Fixes: tag, "coresight-cpu-debug.txt" has been turned
into "arm,coresight-cpu-debug.yaml".

Update the doc accordingly to avoid a 'make htmldocs' warning

Fixes: 66d052047c ("dt-bindings: arm: Convert CoreSight CPU debug to DT schema")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/c7f864854e9e03916017712017ff59132c51c338.1659251193.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2022-08-22 10:01:25 -06:00
Linus Torvalds
965a9d75e3 Tracing updates for 5.20 / 6.0
- Runtime verification infrastructure
   This is the biggest change for this pull request. It introduces the
   runtime verification that is necessary for running Linux on safety
   critical systems. It allows for deterministic automata models to be
   inserted into the kernel that will attach to tracepoints, where the
   information on these tracepoints will move the model from state to state.
   If a state is encountered that does not belong to the model, it will then
   activate a given reactor, that could just inform the user or even panic
   the kernel (for which safety critical systems will detect and can recover
   from).
 
 - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be
   confused with "work in progress"), and Wakeup While Not Running (WWNR).
 
 - Added __vstring() helper to the TRACE_EVENT() macro to replace several
   vsnprintf() usages that were all doing it wrong.
 
 - eprobes now can have their event autogenerated when the event name is left
   off.
 
 - The rest is various cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYu0yzRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qj4HAP4tQtV55rjj4DQ5XIXmtI3/64PmyRSJ
 +y4DEXi1UvEUCQD/QAuQfWoT/7gh35ltkfeS4t3ockzy14rrkP5drZigiQA=
 =kEtM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Runtime verification infrastructure

   This is the biggest change here. It introduces the runtime
   verification that is necessary for running Linux on safety critical
   systems.

   It allows for deterministic automata models to be inserted into the
   kernel that will attach to tracepoints, where the information on
   these tracepoints will move the model from state to state.

   If a state is encountered that does not belong to the model, it will
   then activate a given reactor, that could just inform the user or
   even panic the kernel (for which safety critical systems will detect
   and can recover from).

 - Two monitor models are also added: Wakeup In Preemptive (WIP - not to
   be confused with "work in progress"), and Wakeup While Not Running
   (WWNR).

 - Added __vstring() helper to the TRACE_EVENT() macro to replace
   several vsnprintf() usages that were all doing it wrong.

 - eprobes now can have their event autogenerated when the event name is
   left off.

 - The rest is various cleanups and fixes.

* tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
  rv: Unlock on error path in rv_unregister_reactor()
  tracing: Use alignof__(struct {type b;}) instead of offsetof()
  tracing/eprobe: Show syntax error logs in error_log file
  scripts/tracing: Fix typo 'the the' in comment
  tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT
  tracing: Use free_trace_buffer() in allocate_trace_buffers()
  tracing: Use a struct alignof to determine trace event field alignment
  rv/reactor: Add the panic reactor
  rv/reactor: Add the printk reactor
  rv/monitor: Add the wwnr monitor
  rv/monitor: Add the wip monitor
  rv/monitor: Add the wip monitor skeleton created by dot2k
  Documentation/rv: Add deterministic automata instrumentation documentation
  Documentation/rv: Add deterministic automata monitor synthesis documentation
  tools/rv: Add dot2k
  Documentation/rv: Add deterministic automaton documentation
  tools/rv: Add dot2c
  Documentation/rv: Add a basic documentation
  rv/include: Add instrumentation helper functions
  rv/include: Add deterministic automata monitor definition via C macros
  ...
2022-08-05 09:41:12 -07:00
Daniel Bristot de Oliveira
ccc319dcb4 rv/monitor: Add the wwnr monitor
Per task wakeup while not running (wwnr) monitor.

This model is broken, the reason is that a task can be running in the
processor without being set as RUNNABLE. Think about a task about to
sleep:

1:      set_current_state(TASK_UNINTERRUPTIBLE);
2:      schedule();

And then imagine an IRQ happening in between the lines one and two,
waking the task up. BOOM, the wakeup will happen while the task is
running.

Q: Why do we need this model, so?
A: To test the reactors.

Link: https://lkml.kernel.org/r/473c0fc39967250fdebcff8b620311c11dccad30.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00
Daniel Bristot de Oliveira
10bde81c74 rv/monitor: Add the wip monitor
The wakeup in preemptive (wip) monitor verifies if the
wakeup events always take place with preemption disabled:

                     |
                     |
                     v
                   #==================#
                   H    preemptive    H <+
                   #==================#  |
                     |                   |
                     | preempt_disable   | preempt_enable
                     v                   |
    sched_waking   +------------------+  |
  +--------------- |                  |  |
  |                |  non_preemptive  |  |
  +--------------> |                  | -+
                   +------------------+

The wakeup event always takes place with preemption disabled because
of the scheduler synchronization. However, because the preempt_count
and its trace event are not atomic with regard to interrupts, some
inconsistencies might happen.

The documentation illustrates one of these cases.

Link: https://lkml.kernel.org/r/c98ca678df81115fddc04921b3c79720c836b18f.1659052063.git.bristot@kernel.org

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Tao Zhou <tao.zhou@linux.dev>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 14:01:30 -04:00