linux/kernel/trace
wuqiang.matt 4bbd934556 kprobes: kretprobe scalability improvement
kretprobe is using freelist to manage return-instances, but freelist,
as LIFO queue based on singly linked list, scales badly and reduces
the overall throughput of kretprobed routines, especially for high
contention scenarios.

Here's a typical throughput test of sys_prctl (counts in 10 seconds,
measured with perf stat -a -I 10000 -e syscalls:sys_enter_prctl):

OS: Debian 10 X86_64, Linux 6.5rc7 with freelist
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s

         1T       2T       4T       8T      16T      24T
   24150045 29317964 15446741 12494489 18287272 17708768
        32T      48T      64T      72T      96T     128T
   16200682 13737658 11645677 11269858 10470118  9931051

This patch introduces objpool to replace freelist. objpool is a
high performance queue, which can bring near-linear scalability
to kretprobed routines. Tests of kretprobe throughput show the
biggest ratio as 159x of original freelist. Here's the result:

                  1T         2T         4T         8T        16T
native:     41186213   82336866  164250978  328662645  658810299
freelist:   24150045   29317964   15446741   12494489   18287272
objpool:    23926730   48010314   96125218  191782984  385091769
                 32T        48T        64T        96T       128T
native:   1330338351 1969957941 2512291791 2615754135 2671040914
freelist:   16200682   13737658   11645677   10470118    9931051
objpool:   764481096 1147149781 1456220214 1502109662 1579015050

Testings on 96-core ARM64 output similarly, but with the biggest
ratio up to 448x:

OS: Debian 10 AARCH64, Linux 6.5rc7
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s

                  1T         2T         4T         8T        16T
native: .   30066096   63569843  126194076  257447289  505800181
freelist:   16152090   11064397   11124068    7215768    5663013
objpool:    13997541   28032100   55726624  110099926  221498787
                 24T        32T        48T        64T        96T
native:    763305277 1015925192 1521075123 2033009392 3021013752
freelist:    5015810    4602893    3766792    3382478    2945292
objpool:   328192025  439439564  668534502  887401381 1319972072

Link: https://lore.kernel.org/all/20231017135654.82270-4-wuqiang.matt@bytedance.com/

Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-10-18 23:59:54 +09:00
..
rv tracing/tools: Updates for 6.4 2023-04-28 16:11:26 -07:00
blktrace.c block: remove more NULL checks after bdev_get_queue() 2023-02-21 09:23:22 -07:00
bpf_trace.c bpf: Fix uprobe_multi get_pid_task error path 2023-09-15 10:32:40 -07:00
bpf_trace.h
error_report-traces.c
fgraph.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
fprobe.c kprobes: kretprobe scalability improvement 2023-10-18 23:59:54 +09:00
ftrace_internal.h tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
ftrace.c ftrace: Use LIST_HEAD to initialize clear_hash 2023-09-01 21:18:38 -04:00
Kconfig Probes updates for v6.5: 2023-06-30 10:44:53 -07:00
kprobe_event_gen_test.c tracing: Fix wrong return in kprobe_event_gen_test.c 2023-03-19 12:20:48 -04:00
Makefile tracing/probes: Move finding func-proto API and getting func-param API to trace_btf 2023-08-23 09:39:45 +09:00
pid_list.c tracing: Cleanup double word in comment 2022-04-26 17:58:50 -04:00
pid_list.h
power-traces.c
preemptirq_delay_test.c
rethook.c kprobes: kretprobe scalability improvement 2023-10-18 23:59:54 +09:00
ring_buffer_benchmark.c ring_buffer: Remove unused "event" parameter 2022-11-23 19:08:30 -05:00
ring_buffer.c ring-buffer: Update "shortest_full" in polling 2023-09-30 16:17:34 -04:00
rpm-traces.c
synth_event_gen_test.c tracing: Always use canonical ftrace path 2023-02-18 14:34:09 -05:00
trace_benchmark.c tracing: Add numeric delta time to the trace event benchmark 2022-09-26 13:01:09 -04:00
trace_benchmark.h tracing: Add numeric delta time to the trace event benchmark 2022-09-26 13:01:09 -04:00
trace_boot.c tracing/boot: Test strscpy() against less than zero for error 2023-07-05 10:30:49 -04:00
trace_branch.c
trace_btf.c tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_btf.h tracing/probes: Add a function to search a member of a struct/union 2023-08-23 09:40:16 +09:00
trace_clock.c
trace_dynevent.c tracing: Free buffers when a used dynamic event is removed 2022-11-23 19:07:12 -05:00
trace_dynevent.h
trace_entries.h tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_eprobe.c tracing/eprobe: drop unneeded breaks 2023-10-10 01:03:48 +09:00
trace_event_perf.c tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_user 2022-11-23 19:08:31 -05:00
trace_events_filter_test.h
trace_events_filter.c tracing/filters: Fix coding style issues 2023-09-01 21:27:23 -04:00
trace_events_hist.c tracing/histograms: Return an error if we fail to add histogram to hist_vars list 2023-07-23 11:18:52 -04:00
trace_events_inject.c tracing: Have event inject files inc the trace array ref count 2023-09-07 16:38:54 -04:00
trace_events_synth.c tracing/synthetic: Print out u64 values properly 2023-09-11 18:23:10 -04:00
trace_events_trigger.c tracing: Fix kernel-doc warnings in trace_events_trigger.c 2023-07-28 19:59:03 -04:00
trace_events_user.c tracing/user_events: Align set_bit() address for all archs 2023-09-30 16:25:41 -04:00
trace_events.c tracing: relax trace_event_eval_update() execution with cond_resched() 2023-09-30 16:24:55 -04:00
trace_export.c tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure 2023-07-30 18:11:44 -04:00
trace_fprobe.c tracing/fprobe-event: Assume fprobe is a return event by $retval 2023-08-23 09:41:32 +09:00
trace_functions_graph.c function_graph: Support recording and printing the return value of function 2023-06-20 18:38:37 -04:00
trace_functions.c
trace_hwlat.c tracing: Remove extra space at the end of hwlat_detector/mode 2023-09-01 21:00:00 -04:00
trace_irqsoff.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_kdb.c
trace_kprobe_selftest.c tracing: arm64: Avoid missing-prototype warnings 2023-07-12 12:06:04 -04:00
trace_kprobe_selftest.h
trace_kprobe.c Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
trace_mmiotrace.c
trace_nop.c
trace_osnoise.c tracing/timerlat: Add user-space interface 2023-06-22 10:39:56 -04:00
trace_output.c Tracing updates for 6.5: 2023-06-30 10:33:17 -07:00
trace_output.h tracing: Add "fields" option to show raw trace event fields 2023-03-29 06:52:08 -04:00
trace_preemptirq.c cpuidle: tracing, preempt: Squash _rcuidle tracing 2023-01-31 15:01:46 +01:00
trace_printk.c
trace_probe_kernel.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe_tmpl.h tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails 2023-07-14 17:04:58 +09:00
trace_probe.c tracing/probes: Add string type check with BTF 2023-08-23 09:41:13 +09:00
trace_probe.h Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
trace_recursion_record.c tracing: Use trace_create_file() to simplify creation of tracefs entries 2022-05-26 21:12:52 -04:00
trace_sched_switch.c sched/tracing: Append prev_state to tp args instead 2022-05-12 00:37:11 +02:00
trace_sched_wakeup.c tracing: Fix memleak due to race between current_tracer and trace 2023-08-17 13:49:37 -04:00
trace_selftest_dynamic.c
trace_selftest.c tracing: Have function_graph selftest call cond_resched() 2023-05-28 21:15:46 -04:00
trace_seq.c tracing: Fix kernel-doc warnings in trace_seq.c 2023-07-28 19:59:04 -04:00
trace_stack.c
trace_stat.c
trace_stat.h
trace_synth.h tracing: Allow synthetic events to pass around stacktraces 2023-01-25 10:31:24 -05:00
trace_syscalls.c tracing: bpf: use struct trace_entry in struct syscall_tp_t 2023-08-01 10:53:28 -07:00
trace_uprobe.c Probes updates for v6.6: 2023-09-02 11:10:50 -07:00
trace.c Tracing fixes for 6.6: 2023-09-13 11:30:11 -07:00
trace.h tracing: Increase trace array ref count on enable and filter files 2023-09-07 16:05:46 -04:00
tracing_map.c tracing: Remove unused variable 'dups' 2022-10-03 12:20:31 -04:00
tracing_map.h tracing: Remove unused extern declaration tracing_map_set_field_descr() 2023-07-23 11:08:14 -04:00