linux/include
Jiri Olsa ff474a78ce uprobe: Add uretprobe syscall to speed up return probe
Adding uretprobe syscall instead of trap to speed up return probe.

At the moment the uretprobe setup/path is:

  - install entry uprobe

  - when the uprobe is hit, it overwrites probed function's return address
    on stack with address of the trampoline that contains breakpoint
    instruction

  - the breakpoint trap code handles the uretprobe consumers execution and
    jumps back to original return address

This patch replaces the above trampoline's breakpoint instruction with new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some more extra work:

  - syscall trampoline must save original value for rax/r11/rcx registers
    on stack - rax is set to syscall number and r11/rcx are changed and
    used by syscall instruction

  - the syscall code reads the original values of those registers and
    restore those values in task's pt_regs area

  - only caller from trampoline exposed in '[uprobes]' is allowed,
    the process will receive SIGILL signal otherwise

Even with some extra work, using the uretprobes syscall shows speed
improvement (compared to using standard breakpoint):

  On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)

  current:
    uretprobe-nop  :    1.498 ± 0.000M/s
    uretprobe-push :    1.448 ± 0.001M/s
    uretprobe-ret  :    0.816 ± 0.001M/s

  with the fix:
    uretprobe-nop  :    1.969 ± 0.002M/s  < 31% speed up
    uretprobe-push :    1.910 ± 0.000M/s  < 31% speed up
    uretprobe-ret  :    0.934 ± 0.000M/s  < 14% speed up

  On Amd (AMD Ryzen 7 5700U)

  current:
    uretprobe-nop  :    0.778 ± 0.001M/s
    uretprobe-push :    0.744 ± 0.001M/s
    uretprobe-ret  :    0.540 ± 0.001M/s

  with the fix:
    uretprobe-nop  :    0.860 ± 0.001M/s  < 10% speed up
    uretprobe-push :    0.818 ± 0.001M/s  < 10% speed up
    uretprobe-ret  :    0.578 ± 0.000M/s  <  7% speed up

The performance test spawns a thread that runs loop which triggers
uprobe with attached bpf program that increments the counter that
gets printed in results above.

The uprobe (and uretprobe) kind is determined by which instruction
is being patched with breakpoint instruction. That's also important
for uretprobes, because uprobe is installed for each uretprobe.

The performance test is part of bpf selftests:
  tools/testing/selftests/bpf/run_bench_uprobes.sh

Note at the moment uretprobe syscall is supported only for native
64-bit process, compat process still uses standard breakpoint.

Note that when shadow stack is enabled the uretprobe syscall returns
via iret, which is slower than return via sysret, but won't cause the
shadow stack violation.

Link: https://lore.kernel.org/all/20240611112158.40795-4-jolsa@kernel.org/

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-06-12 08:44:28 +09:00
..
acpi The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
asm-generic asm-generic cleanups for 6.10 2024-05-20 15:18:34 -07:00
clocksource
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
drm Short summary of fixes pull: 2024-05-27 13:47:14 +10:00
dt-bindings - Core Frameworks 2024-05-22 10:49:54 -07:00
keys Hi, 2024-05-13 10:40:15 -07:00
kunit kunit: Print last test location on fault 2024-05-06 14:22:02 -06:00
kvm Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next 2024-05-08 16:41:50 +01:00
linux uprobe: Add uretprobe syscall to speed up return probe 2024-06-12 08:44:28 +09:00
math-emu
media
memory
misc
net rtnetlink: make the "split" NLM_DONE handling generic 2024-06-05 12:34:54 +01:00
pcmcia
ras tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
rdma The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rv
scsi SCSI misc on 20240514 2024-05-14 18:25:53 -07:00
soc I'm actually surprised this time. There aren't any new Qualcomm SoC clk 2024-05-18 12:48:37 -07:00
sound ALSA: pcm: fix typo in comment 2024-05-29 10:40:36 +02:00
target
trace block-6.10-20240523 2024-05-23 13:44:47 -07:00
uapi uprobe: Wire up uretprobe system call 2024-06-12 08:44:27 +09:00
ufs
vdso
video
xen