linux/tools
Ammar Faizi 27af4f2260 tools/nolibc: x86: Remove r8, r9 and r10 from the clobber list
[ Upstream commit bf91666959 ]

Linux x86-64 syscall only clobbers rax, rcx and r11 (and "memory").

  - rax for the return value.
  - rcx to save the return address.
  - r11 to save the rflags.

Other registers are preserved.

Having r8, r9 and r10 in the syscall clobber list is harmless, but this
results in a missed-optimization.

As the syscall doesn't clobber r8-r10, GCC should be allowed to reuse
their value after the syscall returns to userspace. But since they are
in the clobber list, GCC will always miss this opportunity.

Remove them from the x86-64 syscall clobber list to help GCC generate
better code and fix the comment.

See also the x86-64 ABI, section A.2 AMD64 Linux Kernel Conventions,
A.2.1 Calling Conventions [1].

Extra note:
Some people may think it does not really give a benefit to remove r8,
r9 and r10 from the syscall clobber list because the impression of
syscall is a C function call, and function call always clobbers those 3.

However, that is not the case for nolibc.h, because we have a potential
to inline the "syscall" instruction (which its opcode is "0f 05") to the
user functions.

All syscalls in the nolibc.h are written as a static function with inline
ASM and are likely always inline if we use optimization flag, so this is
a profit not to have r8, r9 and r10 in the clobber list.

Here is the example where this matters.

Consider the following C code:
```
  #include "tools/include/nolibc/nolibc.h"
  #define read_abc(a, b, c) __asm__ volatile("nop"::"r"(a),"r"(b),"r"(c))

  int main(void)
  {
  	int a = 0xaa;
  	int b = 0xbb;
  	int c = 0xcc;

  	read_abc(a, b, c);
  	write(1, "test\n", 5);
  	read_abc(a, b, c);

  	return 0;
  }
```

Compile with:
    gcc -Os test.c -o test -nostdlib

With r8, r9, r10 in the clobber list, GCC generates this:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 54                	push   %r12
    1006:	41 bc cc 00 00 00    	mov    $0xcc,%r12d
    100c:	55                   	push   %rbp
    100d:	bd bb 00 00 00       	mov    $0xbb,%ebp
    1012:	53                   	push   %rbx
    1013:	bb aa 00 00 00       	mov    $0xaa,%ebx
    1018:	90                   	nop
    1019:	b8 01 00 00 00       	mov    $0x1,%eax
    101e:	bf 01 00 00 00       	mov    $0x1,%edi
    1023:	ba 05 00 00 00       	mov    $0x5,%edx
    1028:	48 8d 35 d1 0f 00 00 	lea    0xfd1(%rip),%rsi
    102f:	0f 05                	syscall
    1031:	90                   	nop
    1032:	31 c0                	xor    %eax,%eax
    1034:	5b                   	pop    %rbx
    1035:	5d                   	pop    %rbp
    1036:	41 5c                	pop    %r12
    1038:	c3                   	ret

GCC thinks that syscall will clobber r8, r9, r10. So it spills 0xaa,
0xbb and 0xcc to callee saved registers (r12, rbp and rbx). This is
clearly extra memory access and extra stack size for preserving them.

But syscall does not actually clobber them, so this is a missed
optimization.

Now without r8, r9, r10 in the clobber list, GCC generates better code:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64
    1004:	41 b8 aa 00 00 00    	mov    $0xaa,%r8d
    100a:	41 b9 bb 00 00 00    	mov    $0xbb,%r9d
    1010:	41 ba cc 00 00 00    	mov    $0xcc,%r10d
    1016:	90                   	nop
    1017:	b8 01 00 00 00       	mov    $0x1,%eax
    101c:	bf 01 00 00 00       	mov    $0x1,%edi
    1021:	ba 05 00 00 00       	mov    $0x5,%edx
    1026:	48 8d 35 d3 0f 00 00 	lea    0xfd3(%rip),%rsi
    102d:	0f 05                	syscall
    102f:	90                   	nop
    1030:	31 c0                	xor    %eax,%eax
    1032:	c3                   	ret

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/x86-64-psABI [1]
Link: https://lore.kernel.org/lkml/20211011040344.437264-1-ammar.faizi@students.amikom.ac.id/
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 184177c3d6 ("tools/nolibc: restore mips branch ordering in the _start block")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18 11:48:55 +01:00
..
accounting
arch parisc: Align parisc MADV_XXX constants with all other architectures 2023-01-14 10:23:27 +01:00
bootconfig tools/bootconfig: Define memblock_free_ptr() to fix build error 2021-09-15 09:49:48 -07:00
bpf bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE 2022-11-16 09:58:15 +01:00
build tools build: Switch to new openssl API for test-libcrypto 2022-08-25 11:40:14 +02:00
cgroup
debugging
edid
firewire
firmware
gpio
hv
iio tools: iio: iio_generic_buffer: Fix read size 2022-12-02 17:41:10 +01:00
include tools/nolibc: x86: Remove r8, r9 and r10 from the clobber list 2023-01-18 11:48:55 +01:00
io_uring tools/io_uring/io_uring-cp: sync with liburing example 2021-08-13 08:58:11 -06:00
kvm/kvm_stat tools/kvm_stat: fix display of error when multiple processes are found 2022-08-11 13:07:51 +02:00
laptop
leds
lib libbpf: Avoid enum forward-declarations in public API in C++ mode 2022-12-31 13:14:43 +01:00
memory-model
objtool objtool: Fix SEGFAULT 2023-01-12 11:58:45 +01:00
pci
pcmcia
perf perf auxtrace: Fix address filter duplicate symbol selection 2023-01-18 11:48:48 +01:00
power tools/power turbostat: fix ICX DRAM power numbers 2022-06-09 10:22:32 +02:00
rcu
scripts
spi
testing af_unix: selftest: Fix the size of the parameter to connect() 2023-01-18 11:48:54 +01:00
thermal/tmon tools/thermal: Fix possible path truncations 2022-08-17 14:24:15 +02:00
time
tracing tools/latency-collector: Use correct size when writing queue_full_warning 2021-11-18 19:16:19 +01:00
usb usb: testusb: Fix for showing the connection speed 2021-09-14 10:31:41 +02:00
virtio tools/virtio: compile with -pthread 2022-05-25 09:57:25 +02:00
vm tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" 2022-12-08 11:28:42 +01:00
wmi
Makefile