linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 00:34:20 +08:00

History

Mark Rutland 94b6cf84db arm64: cmpxchg_double: hazard against entire exchange variable [ Upstream commit `031af50045` ] The inline assembly for arm64's cmpxchg_double() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: `fee960bed5` ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double() similarly at the same time. The same problem applies, as demonstrated with the following test: \| struct big { \| u64 lo, hi; \| } __aligned(128); \| \| unsigned long foo(struct big b) \| { \| u64 hi_old, hi_new; \| \| hi_old = b->hi; \| cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); \| hi_new = b->hi; \| \| return hi_old ^ hi_new; \| } ... which GCC 12.1.0 compiles as: \| 0000000000000000 <foo>: \| 0: d503233f paciasp \| 4: aa0003e4 mov x4, x0 \| 8: 1400000e b 40 <foo+0x40> \| c: d2800240 mov x0, #0x12 // #18 \| 10: d2800681 mov x1, #0x34 // #52 \| 14: aa0003e5 mov x5, x0 \| 18: aa0103e6 mov x6, x1 \| 1c: d2800ac2 mov x2, #0x56 // #86 \| 20: d2800f03 mov x3, #0x78 // #120 \| 24: 48207c82 casp x0, x1, x2, x3, [x4] \| 28: ca050000 eor x0, x0, x5 \| 2c: ca060021 eor x1, x1, x6 \| 30: aa010000 orr x0, x0, x1 \| 34: d2800000 mov x0, #0x0 // #0 <--- BANG \| 38: d50323bf autiasp \| 3c: d65f03c0 ret \| 40: d2800240 mov x0, #0x12 // #18 \| 44: d2800681 mov x1, #0x34 // #52 \| 48: d2800ac2 mov x2, #0x56 // #86 \| 4c: d2800f03 mov x3, #0x78 // #120 \| 50: f9800091 prfm pstl1strm, [x4] \| 54: c87f1885 ldxp x5, x6, [x4] \| 58: ca0000a5 eor x5, x5, x0 \| 5c: ca0100c6 eor x6, x6, x1 \| 60: aa0600a6 orr x6, x5, x6 \| 64: b5000066 cbnz x6, 70 <foo+0x70> \| 68: c8250c82 stxp w5, x2, x3, [x4] \| 6c: 35ffff45 cbnz w5, 54 <foo+0x54> \| 70: d2800000 mov x0, #0x0 // #0 <--- BANG \| 74: d50323bf autiasp \| 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: \| 0000000000000000 <foo>: \| 0: f9400407 ldr x7, [x0, #8] \| 4: d503233f paciasp \| 8: aa0003e4 mov x4, x0 \| c: 1400000f b 48 <foo+0x48> \| 10: d2800240 mov x0, #0x12 // #18 \| 14: d2800681 mov x1, #0x34 // #52 \| 18: aa0003e5 mov x5, x0 \| 1c: aa0103e6 mov x6, x1 \| 20: d2800ac2 mov x2, #0x56 // #86 \| 24: d2800f03 mov x3, #0x78 // #120 \| 28: 48207c82 casp x0, x1, x2, x3, [x4] \| 2c: ca050000 eor x0, x0, x5 \| 30: ca060021 eor x1, x1, x6 \| 34: aa010000 orr x0, x0, x1 \| 38: f9400480 ldr x0, [x4, #8] \| 3c: d50323bf autiasp \| 40: ca0000e0 eor x0, x7, x0 \| 44: d65f03c0 ret \| 48: d2800240 mov x0, #0x12 // #18 \| 4c: d2800681 mov x1, #0x34 // #52 \| 50: d2800ac2 mov x2, #0x56 // #86 \| 54: d2800f03 mov x3, #0x78 // #120 \| 58: f9800091 prfm pstl1strm, [x4] \| 5c: c87f1885 ldxp x5, x6, [x4] \| 60: ca0000a5 eor x5, x5, x0 \| 64: ca0100c6 eor x6, x6, x1 \| 68: aa0600a6 orr x6, x5, x6 \| 6c: b5000066 cbnz x6, 78 <foo+0x78> \| 70: c8250c82 stxp w5, x2, x3, [x4] \| 74: 35ffff45 cbnz w5, 5c <foo+0x5c> \| 78: f9400480 ldr x0, [x4, #8] \| 7c: d50323bf autiasp \| 80: ca0000e0 eor x0, x7, x0 \| 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: `5284e1b4bc` ("arm64: xchg: Implement cmpxchg_double") Fixes: `e9a4b79565` ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>		2023-01-18 11:48:58 +01:00
..
alpha	alpha: fix syscall entry in !AUDUT_SYSCALL case	2022-12-31 13:14:01 +01:00
arc	arc: iounmap() arg is volatile	2022-11-03 23:59:15 +09:00
arm	ARM: renumber bits related to _TIF_WORK_MASK	2023-01-12 11:59:05 +01:00
arm64	arm64: cmpxchg_double*: hazard against entire exchange variable	2023-01-18 11:48:58 +01:00
csky	csky/kprobe: reclaim insn_slot on kprobe unregistration	2022-08-25 11:40:38 +02:00
h8300
hexagon	uaccess: fix integer overflow on access_ok()	2022-03-28 09:58:45 +02:00
ia64	ia64: export memory_add_physaddr_to_nid to fix cxl build error	2022-10-26 12:34:59 +02:00
m68k	m68k: coldfire/device.c: protect FLEXCAN blocks	2022-08-25 11:40:05 +02:00
microblaze	uaccess: fix nios2 and microblaze get_user_8()	2022-04-08 14:23:18 +02:00
mips	MIPS: OCTEON: warn only once if deprecated link status is being used	2022-12-31 13:14:05 +01:00
nds32	nds32: fix access_ok() checks in get/put_user	2022-03-28 09:58:46 +02:00
nios2	nios2: add FORCE for vmlinuz.gz	2022-12-02 17:41:08 +01:00
openrisc	openrisc: io: Define iounmap argument as volatile	2022-08-25 11:40:38 +02:00
parisc	parisc: Align parisc MADV_XXX constants with all other architectures	2023-01-14 10:23:27 +01:00
powerpc	powerpc/imc-pmu: Fix use of mutex in IRQs disabled section	2023-01-18 11:48:52 +01:00
riscv	riscv, kprobes: Stricter c.jr/c.jalr decoding	2023-01-12 11:59:18 +01:00
s390	s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()	2023-01-18 11:48:49 +01:00
sh	sh: machvec: Use char[] for section boundaries	2022-10-26 12:34:36 +02:00
sparc	signal: Deliver SIGTRAP on perf event asynchronously if blocked	2022-06-09 10:22:48 +02:00
um	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK	2022-10-26 12:34:23 +02:00
x86	KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID	2023-01-18 11:48:57 +01:00
xtensa	xtensa: iss: fix handling error cases in iss_net_configure()	2022-08-17 14:23:52 +02:00
.gitignore
Kconfig	arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic	2022-08-17 14:23:00 +02:00