2019-05-19 20:07:45 +08:00
# SPDX-License-Identifier: GPL-2.0-only
2012-04-20 21:45:54 +08:00
config ARM64
def_bool y
2022-09-29 08:28:34 +08:00
select ACPI_APMT if ACPI
2015-06-11 00:08:53 +08:00
select ACPI_CCA_REQUIRED if ACPI
2015-03-25 01:58:51 +08:00
select ACPI_GENERIC_GSI if ACPI
2017-04-01 01:51:01 +08:00
select ACPI_GTDT if ACPI
2017-06-15 00:37:12 +08:00
select ACPI_IORT if ACPI
2015-03-24 22:02:51 +08:00
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
2018-12-20 06:46:57 +08:00
select ACPI_MCFG if (ACPI && PCI)
2016-09-28 04:54:14 +08:00
select ACPI_SPCR_TABLE if ACPI
2018-05-12 07:58:01 +08:00
select ACPI_PPTT if ACPI
2020-06-04 07:04:02 +08:00
select ARCH_HAS_DEBUG_WX
2022-02-01 00:54:55 +08:00
select ARCH_BINFMT_ELF_EXTRA_PHDRS
2020-03-17 00:50:47 +08:00
select ARCH_BINFMT_ELF_STATE
2021-10-21 08:55:09 +08:00
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
2021-05-05 09:38:21 +08:00
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
2021-05-05 09:38:17 +08:00
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
2021-05-05 09:38:25 +08:00
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
2021-05-05 09:38:21 +08:00
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
2021-05-05 09:38:09 +08:00
select ARCH_HAS_CACHE_LINE_SIZE
2022-02-17 04:05:28 +08:00
select ARCH_HAS_CURRENT_STACK_POINTER
2017-01-11 05:35:50 +08:00
select ARCH_HAS_DEBUG_VIRTUAL
mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.
This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.
Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().
This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.
Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.
[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05 07:47:15 +08:00
select ARCH_HAS_DEBUG_VM_PGTABLE
2019-03-25 22:44:06 +08:00
select ARCH_HAS_DMA_PREP_COHERENT
2016-06-20 18:56:13 +08:00
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
2018-04-24 23:25:47 +08:00
select ARCH_HAS_FAST_MULTIPLIER
include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.
GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.
This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.
Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:
* Some of the fortified string functions (strncpy, strcat), don't yet
place a limit on reads from the source based on __builtin_object_size of
the source buffer.
* Extending coverage to more string functions like strlcat.
* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.
* The compile-time checks should be made available via a separate config
option which can be enabled by default (or always enabled) once enough
time has passed to get the issues it catches fixed.
Kees said:
"This is great to have. While it was out-of-tree code, it would have
blocked at least CVE-2016-3858 from being exploitable (improper size
argument to strlcpy()). I've sent a number of fixes for
out-of-bounds-reads that this detected upstream already"
[arnd@arndb.de: x86: fix fortified memcpy]
Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-13 05:36:10 +08:00
select ARCH_HAS_FORTIFY_SOURCE
2014-12-13 08:57:44 +08:00
select ARCH_HAS_GCOV_PROFILE_ALL
2019-05-14 08:19:04 +08:00
select ARCH_HAS_GIGANTIC_PAGE
2016-06-17 00:39:52 +08:00
select ARCH_HAS_KCOV
2019-05-14 08:18:30 +08:00
select ARCH_HAS_KEEPINITRD
2018-01-30 04:20:19 +08:00
select ARCH_HAS_MEMBARRIER_SYNC_CORE
2022-09-29 02:17:05 +08:00
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.
To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").
Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.
However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).
For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 18:11:16 +08:00
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
2019-07-17 07:30:51 +08:00
select ARCH_HAS_PTE_DEVMAP
2018-06-08 08:06:08 +08:00
select ARCH_HAS_PTE_SPECIAL
2019-01-08 02:36:20 +08:00
select ARCH_HAS_SETUP_DMA_OPS
2019-05-23 18:22:54 +08:00
select ARCH_HAS_SET_DIRECT_MAP
2017-02-21 23:09:33 +08:00
select ARCH_HAS_SET_MEMORY
2020-09-14 23:34:09 +08:00
select ARCH_STACKWALK
2017-02-07 08:31:57 +08:00
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
2018-10-08 15:12:01 +08:00
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SYNC_DMA_FOR_CPU
arm64: implement syscall wrappers
To minimize the risk of userspace-controlled values being used under
speculation, this patch adds pt_regs based syscall wrappers for arm64,
which pass the minimum set of required userspace values to syscall
implementations. For each syscall, a wrapper which takes a pt_regs
argument is automatically generated, and this extracts the arguments
before calling the "real" syscall implementation.
Each syscall has three functions generated:
* __do_<compat_>sys_<name> is the "real" syscall implementation, with
the expected prototype.
* __se_<compat_>sys_<name> is the sign-extension/narrowing wrapper,
inherited from common code. This takes a series of long parameters,
casting each to the requisite types required by the "real" syscall
implementation in __do_<compat_>sys_<name>.
This wrapper *may* not be necessary on arm64 given the AAPCS rules on
unused register bits, but it seemed safer to keep the wrapper for now.
* __arm64_<compat_>_sys_<name> takes a struct pt_regs pointer, and
extracts *only* the relevant register values, passing these on to the
__se_<compat_>sys_<name> wrapper.
The syscall invocation code is updated to handle the calling convention
required by __arm64_<compat_>_sys_<name>, and passes a single struct
pt_regs pointer.
The compiler can fold the syscall implementation and its wrappers, such
that the overhead of this approach is minimized.
Note that we play games with sys_ni_syscall(). It can't be defined with
SYSCALL_DEFINE0() because we must avoid the possibility of error
injection. Additionally, there are a couple of locations where we need
to call it from C code, and we don't (currently) have a
ksys_ni_syscall(). While it has no wrapper, passing in a redundant
pt_regs pointer is benign per the AAPCS.
When ARCH_HAS_SYSCALL_WRAPPER is selected, no prototype is defines for
sys_ni_syscall(). Since we need to treat it differently for in-kernel
calls and the syscall tables, the prototype is defined as-required.
The wrappers are largely the same as their x86 counterparts, but
simplified as we don't have a variety of compat calling conventions that
require separate stubs. Unlike x86, we have some zero-argument compat
syscalls, and must define COMPAT_SYSCALL_DEFINE0() to ensure that these
are also given an __arm64_compat_sys_ prefix.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-11 21:56:56 +08:00
select ARCH_HAS_SYSCALL_WRAPPER
2018-12-22 05:14:44 +08:00
select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
2013-09-04 17:55:17 +08:00
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
2021-07-01 09:52:20 +08:00
select ARCH_HAS_ZONE_DMA_SET if EXPERT
2020-03-17 00:50:47 +08:00
select ARCH_HAVE_ELF_PROT
2017-09-27 23:51:30 +08:00
select ARCH_HAVE_NMI_SAFE_CMPXCHG
lib: Add register read/write tracing support
Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
are typically used to read/write from/to memory mapped registers
and can cause hangs or some undefined behaviour in following few
cases,
* If the access to the register space is unclocked, for example: if
there is an access to multimedia(MM) block registers without MM
clocks.
* If the register space is protected and not set to be accessible from
non-secure world, for example: only EL3 (EL: Exception level) access
is allowed and any EL2/EL1 access is forbidden.
* If xPU(memory/register protection units) is controlling access to
certain memory/register space for specific clients.
and more...
Such cases usually results in instant reboot/SErrors/NOC or interconnect
hangs and tracing these register accesses can be very helpful to debug
such issues during initial development stages and also in later stages.
So use ftrace trace events to log such MMIO register accesses which
provides rich feature set such as early enablement of trace events,
filtering capability, dumping ftrace logs on console and many more.
Sample output:
rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610
Co-developed-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-19 00:44:14 +08:00
select ARCH_HAVE_TRACE_MMIO_ACCESS
2019-10-16 03:17:49 +08:00
select ARCH_INLINE_READ_LOCK if !PREEMPTION
select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
2019-05-14 08:22:59 +08:00
select ARCH_KEEP_MEMBLOCK
2014-05-09 17:33:01 +08:00
select ARCH_USE_CMPXCHG_LOCKREF
2020-03-18 16:28:31 +08:00
select ARCH_USE_GNU_PROPERTY
2021-04-30 13:55:15 +08:00
select ARCH_USE_MEMTEST
2017-10-12 20:20:50 +08:00
select ARCH_USE_QUEUED_RWLOCKS
2018-03-14 04:45:45 +08:00
select ARCH_USE_QUEUED_SPINLOCKS
2020-05-01 19:54:30 +08:00
select ARCH_USE_SYM_ANNOTATIONS
2020-12-15 11:10:30 +08:00
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
2021-05-05 09:38:13 +08:00
select ARCH_SUPPORTS_HUGETLBFS
2017-06-09 01:25:29 +08:00
select ARCH_SUPPORTS_MEMORY_FAILURE
2020-04-28 00:00:16 +08:00
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
2020-12-12 02:46:33 +08:00
select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
select ARCH_SUPPORTS_LTO_CLANG_THIN
2021-04-09 02:28:43 +08:00
select ARCH_SUPPORTS_CFI_CLANG
2014-06-07 01:53:16 +08:00
select ARCH_SUPPORTS_ATOMIC_RMW
2021-09-11 07:40:44 +08:00
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
2016-04-09 06:50:28 +08:00
select ARCH_SUPPORTS_NUMA_BALANCING
2022-05-13 11:23:06 +08:00
select ARCH_SUPPORTS_PAGE_TABLE_CHECK
2023-02-28 01:36:29 +08:00
select ARCH_SUPPORTS_PER_VMA_LOCK
2019-05-08 04:52:28 +08:00
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
2019-12-09 23:08:03 +08:00
select ARCH_WANT_DEFAULT_BPF_JIT
2019-09-24 06:38:47 +08:00
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
2013-01-30 02:25:41 +08:00
select ARCH_WANT_FRAME_POINTERS
2019-06-28 06:00:11 +08:00
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
2020-11-20 04:46:56 +08:00
select ARCH_WANT_LD_ORPHAN_WARN
2021-06-22 07:18:22 +08:00
select ARCH_WANTS_NO_INSTR
arm64: enable THP_SWAP for arm64
THP_SWAP has been proven to improve the swap throughput significantly
on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
splitting THP after swapped out").
As long as arm64 uses 4K page size, it is quite similar with x86_64
by having 2MB PMD THP. THP_SWAP is architecture-independent, thus,
enabling it on arm64 will benefit arm64 as well.
A corner case is that MTE has an assumption that only base pages
can be swapped. We won't enable THP_SWAP for ARM64 hardware with
MTE support until MTE is reworked to coexist with THP_SWAP.
A micro-benchmark is written to measure thp swapout throughput as
below,
unsigned long long tv_to_ms(struct timeval tv)
{
return tv.tv_sec * 1000 + tv.tv_usec / 1000;
}
main()
{
struct timeval tv_b, tv_e;;
#define SIZE 400*1024*1024
volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (!p) {
perror("fail to get memory");
exit(-1);
}
madvise(p, SIZE, MADV_HUGEPAGE);
memset(p, 0x11, SIZE); /* write to get mem */
gettimeofday(&tv_b, NULL);
madvise(p, SIZE, MADV_PAGEOUT);
gettimeofday(&tv_e, NULL);
printf("swp out bandwidth: %ld bytes/ms\n",
SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b)));
}
Testing is done on rk3568 64bit Quad Core Cortex-A55 platform -
ROCK 3A.
thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests)
thp swp throughput w/ patch: 3331bytes/ms (mean of 10 tests)
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Link: https://lore.kernel.org/r/20220720093737.133375-1-21cnbao@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-07-20 17:37:37 +08:00
select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
2016-02-06 07:50:18 +08:00
select ARCH_HAS_UBSAN_SANITIZE_ALL
2012-12-18 23:26:13 +08:00
select ARM_AMBA
2012-11-20 18:06:00 +08:00
select ARM_ARCH_TIMER
2013-01-14 20:39:31 +08:00
select ARM_GIC
2014-07-04 15:28:30 +08:00
select AUDIT_ARCH_COMPAT_GENERIC
2016-06-16 04:47:33 +08:00
select ARM_GIC_V2M if PCI
2014-06-30 23:01:31 +08:00
select ARM_GIC_V3
2016-06-16 04:47:33 +08:00
select ARM_GIC_V3_ITS if PCI
2015-07-31 22:46:16 +08:00
select ARM_PSCI_FW
2019-12-04 08:46:31 +08:00
select BUILDTIME_TABLE_SORT
2012-12-18 23:27:25 +08:00
select CLONE_BACKWARDS
2012-09-23 01:33:36 +08:00
select COMMON_CLK
2013-11-08 02:37:14 +08:00
select CPU_PM if (SUSPEND || CPU_IDLE)
2018-08-27 19:02:44 +08:00
select CRC32
2013-11-07 03:32:13 +08:00
select DCACHE_WORD_ACCESS
2022-11-23 00:36:24 +08:00
select DYNAMIC_FTRACE if FUNCTION_TRACER
2018-11-05 03:29:28 +08:00
select DMA_DIRECT_REMAP
2015-07-08 00:15:39 +08:00
select EDAC_SUPPORT
2015-11-10 02:09:55 +08:00
select FRAME_POINTER
arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
On arm64 we don't align assembly function in the same way as C
functions. This somewhat limits the utility of
CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B for testing, and adds noise when
testing that we're correctly aligning functions as will be necessary for
ftrace in subsequent patches.
Follow the example of x86, and align assembly functions in the same way
as C functions. Selecting FUNCTION_ALIGNMENT_4B ensures
CONFIG_FUCTION_ALIGNMENT will be a minimum of 4 bytes, matching the
minimum alignment that __ALIGN and __ALIGN_STR provide prior to this
patch.
I've tested this by selecting CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y,
building and booting a kernel, and looking for misaligned text symbols:
Before, v6.2-rc3:
# uname -rm
6.2.0-rc3 aarch64
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
5009
Before, v6.2-rc3 + fixed __cold:
# uname -rm
6.2.0-rc3-00001-g2a2bedf8bfa9 aarch64
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
919
Before, v6.2-rc3 + fixed __cold + fixed ACPICA:
# uname -rm
6.2.0-rc3-00002-g267bddc38572 aarch64
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
323
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep acpi | wc -l
0
After:
# uname -rm
6.2.0-rc3-00003-g71db61ee3ea1 aarch64
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | wc -l
112
Considering the remaining 112 unaligned text symbols:
* 20 are non-function KVM NVHE assembly symbols, which are never
instrumented by ftrace:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep __kvm_nvhe | wc -l
20
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep __kvm_nvhe
ffffbe6483f73784 t __kvm_nvhe___invalid
ffffbe6483f73788 t __kvm_nvhe___do_hyp_init
ffffbe6483f73ab0 t __kvm_nvhe_reset
ffffbe6483f73b8c T __kvm_nvhe___hyp_idmap_text_end
ffffbe6483f73b8c T __kvm_nvhe___hyp_text_start
ffffbe6483f77864 t __kvm_nvhe___host_enter_restore_full
ffffbe6483f77874 t __kvm_nvhe___host_enter_for_panic
ffffbe6483f778a4 t __kvm_nvhe___host_enter_without_restoring
ffffbe6483f81178 T __kvm_nvhe___guest_exit_panic
ffffbe6483f811c8 T __kvm_nvhe___guest_exit
ffffbe6483f81354 t __kvm_nvhe_abort_guest_exit_start
ffffbe6483f81358 t __kvm_nvhe_abort_guest_exit_end
ffffbe6483f81830 t __kvm_nvhe_wa_epilogue
ffffbe6483f81844 t __kvm_nvhe_el1_trap
ffffbe6483f81864 t __kvm_nvhe_el1_fiq
ffffbe6483f81864 t __kvm_nvhe_el1_irq
ffffbe6483f81884 t __kvm_nvhe_el1_error
ffffbe6483f818a4 t __kvm_nvhe_el2_sync
ffffbe6483f81920 t __kvm_nvhe_el2_error
ffffbe6483f865c8 T __kvm_nvhe___start___kvm_ex_table
* 53 are position-independent functions only used during early boot, which are
built with '-Os', but are never instrumented by ftrace:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep __pi | wc -l
53
We *could* drop '-Os' when building these for consistency, but that is
not necessary to ensure that ftrace works correctly.
* The remaining 39 are non-function symbols, and 3 runtime BPF
functions, which are never instrumented by ftrace:
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep -v __kvm_nvhe | grep -v __pi | wc -l
39
# grep ' [Tt] ' /proc/kallsyms | grep -iv '[048c]0 [Tt] ' | grep -v __kvm_nvhe | grep -v __pi
ffffbe6482e1009c T __irqentry_text_end
ffffbe6482e10358 T __softirqentry_text_end
ffffbe6482e1435c T __entry_text_end
ffffbe6482e825f8 T __guest_exit_panic
ffffbe6482e82648 T __guest_exit
ffffbe6482e827d4 t abort_guest_exit_start
ffffbe6482e827d8 t abort_guest_exit_end
ffffbe6482e83030 t wa_epilogue
ffffbe6482e83044 t el1_trap
ffffbe6482e83064 t el1_fiq
ffffbe6482e83064 t el1_irq
ffffbe6482e83084 t el1_error
ffffbe6482e830a4 t el2_sync
ffffbe6482e83120 t el2_error
ffffbe6482e93550 T sha256_block_neon
ffffbe64830f3ae0 t e843419@01cc_00002a0c_3104
ffffbe648378bd90 t e843419@09b3_0000d7cb_bc4
ffffbe6483bdab20 t e843419@0c66_000116e2_34c8
ffffbe6483f62c94 T __noinstr_text_end
ffffbe6483f70a18 T __sched_text_end
ffffbe6483f70b2c T __cpuidle_text_end
ffffbe6483f722d4 T __lock_text_end
ffffbe6483f73b8c T __hyp_idmap_text_end
ffffbe6483f73b8c T __hyp_text_start
ffffbe6483f865c8 T __start___kvm_ex_table
ffffbe6483f870d0 t init_el1
ffffbe6483f870f8 t init_el2
ffffbe6483f87324 t pen
ffffbe6483f87b48 T __idmap_text_end
ffffbe64848eb010 T __hibernate_exit_text_start
ffffbe64848eb124 T __hibernate_exit_text_end
ffffbe64848eb124 T __relocate_new_kernel_start
ffffbe64848eb260 T __relocate_new_kernel_end
ffffbe648498a8e8 T _einittext
ffffbe648498a8e8 T __exittext_begin
ffffbe6484999d84 T __exittext_end
ffff8000080756b4 t bpf_prog_6deef7357e7b4530 [bpf]
ffff80000808dd78 t bpf_prog_6deef7357e7b4530 [bpf]
ffff80000809d684 t bpf_prog_6deef7357e7b4530 [bpf]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 21:45:59 +08:00
select FUNCTION_ALIGNMENT_4B
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.
As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:
https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110
| # of failed: 0
| # of unresolved: 3
| # of untested: 0
| # of unsupported: 0
| # of xfailed: 1
| # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 21:46:03 +08:00
select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS
2014-10-10 06:26:44 +08:00
select GENERIC_ALLOCATOR
2017-06-01 00:59:28 +08:00
select GENERIC_ARCH_TOPOLOGY
2015-05-30 01:28:44 +08:00
select GENERIC_CLOCKEVENTS_BROADCAST
2014-03-04 09:10:04 +08:00
select GENERIC_CPU_AUTOPROBE
2019-04-16 05:21:29 +08:00
select GENERIC_CPU_VULNERABILITIES
2014-04-08 06:39:52 +08:00
select GENERIC_EARLY_IOREMAP
2015-08-21 11:40:22 +08:00
select GENERIC_IDLE_POLL_SETUP
2022-06-07 20:50:26 +08:00
select GENERIC_IOREMAP
arm64: Allow IPIs to be handled as normal interrupts
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-25 22:03:47 +08:00
select GENERIC_IRQ_IPI
2012-04-20 21:45:54 +08:00
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_SHOW
2015-04-23 01:16:33 +08:00
select GENERIC_IRQ_SHOW_LEVEL
2020-07-10 03:05:36 +08:00
select GENERIC_LIB_DEVMEM_IS_ALLOWED
2014-11-19 21:09:07 +08:00
select GENERIC_PCI_IOMAP
2020-02-04 09:36:29 +08:00
select GENERIC_PTDUMP
2013-07-19 07:21:18 +08:00
select GENERIC_SCHED_CLOCK
2012-04-20 21:45:54 +08:00
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
2019-06-21 17:52:31 +08:00
select GENERIC_GETTIMEOFDAY
2020-06-24 16:33:21 +08:00
select GENERIC_VDSO_TIME_NS
2012-04-20 21:45:54 +08:00
select HARDIRQS_SW_RESEND
2020-10-14 08:53:07 +08:00
select HAVE_MOVE_PMD
2020-12-15 11:07:35 +08:00
select HAVE_MOVE_PUD
2018-11-16 03:05:32 +08:00
select HAVE_PCI
2016-12-01 21:51:12 +08:00
select HAVE_ACPI_APEI if (ACPI && EFI)
2014-10-24 20:22:20 +08:00
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
2014-07-04 15:28:30 +08:00
select HAVE_ARCH_AUDITSYSCALL
2014-11-03 10:02:23 +08:00
select HAVE_ARCH_BITREVERSE
2020-03-13 17:04:58 +08:00
select HAVE_ARCH_COMPILER_H
2022-09-11 12:44:23 +08:00
select HAVE_ARCH_HUGE_VMALLOC
2016-02-16 20:52:35 +08:00
select HAVE_ARCH_HUGE_VMAP
2014-01-07 22:17:13 +08:00
select HAVE_ARCH_JUMP_LABEL
arm64/kernel: jump_label: Switch to relative references
On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:
...
[38088] __jump_table PROGBITS 0000000000000000 00e19f30
000000000002ea10 0000000000000000 WA 0 0 8
[38089] .rela__jump_table RELA 0000000000000000 01fd8bb0
000000000008be30 0000000000000018 I 38178 38088 8
...
In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.
So let's switch from 64-bit absolute references to 32-bit relative
references for the code and target field, and a 64-bit relative
reference for the 'key' field (which may reside in another module or the
core kernel, which may be more than 4 GB way on arm64 when running with
KASLR enable): this reduces the size of the __jump_table by 33%, and
gets rid of the RELA section entirely.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-4-ard.biesheuvel@linaro.org
2018-09-19 14:51:38 +08:00
select HAVE_ARCH_JUMP_LABEL_RELATIVE
2017-11-16 09:36:40 +08:00
select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
2021-03-24 12:05:20 +08:00
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
2018-12-28 16:31:07 +08:00
select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
2020-12-23 04:02:20 +08:00
select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
2021-12-11 21:17:34 +08:00
# Some instrumentation may be unsound, hence EXPERT
select HAVE_ARCH_KCSAN if EXPERT
2021-02-26 09:19:03 +08:00
select HAVE_ARCH_KFENCE
2014-01-28 19:20:22 +08:00
select HAVE_ARCH_KGDB
2016-01-15 07:20:01 +08:00
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
arch: enable relative relocations for arm64, power and x86
Patch series "add support for relative references in special sections", v10.
This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time. On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)
Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.
Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.
Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.
Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.
[From the v7 series blurb, which included the jump_label patches as well]:
For the arm64 kernel, all patches combined reduce the memory footprint
of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
KASLR enabled), of which ~1 MB is the size reduction of the RELA section
in .init, and the remaining 300 KB is reduction of .text/.data.
This patch (of 6):
Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.
Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 12:56:00 +08:00
select HAVE_ARCH_PREL32_RELOCATIONS
2021-04-02 07:23:46 +08:00
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
2014-11-28 13:26:39 +08:00
select HAVE_ARCH_SECCOMP_FILTER
2018-07-21 05:41:54 +08:00
select HAVE_ARCH_STACKLEAK
2017-08-17 05:05:09 +08:00
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
2012-04-20 21:45:54 +08:00
select HAVE_ARCH_TRACEHOOK
2016-04-19 02:16:14 +08:00
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
2017-07-21 21:25:33 +08:00
select HAVE_ARCH_VMAP_STACK
2016-04-19 02:16:14 +08:00
select HAVE_ARM_SMCCC
2019-08-19 13:54:20 +08:00
select HAVE_ASM_MODVERSIONS
2016-05-14 01:08:28 +08:00
select HAVE_EBPF_JIT
2014-04-30 17:54:32 +08:00
select HAVE_C_RECORDMCOUNT
2014-10-24 20:22:20 +08:00
select HAVE_CMPXCHG_DOUBLE
2015-05-29 21:57:47 +08:00
select HAVE_CMPXCHG_LOCAL
2022-06-08 22:40:24 +08:00
select HAVE_CONTEXT_TRACKING_USER
2012-10-09 07:28:11 +08:00
select HAVE_DEBUG_KMEMLEAK
2013-12-13 03:28:33 +08:00
select HAVE_DMA_CONTIGUOUS
2014-04-30 17:54:34 +08:00
select HAVE_DYNAMIC_FTRACE
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.
As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:
https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110
| # of failed: 0
| # of unresolved: 3
| # of untested: 0
| # of unsupported: 0
| # of xfailed: 1
| # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-23 21:46:03 +08:00
select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS \
2023-02-27 19:58:19 +08:00
if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG && \
!CC_OPTIMIZE_FOR_SIZE)
2020-12-12 02:46:32 +08:00
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
if DYNAMIC_FTRACE_WITH_ARGS
2013-12-17 01:50:08 +08:00
select HAVE_EFFICIENT_UNALIGNED_ACCESS
2019-07-12 11:57:14 +08:00
select HAVE_FAST_GUP
2014-04-30 17:54:32 +08:00
select HAVE_FTRACE_MCOUNT_RECORD
2014-04-30 17:54:33 +08:00
select HAVE_FUNCTION_TRACER
2019-08-06 18:00:14 +08:00
select HAVE_FUNCTION_ERROR_INJECTION
2014-04-30 17:54:33 +08:00
select HAVE_FUNCTION_GRAPH_TRACER
2016-05-24 06:09:38 +08:00
select HAVE_GCC_PLUGINS
2012-04-20 21:45:54 +08:00
select HAVE_HW_BREAKPOINT if PERF_EVENTS
2022-06-07 20:50:27 +08:00
select HAVE_IOREMAP_PROT
2015-11-23 23:12:59 +08:00
select HAVE_IRQ_TIME_ACCOUNTING
2021-09-22 06:22:31 +08:00
select HAVE_KVM
2017-09-27 23:51:30 +08:00
select HAVE_NMI
2012-04-20 21:45:54 +08:00
select HAVE_PERF_EVENTS
2014-02-04 02:18:27 +08:00
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
arm64: Support PREEMPT_DYNAMIC
This patch enables support for PREEMPT_DYNAMIC on arm64, allowing the
preemption model to be chosen at boot time.
Specifically, this patch selects HAVE_PREEMPT_DYNAMIC_KEY, so that each
preemption function is an out-of-line call with an early return
depending upon a static key. This leaves almost all the codegen up to
the compiler, and side-steps a number of pain points with static calls
(e.g. interaction with CFI schemes). This should have no worse overhead
than using non-inline static calls, as those use out-of-line trampolines
with early returns.
For example, the dynamic_cond_resched() wrapper looks as follows when
enabled. When disabled, the first `B` is replaced with a `NOP`,
resulting in an early return.
| <dynamic_cond_resched>:
| bti c
| b <dynamic_cond_resched+0x10> // or `nop`
| mov w0, #0x0
| ret
| mrs x0, sp_el0
| ldr x0, [x0, #8]
| cbnz x0, <dynamic_cond_resched+0x8>
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret
... compared to the regular form of the function:
| <__cond_resched>:
| bti c
| mrs x0, sp_el0
| ldr x1, [x0, #8]
| cbz x1, <__cond_resched+0x18>
| mov w0, #0x0
| ret
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret
Since arm64 does not yet use the generic entry code, we must define our
own `sk_dynamic_irqentry_exit_cond_resched`, which will be
enabled/disabled by the common code in kernel/sched/core.c. All other
preemption functions and associated static keys are defined there.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-8-mark.rutland@arm.com
2022-02-15 00:52:16 +08:00
select HAVE_PREEMPT_DYNAMIC_KEY
2016-07-09 00:35:45 +08:00
select HAVE_REGS_AND_STACK_ACCESS_API
2021-10-18 22:47:13 +08:00
select HAVE_POSIX_CPU_TIMERS_TASK_WORK
2019-04-12 22:22:01 +08:00
select HAVE_FUNCTION_ARG_ACCESS_API
2020-02-04 09:37:02 +08:00
select MMU_GATHER_RCU_TABLE_FREE
2018-06-20 21:46:50 +08:00
select HAVE_RSEQ
2018-06-14 18:36:45 +08:00
select HAVE_STACKPROTECTOR
2014-04-30 17:54:36 +08:00
select HAVE_SYSCALL_TRACEPOINTS
arm64: Kprobes with single stepping support
Add support for basic kernel probes(kprobes) and jump probes
(jprobes) for ARM64.
Kprobes utilizes software breakpoint and single step debug
exceptions supported on ARM v8.
A software breakpoint is placed at the probe address to trap the
kernel execution into the kprobe handler.
ARM v8 supports enabling single stepping before the break exception
return (ERET), with next PC in exception return address (ELR_EL1). The
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed, and
enables single stepping. The PC is set to the out-of-line slot address
before the ERET. With this scheme, the instruction is executed with the
exact same register context except for the PC (and DAIF) registers.
Debug mask (PSTATE.D) is enabled only when single stepping a recursive
kprobe, e.g.: during kprobes reenter so that probed instruction can be
single stepped within the kprobe handler -exception- context.
The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
any further re-entry is prevented by not calling handlers and the case
counted as a missed kprobe).
Single stepping from the x-o-l slot has a drawback for PC-relative accesses
like branching and symbolic literals access as the offset from the new PC
(slot address) may not be ensured to fit in the immediate value of
the opcode. Such instructions need simulation, so reject
probing them.
Instructions generating exceptions or cpu mode change are rejected
for probing.
Exclusive load/store instructions are rejected too. Additionally, the
code is checked to see if it is inside an exclusive load/store sequence
(code from Pratyush).
System instructions are mostly enabled for stepping, except MSR/MRS
accesses to "DAIF" flags in PSTATE, which are not safe for
probing.
This also changes arch/arm64/include/asm/ptrace.h to use
include/asm-generic/ptrace.h.
Thanks to Steve Capper and Pratyush Anand for several suggested
Changes.
Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-09 00:35:48 +08:00
select HAVE_KPROBES
2017-02-06 17:54:33 +08:00
select HAVE_KRETPROBES
2019-06-21 17:52:31 +08:00
select HAVE_GENERIC_VDSO
2012-04-20 21:45:54 +08:00
select IRQ_DOMAIN
2015-04-28 04:53:09 +08:00
select IRQ_FORCED_THREADING
2022-03-25 09:11:53 +08:00
select KASAN_VMALLOC if KASAN
2012-10-16 18:26:57 +08:00
select MODULES_USE_ELF_RELA
2018-05-09 12:53:49 +08:00
select NEED_DMA_MAP_STATE
2018-04-05 15:44:52 +08:00
select NEED_SG_DMA_LENGTH
2012-04-20 21:45:54 +08:00
select OF
select OF_EARLY_FLATTREE
2018-11-16 03:05:33 +08:00
select PCI_DOMAINS_GENERIC if PCI
2018-12-20 06:46:57 +08:00
select PCI_ECAM if (ACPI && PCI)
2018-11-16 03:05:34 +08:00
select PCI_SYSCALL if PCI
2013-03-01 02:14:37 +08:00
select POWER_RESET
select POWER_SUPPLY
2012-04-20 21:45:54 +08:00
select SPARSE_IRQ
2018-04-24 15:00:54 +08:00
select SWIOTLB
2012-10-09 07:28:16 +08:00
select SYSCTL_EXCEPTION_TRACE
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
select THREAD_INFO_IN_TASK
userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9.
Overview
========
This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:
Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.
We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".
Use Case
========
Consider the use case of VM live migration (e.g. under QEMU/KVM):
1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".
2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.
3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.
4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".
We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.
Interaction with Existing APIs
==============================
Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:
UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:
- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.
UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).
- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).
Future Work
===========
This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.
This patch (of 6):
This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:
Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.
This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.
This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].
However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.
[1] https://lore.kernel.org/patchwork/patch/1380226/
[peterx@redhat.com: fix minor fault page leak]
Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 09:35:36 +08:00
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
2021-07-31 13:22:32 +08:00
select TRACE_IRQFLAGS_SUPPORT
arm64: select TRACE_IRQFLAGS_NMI_SUPPORT
Due to an oversight, on arm64 lockdep IRQ state tracking doesn't work as
intended in NMI context. This demonstrably results in bogus warnings
from lockdep, and in theory could mask a variety of issues.
On arm64, we've consistently tracked IRQ flag state for NMIs (and
saved/restored the state of the interrupted context) since commit:
f0cd5ac1e4c53cb6 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
That commit fixed most lockdep issues with NMI by virtue of the
save/restore of the lockdep state of the interrupted context. However,
for lockdep IRQ state tracking to consistently take effect in NMI
context it has been necessary to select TRACE_IRQFLAGS_NMI_SUPPORT since
commit:
ed00495333ccc80f ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")
As arm64 does not select TRACE_IRQFLAGS_NMI_SUPPORT, this means that the
lockdep state can be stale in NMI context, and some uses of that state
can consume stale data.
When an NMI is taken arm64 entry code will call arm64_enter_nmi(). This
will enter NMI context via __nmi_enter() before calling
lockdep_hardirqs_off() to inform lockdep that IRQs have been masked.
Where TRACE_IRQFLAGS_NMI_SUPPORT is not selected, lockdep_hardirqs_off()
will not update lockdep state if called in NMI context. Thus if IRQs
were enabled in the original context, lockdep will continue to believe
that IRQs are enabled despite the call to lockdep_hardirqs_off().
However, the lockdep_assert_*() checks do take effect in NMI context,
and will consume the stale lockdep state. If an NMI is taken from a
context which had IRQs enabled, and during the handling of the NMI
something calls lockdep_assert_irqs_disabled(), this will result in a
spurious warning based upon the stale lockdep state.
This can be seen when using perf with GICv3 pseudo-NMIs. Within the perf
NMI handler we may attempt a uaccess to record the userspace callchain,
and is this faults the el1_abort() call in the nested context will call
exit_to_kernel_mode() when returning, which has a
lockdep_assert_irqs_disabled() assertion:
| # ./perf record -a -g sh
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 164 at arch/arm64/kernel/entry-common.c:73 exit_to_kernel_mode+0x118/0x1ac
| Modules linked in:
| CPU: 0 PID: 164 Comm: perf Not tainted 5.18.0-rc5 #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 004003c5 (nzcv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : exit_to_kernel_mode+0x118/0x1ac
| lr : el1_abort+0x80/0xbc
| sp : ffff8000080039f0
| pmr_save: 000000f0
| x29: ffff8000080039f0 x28: ffff6831054e4980 x27: ffff683103adb400
| x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
| x23: 00000000804000c5 x22: 00000000000000c0 x21: 0000000000000001
| x20: ffffbd51e635ec44 x19: ffff800008003a60 x18: 0000000000000000
| x17: ffffaadf98d23000 x16: ffff800008004000 x15: 0000ffffd14f25c0
| x14: 0000000000000000 x13: 00000000000018eb x12: 0000000000000040
| x11: 000000000000001e x10: 000000002b820020 x9 : 0000000100110000
| x8 : 000000000045cac0 x7 : 0000ffffd14f25c0 x6 : ffffbd51e639b000
| x5 : 00000000000003e5 x4 : ffffbd51e58543b0 x3 : 0000000000000001
| x2 : ffffaadf98d23000 x1 : ffff6831054e4980 x0 : 0000000100110000
| Call trace:
| exit_to_kernel_mode+0x118/0x1ac
| el1_abort+0x80/0xbc
| el1h_64_sync_handler+0xa4/0xd0
| el1h_64_sync+0x74/0x78
| __arch_copy_from_user+0xa4/0x230
| get_perf_callchain+0x134/0x1e4
| perf_callchain+0x7c/0xa0
| perf_prepare_sample+0x414/0x660
| perf_event_output_forward+0x80/0x180
| __perf_event_overflow+0x70/0x13c
| perf_event_overflow+0x1c/0x30
| armv8pmu_handle_irq+0xe8/0x160
| armpmu_dispatch_irq+0x2c/0x70
| handle_percpu_devid_fasteoi_nmi+0x7c/0xbc
| generic_handle_domain_nmi+0x3c/0x60
| gic_handle_irq+0x1dc/0x310
| call_on_irq_stack+0x2c/0x54
| do_interrupt_handler+0x80/0x94
| el1_interrupt+0xb0/0xe4
| el1h_64_irq_handler+0x18/0x24
| el1h_64_irq+0x74/0x78
| lockdep_hardirqs_off+0x50/0x120
| trace_hardirqs_off+0x38/0x214
| _raw_spin_lock_irq+0x98/0xa0
| pipe_read+0x1f8/0x404
| new_sync_read+0x140/0x150
| vfs_read+0x190/0x1dc
| ksys_read+0xdc/0xfc
| __arm64_sys_read+0x20/0x30
| invoke_syscall+0x48/0x114
| el0_svc_common.constprop.0+0x158/0x17c
| do_el0_svc+0x28/0x90
| el0_svc+0x60/0x150
| el0t_64_sync_handler+0xa4/0x130
| el0t_64_sync+0x19c/0x1a0
| irq event stamp: 483
| hardirqs last enabled at (483): [<ffffbd51e636aa24>] _raw_spin_unlock_irqrestore+0xa4/0xb0
| hardirqs last disabled at (482): [<ffffbd51e636acd0>] _raw_spin_lock_irqsave+0xb0/0xb4
| softirqs last enabled at (468): [<ffffbd51e5216f58>] put_cpu_fpsimd_context+0x28/0x70
| softirqs last disabled at (466): [<ffffbd51e5216ed4>] get_cpu_fpsimd_context+0x0/0x5c
| ---[ end trace 0000000000000000 ]---
Note that as lockdep_assert_irqs_disabled() uses WARN_ON_ONCE(), and
this uses a BRK, the warning is logged with the real PSTATE at the time
of the warning, which clearly has DAIF.I set, meaning IRQs (and
pseudo-NMIs) were definitely masked and the warning is spurious.
Fix this by selecting TRACE_IRQFLAGS_NMI_SUPPORT such that the existing
entry tracking takes effect, as we had originally intended when the
arm64 entry code was fixed for transitions to/from NMI.
Arguably the lockdep_assert_*() functions should have the same NMI
checks as the rest of the code to prevent spurious warnings when
TRACE_IRQFLAGS_NMI_SUPPORT is not selected, but the real fix for any
architecture is to explicitly handle the transitions to/from NMI in the
entry code.
Fixes: f0cd5ac1e4c5 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220511131733.4074499-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-05-11 21:17:33 +08:00
select TRACE_IRQFLAGS_NMI_SUPPORT
2022-08-15 20:47:39 +08:00
select HAVE_SOFTIRQ_ON_OWN_STACK
2012-04-20 21:45:54 +08:00
help
ARM 64-bit (AArch64) Linux support.
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
config CLANG_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
Will and Anders reported that using just 'CC=clang' with CONFIG_FTRACE=y
and CONFIG_STACK_TRACER=y would result in an error while linking:
aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
aarch64-linux-gnu-ld: final link failed: bad value
This error was exposed by commit f12b034afeb3 ("scripts/Makefile.clang:
default to LLVM_IAS=1") in combination with binutils older than 2.36.
When '-fpatchable-function-entry' was implemented in LLVM, two code
paths were added for adding the section attributes, one for the
integrated assembler and another for GNU as, due to binutils
deficiencies at the time. If the integrated assembler was used,
attributes that GNU ld < 2.36 could not handle were added, presumably
with the assumption that use of the integrated assembler meant the whole
LLVM stack was being used, namely ld.lld.
Prior to the kernel change previously mentioned, that assumption was
valid, as there were three commonly used combinations of tools for
compiling, assembling, and linking respectively:
$ make CC=clang (clang, GNU as, GNU ld)
$ make LLVM=1 (clang, GNU as, ld.lld)
$ make LLVM=1 LLVM_IAS=1 (clang, integrated assembler, ld.lld)
After the default switch of the integrated assembler, the second and
third commands become equivalent and the first command means "clang,
integrated assembler, and GNU ld", which was not a combination that was
considered when the aforementioned LLVM change was implemented.
It is not possible to go back and fix LLVM, as this change was
implemented in the 10.x series, which is no longer supported. To
workaround this on the kernel side, split out the selection of
HAVE_DYNAMIC_FTRACE_WITH_REGS to two separate configurations, one for
GCC and one for clang.
The GCC config inherits the '-fpatchable-function-entry' check. The
Clang config does not it, as '-fpatchable-function-entry' is always
available for LLVM 11.0.0 and newer, which is the supported range of
versions for the kernel.
The Clang config makes sure that the user is using GNU as or the
integrated assembler with ld.lld or GNU ld 2.36 or newer, which will
avoid the error above.
Link: https://github.com/ClangBuiltLinux/linux/issues/1507
Link: https://github.com/ClangBuiltLinux/linux/issues/788
Link: https://lore.kernel.org/YlCA5PoIjF6nhwYj@dev-arch.thelio-3990X/
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=26256
Link: https://github.com/llvm/llvm-project/commit/7fa5290d5bd5632d7a36a4ea9f46e81e04fb819e
Link: https://github.com/llvm/llvm-project/commit/853a2649160c1c80b9bbd38a20b53ca8fab704e8
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Will Deacon <will@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220413181420.3522187-1-nathan@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-14 02:14:21 +08:00
def_bool CC_IS_CLANG
# https://github.com/ClangBuiltLinux/linux/issues/1507
depends on AS_IS_GNU || (AS_IS_LLVM && (LD_IS_LLD || LD_VERSION >= 23600))
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
Will and Anders reported that using just 'CC=clang' with CONFIG_FTRACE=y
and CONFIG_STACK_TRACER=y would result in an error while linking:
aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
aarch64-linux-gnu-ld: final link failed: bad value
This error was exposed by commit f12b034afeb3 ("scripts/Makefile.clang:
default to LLVM_IAS=1") in combination with binutils older than 2.36.
When '-fpatchable-function-entry' was implemented in LLVM, two code
paths were added for adding the section attributes, one for the
integrated assembler and another for GNU as, due to binutils
deficiencies at the time. If the integrated assembler was used,
attributes that GNU ld < 2.36 could not handle were added, presumably
with the assumption that use of the integrated assembler meant the whole
LLVM stack was being used, namely ld.lld.
Prior to the kernel change previously mentioned, that assumption was
valid, as there were three commonly used combinations of tools for
compiling, assembling, and linking respectively:
$ make CC=clang (clang, GNU as, GNU ld)
$ make LLVM=1 (clang, GNU as, ld.lld)
$ make LLVM=1 LLVM_IAS=1 (clang, integrated assembler, ld.lld)
After the default switch of the integrated assembler, the second and
third commands become equivalent and the first command means "clang,
integrated assembler, and GNU ld", which was not a combination that was
considered when the aforementioned LLVM change was implemented.
It is not possible to go back and fix LLVM, as this change was
implemented in the 10.x series, which is no longer supported. To
workaround this on the kernel side, split out the selection of
HAVE_DYNAMIC_FTRACE_WITH_REGS to two separate configurations, one for
GCC and one for clang.
The GCC config inherits the '-fpatchable-function-entry' check. The
Clang config does not it, as '-fpatchable-function-entry' is always
available for LLVM 11.0.0 and newer, which is the supported range of
versions for the kernel.
The Clang config makes sure that the user is using GNU as or the
integrated assembler with ld.lld or GNU ld 2.36 or newer, which will
avoid the error above.
Link: https://github.com/ClangBuiltLinux/linux/issues/1507
Link: https://github.com/ClangBuiltLinux/linux/issues/788
Link: https://lore.kernel.org/YlCA5PoIjF6nhwYj@dev-arch.thelio-3990X/
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=26256
Link: https://github.com/llvm/llvm-project/commit/7fa5290d5bd5632d7a36a4ea9f46e81e04fb819e
Link: https://github.com/llvm/llvm-project/commit/853a2649160c1c80b9bbd38a20b53ca8fab704e8
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Will Deacon <will@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220413181420.3522187-1-nathan@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-14 02:14:21 +08:00
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
config GCC_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
Will and Anders reported that using just 'CC=clang' with CONFIG_FTRACE=y
and CONFIG_STACK_TRACER=y would result in an error while linking:
aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
aarch64-linux-gnu-ld: final link failed: bad value
This error was exposed by commit f12b034afeb3 ("scripts/Makefile.clang:
default to LLVM_IAS=1") in combination with binutils older than 2.36.
When '-fpatchable-function-entry' was implemented in LLVM, two code
paths were added for adding the section attributes, one for the
integrated assembler and another for GNU as, due to binutils
deficiencies at the time. If the integrated assembler was used,
attributes that GNU ld < 2.36 could not handle were added, presumably
with the assumption that use of the integrated assembler meant the whole
LLVM stack was being used, namely ld.lld.
Prior to the kernel change previously mentioned, that assumption was
valid, as there were three commonly used combinations of tools for
compiling, assembling, and linking respectively:
$ make CC=clang (clang, GNU as, GNU ld)
$ make LLVM=1 (clang, GNU as, ld.lld)
$ make LLVM=1 LLVM_IAS=1 (clang, integrated assembler, ld.lld)
After the default switch of the integrated assembler, the second and
third commands become equivalent and the first command means "clang,
integrated assembler, and GNU ld", which was not a combination that was
considered when the aforementioned LLVM change was implemented.
It is not possible to go back and fix LLVM, as this change was
implemented in the 10.x series, which is no longer supported. To
workaround this on the kernel side, split out the selection of
HAVE_DYNAMIC_FTRACE_WITH_REGS to two separate configurations, one for
GCC and one for clang.
The GCC config inherits the '-fpatchable-function-entry' check. The
Clang config does not it, as '-fpatchable-function-entry' is always
available for LLVM 11.0.0 and newer, which is the supported range of
versions for the kernel.
The Clang config makes sure that the user is using GNU as or the
integrated assembler with ld.lld or GNU ld 2.36 or newer, which will
avoid the error above.
Link: https://github.com/ClangBuiltLinux/linux/issues/1507
Link: https://github.com/ClangBuiltLinux/linux/issues/788
Link: https://lore.kernel.org/YlCA5PoIjF6nhwYj@dev-arch.thelio-3990X/
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=26256
Link: https://github.com/llvm/llvm-project/commit/7fa5290d5bd5632d7a36a4ea9f46e81e04fb819e
Link: https://github.com/llvm/llvm-project/commit/853a2649160c1c80b9bbd38a20b53ca8fab704e8
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Will Deacon <will@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220413181420.3522187-1-nathan@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-14 02:14:21 +08:00
def_bool CC_IS_GCC
depends on $(cc-option,-fpatchable-function-entry=2)
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
Will and Anders reported that using just 'CC=clang' with CONFIG_FTRACE=y
and CONFIG_STACK_TRACER=y would result in an error while linking:
aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
aarch64-linux-gnu-ld: final link failed: bad value
This error was exposed by commit f12b034afeb3 ("scripts/Makefile.clang:
default to LLVM_IAS=1") in combination with binutils older than 2.36.
When '-fpatchable-function-entry' was implemented in LLVM, two code
paths were added for adding the section attributes, one for the
integrated assembler and another for GNU as, due to binutils
deficiencies at the time. If the integrated assembler was used,
attributes that GNU ld < 2.36 could not handle were added, presumably
with the assumption that use of the integrated assembler meant the whole
LLVM stack was being used, namely ld.lld.
Prior to the kernel change previously mentioned, that assumption was
valid, as there were three commonly used combinations of tools for
compiling, assembling, and linking respectively:
$ make CC=clang (clang, GNU as, GNU ld)
$ make LLVM=1 (clang, GNU as, ld.lld)
$ make LLVM=1 LLVM_IAS=1 (clang, integrated assembler, ld.lld)
After the default switch of the integrated assembler, the second and
third commands become equivalent and the first command means "clang,
integrated assembler, and GNU ld", which was not a combination that was
considered when the aforementioned LLVM change was implemented.
It is not possible to go back and fix LLVM, as this change was
implemented in the 10.x series, which is no longer supported. To
workaround this on the kernel side, split out the selection of
HAVE_DYNAMIC_FTRACE_WITH_REGS to two separate configurations, one for
GCC and one for clang.
The GCC config inherits the '-fpatchable-function-entry' check. The
Clang config does not it, as '-fpatchable-function-entry' is always
available for LLVM 11.0.0 and newer, which is the supported range of
versions for the kernel.
The Clang config makes sure that the user is using GNU as or the
integrated assembler with ld.lld or GNU ld 2.36 or newer, which will
avoid the error above.
Link: https://github.com/ClangBuiltLinux/linux/issues/1507
Link: https://github.com/ClangBuiltLinux/linux/issues/788
Link: https://lore.kernel.org/YlCA5PoIjF6nhwYj@dev-arch.thelio-3990X/
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=26256
Link: https://github.com/llvm/llvm-project/commit/7fa5290d5bd5632d7a36a4ea9f46e81e04fb819e
Link: https://github.com/llvm/llvm-project/commit/853a2649160c1c80b9bbd38a20b53ca8fab704e8
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Will Deacon <will@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220413181420.3522187-1-nathan@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-14 02:14:21 +08:00
2012-04-20 21:45:54 +08:00
config 64BIT
def_bool y
config MMU
def_bool y
2016-05-31 22:57:59 +08:00
config ARM64_PAGE_SHIFT
int
default 16 if ARM64_64K_PAGES
default 14 if ARM64_16K_PAGES
default 12
2020-09-10 17:59:35 +08:00
config ARM64_CONT_PTE_SHIFT
2016-05-31 22:57:59 +08:00
int
default 5 if ARM64_64K_PAGES
default 7 if ARM64_16K_PAGES
default 4
2020-09-10 17:59:36 +08:00
config ARM64_CONT_PMD_SHIFT
int
default 5 if ARM64_64K_PAGES
default 5 if ARM64_16K_PAGES
default 4
2016-01-15 07:20:01 +08:00
config ARCH_MMAP_RND_BITS_MIN
2022-05-17 22:16:47 +08:00
default 14 if ARM64_64K_PAGES
default 16 if ARM64_16K_PAGES
default 18
2016-01-15 07:20:01 +08:00
# max bits determined by the following formula:
# VA_BITS - PAGE_SHIFT - 3
config ARCH_MMAP_RND_BITS_MAX
2022-05-17 22:16:47 +08:00
default 19 if ARM64_VA_BITS=36
default 24 if ARM64_VA_BITS=39
default 27 if ARM64_VA_BITS=42
default 30 if ARM64_VA_BITS=47
default 29 if ARM64_VA_BITS=48 && ARM64_64K_PAGES
default 31 if ARM64_VA_BITS=48 && ARM64_16K_PAGES
default 33 if ARM64_VA_BITS=48
default 14 if ARM64_64K_PAGES
default 16 if ARM64_16K_PAGES
default 18
2016-01-15 07:20:01 +08:00
config ARCH_MMAP_RND_COMPAT_BITS_MIN
2022-05-17 22:16:47 +08:00
default 7 if ARM64_64K_PAGES
default 9 if ARM64_16K_PAGES
default 11
2016-01-15 07:20:01 +08:00
config ARCH_MMAP_RND_COMPAT_BITS_MAX
2022-05-17 22:16:47 +08:00
default 16
2016-01-15 07:20:01 +08:00
2014-04-08 06:39:19 +08:00
config NO_IOPORT_MAP
2014-09-29 22:29:31 +08:00
def_bool y if !PCI
2012-04-20 21:45:54 +08:00
config STACKTRACE_SUPPORT
def_bool y
2015-08-19 03:50:10 +08:00
config ILLEGAL_POINTER_VALUE
hex
default 0xdead000000000000
2012-04-20 21:45:54 +08:00
config LOCKDEP_SUPPORT
def_bool y
2015-07-24 23:37:48 +08:00
config GENERIC_BUG
def_bool y
depends on BUG
config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
depends on GENERIC_BUG
2012-04-20 21:45:54 +08:00
config GENERIC_HWEIGHT
def_bool y
config GENERIC_CSUM
2022-05-17 22:16:47 +08:00
def_bool y
2012-04-20 21:45:54 +08:00
config GENERIC_CALIBRATE_DELAY
def_bool y
2021-05-05 09:39:54 +08:00
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
2015-05-30 01:28:44 +08:00
config SMP
def_bool y
2013-07-09 21:18:12 +08:00
config KERNEL_MODE_NEON
def_bool y
2014-04-19 06:19:59 +08:00
config FIX_EARLYCON_MEM
def_bool y
2015-04-15 06:45:39 +08:00
config PGTABLE_LEVELS
int
2015-10-19 21:19:38 +08:00
default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
2015-04-15 06:45:39 +08:00
default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
2019-08-07 23:55:22 +08:00
default 3 if ARM64_64K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
2015-04-15 06:45:39 +08:00
default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
2015-10-19 21:19:37 +08:00
default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48
2015-04-15 06:45:39 +08:00
2016-11-02 17:10:46 +08:00
config ARCH_SUPPORTS_UPROBES
def_bool y
2017-06-14 18:43:55 +08:00
config ARCH_PROC_KCORE_TEXT
def_bool y
2020-01-15 22:18:25 +08:00
config BROKEN_GAS_INST
def_bool !$(as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n)
2019-08-07 23:55:15 +08:00
config KASAN_SHADOW_OFFSET
hex
2020-12-23 04:02:06 +08:00
depends on KASAN_GENERIC || KASAN_SW_TAGS
arm64: mm: extend linear region for 52-bit VA configurations
For historical reasons, the arm64 kernel VA space is configured as two
equally sized halves, i.e., on a 48-bit VA build, the VA space is split
into a 47-bit vmalloc region and a 47-bit linear region.
When support for 52-bit virtual addressing was added, this equal split
was kept, resulting in a substantial waste of virtual address space in
the linear region:
48-bit VA 52-bit VA
0xffff_ffff_ffff_ffff +-------------+ +-------------+
| vmalloc | | vmalloc |
0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+
| linear | : :
0xffff_0000_0000_0000 +-------------+ : :
: : : :
: : : :
: : : :
: : : currently :
: unusable : : :
: : : unused :
: by : : :
: : : :
: hardware : : :
: : : :
0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+
: : | |
: : | |
: : | |
: : | |
: : | |
: unusable : | |
: : | linear |
: by : | |
: : | region |
: hardware : | |
: : | |
: : | |
: : | |
: : | |
: : | |
: : | |
0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+
As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc
space (as before), to ensure that a single 64k granule kernel image can
support any 64k granule capable system, regardless of whether it supports
the 52-bit virtual addressing extension. However, due to the fact that
the VA space is still split in equal halves, the linear region is only
2^51 bytes in size, wasting almost half of the 52-bit VA space.
Let's fix this, by abandoning the equal split, and simply assigning all
VA space outside of the vmalloc region to the linear region.
The KASAN shadow region is reconfigured so that it ends at the start of
the vmalloc region, and grows downwards. That way, the arrangement of
the vmalloc space (which contains kernel mappings, modules, BPF region,
the vmemmap array etc) is identical between non-KASAN and KASAN builds,
which aids debugging.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-10-08 23:36:00 +08:00
default 0xdfff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && !KASAN_SW_TAGS
default 0xdfffc00000000000 if ARM64_VA_BITS_47 && !KASAN_SW_TAGS
default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
default 0xefff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && KASAN_SW_TAGS
default 0xefffc00000000000 if ARM64_VA_BITS_47 && KASAN_SW_TAGS
default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
2019-08-07 23:55:15 +08:00
default 0xffffffffffffffff
2022-10-27 23:59:06 +08:00
config UNWIND_TABLES
bool
2015-07-21 03:09:16 +08:00
source "arch/arm64/Kconfig.platforms"
2012-04-20 21:45:54 +08:00
menu "Kernel Features"
2014-11-14 23:54:12 +08:00
menu "ARM errata workarounds via the alternatives framework"
2018-12-01 01:18:00 +08:00
config ARM64_WORKAROUND_CLEAN_CACHE
2019-04-29 21:21:11 +08:00
bool
2018-12-01 01:18:00 +08:00
2014-11-14 23:54:12 +08:00
config ARM64_ERRATUM_826319
bool "Cortex-A53: 826319: System might deadlock if a write cannot complete until read data is accepted"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 826319 on Cortex-A53 parts up to r0p2 with an AMBA 4 ACE or
AXI master interface and an L2 cache.
If a Cortex-A53 uses an AMBA AXI4 ACE interface to other processors
and is unable to accept a certain write via this interface, it will
not progress on read data presented on the read data channel and the
system can deadlock.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_827319
bool "Cortex-A53: 827319: Data cache clean instructions might cause overlapping transactions to the interconnect"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 827319 on Cortex-A53 parts up to r0p2 with an AMBA 5 CHI
master interface and an L2 cache.
Under certain conditions this erratum can cause a clean line eviction
to occur at the same time as another transaction to the same address
on the AMBA 5 CHI interface, which can cause data corruption if the
interconnect reorders the two transactions.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_824069
bool "Cortex-A53: 824069: Cache line might not be marked as clean after a CleanShared snoop"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 824069 on Cortex-A53 parts up to r0p2 when it is connected
to a coherent interconnect.
If a Cortex-A53 processor is executing a store or prefetch for
write instruction at the same time as a processor in another
cluster is executing a cache maintenance operation to the same
address, then this erratum might cause a clean cache line to be
incorrectly marked as dirty.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this option does not necessarily enable the
workaround, as it depends on the alternative framework, which will
only patch the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_819472
bool "Cortex-A53: 819472: Store exclusive instructions might cause data corruption"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 819472 on Cortex-A53 parts up to r0p1 with an L2 cache
present when it is connected to a coherent interconnect.
If the processor is executing a load and store exclusive sequence at
the same time as a processor in another cluster is executing a cache
maintenance operation to the same address, then this erratum might
cause data corruption.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_832075
bool "Cortex-A57: 832075: possible deadlock on mixing exclusive memory accesses with device loads"
default y
help
This option adds an alternative code sequence to work around ARM
erratum 832075 on Cortex-A57 parts up to r1p2.
Affected Cortex-A57 parts might deadlock when exclusive load/store
instructions to Write-Back memory are mixed with Device loads.
The workaround is to promote device loads to use Load-Acquire
semantics.
Please note that this does not necessarily enable the workaround,
2015-11-16 18:28:18 +08:00
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_834220
bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault"
depends on KVM
default y
help
This option adds an alternative code sequence to work around ARM
erratum 834220 on Cortex-A57 parts up to r1p2.
Affected Cortex-A57 parts might report a Stage 2 translation
fault as the result of a Stage 1 fault for load crossing a
page boundary when there is a permission or device memory
alignment fault at Stage 1 and a translation fault at Stage 2.
The workaround is to verify that the Stage 1 translation
doesn't generate a fault before handling the Stage 2 fault.
Please note that this does not necessarily enable the workaround,
2014-11-14 23:54:12 +08:00
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
2022-07-15 00:15:23 +08:00
config ARM64_ERRATUM_1742098
bool "Cortex-A57/A72: 1742098: ELR recorded incorrectly on interrupt taken between cryptographic instructions in a sequence"
depends on COMPAT
default y
help
This option removes the AES hwcap for aarch32 user-space to
workaround erratum 1742098 on Cortex-A57 and Cortex-A72.
Affected parts may corrupt the AES state if an interrupt is
taken between a pair of AES instructions. These instructions
are only present if the cryptography extensions are present.
All software should have a fallback implementation for CPUs
that don't implement the cryptography extensions.
If unsure, say Y.
2015-03-24 03:07:02 +08:00
config ARM64_ERRATUM_845719
bool "Cortex-A53: 845719: a load might read incorrect data"
depends on COMPAT
default y
help
This option adds an alternative code sequence to work around ARM
erratum 845719 on Cortex-A53 parts up to r0p4.
When running a compat (AArch32) userspace on an affected Cortex-A53
part, a load at EL0 from a virtual address that matches the bottom 32
bits of the virtual address used by a recent load at (AArch64) EL1
might return incorrect data.
The workaround is to write the contextidr_el1 register on exception
return to a 32-bit task.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
2015-03-17 20:15:02 +08:00
config ARM64_ERRATUM_843419
bool "Cortex-A53: 843419: A load or store might access an incorrect address"
default y
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:33 +08:00
select ARM64_MODULE_PLTS if MODULES
2015-03-17 20:15:02 +08:00
help
2016-08-22 18:58:36 +08:00
This option links the kernel with '--fix-cortex-a53-843419' and
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:33 +08:00
enables PLT support to replace certain ADRP instructions, which can
cause subsequent memory accesses to use an incorrect address on
Cortex-A53 parts up to r0p4.
2015-03-17 20:15:02 +08:00
If unsure, say Y.
2021-03-24 15:11:28 +08:00
config ARM64_LD_HAS_FIX_ERRATUM_843419
def_bool $(ld-option,--fix-cortex-a53-843419)
2018-03-26 22:12:49 +08:00
config ARM64_ERRATUM_1024718
bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update"
default y
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A55 Erratum 1024718.
2018-03-26 22:12:49 +08:00
2021-02-04 07:00:57 +08:00
Affected Cortex-A55 cores (all revisions) could cause incorrect
2018-03-26 22:12:49 +08:00
update of the hardware dirty bit when the DBM/AP bits are updated
2019-04-29 21:21:11 +08:00
without a break-before-make. The workaround is to disable the usage
2018-03-26 22:12:49 +08:00
of hardware DBM locally on the affected cores. CPUs not affected by
2019-04-29 21:21:11 +08:00
this erratum will continue to use the feature.
2015-03-17 20:15:02 +08:00
If unsure, say Y.
2019-05-23 18:24:50 +08:00
config ARM64_ERRATUM_1418040
2019-04-15 20:03:54 +08:00
bool "Cortex-A76/Neoverse-N1: MRC read following MRRC read of specific Generic Timer in AArch32 might give incorrect result"
2018-09-28 00:15:34 +08:00
default y
2019-04-15 20:03:52 +08:00
depends on COMPAT
2018-09-28 00:15:34 +08:00
help
2019-05-01 22:45:36 +08:00
This option adds a workaround for ARM Cortex-A76/Neoverse-N1
2019-05-23 18:24:50 +08:00
errata 1188873 and 1418040.
2018-09-28 00:15:34 +08:00
2019-05-23 18:24:50 +08:00
Affected Cortex-A76/Neoverse-N1 cores (r0p0 to r3p1) could
2019-04-15 20:03:54 +08:00
cause register corruption when accessing the timer registers
from AArch32 userspace.
2018-09-28 00:15:34 +08:00
If unsure, say Y.
2020-05-04 17:48:58 +08:00
config ARM64_WORKAROUND_SPECULATIVE_AT
2019-12-16 19:56:29 +08:00
bool
2018-12-07 01:31:26 +08:00
config ARM64_ERRATUM_1165522
2020-05-04 17:48:58 +08:00
bool "Cortex-A76: 1165522: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
2018-12-07 01:31:26 +08:00
default y
2020-05-04 17:48:58 +08:00
select ARM64_WORKAROUND_SPECULATIVE_AT
2018-12-07 01:31:26 +08:00
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A76 erratum 1165522.
2018-12-07 01:31:26 +08:00
Affected Cortex-A76 cores (r0p0, r1p0, r2p0) could end-up with
corrupted TLBs by speculating an AT instruction during a guest
context switch.
If unsure, say Y.
2020-05-04 17:48:58 +08:00
config ARM64_ERRATUM_1319367
bool "Cortex-A57/A72: 1319537: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
default y
select ARM64_WORKAROUND_SPECULATIVE_AT
help
This option adds work arounds for ARM Cortex-A57 erratum 1319537
and A72 erratum 1319367
Cortex-A57 and A72 cores could end-up with corrupted TLBs by
speculating an AT instruction during a guest context switch.
If unsure, say Y.
2019-12-16 19:56:31 +08:00
config ARM64_ERRATUM_1530923
2020-05-04 17:48:58 +08:00
bool "Cortex-A55: 1530923: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
2019-12-16 19:56:31 +08:00
default y
2020-05-04 17:48:58 +08:00
select ARM64_WORKAROUND_SPECULATIVE_AT
2019-12-16 19:56:31 +08:00
help
This option adds a workaround for ARM Cortex-A55 erratum 1530923.
Affected Cortex-A55 cores (r0p0, r0p1, r1p0, r2p0) could end-up with
corrupted TLBs by speculating an AT instruction during a guest
context switch.
If unsure, say Y.
2018-12-07 01:31:26 +08:00
2020-04-16 19:56:57 +08:00
config ARM64_WORKAROUND_REPEAT_TLBI
bool
2022-09-30 21:19:59 +08:00
config ARM64_ERRATUM_2441007
bool "Cortex-A55: Completion of affected memory accesses might not be guaranteed by completion of a TLBI"
default y
select ARM64_WORKAROUND_REPEAT_TLBI
help
This option adds a workaround for ARM Cortex-A55 erratum #2441007.
Under very rare circumstances, affected Cortex-A55 CPUs
may not handle a race between a break-before-make sequence on one
CPU, and another CPU accessing the same page. This could allow a
store to a page that has been unmapped.
Work around this by adding the affected CPUs to the list that needs
TLB sequences to be done twice.
If unsure, say Y.
2018-11-19 19:27:28 +08:00
config ARM64_ERRATUM_1286807
bool "Cortex-A76: Modification of the translation table for a virtual address might lead to read-after-read ordering violation"
default y
select ARM64_WORKAROUND_REPEAT_TLBI
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A76 erratum 1286807.
2018-11-19 19:27:28 +08:00
On the affected Cortex-A76 cores (r0p0 to r3p0), if a virtual
address for a cacheable mapping of a location is being
accessed by a core while another core is remapping the virtual
address to a new physical page using the recommended
break-before-make sequence, then under very rare circumstances
TLBI+DSB completes before a read using the translation being
invalidated has been observed by other observers. The
workaround repeats the TLBI+DSB operation.
2019-04-29 20:03:57 +08:00
config ARM64_ERRATUM_1463225
bool "Cortex-A76: Software Step might prevent interrupt recognition"
default y
help
This option adds a workaround for Arm Cortex-A76 erratum 1463225.
On the affected Cortex-A76 cores (r0p0 to r3p1), software stepping
of a system call instruction (SVC) can prevent recognition of
subsequent interrupts when software stepping is disabled in the
exception handler of the system call and either kernel debugging
is enabled or VHE is in use.
Work around the erratum by triggering a dummy step exception
when handling a system call from a task that is being stepped
in a VHE configuration of the kernel.
If unsure, say Y.
2019-10-18 01:42:58 +08:00
config ARM64_ERRATUM_1542419
bool "Neoverse-N1: workaround mis-ordering of instruction fetches"
default y
help
This option adds a workaround for ARM Neoverse-N1 erratum
1542419.
Affected Neoverse-N1 cores could execute a stale instruction when
modified by another CPU. The workaround depends on a firmware
counterpart.
Workaround the issue by hiding the DIC feature from EL0. This
forces user-space to perform cache maintenance.
If unsure, say Y.
2020-10-29 02:28:39 +08:00
config ARM64_ERRATUM_1508412
bool "Cortex-A77: 1508412: workaround deadlock on sequence of NC/Device load and store exclusive or PAR read"
default y
help
This option adds a workaround for Arm Cortex-A77 erratum 1508412.
Affected Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence
of a store-exclusive or read of PAR_EL1 and a load with device or
non-cacheable memory attributes. The workaround depends on a firmware
counterpart.
KVM guests must also have the workaround implemented or they can
deadlock the system.
Work around the issue by inserting DMB SY barriers around PAR_EL1
register reads and warning KVM users. The DMB barrier is sufficient
to prevent a speculative PAR_EL1 read.
If unsure, say Y.
2021-10-20 00:31:40 +08:00
config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
bool
2022-01-25 23:40:40 +08:00
config ARM64_ERRATUM_2051678
bool "Cortex-A510: 2051678: disable Hardware Update of the page table dirty bit"
2022-02-01 22:48:38 +08:00
default y
2022-01-25 23:40:40 +08:00
help
This options adds the workaround for ARM Cortex-A510 erratum ARM64_ERRATUM_2051678.
2022-04-14 10:37:18 +08:00
Affected Cortex-A510 might not respect the ordering rules for
2022-01-25 23:40:40 +08:00
hardware update of the page table's dirty bit. The workaround
is to not enable the feature on affected CPUs.
If unsure, say Y.
2022-01-27 20:20:52 +08:00
config ARM64_ERRATUM_2077057
bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2"
2022-02-26 02:46:58 +08:00
default y
2022-01-27 20:20:52 +08:00
help
This option adds the workaround for ARM Cortex-A510 erratum 2077057.
Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is
expected, but a Pointer Authentication trap is taken instead. The
erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
This can only happen when EL2 is stepping EL1.
When these conditions occur, the SPSR_EL2 value is unchanged from the
previous guest entry, and can be restored from the in-memory copy.
If unsure, say Y.
2022-09-10 00:59:38 +08:00
config ARM64_ERRATUM_2658417
bool "Cortex-A510: 2658417: remove BF16 support due to incorrect result"
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2658417.
Affected Cortex-A510 (r0p0 to r1p1) may produce the wrong result for
BFMMLA or VMMLA instructions in rare circumstances when a pair of
A510 CPUs are using shared neon hardware. As the sharing is not
discoverable by the kernel, hide the BF16 HWCAP to indicate that
user-space should not be using these instructions.
If unsure, say Y.
2021-10-20 00:31:40 +08:00
config ARM64_ERRATUM_2119858
2022-01-24 11:15:38 +08:00
bool "Cortex-A710/X2: 2119858: workaround TRBE overwriting trace data in FILL mode"
2021-10-20 00:31:40 +08:00
default y
depends on CORESIGHT_TRBE
select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
help
2022-01-24 11:15:38 +08:00
This option adds the workaround for ARM Cortex-A710/X2 erratum 2119858.
2021-10-20 00:31:40 +08:00
2022-01-24 11:15:38 +08:00
Affected Cortex-A710/X2 cores could overwrite up to 3 cache lines of trace
2021-10-20 00:31:40 +08:00
data at the base of the buffer (pointed to by TRBASER_EL1) in FILL mode in
the event of a WRAP event.
Work around the issue by always making sure we move the TRBPTR_EL1 by
256 bytes before enabling the buffer and filling the first 256 bytes of
the buffer with ETM ignore packets upon disabling.
If unsure, say Y.
config ARM64_ERRATUM_2139208
bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
default y
depends on CORESIGHT_TRBE
select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
help
This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
Affected Neoverse-N2 cores could overwrite up to 3 cache lines of trace
data at the base of the buffer (pointed to by TRBASER_EL1) in FILL mode in
the event of a WRAP event.
Work around the issue by always making sure we move the TRBPTR_EL1 by
256 bytes before enabling the buffer and filling the first 256 bytes of
the buffer with ETM ignore packets upon disabling.
If unsure, say Y.
2021-10-20 00:31:41 +08:00
config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
bool
config ARM64_ERRATUM_2054223
bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
default y
select ARM64_WORKAROUND_TSB_FLUSH_FAILURE
help
Enable workaround for ARM Cortex-A710 erratum 2054223
Affected cores may fail to flush the trace data on a TSB instruction, when
the PE is in trace prohibited state. This will cause losing a few bytes
of the trace cached.
Workaround is to issue two TSB consecutively on affected cores.
If unsure, say Y.
config ARM64_ERRATUM_2067961
bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
default y
select ARM64_WORKAROUND_TSB_FLUSH_FAILURE
help
Enable workaround for ARM Neoverse-N2 erratum 2067961
Affected cores may fail to flush the trace data on a TSB instruction, when
the PE is in trace prohibited state. This will cause losing a few bytes
of the trace cached.
Workaround is to issue two TSB consecutively on affected cores.
If unsure, say Y.
2021-10-20 00:31:42 +08:00
config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
bool
config ARM64_ERRATUM_2253138
bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
depends on CORESIGHT_TRBE
default y
select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
help
This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
for TRBE. Under some conditions, the TRBE might generate a write to the next
virtually addressed page following the last page of the TRBE address space
(i.e., the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
Work around this in the driver by always making sure that there is a
page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
If unsure, say Y.
config ARM64_ERRATUM_2224489
2022-01-24 11:15:38 +08:00
bool "Cortex-A710/X2: 2224489: workaround TRBE writing to address out-of-range"
2021-10-20 00:31:42 +08:00
depends on CORESIGHT_TRBE
default y
select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
help
2022-01-24 11:15:38 +08:00
This option adds the workaround for ARM Cortex-A710/X2 erratum 2224489.
2021-10-20 00:31:42 +08:00
2022-01-24 11:15:38 +08:00
Affected Cortex-A710/X2 cores might write to an out-of-range address, not reserved
2021-10-20 00:31:42 +08:00
for TRBE. Under some conditions, the TRBE might generate a write to the next
virtually addressed page following the last page of the TRBE address space
(i.e., the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
Work around this in the driver by always making sure that there is a
page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
If unsure, say Y.
2022-07-04 23:57:32 +08:00
config ARM64_ERRATUM_2441009
bool "Cortex-A510: Completion of affected memory accesses might not be guaranteed by completion of a TLBI"
default y
select ARM64_WORKAROUND_REPEAT_TLBI
help
This option adds a workaround for ARM Cortex-A510 erratum #2441009.
Under very rare circumstances, affected Cortex-A510 CPUs
may not handle a race between a break-before-make sequence on one
CPU, and another CPU accessing the same page. This could allow a
store to a page that has been unmapped.
Work around this by adding the affected CPUs to the list that needs
TLB sequences to be done twice.
If unsure, say Y.
2022-01-25 22:20:32 +08:00
config ARM64_ERRATUM_2064142
bool "Cortex-A510: 2064142: workaround TRBE register writes while disabled"
2022-01-25 22:20:35 +08:00
depends on CORESIGHT_TRBE
2022-01-25 22:20:32 +08:00
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2064142.
Affected Cortex-A510 core might fail to write into system registers after the
TRBE has been disabled. Under some conditions after the TRBE has been disabled
writes into TRBE registers TRBLIMITR_EL1, TRBPTR_EL1, TRBBASER_EL1, TRBSR_EL1,
and TRBTRG_EL1 will be ignored and will not be effected.
Work around this in the driver by executing TSB CSYNC and DSB after collection
is stopped and before performing a system register write to one of the affected
registers.
If unsure, say Y.
2022-01-25 22:20:33 +08:00
config ARM64_ERRATUM_2038923
bool "Cortex-A510: 2038923: workaround TRBE corruption with enable"
2022-01-25 22:20:36 +08:00
depends on CORESIGHT_TRBE
2022-01-25 22:20:33 +08:00
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2038923.
Affected Cortex-A510 core might cause an inconsistent view on whether trace is
prohibited within the CPU. As a result, the trace buffer or trace buffer state
might be corrupted. This happens after TRBE buffer has been enabled by setting
TRBLIMITR_EL1.E, followed by just a single context synchronization event before
execution changes from a context, in which trace is prohibited to one where it
isn't, or vice versa. In these mentioned conditions, the view of whether trace
is prohibited is inconsistent between parts of the CPU, and the trace buffer or
the trace buffer state might be corrupted.
Work around this in the driver by preventing an inconsistent view of whether the
trace is prohibited or not based on TRBLIMITR_EL1.E by immediately following a
change to TRBLIMITR_EL1.E with at least one ISB instruction before an ERET, or
two ISB instructions if no ERET is to take place.
If unsure, say Y.
2022-01-25 22:20:34 +08:00
config ARM64_ERRATUM_1902691
bool "Cortex-A510: 1902691: workaround TRBE trace corruption"
2022-01-25 22:20:37 +08:00
depends on CORESIGHT_TRBE
2022-01-25 22:20:34 +08:00
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 1902691.
Affected Cortex-A510 core might cause trace data corruption, when being written
into the memory. Effectively TRBE is broken and hence cannot be used to capture
trace data.
Work around this problem in the driver by just preventing TRBE initialization on
affected cpus. The firmware must have disabled the access to TRBE for the kernel
on such implementations. This will cover the kernel for any firmware that doesn't
do this already.
If unsure, say Y.
2022-08-19 18:30:50 +08:00
config ARM64_ERRATUM_2457168
bool "Cortex-A510: 2457168: workaround for AMEVCNTR01 incrementing incorrectly"
depends on ARM64_AMU_EXTN
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2457168.
The AMU counter AMEVCNTR01 (constant counter) should increment at the same rate
as the system counter. On affected Cortex-A510 cores AMEVCNTR01 increments
incorrectly giving a significantly higher output value.
Work around this problem by returning 0 when reading the affected counter in
key locations that results in disabling all users of this counter. This effect
is the same to firmware disabling affected counters.
If unsure, say Y.
2023-01-02 14:16:51 +08:00
config ARM64_ERRATUM_2645198
bool "Cortex-A715: 2645198: Workaround possible [ESR|FAR]_ELx corruption"
default y
help
This option adds the workaround for ARM Cortex-A715 erratum 2645198.
If a Cortex-A715 cpu sees a page mapping permissions change from executable
to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers on the
next instruction abort caused by permission fault.
Only user-space does executable to non-executable permission transition via
mprotect() system call. Workaround the problem by doing a break-before-make
TLB invalidation, for all changes to executable user space mappings.
If unsure, say Y.
2015-09-22 04:58:38 +08:00
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
help
2019-04-29 21:21:11 +08:00
Enable workaround for errata 22375 and 24313.
2015-09-22 04:58:38 +08:00
This implements two gicv3-its errata workarounds for ThunderX. Both
2019-04-29 21:21:11 +08:00
with a small impact affecting only ITS table allocation.
2015-09-22 04:58:38 +08:00
erratum 22375: only alloc 8MB table size
erratum 24313: ignore memory access type
The fixes are in ITS initialization and basically ignore memory access
type and table size provided by the TYPER and BASER registers.
If unsure, say Y.
2016-05-25 21:29:20 +08:00
config CAVIUM_ERRATUM_23144
bool "Cavium erratum 23144: ITS SYNC hang on dual socket system"
depends on NUMA
default y
help
ITS SYNC command hang for cross node io and collections/cpu mapping.
If unsure, say Y.
2015-09-22 04:58:35 +08:00
config CAVIUM_ERRATUM_23154
2022-03-07 22:30:14 +08:00
bool "Cavium errata 23154 and 38545: GICv3 lacks HW synchronisation"
2015-09-22 04:58:35 +08:00
default y
help
2022-03-07 22:30:14 +08:00
The ThunderX GICv3 implementation requires a modified version for
2015-09-22 04:58:35 +08:00
reading the IAR status to ensure data synchronization
(access to icc_iar1_el1 is not sync'ed before and after).
2022-03-07 22:30:14 +08:00
It also suffers from erratum 38545 (also present on Marvell's
OcteonTX and OcteonTX2), resulting in deactivated interrupts being
spuriously presented to the CPU interface.
2015-09-22 04:58:35 +08:00
If unsure, say Y.
2016-02-25 09:44:57 +08:00
config CAVIUM_ERRATUM_27456
bool "Cavium erratum 27456: Broadcast TLBI instructions may cause icache corruption"
default y
help
On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI
instructions may cause the icache to become corrupted if it
contains data for a non-current ASID. The fix is to
invalidate the icache when changing the mm context.
If unsure, say Y.
2017-06-09 19:49:48 +08:00
config CAVIUM_ERRATUM_30115
bool "Cavium erratum 30115: Guest may disable interrupts in host"
default y
help
On ThunderX T88 pass 1.x through 2.2, T81 pass 1.0 through
1.2, and T83 Pass 1.0, KVM guest execution may disable
interrupts in host. Trapping both GICv3 group-0 and group-1
accesses sidesteps the issue.
If unsure, say Y.
2019-09-13 17:57:50 +08:00
config CAVIUM_TX2_ERRATUM_219
bool "Cavium ThunderX2 erratum 219: PRFM between TTBR change and ISB fails"
default y
help
On Cavium ThunderX2, a load, store or prefetch instruction between a
TTBR update and the corresponding context synchronizing operation can
cause a spurious Data Abort to be delivered to any hardware thread in
the CPU core.
Work around the issue by avoiding the problematic code sequence and
trapping KVM guest TTBRx_EL1 writes to EL2 when SMT is enabled. The
trap handler performs the corresponding register access, skips the
instruction and ensures context synchronization by virtue of the
exception return.
If unsure, say Y.
2020-04-16 19:56:57 +08:00
config FUJITSU_ERRATUM_010001
bool "Fujitsu-A64FX erratum E#010001: Undefined fault may occur wrongly"
default y
help
This option adds a workaround for Fujitsu-A64FX erratum E#010001.
On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1), memory
accesses may cause undefined fault (Data abort, DFSC=0b111111).
This fault occurs under a specific hardware condition when a
load/store instruction performs an address translation using:
case-1 TTBR0_EL1 with TCR_EL1.NFD0 == 1.
case-2 TTBR0_EL2 with TCR_EL2.NFD0 == 1.
case-3 TTBR1_EL1 with TCR_EL1.NFD1 == 1.
case-4 TTBR1_EL2 with TCR_EL2.NFD1 == 1.
The workaround is to ensure these bits are clear in TCR_ELx.
The workaround only affects the Fujitsu-A64FX.
If unsure, say Y.
config HISILICON_ERRATUM_161600802
bool "Hip07 161600802: Erroneous redistributor VLPI base"
default y
help
The HiSilicon Hip07 SoC uses the wrong redistributor base
when issued ITS commands such as VMOVP and VMAPP, and requires
a 128kB offset to be applied to the target address in this commands.
If unsure, say Y.
2017-02-09 04:08:37 +08:00
config QCOM_FALKOR_ERRATUM_1003
bool "Falkor E1003: Incorrect translation due to ASID change"
default y
help
On Falkor v1, an incorrect ASID may be cached in the TLB when ASID
2017-11-14 22:29:19 +08:00
and BADDR are changed together in TTBRx_EL1. Since we keep the ASID
in TTBR1_EL1, this situation only occurs in the entry trampoline and
then only for entries in the walk cache, since the leaf translation
is unchanged. Work around the erratum by invalidating the walk cache
entries for the trampoline before entering the kernel proper.
2017-02-09 04:08:37 +08:00
2017-02-01 01:50:19 +08:00
config QCOM_FALKOR_ERRATUM_1009
bool "Falkor E1009: Prematurely complete a DSB after a TLBI"
default y
2018-11-19 19:27:28 +08:00
select ARM64_WORKAROUND_REPEAT_TLBI
2017-02-01 01:50:19 +08:00
help
On Falkor v1, the CPU may prematurely complete a DSB following a
TLBI xxIS invalidate maintenance operation. Repeat the TLBI operation
one more time to fix the issue.
If unsure, say Y.
2017-03-07 22:20:38 +08:00
config QCOM_QDF2400_ERRATUM_0065
bool "QDF2400 E0065: Incorrect GITS_TYPER.ITT_Entry_size"
default y
help
On Qualcomm Datacenter Technologies QDF2400 SoC, ITS hardware reports
ITE size incorrectly. The GITS_TYPER.ITT_Entry_size field should have
been indicated as 16Bytes (0xf), not 8Bytes (0x7).
If unsure, say Y.
2017-12-12 06:42:32 +08:00
config QCOM_FALKOR_ERRATUM_E1041
bool "Falkor E1041: Speculative instruction fetches might cause errant memory access"
default y
help
Falkor CPU may speculatively fetch instructions from an improper
memory location when MMU translation is changed from SCTLR_ELn[M]=1
to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
If unsure, say Y.
2021-03-24 08:28:09 +08:00
config NVIDIA_CARMEL_CNP_ERRATUM
bool "NVIDIA Carmel CNP: CNP on Carmel semantically different than ARM cores"
default y
help
If CNP is enabled on Carmel cores, non-sharable TLBIs on a core will not
invalidate shared TLB entries installed by a different core, as it would
on standard ARM cores.
If unsure, say Y.
2020-04-16 19:56:57 +08:00
config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
2019-02-27 02:43:41 +08:00
default y
help
2020-04-16 19:56:57 +08:00
Socionext Synquacer SoCs implement a separate h/w block to generate
MSI doorbell writes with non-zero values for the device ID.
2019-02-27 02:43:41 +08:00
If unsure, say Y.
2022-05-17 22:16:47 +08:00
endmenu # "ARM errata workarounds via the alternatives framework"
2014-11-14 23:54:12 +08:00
2014-05-12 17:40:38 +08:00
choice
prompt "Page size"
default ARM64_4K_PAGES
help
Page size (translation granule) configuration.
config ARM64_4K_PAGES
bool "4KB"
help
This feature enables 4KB pages support.
2015-10-19 21:19:37 +08:00
config ARM64_16K_PAGES
bool "16KB"
help
The system will use 16KB pages support. AArch32 emulation
requires applications compiled with 16K (or a multiple of 16K)
aligned segments.
2012-04-20 21:45:54 +08:00
config ARM64_64K_PAGES
2014-05-12 17:40:38 +08:00
bool "64KB"
2012-04-20 21:45:54 +08:00
help
This feature enables 64KB pages support (4KB by default)
allowing only two levels of page tables and faster TLB
2015-10-19 21:19:34 +08:00
look-up. AArch32 emulation requires applications compiled
with 64K aligned segments.
2012-04-20 21:45:54 +08:00
2014-05-12 17:40:38 +08:00
endchoice
choice
prompt "Virtual address space size"
default ARM64_VA_BITS_39 if ARM64_4K_PAGES
2015-10-19 21:19:37 +08:00
default ARM64_VA_BITS_47 if ARM64_16K_PAGES
2014-05-12 17:40:38 +08:00
default ARM64_VA_BITS_42 if ARM64_64K_PAGES
help
Allows choosing one of multiple possible virtual address
space sizes. The level of translation table is determined by
a combination of page size and virtual address space size.
2015-10-19 21:19:38 +08:00
config ARM64_VA_BITS_36
2015-10-20 21:59:20 +08:00
bool "36-bit" if EXPERT
2015-10-19 21:19:38 +08:00
depends on ARM64_16K_PAGES
2014-05-12 17:40:38 +08:00
config ARM64_VA_BITS_39
bool "39-bit"
depends on ARM64_4K_PAGES
config ARM64_VA_BITS_42
bool "42-bit"
depends on ARM64_64K_PAGES
2015-10-19 21:19:37 +08:00
config ARM64_VA_BITS_47
bool "47-bit"
depends on ARM64_16K_PAGES
2014-05-12 17:40:51 +08:00
config ARM64_VA_BITS_48
bool "48-bit"
2019-08-07 23:55:22 +08:00
config ARM64_VA_BITS_52
bool "52-bit"
2018-12-10 22:15:15 +08:00
depends on ARM64_64K_PAGES && (ARM64_PAN || !ARM64_SW_TTBR0_PAN)
help
Enable 52-bit virtual addressing for userspace when explicitly
2019-08-07 23:55:22 +08:00
requested via a hint to mmap(). The kernel will also use 52-bit
virtual addresses for its own mappings (provided HW support for
this feature is available, otherwise it reverts to 48-bit).
2018-12-10 22:15:15 +08:00
NOTE: Enabling 52-bit virtual addressing in conjunction with
ARMv8.3 Pointer Authentication will result in the PAC being
reduced from 7 bits to 3 bits, which may have a significant
impact on its susceptibility to brute-force attacks.
If unsure, select 48-bit virtual addressing instead.
2014-05-12 17:40:38 +08:00
endchoice
2018-12-10 22:15:15 +08:00
config ARM64_FORCE_52BIT
bool "Force 52-bit virtual addresses for userspace"
2019-08-07 23:55:22 +08:00
depends on ARM64_VA_BITS_52 && EXPERT
2018-12-10 22:15:15 +08:00
help
For systems with 52-bit userspace VAs enabled, the kernel will attempt
to maintain compatibility with older software by providing 48-bit VAs
unless a hint is supplied to mmap.
This configuration option disables the 48-bit compatibility logic, and
forces all userspace addresses to be 52-bit on HW that supports it. One
should only enable this configuration option for stress testing userspace
memory management code. If unsure say N here.
2014-05-12 17:40:38 +08:00
config ARM64_VA_BITS
int
2015-10-19 21:19:38 +08:00
default 36 if ARM64_VA_BITS_36
2014-05-12 17:40:38 +08:00
default 39 if ARM64_VA_BITS_39
default 42 if ARM64_VA_BITS_42
2015-10-19 21:19:37 +08:00
default 47 if ARM64_VA_BITS_47
2019-08-07 23:55:22 +08:00
default 48 if ARM64_VA_BITS_48
default 52 if ARM64_VA_BITS_52
2014-05-12 17:40:38 +08:00
2017-12-14 01:07:16 +08:00
choice
prompt "Physical address space size"
default ARM64_PA_BITS_48
help
Choose the maximum physical address range that the kernel will
support.
config ARM64_PA_BITS_48
bool "48-bit"
2017-12-14 01:07:25 +08:00
config ARM64_PA_BITS_52
bool "52-bit (ARMv8.2)"
depends on ARM64_64K_PAGES
depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
help
Enable support for a 52-bit physical address space, introduced as
part of the ARMv8.2-LPA extension.
With this enabled, the kernel will also continue to work on CPUs that
do not support ARMv8.2-LPA, but with some added memory overhead (and
minor performance overhead).
2017-12-14 01:07:16 +08:00
endchoice
config ARM64_PA_BITS
int
default 48 if ARM64_PA_BITS_48
2017-12-14 01:07:25 +08:00
default 52 if ARM64_PA_BITS_52
2017-12-14 01:07:16 +08:00
2019-11-13 17:26:52 +08:00
choice
prompt "Endianness"
default CPU_LITTLE_ENDIAN
help
Select the endianness of data accesses performed by the CPU. Userspace
applications will need to be compiled and linked for the endianness
that is selected here.
2013-10-11 21:52:19 +08:00
config CPU_BIG_ENDIAN
2021-02-09 08:57:20 +08:00
bool "Build big-endian kernel"
depends on !LD_IS_LLD || LLD_VERSION >= 130000
help
2019-11-13 17:26:52 +08:00
Say Y if you plan on running a kernel with a big-endian userspace.
config CPU_LITTLE_ENDIAN
bool "Build little-endian kernel"
help
Say Y if you plan on running a kernel with a little-endian userspace.
This is usually the case for distributions targeting arm64.
endchoice
2013-10-11 21:52:19 +08:00
2014-03-04 15:51:17 +08:00
config SCHED_MC
bool "Multi-core scheduler support"
help
Multi-core scheduler support improves the CPU scheduler's decision
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.
sched: Add cluster scheduler level in core and related Kconfig for ARM64
This patch adds scheduler level for clusters and automatically enables
the load balance among clusters. It will directly benefit a lot of
workload which loves more resources such as memory bandwidth, caches.
Testing has widely been done in two different hardware configurations of
Kunpeng920:
24 cores in one NUMA(6 clusters in each NUMA node);
32 cores in one NUMA(8 clusters in each NUMA node)
Workload is running on either one NUMA node or four NUMA nodes, thus,
this can estimate the effect of cluster spreading w/ and w/o NUMA load
balance.
* Stream benchmark:
4threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 29929.64 ( 0.00%) 32932.68 ( 10.03%)
MB/sec scale 29861.10 ( 0.00%) 32710.58 ( 9.54%)
MB/sec add 27034.42 ( 0.00%) 32400.68 ( 19.85%)
MB/sec triad 27225.26 ( 0.00%) 31965.36 ( 17.41%)
6threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 40330.24 ( 0.00%) 42377.68 ( 5.08%)
MB/sec scale 40196.42 ( 0.00%) 42197.90 ( 4.98%)
MB/sec add 37427.00 ( 0.00%) 41960.78 ( 12.11%)
MB/sec triad 37841.36 ( 0.00%) 42513.64 ( 12.35%)
12threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 52639.82 ( 0.00%) 53818.04 ( 2.24%)
MB/sec scale 52350.30 ( 0.00%) 53253.38 ( 1.73%)
MB/sec add 53607.68 ( 0.00%) 55198.82 ( 2.97%)
MB/sec triad 54776.66 ( 0.00%) 56360.40 ( 2.89%)
Thus, it could help memory-bound workload especially under medium load.
Similar improvement is also seen in lkp-pbzip2:
* lkp-pbzip2 benchmark
2-96 threads (on 4NUMA * 24cores = 96cores)
lkp-pbzip2 lkp-pbzip2
w/o patch w/ patch
Hmean tput-2 11062841.57 ( 0.00%) 11341817.51 * 2.52%*
Hmean tput-5 26815503.70 ( 0.00%) 27412872.65 * 2.23%*
Hmean tput-8 41873782.21 ( 0.00%) 43326212.92 * 3.47%*
Hmean tput-12 61875980.48 ( 0.00%) 64578337.51 * 4.37%*
Hmean tput-21 105814963.07 ( 0.00%) 111381851.01 * 5.26%*
Hmean tput-30 150349470.98 ( 0.00%) 156507070.73 * 4.10%*
Hmean tput-48 237195937.69 ( 0.00%) 242353597.17 * 2.17%*
Hmean tput-79 360252509.37 ( 0.00%) 362635169.23 * 0.66%*
Hmean tput-96 394571737.90 ( 0.00%) 400952978.48 * 1.62%*
2-24 threads (on 1NUMA * 24cores = 24cores)
lkp-pbzip2 lkp-pbzip2
w/o patch w/ patch
Hmean tput-2 11071705.49 ( 0.00%) 11296869.10 * 2.03%*
Hmean tput-4 20782165.19 ( 0.00%) 21949232.15 * 5.62%*
Hmean tput-6 30489565.14 ( 0.00%) 33023026.96 * 8.31%*
Hmean tput-8 40376495.80 ( 0.00%) 42779286.27 * 5.95%*
Hmean tput-12 61264033.85 ( 0.00%) 62995632.78 * 2.83%*
Hmean tput-18 86697139.39 ( 0.00%) 86461545.74 ( -0.27%)
Hmean tput-24 104854637.04 ( 0.00%) 104522649.46 * -0.32%*
In the case of 6 threads and 8 threads, we see the greatest performance
improvement.
Similar improvement can be seen on lkp-pixz though the improvement is
smaller:
* lkp-pixz benchmark
2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores)
lkp-pixz lkp-pixz
w/o patch w/ patch
Hmean tput-2 6486981.16 ( 0.00%) 6561515.98 * 1.15%*
Hmean tput-4 11645766.38 ( 0.00%) 11614628.43 ( -0.27%)
Hmean tput-6 15429943.96 ( 0.00%) 15957350.76 * 3.42%*
Hmean tput-8 19974087.63 ( 0.00%) 20413746.98 * 2.20%*
Hmean tput-12 28172068.18 ( 0.00%) 28751997.06 * 2.06%*
Hmean tput-18 39413409.54 ( 0.00%) 39896830.55 * 1.23%*
Hmean tput-24 49101815.85 ( 0.00%) 49418141.47 * 0.64%*
* SPECrate benchmark
4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores)
Base Base
Run Time Rate
------- ---------
4 Copies w/o 580 (w/ 570) w/o 11.1 (w/ 11.3)
8 Copies w/o 647 (w/ 605) w/o 20.0 (w/ 21.4, +7%)
16 Copies w/o 844 (w/ 844) w/o 30.6 (w/ 30.6)
32 Copies(on 4NUMA * 32 cores = 128cores)
[w/o patch]
Base Base Base
Benchmarks Copies Run Time Rate
--------------- ------- --------- ---------
500.perlbench_r 32 584 87.2 *
502.gcc_r 32 503 90.2 *
505.mcf_r 32 745 69.4 *
520.omnetpp_r 32 1031 40.7 *
523.xalancbmk_r 32 597 56.6 *
525.x264_r 1 -- CE
531.deepsjeng_r 32 336 109 *
541.leela_r 32 556 95.4 *
548.exchange2_r 32 513 163 *
557.xz_r 32 530 65.2 *
Est. SPECrate2017_int_base 80.3
[w/ patch]
Base Base Base
Benchmarks Copies Run Time Rate
--------------- ------- --------- ---------
500.perlbench_r 32 580 87.8 (+0.688%) *
502.gcc_r 32 477 95.1 (+5.432%) *
505.mcf_r 32 644 80.3 (+13.574%) *
520.omnetpp_r 32 942 44.6 (+9.58%) *
523.xalancbmk_r 32 560 60.4 (+6.714%%) *
525.x264_r 1 -- CE
531.deepsjeng_r 32 337 109 (+0.000%) *
541.leela_r 32 554 95.6 (+0.210%) *
548.exchange2_r 32 515 163 (+0.000%) *
557.xz_r 32 524 66.0 (+1.227%) *
Est. SPECrate2017_int_base 83.7 (+4.062%)
On the other hand, it is slightly helpful to CPU-bound tasks like
kernbench:
* 24-96 threads kernbench (on 4NUMA * 24cores = 96cores)
kernbench kernbench
w/o cluster w/ cluster
Min user-24 12054.67 ( 0.00%) 12024.19 ( 0.25%)
Min syst-24 1751.51 ( 0.00%) 1731.68 ( 1.13%)
Min elsp-24 600.46 ( 0.00%) 598.64 ( 0.30%)
Min user-48 12361.93 ( 0.00%) 12315.32 ( 0.38%)
Min syst-48 1917.66 ( 0.00%) 1892.73 ( 1.30%)
Min elsp-48 333.96 ( 0.00%) 332.57 ( 0.42%)
Min user-96 12922.40 ( 0.00%) 12921.17 ( 0.01%)
Min syst-96 2143.94 ( 0.00%) 2110.39 ( 1.56%)
Min elsp-96 211.22 ( 0.00%) 210.47 ( 0.36%)
Amean user-24 12063.99 ( 0.00%) 12030.78 * 0.28%*
Amean syst-24 1755.20 ( 0.00%) 1735.53 * 1.12%*
Amean elsp-24 601.60 ( 0.00%) 600.19 ( 0.23%)
Amean user-48 12362.62 ( 0.00%) 12315.56 * 0.38%*
Amean syst-48 1921.59 ( 0.00%) 1894.95 * 1.39%*
Amean elsp-48 334.10 ( 0.00%) 332.82 * 0.38%*
Amean user-96 12925.27 ( 0.00%) 12922.63 ( 0.02%)
Amean syst-96 2146.66 ( 0.00%) 2122.20 * 1.14%*
Amean elsp-96 211.96 ( 0.00%) 211.79 ( 0.08%)
Note this patch isn't an universal win, it might hurt those workload
which can benefit from packing. Though tasks which want to take
advantages of lower communication latency of one cluster won't
necessarily been packed in one cluster while kernel is not aware of
clusters, they have some chance to be randomly packed. But this
patch will make them more likely spread.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-09-24 16:51:03 +08:00
config SCHED_CLUSTER
bool "Cluster scheduler support"
help
Cluster scheduler support improves the CPU scheduler's decision
making when dealing with machines that have clusters of CPUs.
Cluster usually means a couple of CPUs which are placed closely
by sharing mid-level caches, last-level cache tags or internal
busses.
2014-03-04 15:51:17 +08:00
config SCHED_SMT
bool "SMT scheduler support"
help
Improves the CPU scheduler's decision making when dealing with
MultiThreading at a cost of slightly increased overhead in some
places. If unsure say N here.
2012-04-20 21:45:54 +08:00
config NR_CPUS
2015-03-18 19:01:18 +08:00
int "Maximum number of CPUs (2-4096)"
range 2 4096
2019-01-14 19:41:25 +08:00
default "256"
2012-04-20 21:45:54 +08:00
2013-10-25 03:30:18 +08:00
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
2015-09-24 17:32:14 +08:00
select GENERIC_IRQ_MIGRATION
2013-10-25 03:30:18 +08:00
help
Say Y here to experiment with turning CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu.
2016-04-09 06:50:27 +08:00
# Common NUMA Features
config NUMA
2020-02-01 09:51:06 +08:00
bool "NUMA Memory Allocation and Scheduler Support"
2020-11-19 08:38:26 +08:00
select GENERIC_ARCH_NUMA
2016-09-26 15:36:50 +08:00
select ACPI_NUMA if ACPI
select OF_NUMA
mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 10:07:41 +08:00
select HAVE_SETUP_PER_CPU_AREA
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK
select USE_PERCPU_NUMA_NODE_ID
2016-04-09 06:50:27 +08:00
help
2020-02-01 09:51:06 +08:00
Enable NUMA (Non-Uniform Memory Access) support.
2016-04-09 06:50:27 +08:00
The kernel will try to allocate memory used by a CPU on the
local memory of the CPU and add some more
NUMA awareness to the kernel.
config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)"
range 1 10
2020-10-31 01:30:50 +08:00
default "4"
2021-06-29 10:43:01 +08:00
depends on NUMA
2016-04-09 06:50:27 +08:00
help
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
2018-12-11 19:01:04 +08:00
source "kernel/Kconfig.hz"
2012-04-20 21:45:54 +08:00
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_VMEMMAP_ENABLE
2021-04-20 17:35:59 +08:00
select SPARSEMEM_VMEMMAP
2018-07-07 01:47:24 +08:00
2012-04-20 21:45:54 +08:00
config HW_PERF_EVENTS
2015-10-02 17:55:03 +08:00
def_bool y
depends on ARM_PMU
2012-04-20 21:45:54 +08:00
arm64: Add gcc Shadow Call Stack support
Shadow call stacks will be available in GCC >= 12, this patch makes
the corresponding kernel configuration available when compiling
the kernel with the gcc.
Note that the implementation in GCC is slightly different from Clang.
With SCS enabled, functions will only pop x30 once in the epilogue,
like:
str x30, [x18], #8
stp x29, x30, [sp, #-16]!
......
- ldp x29, x30, [sp], #16 //clang
+ ldr x29, [sp], #16 //GCC
ldr x30, [x18, #-8]!
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Dan Li <ashimida@linux.alibaba.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220303074323.86282-1-ashimida@linux.alibaba.com
2022-03-03 15:43:23 +08:00
# Supported by clang >= 7.0 or GCC >= 12.0.0
2020-04-28 00:00:16 +08:00
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
2015-11-23 18:33:49 +08:00
config PARAVIRT
bool "Enable paravirtualization code"
help
This changes the kernel so it can modify itself when it is run
under a hypervisor, potentially improving performance significantly
over full virtualization.
config PARAVIRT_TIME_ACCOUNTING
bool "Paravirtual steal time accounting"
select PARAVIRT
help
Select this option to enable fine granularity task steal time
accounting. Time spent executing other tasks in parallel with
the current vCPU is discounted from the vCPU power. To account for
that, there can be a small performance impact.
If in doubt, say N here.
2016-06-24 01:54:48 +08:00
config KEXEC
depends on PM_SLEEP_SMP
select KEXEC_CORE
bool "kexec system call"
2020-06-14 00:50:22 +08:00
help
2016-06-24 01:54:48 +08:00
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.
2018-11-15 13:52:48 +08:00
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
2021-02-22 01:49:30 +08:00
select HAVE_IMA_KEXEC if IMA
2018-11-15 13:52:48 +08:00
help
This is new version of kexec system call. This system call is
file based and takes file descriptors as system call argument
for kernel and initramfs as opposed to list of segments as
accepted by previous system call.
2019-08-20 08:17:44 +08:00
config KEXEC_SIG
2018-11-15 13:52:54 +08:00
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
help
Select this option to verify a signature with loaded kernel
image. If configured, any attempt of loading a image without
valid signature will fail.
In addition to that option, you need to enable signature
verification for the corresponding kernel image type being
loaded in order for this to work.
config KEXEC_IMAGE_VERIFY_SIG
bool "Enable Image signature verification support"
default y
2019-08-20 08:17:44 +08:00
depends on KEXEC_SIG
2018-11-15 13:52:54 +08:00
depends on EFI && SIGNED_PE_FILE_VERIFICATION
help
Enable Image signature verification support.
comment "Support for PE file signature verification disabled"
2019-08-20 08:17:44 +08:00
depends on KEXEC_SIG
2018-11-15 13:52:54 +08:00
depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
arm64: kdump: provide /proc/vmcore file
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.
On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.
Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-03 10:24:38 +08:00
config CRASH_DUMP
bool "Build kdump crash kernel"
help
Generate crash dump after being started by kexec. This should
be normally only set in special crash dump kernels which are
loaded in the main kernel with kexec-tools into a specially
reserved region and then later executed after a crash by
kdump/kexec.
2019-06-14 02:21:39 +08:00
For more details see Documentation/admin-guide/kdump/kdump.rst
arm64: kdump: provide /proc/vmcore file
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.
On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.
Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-03 10:24:38 +08:00
2021-01-26 03:19:08 +08:00
config TRANS_TABLE
def_bool y
2021-09-30 22:31:06 +08:00
depends on HIBERNATION || KEXEC_CORE
2021-01-26 03:19:08 +08:00
2013-06-04 01:05:43 +08:00
config XEN_DOM0
def_bool y
depends on XEN
config XEN
2014-09-18 05:07:06 +08:00
bool "Xen guest support on ARM64"
2013-06-04 01:05:43 +08:00
depends on ARM64 && OF
2013-10-10 21:40:44 +08:00
select SWIOTLB_XEN
2015-11-23 18:33:49 +08:00
select PARAVIRT
2013-06-04 01:05:43 +08:00
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
2023-01-04 21:00:00 +08:00
# include/linux/mmzone.h requires the following to be true:
#
2023-03-15 19:31:33 +08:00
# MAX_ORDER + PAGE_SHIFT <= SECTION_SIZE_BITS
2023-01-04 21:00:00 +08:00
#
2023-03-15 19:31:33 +08:00
# so the maximum value of MAX_ORDER is SECTION_SIZE_BITS - PAGE_SHIFT:
2023-01-04 21:00:00 +08:00
#
# | SECTION_SIZE_BITS | PAGE_SHIFT | max MAX_ORDER | default MAX_ORDER |
# ----+-------------------+--------------+-----------------+--------------------+
2023-03-15 19:31:33 +08:00
# 4K | 27 | 12 | 15 | 10 |
# 16K | 27 | 14 | 13 | 11 |
# 64K | 29 | 16 | 13 | 13 |
2022-08-15 22:39:59 +08:00
config ARCH_FORCE_MAX_ORDER
2023-03-24 13:22:21 +08:00
int "Maximum zone order" if EXPERT && (ARM64_4K_PAGES || ARM64_16K_PAGES)
2023-03-15 19:31:33 +08:00
default "13" if ARM64_64K_PAGES
default "11" if ARM64_16K_PAGES
default "10"
2015-10-19 21:19:37 +08:00
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
pages. This option selects the largest power of two that the kernel
keeps in the memory allocator. If you need to allocate very large
blocks of physically contiguous memory, then you may need to
increase this value.
2023-01-25 02:16:05 +08:00
We make sure that we can allocate up to a HugePage size for each configuration.
2015-10-19 21:19:37 +08:00
Hence we have :
2023-03-15 19:31:33 +08:00
MAX_ORDER = PMD_SHIFT - PAGE_SHIFT => PAGE_SHIFT - 3
2015-10-19 21:19:37 +08:00
2023-03-15 19:31:33 +08:00
However for 4K, we choose a higher default value, 10 as opposed to 9, giving us
2015-10-19 21:19:37 +08:00
4M allocations matching the default size used by generic code.
2013-04-25 22:19:21 +08:00
2017-11-14 22:41:01 +08:00
config UNMAP_KERNEL_AT_EL0
2017-11-15 00:19:39 +08:00
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
2017-11-14 22:41:01 +08:00
default y
help
2017-11-15 00:19:39 +08:00
Speculation attacks against some high-performance processors can
be used to bypass MMU permission checks and leak kernel data to
userspace. This can be defended against by unmapping the kernel
when running in userspace, mapping it back in on exception entry
via a trampoline page in the vector table.
2017-11-14 22:41:01 +08:00
If unsure, say Y.
arm64: Mitigate spectre style branch history side channels
Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation.
When taking an exception from user-space, a sequence of branches
or a firmware call overwrites or invalidates the branch history.
The sequence of branches is added to the vectors, and should appear
before the first indirect branch. For systems using KPTI the sequence
is added to the kpti trampoline where it has a free register as the exit
from the trampoline is via a 'ret'. For systems not using KPTI, the same
register tricks are used to free up a register in the vectors.
For the firmware call, arch-workaround-3 clobbers 4 registers, so
there is no choice but to save them to the EL1 stack. This only happens
for entry from EL0, so if we take an exception due to the stack access,
it will not become re-entrant.
For KVM, the existing branch-predictor-hardening vectors are used.
When a spectre version of these vectors is in use, the firmware call
is sufficient to mitigate against Spectre-BHB. For the non-spectre
versions, the sequence of branches is added to the indirect vector.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2021-11-10 22:48:00 +08:00
config MITIGATE_SPECTRE_BRANCH_HISTORY
bool "Mitigate Spectre style attacks against branch history" if EXPERT
default y
help
Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation.
When taking an exception from user-space, a sequence of branches
or a firmware call overwrites the branch history.
arm64: mm: apply r/o permissions of VM areas to its linear alias as well
On arm64, we use block mappings and contiguous hints to map the linear
region, to minimize the TLB footprint. However, this means that the
entire region is mapped using read/write permissions, which we cannot
modify at page granularity without having to take intrusive measures to
prevent TLB conflicts.
This means the linear aliases of pages belonging to read-only mappings
(executable or otherwise) in the vmalloc region are also mapped read/write,
and could potentially be abused to modify things like module code, bpf JIT
code or other read-only data.
So let's fix this, by extending the set_memory_ro/rw routines to take
the linear alias into account. The consequence of enabling this is
that we can no longer use block mappings or contiguous hints, so in
cases where the TLB footprint of the linear region is a bottleneck,
performance may be affected.
Therefore, allow this feature to be runtime en/disabled, by setting
rodata=full (or 'on' to disable just this enhancement, or 'off' to
disable read-only mappings for code and r/o data entirely) on the
kernel command line. Also, allow the default value to be set via a
Kconfig option.
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-07 18:36:20 +08:00
config RODATA_FULL_DEFAULT_ENABLED
bool "Apply r/o permissions of VM areas also to their linear aliases"
default y
help
Apply read-only attributes of VM areas to the linear alias of
the backing pages as well. This prevents code or read-only data
from being modified (inadvertently or intentionally) via another
mapping of the same memory page. This additional enhancement can
be turned off at runtime by passing rodata=[off|on] (and turned on
with rodata=full if this option is set to 'n')
This requires the linear region to be mapped down to pages,
which may adversely affect performance in some cases.
2019-04-23 21:37:24 +08:00
config ARM64_SW_TTBR0_PAN
bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
help
Enabling this option prevents the kernel from accessing
user-space memory directly by pointing TTBR0_EL1 to a reserved
zeroed area and reserved ASID. The user access routines
restore the valid TTBR0_EL1 temporarily.
2019-07-24 01:58:39 +08:00
config ARM64_TAGGED_ADDR_ABI
bool "Enable the tagged user addresses syscall ABI"
default y
help
When this option is enabled, user applications can opt in to a
relaxed ABI via prctl() allowing tagged addresses to be passed
to system calls as pointer arguments. For details, see
2019-09-18 03:52:27 +08:00
Documentation/arm64/tagged-address-abi.rst.
2019-07-24 01:58:39 +08:00
2019-04-23 21:37:24 +08:00
menuconfig COMPAT
bool "Kernel support for 32-bit EL0"
depends on ARM64_4K_PAGES || EXPERT
select HAVE_UID16
select OLD_SIGSUSPEND3
select COMPAT_OLD_SIGACTION
help
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
If you use a page size other than 4KB (i.e, 16KB or 64KB), please be aware
that you will only be able to execute AArch32 binaries that were compiled
with page size aligned segments.
If you want to execute 32-bit userspace applications, say Y.
if COMPAT
config KUSER_HELPERS
2019-10-07 20:03:12 +08:00
bool "Enable kuser helpers page for 32-bit applications"
2019-04-23 21:37:24 +08:00
default y
help
Warning: disabling this option may break 32-bit user programs.
Provide kuser helpers to compat tasks. The kernel provides
helper code to userspace in read only form at a fixed location
to allow userspace to be independent of the CPU type fitted to
the system. This permits binaries to be run on ARMv4 through
to ARMv8 without modification.
2019-04-15 02:51:10 +08:00
See Documentation/arm/kernel_user_helpers.rst for details.
2019-04-23 21:37:24 +08:00
However, the fixed address nature of these helpers can be used
by ROP (return orientated programming) authors when creating
exploits.
If all of the binaries and libraries which run on your platform
are built specifically for your platform, and make no use of
these helpers, then you can turn this option off to hinder
such exploits. However, in that case, if a binary or library
relying on those helpers is run, it will not function correctly.
Say N here only if you are absolutely certain that you do not
need these helpers; otherwise, the safe option is to say Y.
2019-10-07 20:03:12 +08:00
config COMPAT_VDSO
bool "Enable vDSO for 32-bit applications"
2021-10-20 06:36:46 +08:00
depends on !CPU_BIG_ENDIAN
depends on (CC_IS_CLANG && LD_IS_LLD) || "$(CROSS_COMPILE_COMPAT)" != ""
2019-10-07 20:03:12 +08:00
select GENERIC_COMPAT_VDSO
default y
help
Place in the process address space of 32-bit applications an
ELF shared object providing fast implementations of gettimeofday
and clock_gettime.
You must have a 32-bit build of glibc 2.22 or later for programs
to seamlessly take advantage of this.
2019-04-23 21:37:24 +08:00
2020-06-09 04:57:08 +08:00
config THUMB2_COMPAT_VDSO
bool "Compile the 32-bit vDSO for Thumb-2 mode" if EXPERT
depends on COMPAT_VDSO
default y
help
Compile the compat vDSO with '-mthumb -fomit-frame-pointer' if y,
otherwise with '-marm'.
2022-07-01 21:53:22 +08:00
config COMPAT_ALIGNMENT_FIXUPS
bool "Fix up misaligned multi-word loads and stores in user space"
2014-11-21 00:51:10 +08:00
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
2017-11-07 02:07:11 +08:00
depends on SYSCTL
2014-11-21 00:51:10 +08:00
help
Legacy software support may require certain instructions
that have been deprecated or obsoleted in the architecture.
Enable this config to enable selective emulation of these
features.
If unsure, say Y
if ARMV8_DEPRECATED
config SWP_EMULATION
bool "Emulate SWP/SWPB instructions"
help
ARMv8 obsoletes the use of A32 SWP/SWPB instructions such that
they are always undefined. Say Y here to enable software
emulation of these instructions for userspace using LDXR/STXR.
2020-06-25 21:15:07 +08:00
This feature can be controlled at runtime with the abi.swp
sysctl which is disabled by default.
2014-11-21 00:51:10 +08:00
In some older versions of glibc [<=2.8] SWP is used during futex
trylock() operations with the assumption that the code will not
be preempted. This invalid assumption may be more likely to fail
with SWP emulation enabled, leading to deadlock of the user
application.
NOTE: when accessing uncached shared regions, LDXR/STXR rely
on an external transaction monitoring block called a global
monitor to maintain update atomicity. If your system does not
implement a global monitor, this option can cause programs that
perform SWP operations to uncached memory to deadlock.
If unsure, say Y
config CP15_BARRIER_EMULATION
bool "Emulate CP15 Barrier instructions"
help
The CP15 barrier instructions - CP15ISB, CP15DSB, and
CP15DMB - are deprecated in ARMv8 (and ARMv7). It is
strongly recommended to use the ISB, DSB, and DMB
instructions instead.
Say Y here to enable software emulation of these
instructions for AArch32 userspace code. When this option is
enabled, CP15 barrier usage is traced which can help
2020-06-25 21:15:07 +08:00
identify software that needs updating. This feature can be
controlled at runtime with the abi.cp15_barrier sysctl.
2014-11-21 00:51:10 +08:00
If unsure, say Y
2015-01-21 20:43:11 +08:00
config SETEND_EMULATION
bool "Emulate SETEND instruction"
help
The SETEND instruction alters the data-endianness of the
AArch32 EL0, and is deprecated in ARMv8.
Say Y here to enable software emulation of the instruction
2020-06-25 21:15:07 +08:00
for AArch32 userspace code. This feature can be controlled
at runtime with the abi.setend sysctl.
2015-01-21 20:43:11 +08:00
Note: All the cpus on the system must have mixed endian support at EL0
for this feature to be enabled. If a new CPU - which doesn't support mixed
endian - is hotplugged in after this feature has been enabled, there could
be unexpected results in the applications.
If unsure, say Y
2022-05-17 22:16:47 +08:00
endif # ARMV8_DEPRECATED
2014-11-21 00:51:10 +08:00
2022-05-17 22:16:47 +08:00
endif # COMPAT
2016-07-02 01:25:31 +08:00
2015-07-27 22:54:13 +08:00
menu "ARMv8.1 architectural features"
config ARM64_HW_AFDBM
bool "Support for hardware updates of the Access and Dirty page flags"
default y
help
The ARMv8.1 architecture extensions introduce support for
hardware updates of the access and dirty information in page
table entries. When enabled in TCR_EL1 (HA and HD bits) on
capable processors, accesses to pages with PTE_AF cleared will
set this bit instead of raising an access flag fault.
Similarly, writes to read-only pages with the DBM bit set will
clear the read-only bit (AP[2]) instead of raising a
permission fault.
Kernels built with this configuration option enabled continue
to work on pre-ARMv8.1 hardware and the performance impact is
minimal. If unsure, say Y.
config ARM64_PAN
bool "Enable support for Privileged Access Never (PAN)"
default y
help
2022-05-17 22:16:47 +08:00
Privileged Access Never (PAN; part of the ARMv8.1 Extensions)
prevents the kernel or hypervisor from accessing user-space (EL0)
memory directly.
2015-07-27 22:54:13 +08:00
2022-05-17 22:16:47 +08:00
Choosing this option will cause any unprotected (not using
copy_to_user et al) memory access to fail with a permission fault.
2015-07-27 22:54:13 +08:00
2022-05-17 22:16:47 +08:00
The feature is detected at runtime, and will remain as a 'nop'
instruction if the cpu does not implement the feature.
2015-07-27 22:54:13 +08:00
2020-06-30 21:02:22 +08:00
config AS_HAS_LDAPR
def_bool $(as-instr,.arch_extension rcpc)
2021-04-10 01:37:10 +08:00
config AS_HAS_LSE_ATOMICS
def_bool $(as-instr,.arch_extension lse)
2015-07-27 22:54:13 +08:00
config ARM64_LSE_ATOMICS
2020-01-15 19:30:08 +08:00
bool
default ARM64_USE_LSE_ATOMICS
2021-04-10 01:37:10 +08:00
depends on AS_HAS_LSE_ATOMICS
2020-01-15 19:30:08 +08:00
config ARM64_USE_LSE_ATOMICS
2015-07-27 22:54:13 +08:00
bool "Atomic instructions"
2018-05-22 02:14:22 +08:00
default y
2015-07-27 22:54:13 +08:00
help
As part of the Large System Extensions, ARMv8.1 introduces new
atomic instructions that are designed specifically to scale in
very large systems.
Say Y here to make use of these instructions for the in-kernel
atomic routines. This incurs a small overhead on CPUs that do
not support these instructions and requires the kernel to be
2018-05-22 02:14:22 +08:00
built with binutils >= 2.25 in order for the new instructions
to be used.
2015-07-27 22:54:13 +08:00
2022-05-17 22:16:47 +08:00
endmenu # "ARMv8.1 architectural features"
2015-07-27 22:54:13 +08:00
2016-02-27 00:30:14 +08:00
menu "ARMv8.2 architectural features"
2021-12-13 22:02:52 +08:00
config AS_HAS_ARMV8_2
2022-05-17 22:16:47 +08:00
def_bool $(cc-option,-Wa$(comma)-march=armv8.2-a)
2021-12-13 22:02:52 +08:00
config AS_HAS_SHA3
2022-05-17 22:16:47 +08:00
def_bool $(as-instr,.arch armv8.2-a+sha3)
2021-12-13 22:02:52 +08:00
2017-07-25 18:55:42 +08:00
config ARM64_PMEM
bool "Enable support for persistent memory"
select ARCH_HAS_PMEM_API
2017-07-25 18:55:43 +08:00
select ARCH_HAS_UACCESS_FLUSHCACHE
2017-07-25 18:55:42 +08:00
help
Say Y to enable support for the persistent memory API based on the
ARMv8.2 DCPoP feature.
The feature is detected at runtime, and the kernel will use DC CVAC
operations if DC CVAP is not supported (following the behaviour of
DC CVAP itself if the system does not define a point of persistence).
2018-01-16 03:38:56 +08:00
config ARM64_RAS_EXTN
bool "Enable support for RAS CPU Extensions"
default y
help
CPUs that support the Reliability, Availability and Serviceability
(RAS) Extensions, part of ARMv8.2 are able to track faults and
errors, classify them and report them to software.
On CPUs with these extensions system software can use additional
barriers to determine if faults are pending and read the
classification from a new set of registers.
Selecting this feature will allow the kernel to use these barriers
and access the new registers if the system supports the extension.
Platform RAS features may additionally depend on firmware support.
2018-07-31 21:08:56 +08:00
config ARM64_CNP
bool "Enable support for Common Not Private (CNP) translations"
default y
depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
help
Common Not Private (CNP) allows translation table entries to
be shared between different PEs in the same inner shareable
domain, so the hardware can use this fact to optimise the
caching of such entries in the TLB.
Selecting this option allows the CNP feature to be detected
at runtime, and does not affect PEs that do not implement
this feature.
2022-05-17 22:16:47 +08:00
endmenu # "ARMv8.2 architectural features"
2016-02-27 00:30:14 +08:00
2018-12-08 02:39:30 +08:00
menu "ARMv8.3 architectural features"
config ARM64_PTR_AUTH
bool "Enable support for pointer authentication"
default y
help
Pointer authentication (part of the ARMv8.3 Extensions) provides
instructions for signing and authenticating pointers against secret
keys, which can be used to mitigate Return Oriented Programming (ROP)
and other attacks.
This option enables these instructions at EL0 (i.e. for userspace).
Choosing this option will cause the kernel to initialise secret keys
for each process at exec() time, with these keys being
context-switched along with the process.
The feature is detected at runtime. If the feature is not present in
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 12:42:35 +08:00
hardware it will not be advertised to userspace/KVM guest nor will it
2020-06-11 19:18:58 +08:00
be enabled.
2018-12-08 02:39:30 +08:00
2020-03-13 17:04:55 +08:00
If the feature is present on the boot CPU but not on a late CPU, then
the late CPU will be parked. Also, if the boot CPU does not have
address auth and the late CPU has then the late CPU will still boot
but with the feature disabled. On such a system, this option should
not be selected.
2021-06-13 17:26:31 +08:00
config ARM64_PTR_AUTH_KERNEL
2021-06-13 17:26:32 +08:00
bool "Use pointer authentication for kernel"
2021-06-13 17:26:31 +08:00
default y
depends on ARM64_PTR_AUTH
arm64: unify asm-arch manipulation
Assemblers will reject instructions not supported by a target
architecture version, and so we must explicitly tell the assembler the
latest architecture version for which we want to assemble instructions
from.
We've added a few AS_HAS_ARMV8_<N> definitions for this, in addition to
an inconsistently named AS_HAS_PAC definition, from which arm64's
top-level Makefile determines the architecture version that we intend to
target, and generates the `asm-arch` variable.
To make this a bit clearer and easier to maintain, this patch reworks
the Makefile to determine asm-arch in a single if-else-endif chain.
AS_HAS_PAC, which is defined when the assembler supports
`-march=armv8.3-a`, is renamed to AS_HAS_ARMV8_3.
As the logic for armv8.3-a is lifted out of the block handling pointer
authentication, `asm-arch` may now be set to armv8.3-a regardless of
whether support for pointer authentication is selected. This means that
it will be possible to assemble armv8.3-a instructions even if we didn't
intend to, but this is consistent with our handling of other
architecture versions, and the compiler won't generate armv8.3-a
instructions regardless.
For the moment there's no need for an CONFIG_AS_HAS_ARMV8_1, as the code
for LSE atomics and LDAPR use individual `.arch_extension` entries and
do not require the baseline asm arch to be bumped to armv8.1-a. The
other armv8.1-a features (e.g. PAN) do not require assembler support.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-31 18:58:08 +08:00
depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) && AS_HAS_ARMV8_3
2021-06-13 17:26:31 +08:00
# Modern compilers insert a .note.gnu.property section note for PAC
# which is only understood by binutils starting with version 2.33.1.
depends on LD_IS_LLD || LD_VERSION >= 23301 || (CC_IS_GCC && GCC_VERSION < 90100)
depends on !CC_IS_CLANG || AS_HAS_CFI_NEGATE_RA_STATE
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_ARGS)
2021-06-13 17:26:31 +08:00
help
If the compiler supports the -mbranch-protection or
-msign-return-address flag (e.g. GCC 7 or later), then this option
will cause the kernel itself to be compiled with return address
protection. In this case, and if the target hardware is known to
support pointer authentication, then CONFIG_STACKPROTECTOR can be
disabled with minimal loss of protection.
arm64: compile the kernel with ptrauth return address signing
Compile all functions with two ptrauth instructions: PACIASP in the
prologue to sign the return address, and AUTIASP in the epilogue to
authenticate the return address (from the stack). If authentication
fails, the return will cause an instruction abort to be taken, followed
by an oops and killing the task.
This should help protect the kernel against attacks using
return-oriented programming. As ptrauth protects the return address, it
can also serve as a replacement for CONFIG_STACKPROTECTOR, although note
that it does not protect other parts of the stack.
The new instructions are in the HINT encoding space, so on a system
without ptrauth they execute as NOPs.
CONFIG_ARM64_PTR_AUTH now not only enables ptrauth for userspace and KVM
guests, but also automatically builds the kernel with ptrauth
instructions if the compiler supports it. If there is no compiler
support, we do not warn that the kernel was built without ptrauth
instructions.
GCC 7 and 8 support the -msign-return-address option, while GCC 9
deprecates that option and replaces it with -mbranch-protection. Support
both options.
Clang uses an external assembler hence this patch makes sure that the
correct parameters (-march=armv8.3-a) are passed down to help it recognize
the ptrauth instructions.
Ftrace function tracer works properly with Ptrauth only when
patchable-function-entry feature is present and is ensured by the
Kconfig dependency.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> # not co-dev parts
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Cover leaf function, comments, Ftrace Kconfig]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13 17:05:03 +08:00
This feature works with FUNCTION_GRAPH_TRACER option only if
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
DYNAMIC_FTRACE_WITH_ARGS is enabled.
arm64: compile the kernel with ptrauth return address signing
Compile all functions with two ptrauth instructions: PACIASP in the
prologue to sign the return address, and AUTIASP in the epilogue to
authenticate the return address (from the stack). If authentication
fails, the return will cause an instruction abort to be taken, followed
by an oops and killing the task.
This should help protect the kernel against attacks using
return-oriented programming. As ptrauth protects the return address, it
can also serve as a replacement for CONFIG_STACKPROTECTOR, although note
that it does not protect other parts of the stack.
The new instructions are in the HINT encoding space, so on a system
without ptrauth they execute as NOPs.
CONFIG_ARM64_PTR_AUTH now not only enables ptrauth for userspace and KVM
guests, but also automatically builds the kernel with ptrauth
instructions if the compiler supports it. If there is no compiler
support, we do not warn that the kernel was built without ptrauth
instructions.
GCC 7 and 8 support the -msign-return-address option, while GCC 9
deprecates that option and replaces it with -mbranch-protection. Support
both options.
Clang uses an external assembler hence this patch makes sure that the
correct parameters (-march=armv8.3-a) are passed down to help it recognize
the ptrauth instructions.
Ftrace function tracer works properly with Ptrauth only when
patchable-function-entry feature is present and is ensured by the
Kconfig dependency.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> # not co-dev parts
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Cover leaf function, comments, Ftrace Kconfig]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13 17:05:03 +08:00
config CC_HAS_BRANCH_PROT_PAC_RET
# GCC 9 or later, clang 8 or later
def_bool $(cc-option,-mbranch-protection=pac-ret+leaf)
config CC_HAS_SIGN_RETURN_ADDRESS
# GCC 7, 8
def_bool $(cc-option,-msign-return-address=all)
arm64: unify asm-arch manipulation
Assemblers will reject instructions not supported by a target
architecture version, and so we must explicitly tell the assembler the
latest architecture version for which we want to assemble instructions
from.
We've added a few AS_HAS_ARMV8_<N> definitions for this, in addition to
an inconsistently named AS_HAS_PAC definition, from which arm64's
top-level Makefile determines the architecture version that we intend to
target, and generates the `asm-arch` variable.
To make this a bit clearer and easier to maintain, this patch reworks
the Makefile to determine asm-arch in a single if-else-endif chain.
AS_HAS_PAC, which is defined when the assembler supports
`-march=armv8.3-a`, is renamed to AS_HAS_ARMV8_3.
As the logic for armv8.3-a is lifted out of the block handling pointer
authentication, `asm-arch` may now be set to armv8.3-a regardless of
whether support for pointer authentication is selected. This means that
it will be possible to assemble armv8.3-a instructions even if we didn't
intend to, but this is consistent with our handling of other
architecture versions, and the compiler won't generate armv8.3-a
instructions regardless.
For the moment there's no need for an CONFIG_AS_HAS_ARMV8_1, as the code
for LSE atomics and LDAPR use individual `.arch_extension` entries and
do not require the baseline asm arch to be bumped to armv8.1-a. The
other armv8.1-a features (e.g. PAN) do not require assembler support.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-31 18:58:08 +08:00
config AS_HAS_ARMV8_3
2020-06-14 22:43:41 +08:00
def_bool $(cc-option,-Wa$(comma)-march=armv8.3-a)
arm64: compile the kernel with ptrauth return address signing
Compile all functions with two ptrauth instructions: PACIASP in the
prologue to sign the return address, and AUTIASP in the epilogue to
authenticate the return address (from the stack). If authentication
fails, the return will cause an instruction abort to be taken, followed
by an oops and killing the task.
This should help protect the kernel against attacks using
return-oriented programming. As ptrauth protects the return address, it
can also serve as a replacement for CONFIG_STACKPROTECTOR, although note
that it does not protect other parts of the stack.
The new instructions are in the HINT encoding space, so on a system
without ptrauth they execute as NOPs.
CONFIG_ARM64_PTR_AUTH now not only enables ptrauth for userspace and KVM
guests, but also automatically builds the kernel with ptrauth
instructions if the compiler supports it. If there is no compiler
support, we do not warn that the kernel was built without ptrauth
instructions.
GCC 7 and 8 support the -msign-return-address option, while GCC 9
deprecates that option and replaces it with -mbranch-protection. Support
both options.
Clang uses an external assembler hence this patch makes sure that the
correct parameters (-march=armv8.3-a) are passed down to help it recognize
the ptrauth instructions.
Ftrace function tracer works properly with Ptrauth only when
patchable-function-entry feature is present and is ensured by the
Kconfig dependency.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> # not co-dev parts
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Cover leaf function, comments, Ftrace Kconfig]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13 17:05:03 +08:00
2020-03-20 02:19:51 +08:00
config AS_HAS_CFI_NEGATE_RA_STATE
def_bool $(as-instr,.cfi_startproc\n.cfi_negate_ra_state\n.cfi_endproc\n)
2022-05-17 22:16:47 +08:00
endmenu # "ARMv8.3 architectural features"
2018-12-08 02:39:30 +08:00
2020-03-05 17:06:21 +08:00
menu "ARMv8.4 architectural features"
config ARM64_AMU_EXTN
bool "Enable support for the Activity Monitors Unit CPU extension"
default y
help
The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. This enables support for version 1
of the activity monitors architecture, AMUv1.
To enable the use of this extension on CPUs that implement it, say Y.
Note that for architectural reasons, firmware _must_ implement AMU
support when running on CPUs that present the activity monitors
extension. The required support is present in:
* Version 1.5 and later of the ARM Trusted Firmware
For kernels that have this configuration enabled but boot with broken
firmware, you may need to say N here until the firmware is fixed.
Otherwise you may experience firmware panics or lockups when
accessing the counter registers. Even if you are not observing these
symptoms, the values returned by the register reads might not
correctly reflect reality. Most commonly, the value read will be 0,
indicating that the counter is not enabled.
2020-07-15 15:19:44 +08:00
config AS_HAS_ARMV8_4
def_bool $(cc-option,-Wa$(comma)-march=armv8.4-a)
config ARM64_TLB_RANGE
bool "Enable support for tlbi range feature"
default y
depends on AS_HAS_ARMV8_4
help
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses.
The feature introduces new assembly instructions, and they were
support when binutils >= 2.30.
2022-05-17 22:16:47 +08:00
endmenu # "ARMv8.4 architectural features"
2018-12-08 02:39:30 +08:00
2019-12-10 02:12:14 +08:00
menu "ARMv8.5 architectural features"
2020-12-23 04:01:24 +08:00
config AS_HAS_ARMV8_5
def_bool $(cc-option,-Wa$(comma)-march=armv8.5-a)
2020-03-17 00:50:55 +08:00
config ARM64_BTI
bool "Branch Target Identification support"
default y
help
Branch Target Identification (part of the ARMv8.5 Extensions)
provides a mechanism to limit the set of locations to which computed
branch instructions such as BR or BLR can jump.
To make use of BTI on CPUs that support it, say Y.
BTI is intended to provide complementary protection to other control
flow integrity protection mechanisms, such as the Pointer
authentication mechanism provided as part of the ARMv8.3 Extensions.
For this reason, it does not make sense to enable this option without
also enabling support for pointer authentication. Thus, when
enabling this option you should also select ARM64_PTR_AUTH=y.
Userspace binaries must also be specifically compiled to make use of
this mechanism. If you say N here or the hardware does not support
BTI, such binaries can still run, but you get no additional
enforcement of branch destinations.
2020-05-07 03:51:34 +08:00
config ARM64_BTI_KERNEL
bool "Use Branch Target Identification for kernel"
default y
depends on ARM64_BTI
2021-06-13 17:26:31 +08:00
depends on ARM64_PTR_AUTH_KERNEL
2020-05-07 03:51:34 +08:00
depends on CC_HAS_BRANCH_PROT_PAC_RET_BTI
2020-05-12 19:45:40 +08:00
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697
depends on !CC_IS_GCC || GCC_VERSION >= 100100
2022-09-05 22:22:55 +08:00
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
depends on !CC_IS_GCC
2021-07-13 05:46:37 +08:00
# https://github.com/llvm/llvm-project/commit/a88c722e687e6780dcd6a58718350dc76fcc4cc9
depends on !CC_IS_CLANG || CLANG_VERSION >= 120000
ftrace: arm64: move from REGS to ARGS
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-04 01:05:20 +08:00
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_ARGS)
2020-05-07 03:51:34 +08:00
help
Build the kernel with Branch Target Identification annotations
and enable enforcement of this for kernel code. When this option
is enabled and the system supports BTI all kernel code including
modular code must have BTI enabled.
config CC_HAS_BRANCH_PROT_PAC_RET_BTI
# GCC 9 or later, clang 8 or later
def_bool $(cc-option,-mbranch-protection=pac-ret+leaf+bti)
2019-12-10 02:12:14 +08:00
config ARM64_E0PD
bool "Enable support for E0PD"
default y
help
2020-01-22 19:23:54 +08:00
E0PD (part of the ARMv8.5 extensions) allows us to ensure
that EL0 accesses made via TTBR1 always fault in constant time,
providing similar benefits to KASLR as those provided by KPTI, but
with lower overhead and without disrupting legitimate access to
kernel memory such as SPE.
2019-12-10 02:12:14 +08:00
2020-01-22 19:23:54 +08:00
This option enables E0PD for TTBR1 where available.
2019-12-10 02:12:14 +08:00
2019-09-06 18:08:29 +08:00
config ARM64_AS_HAS_MTE
# Initial support for MTE went in binutils 2.32.0, checked with
# ".arch armv8.5-a+memtag" below. However, this was incomplete
# as a late addition to the final architecture spec (LDGM/STGM)
# is only supported in the newer 2.32.x and 2.33 binutils
# versions, hence the extra "stgm" instruction check below.
def_bool $(as-instr,.arch armv8.5-a+memtag\nstgm xzr$(comma)[x0])
config ARM64_MTE
bool "Memory Tagging Extension support"
default y
depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
2020-12-23 04:01:24 +08:00
depends on AS_HAS_ARMV8_5
2021-04-10 01:37:10 +08:00
depends on AS_HAS_LSE_ATOMICS
2020-12-23 04:01:35 +08:00
# Required for tag checking in the uaccess routines
depends on ARM64_PAN
2022-04-23 18:07:50 +08:00
select ARCH_HAS_SUBPAGE_FAULTS
2019-09-06 18:08:29 +08:00
select ARCH_USES_HIGH_VMA_FLAGS
2022-11-04 09:10:34 +08:00
select ARCH_USES_PG_ARCH_X
2019-09-06 18:08:29 +08:00
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
architectural support for run-time, always-on detection of
various classes of memory error to aid with software debugging
to eliminate vulnerabilities arising from memory-unsafe
languages.
This option enables the support for the Memory Tagging
Extension at EL0 (i.e. for userspace).
Selecting this option allows the feature to be detected at
runtime. Any secondary CPU not implementing this feature will
not be allowed a late bring-up.
Userspace binaries that want to use this feature must
explicitly opt in. The mechanism for the userspace is
described in:
Documentation/arm64/memory-tagging-extension.rst.
2022-05-17 22:16:47 +08:00
endmenu # "ARMv8.5 architectural features"
2019-12-10 02:12:14 +08:00
2021-03-13 01:38:10 +08:00
menu "ARMv8.7 architectural features"
config ARM64_EPAN
bool "Enable support for Enhanced Privileged Access Never (EPAN)"
default y
depends on ARM64_PAN
help
2022-05-17 22:16:47 +08:00
Enhanced Privileged Access Never (EPAN) allows Privileged
Access Never to be used with Execute-only mappings.
2021-03-13 01:38:10 +08:00
2022-05-17 22:16:47 +08:00
The feature is detected at runtime, and will remain disabled
if the cpu does not implement the feature.
endmenu # "ARMv8.7 architectural features"
2021-03-13 01:38:10 +08:00
2017-10-31 23:51:02 +08:00
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
help
The Scalable Vector Extension (SVE) is an extension to the AArch64
execution state which complements and extends the SIMD functionality
of the base architecture to support much larger vectors and to enable
additional vectorisation opportunities.
To enable use of this extension on CPUs that implement it, say Y.
2019-04-19 01:41:38 +08:00
On CPUs that support the SVE2 extensions, this option will enable
those too.
2018-03-24 02:08:31 +08:00
Note that for architectural reasons, firmware _must_ implement SVE
support when running on SVE capable hardware. The required support
is present in:
* version 1.5 and later of the ARM Trusted Firmware
* the AArch64 boot wrapper since commit 5e1261e08abf
("bootwrapper: SVE: Enable SVE for EL2 and below").
For other firmware implementations, consult the firmware documentation
or vendor.
If you need the kernel to boot on SVE-capable hardware with broken
firmware, you may need to say N here until you get your firmware
fixed. Otherwise, you may experience firmware panics or lockups when
booting the kernel. If unsure and you are not observing these
symptoms, you should assume that it is safe to say Y.
2015-11-24 19:37:35 +08:00
2022-04-19 19:22:35 +08:00
config ARM64_SME
bool "ARM Scalable Matrix Extension support"
default y
depends on ARM64_SVE
help
The Scalable Matrix Extension (SME) is an extension to the AArch64
execution state which utilises a substantial subset of the SVE
instruction set, together with the addition of new architectural
register state capable of holding two dimensional matrix tiles to
enable various matrix operations.
2015-11-24 19:37:35 +08:00
config ARM64_MODULE_PLTS
2019-06-18 06:29:59 +08:00
bool "Use PLTs to allow module memory to spill over into vmalloc area"
2019-06-25 16:32:11 +08:00
depends on MODULES
2015-11-24 19:37:35 +08:00
select HAVE_MOD_ARCH_SPECIFIC
2019-06-18 06:29:59 +08:00
help
Allocate PLTs when loading modules so that jumps and calls whose
targets are too far away for their relative offsets to be encoded
in the instructions themselves can be bounced via veneers in the
module's PLT. This allows modules to be allocated in the generic
vmalloc area after the dedicated module memory area has been
exhausted.
When running with address space randomization (KASLR), the module
region itself may be too far away for ordinary relative jumps and
calls, and so in that case, module PLTs are required and cannot be
disabled.
Specific errata workaround(s) might also force module PLTs to be
enabled (ARM64_ERRATUM_843419).
2015-11-24 19:37:35 +08:00
2019-01-31 22:59:03 +08:00
config ARM64_PSEUDO_NMI
bool "Support for NMI-like interrupts"
2019-12-31 17:54:57 +08:00
select ARM_GIC_V3
2019-01-31 22:59:03 +08:00
help
Adds support for mimicking Non-Maskable Interrupts through the use of
GIC interrupt priority. This support requires version 3 or later of
2019-04-29 21:21:11 +08:00
ARM GIC.
2019-01-31 22:59:03 +08:00
This high priority configuration for interrupts needs to be
explicitly enabled by setting the kernel parameter
"irqchip.gicv3_pseudo_nmi" to 1.
If unsure, say N
2019-06-11 17:38:11 +08:00
if ARM64_PSEUDO_NMI
config ARM64_DEBUG_PRIORITY_MASKING
bool "Debug interrupt priority masking"
help
This adds runtime checks to functions enabling/disabling
interrupts when using priority masking. The additional checks verify
the validity of ICC_PMR_EL1 when calling concerned functions.
If unsure, say N
2022-05-17 22:16:47 +08:00
endif # ARM64_PSEUDO_NMI
2019-06-11 17:38:11 +08:00
2016-01-26 16:13:44 +08:00
config RELOCATABLE
2020-06-11 20:43:30 +08:00
bool "Build a relocatable kernel image" if EXPERT
2019-08-01 09:18:42 +08:00
select ARCH_HAS_RELR
2020-06-11 20:43:30 +08:00
default y
2016-01-26 16:13:44 +08:00
help
This builds the kernel as a Position Independent Executable (PIE),
which retains all relocation metadata required to relocate the
kernel binary at runtime to a different virtual address than the
address it was linked at.
Since AArch64 uses the RELA relocation format, this requires a
relocation pass at runtime even if the kernel is loaded at the
same address it was linked at.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
2016-07-27 01:16:55 +08:00
select ARM64_MODULE_PLTS if MODULES
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
select RELOCATABLE
help
Randomizes the virtual address at which the kernel image is
loaded, as a security feature that deters exploit attempts
relying on knowledge of the location of kernel internals.
It is the bootloader's job to provide entropy, by passing a
random u64 value in /chosen/kaslr-seed at kernel entry.
2016-01-26 21:48:29 +08:00
When booting via the UEFI stub, it will invoke the firmware's
EFI_RNG_PROTOCOL implementation (if available) to supply entropy
to the kernel proper. In addition, it will randomise the physical
location of the kernel Image as well.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
If unsure, say N.
config RANDOMIZE_MODULE_REGION_FULL
2021-07-30 20:51:31 +08:00
bool "Randomize the module region over a 2 GB range"
2017-06-07 01:00:22 +08:00
depends on RANDOMIZE_BASE
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
default y
help
2021-07-30 20:51:31 +08:00
Randomizes the location of the module region inside a 2 GB window
arm64/kernel: kaslr: reduce module randomization range to 4 GB
We currently have to rely on the GCC large code model for KASLR for
two distinct but related reasons:
- if we enable full randomization, modules will be loaded very far away
from the core kernel, where they are out of range for ADRP instructions,
- even without full randomization, the fact that the 128 MB module region
is now no longer fully reserved for kernel modules means that there is
a very low likelihood that the normal bottom-up allocation of other
vmalloc regions may collide, and use up the range for other things.
Large model code is suboptimal, given that each symbol reference involves
a literal load that goes through the D-cache, reducing cache utilization.
But more importantly, literals are not instructions but part of .text
nonetheless, and hence mapped with executable permissions.
So let's get rid of our dependency on the large model for KASLR, by:
- reducing the full randomization range to 4 GB, thereby ensuring that
ADRP references between modules and the kernel are always in range,
- reduce the spillover range to 4 GB as well, so that we fallback to a
region that is still guaranteed to be in range
- move the randomization window of the core kernel to the middle of the
VMALLOC space
Note that KASAN always uses the module region outside of the vmalloc space,
so keep the kernel close to that if KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:32 +08:00
covering the core kernel. This way, it is less likely for modules
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
to leak information about the location of core kernel data structures
but it does imply that function calls between modules and the core
kernel will need to be resolved via veneers in the module PLT.
When this option is not set, the module region will be randomized over
a limited range that contains the [_stext, _etext] interval of the
2021-07-30 20:51:31 +08:00
core kernel, so branch relocations are almost always in range unless
ARM64_MODULE_PLTS is enabled and the region is exhausted. In this
particular case of region exhaustion, modules might be able to fall
back to a larger 2GB area.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
2018-12-12 20:08:44 +08:00
config CC_HAVE_STACKPROTECTOR_SYSREG
def_bool $(cc-option,-mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0)
config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_SYSREG
2022-10-27 23:59:08 +08:00
config UNWIND_PATCH_PAC_INTO_SCS
bool "Enable shadow call stack dynamically using code patching"
# needs Clang with https://reviews.llvm.org/D111780 incorporated
depends on CC_IS_CLANG && CLANG_VERSION >= 150000
depends on ARM64_PTR_AUTH_KERNEL && CC_HAS_BRANCH_PROT_PAC_RET
depends on SHADOW_CALL_STACK
select UNWIND_TABLES
select DYNAMIC_SCS
2022-05-17 22:16:47 +08:00
endmenu # "Kernel Features"
2012-04-20 21:45:54 +08:00
menu "Boot options"
2016-01-26 19:10:38 +08:00
config ARM64_ACPI_PARKING_PROTOCOL
bool "Enable support for the ARM64 ACPI parking protocol"
depends on ACPI
help
Enable support for the ARM64 ACPI parking protocol. If disabled
the kernel will not allow booting through the ARM64 ACPI parking
protocol even if the corresponding data is present in the ACPI
MADT table.
2012-04-20 21:45:54 +08:00
config CMDLINE
string "Default kernel command string"
default ""
help
Provide a set of default command-line options at build time by
entering them here. As a minimum, you should specify the the
root device (e.g. root=/dev/nfs).
2020-09-22 03:15:57 +08:00
choice
prompt "Kernel command line type" if CMDLINE != ""
default CMDLINE_FROM_BOOTLOADER
help
Choose how the kernel will handle the provided default kernel
command line string.
config CMDLINE_FROM_BOOTLOADER
bool "Use bootloader kernel arguments if available"
help
Uses the command-line options passed by the boot loader. If
the boot loader doesn't provide any, the default kernel command
string provided in CMDLINE will be used.
2012-04-20 21:45:54 +08:00
config CMDLINE_FORCE
bool "Always use the default kernel command string"
help
Always use the default kernel command string, even if the boot
loader passes other arguments to the kernel.
This is useful if you cannot or don't want to change the
command-line options your boot loader passes to the kernel.
2020-09-22 03:15:57 +08:00
endchoice
2014-07-02 20:54:43 +08:00
config EFI_STUB
bool
2014-04-16 09:59:30 +08:00
config EFI
bool "UEFI runtime support"
depends on OF && !CPU_BIG_ENDIAN
2017-10-31 23:50:57 +08:00
depends on KERNEL_MODE_NEON
2018-07-24 17:48:45 +08:00
select ARCH_SUPPORTS_ACPI
2014-04-16 09:59:30 +08:00
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
2014-07-05 01:41:53 +08:00
select EFI_RUNTIME_WRAPPERS
2014-07-02 20:54:43 +08:00
select EFI_STUB
2020-04-16 03:54:18 +08:00
select EFI_GENERIC_STUB
2020-10-30 14:08:40 +08:00
imply IMA_SECURE_AND_OR_TRUSTED_BOOT
2014-04-16 09:59:30 +08:00
default y
help
This option provides support for runtime services provided
by UEFI firmware (such as non-volatile variables, realtime
2022-05-17 22:16:47 +08:00
clock, and platform reset). A UEFI stub is also provided to
2014-04-16 10:47:52 +08:00
allow the kernel to be booted as an EFI application. This
is only useful on systems that have UEFI firmware.
2014-04-16 09:59:30 +08:00
2014-10-04 23:46:43 +08:00
config DMI
bool "Enable support for SMBIOS (DMI) tables"
depends on EFI
default y
help
This enables SMBIOS/DMI feature for systems.
This option is only useful on systems that have UEFI firmware.
However, even with this option, the resultant kernel should
continue to boot on existing non-UEFI platforms.
2022-05-17 22:16:47 +08:00
endmenu # "Boot options"
2012-04-20 21:45:54 +08:00
2013-11-08 02:37:14 +08:00
menu "Power management options"
source "kernel/power/Kconfig"
2016-04-28 00:47:12 +08:00
config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on CPU_PM
config ARCH_HIBERNATION_HEADER
def_bool y
depends on HIBERNATION
2013-11-08 02:37:14 +08:00
config ARCH_SUSPEND_POSSIBLE
def_bool y
2022-05-17 22:16:47 +08:00
endmenu # "Power management options"
2013-11-08 02:37:14 +08:00
2013-07-17 21:54:21 +08:00
menu "CPU Power Management"
source "drivers/cpuidle/Kconfig"
2014-02-24 10:27:57 +08:00
source "drivers/cpufreq/Kconfig"
2022-05-17 22:16:47 +08:00
endmenu # "CPU Power Management"
2014-02-24 10:27:57 +08:00
2015-03-24 22:02:53 +08:00
source "drivers/acpi/Kconfig"
2013-07-04 20:34:32 +08:00
source "arch/arm64/kvm/Kconfig"