2019-05-19 20:07:45 +08:00
# SPDX-License-Identifier: GPL-2.0-only
2012-04-20 21:45:54 +08:00
config ARM64
def_bool y
2015-06-11 00:08:53 +08:00
select ACPI_CCA_REQUIRED if ACPI
2015-03-25 01:58:51 +08:00
select ACPI_GENERIC_GSI if ACPI
2017-04-01 01:51:01 +08:00
select ACPI_GTDT if ACPI
2017-06-15 00:37:12 +08:00
select ACPI_IORT if ACPI
2015-03-24 22:02:51 +08:00
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
2018-12-20 06:46:57 +08:00
select ACPI_MCFG if (ACPI && PCI)
2016-09-28 04:54:14 +08:00
select ACPI_SPCR_TABLE if ACPI
2018-05-12 07:58:01 +08:00
select ACPI_PPTT if ACPI
2020-06-04 07:04:02 +08:00
select ARCH_HAS_DEBUG_WX
2020-03-17 00:50:47 +08:00
select ARCH_BINFMT_ELF_STATE
2021-10-21 08:55:09 +08:00
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
2021-05-05 09:38:21 +08:00
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
2021-05-05 09:38:17 +08:00
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
2021-05-05 09:38:25 +08:00
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
2021-05-05 09:38:21 +08:00
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
2021-05-05 09:38:09 +08:00
select ARCH_HAS_CACHE_LINE_SIZE
2017-01-11 05:35:50 +08:00
select ARCH_HAS_DEBUG_VIRTUAL
mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.
This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.
Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().
This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.
Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.
[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05 07:47:15 +08:00
select ARCH_HAS_DEBUG_VM_PGTABLE
2019-03-25 22:44:06 +08:00
select ARCH_HAS_DMA_PREP_COHERENT
2016-06-20 18:56:13 +08:00
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
2018-04-24 23:25:47 +08:00
select ARCH_HAS_FAST_MULTIPLIER
include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.
GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.
This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.
Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:
* Some of the fortified string functions (strncpy, strcat), don't yet
place a limit on reads from the source based on __builtin_object_size of
the source buffer.
* Extending coverage to more string functions like strlcat.
* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.
* The compile-time checks should be made available via a separate config
option which can be enabled by default (or always enabled) once enough
time has passed to get the issues it catches fixed.
Kees said:
"This is great to have. While it was out-of-tree code, it would have
blocked at least CVE-2016-3858 from being exploitable (improper size
argument to strlcpy()). I've sent a number of fixes for
out-of-bounds-reads that this detected upstream already"
[arnd@arndb.de: x86: fix fortified memcpy]
Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-13 05:36:10 +08:00
select ARCH_HAS_FORTIFY_SOURCE
2014-12-13 08:57:44 +08:00
select ARCH_HAS_GCOV_PROFILE_ALL
2019-05-14 08:19:04 +08:00
select ARCH_HAS_GIGANTIC_PAGE
2016-06-17 00:39:52 +08:00
select ARCH_HAS_KCOV
2019-05-14 08:18:30 +08:00
select ARCH_HAS_KEEPINITRD
2018-01-30 04:20:19 +08:00
select ARCH_HAS_MEMBARRIER_SYNC_CORE
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.
To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").
Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.
However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).
For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 18:11:16 +08:00
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
2019-07-17 07:30:51 +08:00
select ARCH_HAS_PTE_DEVMAP
2018-06-08 08:06:08 +08:00
select ARCH_HAS_PTE_SPECIAL
2019-01-08 02:36:20 +08:00
select ARCH_HAS_SETUP_DMA_OPS
2019-05-23 18:22:54 +08:00
select ARCH_HAS_SET_DIRECT_MAP
2017-02-21 23:09:33 +08:00
select ARCH_HAS_SET_MEMORY
2020-09-14 23:34:09 +08:00
select ARCH_STACKWALK
2017-02-07 08:31:57 +08:00
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
2018-10-08 15:12:01 +08:00
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SYNC_DMA_FOR_CPU
arm64: implement syscall wrappers
To minimize the risk of userspace-controlled values being used under
speculation, this patch adds pt_regs based syscall wrappers for arm64,
which pass the minimum set of required userspace values to syscall
implementations. For each syscall, a wrapper which takes a pt_regs
argument is automatically generated, and this extracts the arguments
before calling the "real" syscall implementation.
Each syscall has three functions generated:
* __do_<compat_>sys_<name> is the "real" syscall implementation, with
the expected prototype.
* __se_<compat_>sys_<name> is the sign-extension/narrowing wrapper,
inherited from common code. This takes a series of long parameters,
casting each to the requisite types required by the "real" syscall
implementation in __do_<compat_>sys_<name>.
This wrapper *may* not be necessary on arm64 given the AAPCS rules on
unused register bits, but it seemed safer to keep the wrapper for now.
* __arm64_<compat_>_sys_<name> takes a struct pt_regs pointer, and
extracts *only* the relevant register values, passing these on to the
__se_<compat_>sys_<name> wrapper.
The syscall invocation code is updated to handle the calling convention
required by __arm64_<compat_>_sys_<name>, and passes a single struct
pt_regs pointer.
The compiler can fold the syscall implementation and its wrappers, such
that the overhead of this approach is minimized.
Note that we play games with sys_ni_syscall(). It can't be defined with
SYSCALL_DEFINE0() because we must avoid the possibility of error
injection. Additionally, there are a couple of locations where we need
to call it from C code, and we don't (currently) have a
ksys_ni_syscall(). While it has no wrapper, passing in a redundant
pt_regs pointer is benign per the AAPCS.
When ARCH_HAS_SYSCALL_WRAPPER is selected, no prototype is defines for
sys_ni_syscall(). Since we need to treat it differently for in-kernel
calls and the syscall tables, the prototype is defined as-required.
The wrappers are largely the same as their x86 counterparts, but
simplified as we don't have a variety of compat calling conventions that
require separate stubs. Unlike x86, we have some zero-argument compat
syscalls, and must define COMPAT_SYSCALL_DEFINE0() to ensure that these
are also given an __arm64_compat_sys_ prefix.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-11 21:56:56 +08:00
select ARCH_HAS_SYSCALL_WRAPPER
2018-12-22 05:14:44 +08:00
select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT
2013-09-04 17:55:17 +08:00
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
2021-07-01 09:52:20 +08:00
select ARCH_HAS_ZONE_DMA_SET if EXPERT
2020-03-17 00:50:47 +08:00
select ARCH_HAVE_ELF_PROT
2017-09-27 23:51:30 +08:00
select ARCH_HAVE_NMI_SAFE_CMPXCHG
2019-10-16 03:17:49 +08:00
select ARCH_INLINE_READ_LOCK if !PREEMPTION
select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION
select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION
select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION
select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION
2019-05-14 08:22:59 +08:00
select ARCH_KEEP_MEMBLOCK
2014-05-09 17:33:01 +08:00
select ARCH_USE_CMPXCHG_LOCKREF
2020-03-18 16:28:31 +08:00
select ARCH_USE_GNU_PROPERTY
2021-04-30 13:55:15 +08:00
select ARCH_USE_MEMTEST
2017-10-12 20:20:50 +08:00
select ARCH_USE_QUEUED_RWLOCKS
2018-03-14 04:45:45 +08:00
select ARCH_USE_QUEUED_SPINLOCKS
2020-05-01 19:54:30 +08:00
select ARCH_USE_SYM_ANNOTATIONS
2020-12-15 11:10:30 +08:00
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
2021-05-05 09:38:13 +08:00
select ARCH_SUPPORTS_HUGETLBFS
2017-06-09 01:25:29 +08:00
select ARCH_SUPPORTS_MEMORY_FAILURE
2020-04-28 00:00:16 +08:00
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
2020-12-12 02:46:33 +08:00
select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
select ARCH_SUPPORTS_LTO_CLANG_THIN
2021-04-09 02:28:43 +08:00
select ARCH_SUPPORTS_CFI_CLANG
2014-06-07 01:53:16 +08:00
select ARCH_SUPPORTS_ATOMIC_RMW
2021-09-11 07:40:44 +08:00
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
2016-04-09 06:50:28 +08:00
select ARCH_SUPPORTS_NUMA_BALANCING
2019-05-08 04:52:28 +08:00
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
2019-12-09 23:08:03 +08:00
select ARCH_WANT_DEFAULT_BPF_JIT
2019-09-24 06:38:47 +08:00
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
2013-01-30 02:25:41 +08:00
select ARCH_WANT_FRAME_POINTERS
2019-06-28 06:00:11 +08:00
select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
2020-11-20 04:46:56 +08:00
select ARCH_WANT_LD_ORPHAN_WARN
2021-06-22 07:18:22 +08:00
select ARCH_WANTS_NO_INSTR
2016-02-06 07:50:18 +08:00
select ARCH_HAS_UBSAN_SANITIZE_ALL
2012-12-18 23:26:13 +08:00
select ARM_AMBA
2012-11-20 18:06:00 +08:00
select ARM_ARCH_TIMER
2013-01-14 20:39:31 +08:00
select ARM_GIC
2014-07-04 15:28:30 +08:00
select AUDIT_ARCH_COMPAT_GENERIC
2016-06-16 04:47:33 +08:00
select ARM_GIC_V2M if PCI
2014-06-30 23:01:31 +08:00
select ARM_GIC_V3
2016-06-16 04:47:33 +08:00
select ARM_GIC_V3_ITS if PCI
2015-07-31 22:46:16 +08:00
select ARM_PSCI_FW
2019-12-04 08:46:31 +08:00
select BUILDTIME_TABLE_SORT
2012-12-18 23:27:25 +08:00
select CLONE_BACKWARDS
2012-09-23 01:33:36 +08:00
select COMMON_CLK
2013-11-08 02:37:14 +08:00
select CPU_PM if (SUSPEND || CPU_IDLE)
2018-08-27 19:02:44 +08:00
select CRC32
2013-11-07 03:32:13 +08:00
select DCACHE_WORD_ACCESS
2018-11-05 03:29:28 +08:00
select DMA_DIRECT_REMAP
2015-07-08 00:15:39 +08:00
select EDAC_SUPPORT
2015-11-10 02:09:55 +08:00
select FRAME_POINTER
2014-10-10 06:26:44 +08:00
select GENERIC_ALLOCATOR
2017-06-01 00:59:28 +08:00
select GENERIC_ARCH_TOPOLOGY
2015-05-30 01:28:44 +08:00
select GENERIC_CLOCKEVENTS_BROADCAST
2014-03-04 09:10:04 +08:00
select GENERIC_CPU_AUTOPROBE
2019-04-16 05:21:29 +08:00
select GENERIC_CPU_VULNERABILITIES
2014-04-08 06:39:52 +08:00
select GENERIC_EARLY_IOREMAP
ARM64: enable GENERIC_FIND_FIRST_BIT
ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
enable it in a config. It leads to using find_next_bit() which is less
efficient:
0000000000000000 <find_first_bit>:
0: aa0003e4 mov x4, x0
4: aa0103e0 mov x0, x1
8: b4000181 cbz x1, 38 <find_first_bit+0x38>
c: f9400083 ldr x3, [x4]
10: d2800802 mov x2, #0x40 // #64
14: 91002084 add x4, x4, #0x8
18: b40000c3 cbz x3, 30 <find_first_bit+0x30>
1c: 14000008 b 3c <find_first_bit+0x3c>
20: f8408483 ldr x3, [x4], #8
24: 91010045 add x5, x2, #0x40
28: b50000c3 cbnz x3, 40 <find_first_bit+0x40>
2c: aa0503e2 mov x2, x5
30: eb02001f cmp x0, x2
34: 54ffff68 b.hi 20 <find_first_bit+0x20> // b.pmore
38: d65f03c0 ret
3c: d2800002 mov x2, #0x0 // #0
40: dac00063 rbit x3, x3
44: dac01063 clz x3, x3
48: 8b020062 add x2, x3, x2
4c: eb02001f cmp x0, x2
50: 9a829000 csel x0, x0, x2, ls // ls = plast
54: d65f03c0 ret
...
0000000000000118 <_find_next_bit.constprop.1>:
118: eb02007f cmp x3, x2
11c: 540002e2 b.cs 178 <_find_next_bit.constprop.1+0x60> // b.hs, b.nlast
120: d346fc66 lsr x6, x3, #6
124: f8667805 ldr x5, [x0, x6, lsl #3]
128: b4000061 cbz x1, 134 <_find_next_bit.constprop.1+0x1c>
12c: f8667826 ldr x6, [x1, x6, lsl #3]
130: 8a0600a5 and x5, x5, x6
134: ca0400a6 eor x6, x5, x4
138: 92800005 mov x5, #0xffffffffffffffff // #-1
13c: 9ac320a5 lsl x5, x5, x3
140: 927ae463 and x3, x3, #0xffffffffffffffc0
144: ea0600a5 ands x5, x5, x6
148: 54000120 b.eq 16c <_find_next_bit.constprop.1+0x54> // b.none
14c: 1400000e b 184 <_find_next_bit.constprop.1+0x6c>
150: d346fc66 lsr x6, x3, #6
154: f8667805 ldr x5, [x0, x6, lsl #3]
158: b4000061 cbz x1, 164 <_find_next_bit.constprop.1+0x4c>
15c: f8667826 ldr x6, [x1, x6, lsl #3]
160: 8a0600a5 and x5, x5, x6
164: eb05009f cmp x4, x5
168: 540000c1 b.ne 180 <_find_next_bit.constprop.1+0x68> // b.any
16c: 91010063 add x3, x3, #0x40
170: eb03005f cmp x2, x3
174: 54fffee8 b.hi 150 <_find_next_bit.constprop.1+0x38> // b.pmore
178: aa0203e0 mov x0, x2
17c: d65f03c0 ret
180: ca050085 eor x5, x4, x5
184: dac000a5 rbit x5, x5
188: dac010a5 clz x5, x5
18c: 8b0300a3 add x3, x5, x3
190: eb03005f cmp x2, x3
194: 9a839042 csel x2, x2, x3, ls // ls = plast
198: aa0203e0 mov x0, x2
19c: d65f03c0 ret
...
0000000000000238 <find_next_bit>:
238: a9bf7bfd stp x29, x30, [sp, #-16]!
23c: aa0203e3 mov x3, x2
240: d2800004 mov x4, #0x0 // #0
244: aa0103e2 mov x2, x1
248: 910003fd mov x29, sp
24c: d2800001 mov x1, #0x0 // #0
250: 97ffffb2 bl 118 <_find_next_bit.constprop.1>
254: a8c17bfd ldp x29, x30, [sp], #16
258: d65f03c0 ret
Enabling find_{first,next}_bit() would also benefit for_each_{set,clear}_bit().
On A-53 find_first_bit() is almost twice faster than find_next_bit(), according
to lib/find_bit_benchmark (thanks to Alexey for testing):
GENERIC_FIND_FIRST_BIT=n:
[7126084.948181] find_first_bit: 47389224 ns, 16357 iterations
[7126085.032315] find_first_bit: 19048193 ns, 655 iterations
GENERIC_FIND_FIRST_BIT=y:
[ 84.158068] find_first_bit: 27193319 ns, 16406 iterations
[ 84.233005] find_first_bit: 11082437 ns, 656 iterations
GENERIC_FIND_FIRST_BIT=n bloats the kernel despite that it disables generation
of find_{first,next}_bit():
yury:linux$ scripts/bloat-o-meter vmlinux vmlinux.ffb
add/remove: 4/1 grow/shrink: 19/251 up/down: 564/-1692 (-1128)
...
Overall, GENERIC_FIND_FIRST_BIT=n is harmful both in terms of performance and
code size, and it's better to have GENERIC_FIND_FIRST_BIT enabled.
Tested-by: Alexey Klimov <aklimov@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210225135700.1381396-2-yury.norov@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-02-25 21:56:59 +08:00
select GENERIC_FIND_FIRST_BIT
2015-08-21 11:40:22 +08:00
select GENERIC_IDLE_POLL_SETUP
arm64: Allow IPIs to be handled as normal interrupts
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-25 22:03:47 +08:00
select GENERIC_IRQ_IPI
2012-04-20 21:45:54 +08:00
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_SHOW
2015-04-23 01:16:33 +08:00
select GENERIC_IRQ_SHOW_LEVEL
2020-07-10 03:05:36 +08:00
select GENERIC_LIB_DEVMEM_IS_ALLOWED
2014-11-19 21:09:07 +08:00
select GENERIC_PCI_IOMAP
2020-02-04 09:36:29 +08:00
select GENERIC_PTDUMP
2013-07-19 07:21:18 +08:00
select GENERIC_SCHED_CLOCK
2012-04-20 21:45:54 +08:00
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
2019-06-21 17:52:31 +08:00
select GENERIC_GETTIMEOFDAY
2020-06-24 16:33:21 +08:00
select GENERIC_VDSO_TIME_NS
2012-04-20 21:45:54 +08:00
select HARDIRQS_SW_RESEND
2020-10-14 08:53:07 +08:00
select HAVE_MOVE_PMD
2020-12-15 11:07:35 +08:00
select HAVE_MOVE_PUD
2018-11-16 03:05:32 +08:00
select HAVE_PCI
2016-12-01 21:51:12 +08:00
select HAVE_ACPI_APEI if (ACPI && EFI)
2014-10-24 20:22:20 +08:00
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
2014-07-04 15:28:30 +08:00
select HAVE_ARCH_AUDITSYSCALL
2014-11-03 10:02:23 +08:00
select HAVE_ARCH_BITREVERSE
2020-03-13 17:04:58 +08:00
select HAVE_ARCH_COMPILER_H
2016-02-16 20:52:35 +08:00
select HAVE_ARCH_HUGE_VMAP
2014-01-07 22:17:13 +08:00
select HAVE_ARCH_JUMP_LABEL
arm64/kernel: jump_label: Switch to relative references
On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:
...
[38088] __jump_table PROGBITS 0000000000000000 00e19f30
000000000002ea10 0000000000000000 WA 0 0 8
[38089] .rela__jump_table RELA 0000000000000000 01fd8bb0
000000000008be30 0000000000000018 I 38178 38088 8
...
In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.
So let's switch from 64-bit absolute references to 32-bit relative
references for the code and target field, and a 64-bit relative
reference for the 'key' field (which may reside in another module or the
core kernel, which may be more than 4 GB way on arm64 when running with
KASLR enable): this reduces the size of the __jump_table by 33%, and
gets rid of the RELA section entirely.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-4-ard.biesheuvel@linaro.org
2018-09-19 14:51:38 +08:00
select HAVE_ARCH_JUMP_LABEL_RELATIVE
2017-11-16 09:36:40 +08:00
select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
2021-03-24 12:05:20 +08:00
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
2018-12-28 16:31:07 +08:00
select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
2020-12-23 04:02:20 +08:00
select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
2021-02-26 09:19:03 +08:00
select HAVE_ARCH_KFENCE
2014-01-28 19:20:22 +08:00
select HAVE_ARCH_KGDB
2016-01-15 07:20:01 +08:00
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
arch: enable relative relocations for arm64, power and x86
Patch series "add support for relative references in special sections", v10.
This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time. On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)
Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.
Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.
Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.
Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.
[From the v7 series blurb, which included the jump_label patches as well]:
For the arm64 kernel, all patches combined reduce the memory footprint
of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
KASLR enabled), of which ~1 MB is the size reduction of the RELA section
in .init, and the remaining 300 KB is reduction of .text/.data.
This patch (of 6):
Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.
Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 12:56:00 +08:00
select HAVE_ARCH_PREL32_RELOCATIONS
2021-04-02 07:23:46 +08:00
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
2014-11-28 13:26:39 +08:00
select HAVE_ARCH_SECCOMP_FILTER
2018-07-21 05:41:54 +08:00
select HAVE_ARCH_STACKLEAK
2017-08-17 05:05:09 +08:00
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
2012-04-20 21:45:54 +08:00
select HAVE_ARCH_TRACEHOOK
2016-04-19 02:16:14 +08:00
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
2017-07-21 21:25:33 +08:00
select HAVE_ARCH_VMAP_STACK
2016-04-19 02:16:14 +08:00
select HAVE_ARM_SMCCC
2019-08-19 13:54:20 +08:00
select HAVE_ASM_MODVERSIONS
2016-05-14 01:08:28 +08:00
select HAVE_EBPF_JIT
2014-04-30 17:54:32 +08:00
select HAVE_C_RECORDMCOUNT
2014-10-24 20:22:20 +08:00
select HAVE_CMPXCHG_DOUBLE
2015-05-29 21:57:47 +08:00
select HAVE_CMPXCHG_LOCAL
2016-04-19 02:16:14 +08:00
select HAVE_CONTEXT_TRACKING
2012-10-09 07:28:11 +08:00
select HAVE_DEBUG_KMEMLEAK
2013-12-13 03:28:33 +08:00
select HAVE_DMA_CONTIGUOUS
2014-04-30 17:54:34 +08:00
select HAVE_DYNAMIC_FTRACE
arm64: implement ftrace with regs
This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
function's arguments (and some other registers) to be captured into a
struct pt_regs, allowing these to be inspected and/or modified. This is
a building block for live-patching, where a function's arguments may be
forwarded to another function. This is also necessary to enable ftrace
and in-kernel pointer authentication at the same time, as it allows the
LR value to be captured and adjusted prior to signing.
Using GCC's -fpatchable-function-entry=N option, we can have the
compiler insert a configurable number of NOPs between the function entry
point and the usual prologue. This also ensures functions are AAPCS
compliant (e.g. disabling inter-procedural register allocation).
For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
following:
| unsigned long bar(void);
|
| unsigned long foo(void)
| {
| return bar() + 1;
| }
... to:
| <foo>:
| nop
| nop
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl 0 <bar>
| add x0, x0, #0x1
| ldp x29, x30, [sp], #16
| ret
This patch builds the kernel with -fpatchable-function-entry=2,
prefixing each function with two NOPs. To trace a function, we replace
these NOPs with a sequence that saves the LR into a GPR, then calls an
ftrace entry assembly function which saves this and other relevant
registers:
| mov x9, x30
| bl <ftrace-entry>
Since patchable functions are AAPCS compliant (and the kernel does not
use x18 as a platform register), x9-x18 can be safely clobbered in the
patched sequence and the ftrace entry code.
There are now two ftrace entry functions, ftrace_regs_entry (which saves
all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
allocated for each within modules.
Signed-off-by: Torsten Duwe <duwe@suse.de>
[Mark: rework asm, comments, PLTs, initialization, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Will Deacon <will@kernel.org>
2019-02-08 23:10:19 +08:00
select HAVE_DYNAMIC_FTRACE_WITH_REGS \
if $(cc-option,-fpatchable-function-entry=2)
2020-12-12 02:46:32 +08:00
select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
if DYNAMIC_FTRACE_WITH_REGS
2013-12-17 01:50:08 +08:00
select HAVE_EFFICIENT_UNALIGNED_ACCESS
2019-07-12 11:57:14 +08:00
select HAVE_FAST_GUP
2014-04-30 17:54:32 +08:00
select HAVE_FTRACE_MCOUNT_RECORD
2014-04-30 17:54:33 +08:00
select HAVE_FUNCTION_TRACER
2019-08-06 18:00:14 +08:00
select HAVE_FUNCTION_ERROR_INJECTION
2014-04-30 17:54:33 +08:00
select HAVE_FUNCTION_GRAPH_TRACER
2016-05-24 06:09:38 +08:00
select HAVE_GCC_PLUGINS
2012-04-20 21:45:54 +08:00
select HAVE_HW_BREAKPOINT if PERF_EVENTS
2015-11-23 23:12:59 +08:00
select HAVE_IRQ_TIME_ACCOUNTING
2021-09-22 06:22:31 +08:00
select HAVE_KVM
2017-09-27 23:51:30 +08:00
select HAVE_NMI
2014-02-08 01:12:45 +08:00
select HAVE_PATA_PLATFORM
2012-04-20 21:45:54 +08:00
select HAVE_PERF_EVENTS
2014-02-04 02:18:27 +08:00
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
2016-07-09 00:35:45 +08:00
select HAVE_REGS_AND_STACK_ACCESS_API
2021-10-18 22:47:13 +08:00
select HAVE_POSIX_CPU_TIMERS_TASK_WORK
2019-04-12 22:22:01 +08:00
select HAVE_FUNCTION_ARG_ACCESS_API
2020-01-20 18:36:02 +08:00
select HAVE_FUTEX_CMPXCHG if FUTEX
2020-02-04 09:37:02 +08:00
select MMU_GATHER_RCU_TABLE_FREE
2018-06-20 21:46:50 +08:00
select HAVE_RSEQ
2018-06-14 18:36:45 +08:00
select HAVE_STACKPROTECTOR
2014-04-30 17:54:36 +08:00
select HAVE_SYSCALL_TRACEPOINTS
arm64: Kprobes with single stepping support
Add support for basic kernel probes(kprobes) and jump probes
(jprobes) for ARM64.
Kprobes utilizes software breakpoint and single step debug
exceptions supported on ARM v8.
A software breakpoint is placed at the probe address to trap the
kernel execution into the kprobe handler.
ARM v8 supports enabling single stepping before the break exception
return (ERET), with next PC in exception return address (ELR_EL1). The
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed, and
enables single stepping. The PC is set to the out-of-line slot address
before the ERET. With this scheme, the instruction is executed with the
exact same register context except for the PC (and DAIF) registers.
Debug mask (PSTATE.D) is enabled only when single stepping a recursive
kprobe, e.g.: during kprobes reenter so that probed instruction can be
single stepped within the kprobe handler -exception- context.
The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
any further re-entry is prevented by not calling handlers and the case
counted as a missed kprobe).
Single stepping from the x-o-l slot has a drawback for PC-relative accesses
like branching and symbolic literals access as the offset from the new PC
(slot address) may not be ensured to fit in the immediate value of
the opcode. Such instructions need simulation, so reject
probing them.
Instructions generating exceptions or cpu mode change are rejected
for probing.
Exclusive load/store instructions are rejected too. Additionally, the
code is checked to see if it is inside an exclusive load/store sequence
(code from Pratyush).
System instructions are mostly enabled for stepping, except MSR/MRS
accesses to "DAIF" flags in PSTATE, which are not safe for
probing.
This also changes arch/arm64/include/asm/ptrace.h to use
include/asm-generic/ptrace.h.
Thanks to Steve Capper and Pratyush Anand for several suggested
Changes.
Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-09 00:35:48 +08:00
select HAVE_KPROBES
2017-02-06 17:54:33 +08:00
select HAVE_KRETPROBES
2019-06-21 17:52:31 +08:00
select HAVE_GENERIC_VDSO
2015-10-02 03:14:00 +08:00
select IOMMU_DMA if IOMMU_SUPPORT
2012-04-20 21:45:54 +08:00
select IRQ_DOMAIN
2015-04-28 04:53:09 +08:00
select IRQ_FORCED_THREADING
2021-03-24 12:05:22 +08:00
select KASAN_VMALLOC if KASAN_GENERIC
2012-10-16 18:26:57 +08:00
select MODULES_USE_ELF_RELA
2018-05-09 12:53:49 +08:00
select NEED_DMA_MAP_STATE
2018-04-05 15:44:52 +08:00
select NEED_SG_DMA_LENGTH
2012-04-20 21:45:54 +08:00
select OF
select OF_EARLY_FLATTREE
2018-11-16 03:05:33 +08:00
select PCI_DOMAINS_GENERIC if PCI
2018-12-20 06:46:57 +08:00
select PCI_ECAM if (ACPI && PCI)
2018-11-16 03:05:34 +08:00
select PCI_SYSCALL if PCI
2013-03-01 02:14:37 +08:00
select POWER_RESET
select POWER_SUPPLY
2012-04-20 21:45:54 +08:00
select SPARSE_IRQ
2018-04-24 15:00:54 +08:00
select SWIOTLB
2012-10-09 07:28:16 +08:00
select SYSCTL_EXCEPTION_TRACE
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
select THREAD_INFO_IN_TASK
userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9.
Overview
========
This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:
Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.
We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".
Use Case
========
Consider the use case of VM live migration (e.g. under QEMU/KVM):
1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".
2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.
3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.
4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".
We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.
Interaction with Existing APIs
==============================
Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:
UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:
- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.
UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).
- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).
Future Work
===========
This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.
This patch (of 6):
This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:
Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.
This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.
This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].
However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.
[1] https://lore.kernel.org/patchwork/patch/1380226/
[peterx@redhat.com: fix minor fault page leak]
Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 09:35:36 +08:00
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
2021-07-31 13:22:32 +08:00
select TRACE_IRQFLAGS_SUPPORT
2012-04-20 21:45:54 +08:00
help
ARM 64-bit (AArch64) Linux support.
config 64BIT
def_bool y
config MMU
def_bool y
2016-05-31 22:57:59 +08:00
config ARM64_PAGE_SHIFT
int
default 16 if ARM64_64K_PAGES
default 14 if ARM64_16K_PAGES
default 12
2020-09-10 17:59:35 +08:00
config ARM64_CONT_PTE_SHIFT
2016-05-31 22:57:59 +08:00
int
default 5 if ARM64_64K_PAGES
default 7 if ARM64_16K_PAGES
default 4
2020-09-10 17:59:36 +08:00
config ARM64_CONT_PMD_SHIFT
int
default 5 if ARM64_64K_PAGES
default 5 if ARM64_16K_PAGES
default 4
2016-01-15 07:20:01 +08:00
config ARCH_MMAP_RND_BITS_MIN
default 14 if ARM64_64K_PAGES
default 16 if ARM64_16K_PAGES
default 18
# max bits determined by the following formula:
# VA_BITS - PAGE_SHIFT - 3
config ARCH_MMAP_RND_BITS_MAX
default 19 if ARM64_VA_BITS=36
default 24 if ARM64_VA_BITS=39
default 27 if ARM64_VA_BITS=42
default 30 if ARM64_VA_BITS=47
default 29 if ARM64_VA_BITS=48 && ARM64_64K_PAGES
default 31 if ARM64_VA_BITS=48 && ARM64_16K_PAGES
default 33 if ARM64_VA_BITS=48
default 14 if ARM64_64K_PAGES
default 16 if ARM64_16K_PAGES
default 18
config ARCH_MMAP_RND_COMPAT_BITS_MIN
default 7 if ARM64_64K_PAGES
default 9 if ARM64_16K_PAGES
default 11
config ARCH_MMAP_RND_COMPAT_BITS_MAX
default 16
2014-04-08 06:39:19 +08:00
config NO_IOPORT_MAP
2014-09-29 22:29:31 +08:00
def_bool y if !PCI
2012-04-20 21:45:54 +08:00
config STACKTRACE_SUPPORT
def_bool y
2015-08-19 03:50:10 +08:00
config ILLEGAL_POINTER_VALUE
hex
default 0xdead000000000000
2012-04-20 21:45:54 +08:00
config LOCKDEP_SUPPORT
def_bool y
2015-07-24 23:37:48 +08:00
config GENERIC_BUG
def_bool y
depends on BUG
config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
depends on GENERIC_BUG
2012-04-20 21:45:54 +08:00
config GENERIC_HWEIGHT
def_bool y
config GENERIC_CSUM
def_bool y
config GENERIC_CALIBRATE_DELAY
def_bool y
2021-05-05 09:39:54 +08:00
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
2015-05-30 01:28:44 +08:00
config SMP
def_bool y
2013-07-09 21:18:12 +08:00
config KERNEL_MODE_NEON
def_bool y
2014-04-19 06:19:59 +08:00
config FIX_EARLYCON_MEM
def_bool y
2015-04-15 06:45:39 +08:00
config PGTABLE_LEVELS
int
2015-10-19 21:19:38 +08:00
default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
2015-04-15 06:45:39 +08:00
default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
2019-08-07 23:55:22 +08:00
default 3 if ARM64_64K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
2015-04-15 06:45:39 +08:00
default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
2015-10-19 21:19:37 +08:00
default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48
2015-04-15 06:45:39 +08:00
2016-11-02 17:10:46 +08:00
config ARCH_SUPPORTS_UPROBES
def_bool y
2017-06-14 18:43:55 +08:00
config ARCH_PROC_KCORE_TEXT
def_bool y
2020-01-15 22:18:25 +08:00
config BROKEN_GAS_INST
def_bool !$(as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n)
2019-08-07 23:55:15 +08:00
config KASAN_SHADOW_OFFSET
hex
2020-12-23 04:02:06 +08:00
depends on KASAN_GENERIC || KASAN_SW_TAGS
arm64: mm: extend linear region for 52-bit VA configurations
For historical reasons, the arm64 kernel VA space is configured as two
equally sized halves, i.e., on a 48-bit VA build, the VA space is split
into a 47-bit vmalloc region and a 47-bit linear region.
When support for 52-bit virtual addressing was added, this equal split
was kept, resulting in a substantial waste of virtual address space in
the linear region:
48-bit VA 52-bit VA
0xffff_ffff_ffff_ffff +-------------+ +-------------+
| vmalloc | | vmalloc |
0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+
| linear | : :
0xffff_0000_0000_0000 +-------------+ : :
: : : :
: : : :
: : : :
: : : currently :
: unusable : : :
: : : unused :
: by : : :
: : : :
: hardware : : :
: : : :
0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+
: : | |
: : | |
: : | |
: : | |
: : | |
: unusable : | |
: : | linear |
: by : | |
: : | region |
: hardware : | |
: : | |
: : | |
: : | |
: : | |
: : | |
: : | |
0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+
As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc
space (as before), to ensure that a single 64k granule kernel image can
support any 64k granule capable system, regardless of whether it supports
the 52-bit virtual addressing extension. However, due to the fact that
the VA space is still split in equal halves, the linear region is only
2^51 bytes in size, wasting almost half of the 52-bit VA space.
Let's fix this, by abandoning the equal split, and simply assigning all
VA space outside of the vmalloc region to the linear region.
The KASAN shadow region is reconfigured so that it ends at the start of
the vmalloc region, and grows downwards. That way, the arrangement of
the vmalloc space (which contains kernel mappings, modules, BPF region,
the vmemmap array etc) is identical between non-KASAN and KASAN builds,
which aids debugging.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-10-08 23:36:00 +08:00
default 0xdfff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && !KASAN_SW_TAGS
default 0xdfffc00000000000 if ARM64_VA_BITS_47 && !KASAN_SW_TAGS
default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
default 0xefff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && KASAN_SW_TAGS
default 0xefffc00000000000 if ARM64_VA_BITS_47 && KASAN_SW_TAGS
default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
2019-08-07 23:55:15 +08:00
default 0xffffffffffffffff
2015-07-21 03:09:16 +08:00
source "arch/arm64/Kconfig.platforms"
2012-04-20 21:45:54 +08:00
menu "Kernel Features"
2014-11-14 23:54:12 +08:00
menu "ARM errata workarounds via the alternatives framework"
2018-12-01 01:18:00 +08:00
config ARM64_WORKAROUND_CLEAN_CACHE
2019-04-29 21:21:11 +08:00
bool
2018-12-01 01:18:00 +08:00
2014-11-14 23:54:12 +08:00
config ARM64_ERRATUM_826319
bool "Cortex-A53: 826319: System might deadlock if a write cannot complete until read data is accepted"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 826319 on Cortex-A53 parts up to r0p2 with an AMBA 4 ACE or
AXI master interface and an L2 cache.
If a Cortex-A53 uses an AMBA AXI4 ACE interface to other processors
and is unable to accept a certain write via this interface, it will
not progress on read data presented on the read data channel and the
system can deadlock.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_827319
bool "Cortex-A53: 827319: Data cache clean instructions might cause overlapping transactions to the interconnect"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 827319 on Cortex-A53 parts up to r0p2 with an AMBA 5 CHI
master interface and an L2 cache.
Under certain conditions this erratum can cause a clean line eviction
to occur at the same time as another transaction to the same address
on the AMBA 5 CHI interface, which can cause data corruption if the
interconnect reorders the two transactions.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_824069
bool "Cortex-A53: 824069: Cache line might not be marked as clean after a CleanShared snoop"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 824069 on Cortex-A53 parts up to r0p2 when it is connected
to a coherent interconnect.
If a Cortex-A53 processor is executing a store or prefetch for
write instruction at the same time as a processor in another
cluster is executing a cache maintenance operation to the same
address, then this erratum might cause a clean cache line to be
incorrectly marked as dirty.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this option does not necessarily enable the
workaround, as it depends on the alternative framework, which will
only patch the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_819472
bool "Cortex-A53: 819472: Store exclusive instructions might cause data corruption"
default y
2018-12-01 01:18:00 +08:00
select ARM64_WORKAROUND_CLEAN_CACHE
2014-11-14 23:54:12 +08:00
help
This option adds an alternative code sequence to work around ARM
erratum 819472 on Cortex-A53 parts up to r0p1 with an L2 cache
present when it is connected to a coherent interconnect.
If the processor is executing a load and store exclusive sequence at
the same time as a processor in another cluster is executing a cache
maintenance operation to the same address, then this erratum might
cause data corruption.
The workaround promotes data cache clean instructions to
data cache clean-and-invalidate.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_832075
bool "Cortex-A57: 832075: possible deadlock on mixing exclusive memory accesses with device loads"
default y
help
This option adds an alternative code sequence to work around ARM
erratum 832075 on Cortex-A57 parts up to r1p2.
Affected Cortex-A57 parts might deadlock when exclusive load/store
instructions to Write-Back memory are mixed with Device loads.
The workaround is to promote device loads to use Load-Acquire
semantics.
Please note that this does not necessarily enable the workaround,
2015-11-16 18:28:18 +08:00
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
config ARM64_ERRATUM_834220
bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault"
depends on KVM
default y
help
This option adds an alternative code sequence to work around ARM
erratum 834220 on Cortex-A57 parts up to r1p2.
Affected Cortex-A57 parts might report a Stage 2 translation
fault as the result of a Stage 1 fault for load crossing a
page boundary when there is a permission or device memory
alignment fault at Stage 1 and a translation fault at Stage 2.
The workaround is to verify that the Stage 1 translation
doesn't generate a fault before handling the Stage 2 fault.
Please note that this does not necessarily enable the workaround,
2014-11-14 23:54:12 +08:00
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
2015-03-24 03:07:02 +08:00
config ARM64_ERRATUM_845719
bool "Cortex-A53: 845719: a load might read incorrect data"
depends on COMPAT
default y
help
This option adds an alternative code sequence to work around ARM
erratum 845719 on Cortex-A53 parts up to r0p4.
When running a compat (AArch32) userspace on an affected Cortex-A53
part, a load at EL0 from a virtual address that matches the bottom 32
bits of the virtual address used by a recent load at (AArch64) EL1
might return incorrect data.
The workaround is to write the contextidr_el1 register on exception
return to a 32-bit task.
Please note that this does not necessarily enable the workaround,
as it depends on the alternative framework, which will only patch
the kernel if an affected CPU is detected.
If unsure, say Y.
2015-03-17 20:15:02 +08:00
config ARM64_ERRATUM_843419
bool "Cortex-A53: 843419: A load or store might access an incorrect address"
default y
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:33 +08:00
select ARM64_MODULE_PLTS if MODULES
2015-03-17 20:15:02 +08:00
help
2016-08-22 18:58:36 +08:00
This option links the kernel with '--fix-cortex-a53-843419' and
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:33 +08:00
enables PLT support to replace certain ADRP instructions, which can
cause subsequent memory accesses to use an incorrect address on
Cortex-A53 parts up to r0p4.
2015-03-17 20:15:02 +08:00
If unsure, say Y.
2021-03-24 15:11:28 +08:00
config ARM64_LD_HAS_FIX_ERRATUM_843419
def_bool $(ld-option,--fix-cortex-a53-843419)
2018-03-26 22:12:49 +08:00
config ARM64_ERRATUM_1024718
bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update"
default y
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A55 Erratum 1024718.
2018-03-26 22:12:49 +08:00
2021-02-04 07:00:57 +08:00
Affected Cortex-A55 cores (all revisions) could cause incorrect
2018-03-26 22:12:49 +08:00
update of the hardware dirty bit when the DBM/AP bits are updated
2019-04-29 21:21:11 +08:00
without a break-before-make. The workaround is to disable the usage
2018-03-26 22:12:49 +08:00
of hardware DBM locally on the affected cores. CPUs not affected by
2019-04-29 21:21:11 +08:00
this erratum will continue to use the feature.
2015-03-17 20:15:02 +08:00
If unsure, say Y.
2019-05-23 18:24:50 +08:00
config ARM64_ERRATUM_1418040
2019-04-15 20:03:54 +08:00
bool "Cortex-A76/Neoverse-N1: MRC read following MRRC read of specific Generic Timer in AArch32 might give incorrect result"
2018-09-28 00:15:34 +08:00
default y
2019-04-15 20:03:52 +08:00
depends on COMPAT
2018-09-28 00:15:34 +08:00
help
2019-05-01 22:45:36 +08:00
This option adds a workaround for ARM Cortex-A76/Neoverse-N1
2019-05-23 18:24:50 +08:00
errata 1188873 and 1418040.
2018-09-28 00:15:34 +08:00
2019-05-23 18:24:50 +08:00
Affected Cortex-A76/Neoverse-N1 cores (r0p0 to r3p1) could
2019-04-15 20:03:54 +08:00
cause register corruption when accessing the timer registers
from AArch32 userspace.
2018-09-28 00:15:34 +08:00
If unsure, say Y.
2020-05-04 17:48:58 +08:00
config ARM64_WORKAROUND_SPECULATIVE_AT
2019-12-16 19:56:29 +08:00
bool
2018-12-07 01:31:26 +08:00
config ARM64_ERRATUM_1165522
2020-05-04 17:48:58 +08:00
bool "Cortex-A76: 1165522: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
2018-12-07 01:31:26 +08:00
default y
2020-05-04 17:48:58 +08:00
select ARM64_WORKAROUND_SPECULATIVE_AT
2018-12-07 01:31:26 +08:00
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A76 erratum 1165522.
2018-12-07 01:31:26 +08:00
Affected Cortex-A76 cores (r0p0, r1p0, r2p0) could end-up with
corrupted TLBs by speculating an AT instruction during a guest
context switch.
If unsure, say Y.
2020-05-04 17:48:58 +08:00
config ARM64_ERRATUM_1319367
bool "Cortex-A57/A72: 1319537: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
default y
select ARM64_WORKAROUND_SPECULATIVE_AT
help
This option adds work arounds for ARM Cortex-A57 erratum 1319537
and A72 erratum 1319367
Cortex-A57 and A72 cores could end-up with corrupted TLBs by
speculating an AT instruction during a guest context switch.
If unsure, say Y.
2019-12-16 19:56:31 +08:00
config ARM64_ERRATUM_1530923
2020-05-04 17:48:58 +08:00
bool "Cortex-A55: 1530923: Speculative AT instruction using out-of-context translation regime could cause subsequent request to generate an incorrect translation"
2019-12-16 19:56:31 +08:00
default y
2020-05-04 17:48:58 +08:00
select ARM64_WORKAROUND_SPECULATIVE_AT
2019-12-16 19:56:31 +08:00
help
This option adds a workaround for ARM Cortex-A55 erratum 1530923.
Affected Cortex-A55 cores (r0p0, r0p1, r1p0, r2p0) could end-up with
corrupted TLBs by speculating an AT instruction during a guest
context switch.
If unsure, say Y.
2018-12-07 01:31:26 +08:00
2020-04-16 19:56:57 +08:00
config ARM64_WORKAROUND_REPEAT_TLBI
bool
2018-11-19 19:27:28 +08:00
config ARM64_ERRATUM_1286807
bool "Cortex-A76: Modification of the translation table for a virtual address might lead to read-after-read ordering violation"
default y
select ARM64_WORKAROUND_REPEAT_TLBI
help
2019-04-29 21:21:11 +08:00
This option adds a workaround for ARM Cortex-A76 erratum 1286807.
2018-11-19 19:27:28 +08:00
On the affected Cortex-A76 cores (r0p0 to r3p0), if a virtual
address for a cacheable mapping of a location is being
accessed by a core while another core is remapping the virtual
address to a new physical page using the recommended
break-before-make sequence, then under very rare circumstances
TLBI+DSB completes before a read using the translation being
invalidated has been observed by other observers. The
workaround repeats the TLBI+DSB operation.
2019-04-29 20:03:57 +08:00
config ARM64_ERRATUM_1463225
bool "Cortex-A76: Software Step might prevent interrupt recognition"
default y
help
This option adds a workaround for Arm Cortex-A76 erratum 1463225.
On the affected Cortex-A76 cores (r0p0 to r3p1), software stepping
of a system call instruction (SVC) can prevent recognition of
subsequent interrupts when software stepping is disabled in the
exception handler of the system call and either kernel debugging
is enabled or VHE is in use.
Work around the erratum by triggering a dummy step exception
when handling a system call from a task that is being stepped
in a VHE configuration of the kernel.
If unsure, say Y.
2019-10-18 01:42:58 +08:00
config ARM64_ERRATUM_1542419
bool "Neoverse-N1: workaround mis-ordering of instruction fetches"
default y
help
This option adds a workaround for ARM Neoverse-N1 erratum
1542419.
Affected Neoverse-N1 cores could execute a stale instruction when
modified by another CPU. The workaround depends on a firmware
counterpart.
Workaround the issue by hiding the DIC feature from EL0. This
forces user-space to perform cache maintenance.
If unsure, say Y.
2020-10-29 02:28:39 +08:00
config ARM64_ERRATUM_1508412
bool "Cortex-A77: 1508412: workaround deadlock on sequence of NC/Device load and store exclusive or PAR read"
default y
help
This option adds a workaround for Arm Cortex-A77 erratum 1508412.
Affected Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence
of a store-exclusive or read of PAR_EL1 and a load with device or
non-cacheable memory attributes. The workaround depends on a firmware
counterpart.
KVM guests must also have the workaround implemented or they can
deadlock the system.
Work around the issue by inserting DMB SY barriers around PAR_EL1
register reads and warning KVM users. The DMB barrier is sufficient
to prevent a speculative PAR_EL1 read.
If unsure, say Y.
2021-10-20 00:31:40 +08:00
config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
bool
config ARM64_ERRATUM_2119858
bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
default y
depends on CORESIGHT_TRBE
select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
help
This option adds the workaround for ARM Cortex-A710 erratum 2119858.
Affected Cortex-A710 cores could overwrite up to 3 cache lines of trace
data at the base of the buffer (pointed to by TRBASER_EL1) in FILL mode in
the event of a WRAP event.
Work around the issue by always making sure we move the TRBPTR_EL1 by
256 bytes before enabling the buffer and filling the first 256 bytes of
the buffer with ETM ignore packets upon disabling.
If unsure, say Y.
config ARM64_ERRATUM_2139208
bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
default y
depends on CORESIGHT_TRBE
select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
help
This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
Affected Neoverse-N2 cores could overwrite up to 3 cache lines of trace
data at the base of the buffer (pointed to by TRBASER_EL1) in FILL mode in
the event of a WRAP event.
Work around the issue by always making sure we move the TRBPTR_EL1 by
256 bytes before enabling the buffer and filling the first 256 bytes of
the buffer with ETM ignore packets upon disabling.
If unsure, say Y.
2021-10-20 00:31:41 +08:00
config ARM64_WORKAROUND_TSB_FLUSH_FAILURE
bool
config ARM64_ERRATUM_2054223
bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace"
default y
select ARM64_WORKAROUND_TSB_FLUSH_FAILURE
help
Enable workaround for ARM Cortex-A710 erratum 2054223
Affected cores may fail to flush the trace data on a TSB instruction, when
the PE is in trace prohibited state. This will cause losing a few bytes
of the trace cached.
Workaround is to issue two TSB consecutively on affected cores.
If unsure, say Y.
config ARM64_ERRATUM_2067961
bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace"
default y
select ARM64_WORKAROUND_TSB_FLUSH_FAILURE
help
Enable workaround for ARM Neoverse-N2 erratum 2067961
Affected cores may fail to flush the trace data on a TSB instruction, when
the PE is in trace prohibited state. This will cause losing a few bytes
of the trace cached.
Workaround is to issue two TSB consecutively on affected cores.
If unsure, say Y.
2021-10-20 00:31:42 +08:00
config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
bool
config ARM64_ERRATUM_2253138
bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range"
depends on CORESIGHT_TRBE
default y
select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
help
This option adds the workaround for ARM Neoverse-N2 erratum 2253138.
Affected Neoverse-N2 cores might write to an out-of-range address, not reserved
for TRBE. Under some conditions, the TRBE might generate a write to the next
virtually addressed page following the last page of the TRBE address space
(i.e., the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
Work around this in the driver by always making sure that there is a
page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
If unsure, say Y.
config ARM64_ERRATUM_2224489
bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range"
depends on CORESIGHT_TRBE
default y
select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE
help
This option adds the workaround for ARM Cortex-A710 erratum 2224489.
Affected Cortex-A710 cores might write to an out-of-range address, not reserved
for TRBE. Under some conditions, the TRBE might generate a write to the next
virtually addressed page following the last page of the TRBE address space
(i.e., the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base.
Work around this in the driver by always making sure that there is a
page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE.
If unsure, say Y.
2015-09-22 04:58:38 +08:00
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
help
2019-04-29 21:21:11 +08:00
Enable workaround for errata 22375 and 24313.
2015-09-22 04:58:38 +08:00
This implements two gicv3-its errata workarounds for ThunderX. Both
2019-04-29 21:21:11 +08:00
with a small impact affecting only ITS table allocation.
2015-09-22 04:58:38 +08:00
erratum 22375: only alloc 8MB table size
erratum 24313: ignore memory access type
The fixes are in ITS initialization and basically ignore memory access
type and table size provided by the TYPER and BASER registers.
If unsure, say Y.
2016-05-25 21:29:20 +08:00
config CAVIUM_ERRATUM_23144
bool "Cavium erratum 23144: ITS SYNC hang on dual socket system"
depends on NUMA
default y
help
ITS SYNC command hang for cross node io and collections/cpu mapping.
If unsure, say Y.
2015-09-22 04:58:35 +08:00
config CAVIUM_ERRATUM_23154
bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
default y
help
The gicv3 of ThunderX requires a modified version for
reading the IAR status to ensure data synchronization
(access to icc_iar1_el1 is not sync'ed before and after).
If unsure, say Y.
2016-02-25 09:44:57 +08:00
config CAVIUM_ERRATUM_27456
bool "Cavium erratum 27456: Broadcast TLBI instructions may cause icache corruption"
default y
help
On ThunderX T88 pass 1.x through 2.1 parts, broadcast TLBI
instructions may cause the icache to become corrupted if it
contains data for a non-current ASID. The fix is to
invalidate the icache when changing the mm context.
If unsure, say Y.
2017-06-09 19:49:48 +08:00
config CAVIUM_ERRATUM_30115
bool "Cavium erratum 30115: Guest may disable interrupts in host"
default y
help
On ThunderX T88 pass 1.x through 2.2, T81 pass 1.0 through
1.2, and T83 Pass 1.0, KVM guest execution may disable
interrupts in host. Trapping both GICv3 group-0 and group-1
accesses sidesteps the issue.
If unsure, say Y.
2019-09-13 17:57:50 +08:00
config CAVIUM_TX2_ERRATUM_219
bool "Cavium ThunderX2 erratum 219: PRFM between TTBR change and ISB fails"
default y
help
On Cavium ThunderX2, a load, store or prefetch instruction between a
TTBR update and the corresponding context synchronizing operation can
cause a spurious Data Abort to be delivered to any hardware thread in
the CPU core.
Work around the issue by avoiding the problematic code sequence and
trapping KVM guest TTBRx_EL1 writes to EL2 when SMT is enabled. The
trap handler performs the corresponding register access, skips the
instruction and ensures context synchronization by virtue of the
exception return.
If unsure, say Y.
2020-04-16 19:56:57 +08:00
config FUJITSU_ERRATUM_010001
bool "Fujitsu-A64FX erratum E#010001: Undefined fault may occur wrongly"
default y
help
This option adds a workaround for Fujitsu-A64FX erratum E#010001.
On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1), memory
accesses may cause undefined fault (Data abort, DFSC=0b111111).
This fault occurs under a specific hardware condition when a
load/store instruction performs an address translation using:
case-1 TTBR0_EL1 with TCR_EL1.NFD0 == 1.
case-2 TTBR0_EL2 with TCR_EL2.NFD0 == 1.
case-3 TTBR1_EL1 with TCR_EL1.NFD1 == 1.
case-4 TTBR1_EL2 with TCR_EL2.NFD1 == 1.
The workaround is to ensure these bits are clear in TCR_ELx.
The workaround only affects the Fujitsu-A64FX.
If unsure, say Y.
config HISILICON_ERRATUM_161600802
bool "Hip07 161600802: Erroneous redistributor VLPI base"
default y
help
The HiSilicon Hip07 SoC uses the wrong redistributor base
when issued ITS commands such as VMOVP and VMAPP, and requires
a 128kB offset to be applied to the target address in this commands.
If unsure, say Y.
2017-02-09 04:08:37 +08:00
config QCOM_FALKOR_ERRATUM_1003
bool "Falkor E1003: Incorrect translation due to ASID change"
default y
help
On Falkor v1, an incorrect ASID may be cached in the TLB when ASID
2017-11-14 22:29:19 +08:00
and BADDR are changed together in TTBRx_EL1. Since we keep the ASID
in TTBR1_EL1, this situation only occurs in the entry trampoline and
then only for entries in the walk cache, since the leaf translation
is unchanged. Work around the erratum by invalidating the walk cache
entries for the trampoline before entering the kernel proper.
2017-02-09 04:08:37 +08:00
2017-02-01 01:50:19 +08:00
config QCOM_FALKOR_ERRATUM_1009
bool "Falkor E1009: Prematurely complete a DSB after a TLBI"
default y
2018-11-19 19:27:28 +08:00
select ARM64_WORKAROUND_REPEAT_TLBI
2017-02-01 01:50:19 +08:00
help
On Falkor v1, the CPU may prematurely complete a DSB following a
TLBI xxIS invalidate maintenance operation. Repeat the TLBI operation
one more time to fix the issue.
If unsure, say Y.
2017-03-07 22:20:38 +08:00
config QCOM_QDF2400_ERRATUM_0065
bool "QDF2400 E0065: Incorrect GITS_TYPER.ITT_Entry_size"
default y
help
On Qualcomm Datacenter Technologies QDF2400 SoC, ITS hardware reports
ITE size incorrectly. The GITS_TYPER.ITT_Entry_size field should have
been indicated as 16Bytes (0xf), not 8Bytes (0x7).
If unsure, say Y.
2017-12-12 06:42:32 +08:00
config QCOM_FALKOR_ERRATUM_E1041
bool "Falkor E1041: Speculative instruction fetches might cause errant memory access"
default y
help
Falkor CPU may speculatively fetch instructions from an improper
memory location when MMU translation is changed from SCTLR_ELn[M]=1
to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
If unsure, say Y.
2021-03-24 08:28:09 +08:00
config NVIDIA_CARMEL_CNP_ERRATUM
bool "NVIDIA Carmel CNP: CNP on Carmel semantically different than ARM cores"
default y
help
If CNP is enabled on Carmel cores, non-sharable TLBIs on a core will not
invalidate shared TLB entries installed by a different core, as it would
on standard ARM cores.
If unsure, say Y.
2020-04-16 19:56:57 +08:00
config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
2019-02-27 02:43:41 +08:00
default y
help
2020-04-16 19:56:57 +08:00
Socionext Synquacer SoCs implement a separate h/w block to generate
MSI doorbell writes with non-zero values for the device ID.
2019-02-27 02:43:41 +08:00
If unsure, say Y.
2014-11-14 23:54:12 +08:00
endmenu
2014-05-12 17:40:38 +08:00
choice
prompt "Page size"
default ARM64_4K_PAGES
help
Page size (translation granule) configuration.
config ARM64_4K_PAGES
bool "4KB"
help
This feature enables 4KB pages support.
2015-10-19 21:19:37 +08:00
config ARM64_16K_PAGES
bool "16KB"
help
The system will use 16KB pages support. AArch32 emulation
requires applications compiled with 16K (or a multiple of 16K)
aligned segments.
2012-04-20 21:45:54 +08:00
config ARM64_64K_PAGES
2014-05-12 17:40:38 +08:00
bool "64KB"
2012-04-20 21:45:54 +08:00
help
This feature enables 64KB pages support (4KB by default)
allowing only two levels of page tables and faster TLB
2015-10-19 21:19:34 +08:00
look-up. AArch32 emulation requires applications compiled
with 64K aligned segments.
2012-04-20 21:45:54 +08:00
2014-05-12 17:40:38 +08:00
endchoice
choice
prompt "Virtual address space size"
default ARM64_VA_BITS_39 if ARM64_4K_PAGES
2015-10-19 21:19:37 +08:00
default ARM64_VA_BITS_47 if ARM64_16K_PAGES
2014-05-12 17:40:38 +08:00
default ARM64_VA_BITS_42 if ARM64_64K_PAGES
help
Allows choosing one of multiple possible virtual address
space sizes. The level of translation table is determined by
a combination of page size and virtual address space size.
2015-10-19 21:19:38 +08:00
config ARM64_VA_BITS_36
2015-10-20 21:59:20 +08:00
bool "36-bit" if EXPERT
2015-10-19 21:19:38 +08:00
depends on ARM64_16K_PAGES
2014-05-12 17:40:38 +08:00
config ARM64_VA_BITS_39
bool "39-bit"
depends on ARM64_4K_PAGES
config ARM64_VA_BITS_42
bool "42-bit"
depends on ARM64_64K_PAGES
2015-10-19 21:19:37 +08:00
config ARM64_VA_BITS_47
bool "47-bit"
depends on ARM64_16K_PAGES
2014-05-12 17:40:51 +08:00
config ARM64_VA_BITS_48
bool "48-bit"
2019-08-07 23:55:22 +08:00
config ARM64_VA_BITS_52
bool "52-bit"
2018-12-10 22:15:15 +08:00
depends on ARM64_64K_PAGES && (ARM64_PAN || !ARM64_SW_TTBR0_PAN)
help
Enable 52-bit virtual addressing for userspace when explicitly
2019-08-07 23:55:22 +08:00
requested via a hint to mmap(). The kernel will also use 52-bit
virtual addresses for its own mappings (provided HW support for
this feature is available, otherwise it reverts to 48-bit).
2018-12-10 22:15:15 +08:00
NOTE: Enabling 52-bit virtual addressing in conjunction with
ARMv8.3 Pointer Authentication will result in the PAC being
reduced from 7 bits to 3 bits, which may have a significant
impact on its susceptibility to brute-force attacks.
If unsure, select 48-bit virtual addressing instead.
2014-05-12 17:40:38 +08:00
endchoice
2018-12-10 22:15:15 +08:00
config ARM64_FORCE_52BIT
bool "Force 52-bit virtual addresses for userspace"
2019-08-07 23:55:22 +08:00
depends on ARM64_VA_BITS_52 && EXPERT
2018-12-10 22:15:15 +08:00
help
For systems with 52-bit userspace VAs enabled, the kernel will attempt
to maintain compatibility with older software by providing 48-bit VAs
unless a hint is supplied to mmap.
This configuration option disables the 48-bit compatibility logic, and
forces all userspace addresses to be 52-bit on HW that supports it. One
should only enable this configuration option for stress testing userspace
memory management code. If unsure say N here.
2014-05-12 17:40:38 +08:00
config ARM64_VA_BITS
int
2015-10-19 21:19:38 +08:00
default 36 if ARM64_VA_BITS_36
2014-05-12 17:40:38 +08:00
default 39 if ARM64_VA_BITS_39
default 42 if ARM64_VA_BITS_42
2015-10-19 21:19:37 +08:00
default 47 if ARM64_VA_BITS_47
2019-08-07 23:55:22 +08:00
default 48 if ARM64_VA_BITS_48
default 52 if ARM64_VA_BITS_52
2014-05-12 17:40:38 +08:00
2017-12-14 01:07:16 +08:00
choice
prompt "Physical address space size"
default ARM64_PA_BITS_48
help
Choose the maximum physical address range that the kernel will
support.
config ARM64_PA_BITS_48
bool "48-bit"
2017-12-14 01:07:25 +08:00
config ARM64_PA_BITS_52
bool "52-bit (ARMv8.2)"
depends on ARM64_64K_PAGES
depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
help
Enable support for a 52-bit physical address space, introduced as
part of the ARMv8.2-LPA extension.
With this enabled, the kernel will also continue to work on CPUs that
do not support ARMv8.2-LPA, but with some added memory overhead (and
minor performance overhead).
2017-12-14 01:07:16 +08:00
endchoice
config ARM64_PA_BITS
int
default 48 if ARM64_PA_BITS_48
2017-12-14 01:07:25 +08:00
default 52 if ARM64_PA_BITS_52
2017-12-14 01:07:16 +08:00
2019-11-13 17:26:52 +08:00
choice
prompt "Endianness"
default CPU_LITTLE_ENDIAN
help
Select the endianness of data accesses performed by the CPU. Userspace
applications will need to be compiled and linked for the endianness
that is selected here.
2013-10-11 21:52:19 +08:00
config CPU_BIG_ENDIAN
2021-02-09 08:57:20 +08:00
bool "Build big-endian kernel"
depends on !LD_IS_LLD || LLD_VERSION >= 130000
help
2019-11-13 17:26:52 +08:00
Say Y if you plan on running a kernel with a big-endian userspace.
config CPU_LITTLE_ENDIAN
bool "Build little-endian kernel"
help
Say Y if you plan on running a kernel with a little-endian userspace.
This is usually the case for distributions targeting arm64.
endchoice
2013-10-11 21:52:19 +08:00
2014-03-04 15:51:17 +08:00
config SCHED_MC
bool "Multi-core scheduler support"
help
Multi-core scheduler support improves the CPU scheduler's decision
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.
sched: Add cluster scheduler level in core and related Kconfig for ARM64
This patch adds scheduler level for clusters and automatically enables
the load balance among clusters. It will directly benefit a lot of
workload which loves more resources such as memory bandwidth, caches.
Testing has widely been done in two different hardware configurations of
Kunpeng920:
24 cores in one NUMA(6 clusters in each NUMA node);
32 cores in one NUMA(8 clusters in each NUMA node)
Workload is running on either one NUMA node or four NUMA nodes, thus,
this can estimate the effect of cluster spreading w/ and w/o NUMA load
balance.
* Stream benchmark:
4threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 29929.64 ( 0.00%) 32932.68 ( 10.03%)
MB/sec scale 29861.10 ( 0.00%) 32710.58 ( 9.54%)
MB/sec add 27034.42 ( 0.00%) 32400.68 ( 19.85%)
MB/sec triad 27225.26 ( 0.00%) 31965.36 ( 17.41%)
6threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 40330.24 ( 0.00%) 42377.68 ( 5.08%)
MB/sec scale 40196.42 ( 0.00%) 42197.90 ( 4.98%)
MB/sec add 37427.00 ( 0.00%) 41960.78 ( 12.11%)
MB/sec triad 37841.36 ( 0.00%) 42513.64 ( 12.35%)
12threads stream (on 1NUMA * 24cores = 24cores)
stream stream
w/o patch w/ patch
MB/sec copy 52639.82 ( 0.00%) 53818.04 ( 2.24%)
MB/sec scale 52350.30 ( 0.00%) 53253.38 ( 1.73%)
MB/sec add 53607.68 ( 0.00%) 55198.82 ( 2.97%)
MB/sec triad 54776.66 ( 0.00%) 56360.40 ( 2.89%)
Thus, it could help memory-bound workload especially under medium load.
Similar improvement is also seen in lkp-pbzip2:
* lkp-pbzip2 benchmark
2-96 threads (on 4NUMA * 24cores = 96cores)
lkp-pbzip2 lkp-pbzip2
w/o patch w/ patch
Hmean tput-2 11062841.57 ( 0.00%) 11341817.51 * 2.52%*
Hmean tput-5 26815503.70 ( 0.00%) 27412872.65 * 2.23%*
Hmean tput-8 41873782.21 ( 0.00%) 43326212.92 * 3.47%*
Hmean tput-12 61875980.48 ( 0.00%) 64578337.51 * 4.37%*
Hmean tput-21 105814963.07 ( 0.00%) 111381851.01 * 5.26%*
Hmean tput-30 150349470.98 ( 0.00%) 156507070.73 * 4.10%*
Hmean tput-48 237195937.69 ( 0.00%) 242353597.17 * 2.17%*
Hmean tput-79 360252509.37 ( 0.00%) 362635169.23 * 0.66%*
Hmean tput-96 394571737.90 ( 0.00%) 400952978.48 * 1.62%*
2-24 threads (on 1NUMA * 24cores = 24cores)
lkp-pbzip2 lkp-pbzip2
w/o patch w/ patch
Hmean tput-2 11071705.49 ( 0.00%) 11296869.10 * 2.03%*
Hmean tput-4 20782165.19 ( 0.00%) 21949232.15 * 5.62%*
Hmean tput-6 30489565.14 ( 0.00%) 33023026.96 * 8.31%*
Hmean tput-8 40376495.80 ( 0.00%) 42779286.27 * 5.95%*
Hmean tput-12 61264033.85 ( 0.00%) 62995632.78 * 2.83%*
Hmean tput-18 86697139.39 ( 0.00%) 86461545.74 ( -0.27%)
Hmean tput-24 104854637.04 ( 0.00%) 104522649.46 * -0.32%*
In the case of 6 threads and 8 threads, we see the greatest performance
improvement.
Similar improvement can be seen on lkp-pixz though the improvement is
smaller:
* lkp-pixz benchmark
2-24 threads lkp-pixz (on 1NUMA * 24cores = 24cores)
lkp-pixz lkp-pixz
w/o patch w/ patch
Hmean tput-2 6486981.16 ( 0.00%) 6561515.98 * 1.15%*
Hmean tput-4 11645766.38 ( 0.00%) 11614628.43 ( -0.27%)
Hmean tput-6 15429943.96 ( 0.00%) 15957350.76 * 3.42%*
Hmean tput-8 19974087.63 ( 0.00%) 20413746.98 * 2.20%*
Hmean tput-12 28172068.18 ( 0.00%) 28751997.06 * 2.06%*
Hmean tput-18 39413409.54 ( 0.00%) 39896830.55 * 1.23%*
Hmean tput-24 49101815.85 ( 0.00%) 49418141.47 * 0.64%*
* SPECrate benchmark
4,8,16 copies mcf_r(on 1NUMA * 32cores = 32cores)
Base Base
Run Time Rate
------- ---------
4 Copies w/o 580 (w/ 570) w/o 11.1 (w/ 11.3)
8 Copies w/o 647 (w/ 605) w/o 20.0 (w/ 21.4, +7%)
16 Copies w/o 844 (w/ 844) w/o 30.6 (w/ 30.6)
32 Copies(on 4NUMA * 32 cores = 128cores)
[w/o patch]
Base Base Base
Benchmarks Copies Run Time Rate
--------------- ------- --------- ---------
500.perlbench_r 32 584 87.2 *
502.gcc_r 32 503 90.2 *
505.mcf_r 32 745 69.4 *
520.omnetpp_r 32 1031 40.7 *
523.xalancbmk_r 32 597 56.6 *
525.x264_r 1 -- CE
531.deepsjeng_r 32 336 109 *
541.leela_r 32 556 95.4 *
548.exchange2_r 32 513 163 *
557.xz_r 32 530 65.2 *
Est. SPECrate2017_int_base 80.3
[w/ patch]
Base Base Base
Benchmarks Copies Run Time Rate
--------------- ------- --------- ---------
500.perlbench_r 32 580 87.8 (+0.688%) *
502.gcc_r 32 477 95.1 (+5.432%) *
505.mcf_r 32 644 80.3 (+13.574%) *
520.omnetpp_r 32 942 44.6 (+9.58%) *
523.xalancbmk_r 32 560 60.4 (+6.714%%) *
525.x264_r 1 -- CE
531.deepsjeng_r 32 337 109 (+0.000%) *
541.leela_r 32 554 95.6 (+0.210%) *
548.exchange2_r 32 515 163 (+0.000%) *
557.xz_r 32 524 66.0 (+1.227%) *
Est. SPECrate2017_int_base 83.7 (+4.062%)
On the other hand, it is slightly helpful to CPU-bound tasks like
kernbench:
* 24-96 threads kernbench (on 4NUMA * 24cores = 96cores)
kernbench kernbench
w/o cluster w/ cluster
Min user-24 12054.67 ( 0.00%) 12024.19 ( 0.25%)
Min syst-24 1751.51 ( 0.00%) 1731.68 ( 1.13%)
Min elsp-24 600.46 ( 0.00%) 598.64 ( 0.30%)
Min user-48 12361.93 ( 0.00%) 12315.32 ( 0.38%)
Min syst-48 1917.66 ( 0.00%) 1892.73 ( 1.30%)
Min elsp-48 333.96 ( 0.00%) 332.57 ( 0.42%)
Min user-96 12922.40 ( 0.00%) 12921.17 ( 0.01%)
Min syst-96 2143.94 ( 0.00%) 2110.39 ( 1.56%)
Min elsp-96 211.22 ( 0.00%) 210.47 ( 0.36%)
Amean user-24 12063.99 ( 0.00%) 12030.78 * 0.28%*
Amean syst-24 1755.20 ( 0.00%) 1735.53 * 1.12%*
Amean elsp-24 601.60 ( 0.00%) 600.19 ( 0.23%)
Amean user-48 12362.62 ( 0.00%) 12315.56 * 0.38%*
Amean syst-48 1921.59 ( 0.00%) 1894.95 * 1.39%*
Amean elsp-48 334.10 ( 0.00%) 332.82 * 0.38%*
Amean user-96 12925.27 ( 0.00%) 12922.63 ( 0.02%)
Amean syst-96 2146.66 ( 0.00%) 2122.20 * 1.14%*
Amean elsp-96 211.96 ( 0.00%) 211.79 ( 0.08%)
Note this patch isn't an universal win, it might hurt those workload
which can benefit from packing. Though tasks which want to take
advantages of lower communication latency of one cluster won't
necessarily been packed in one cluster while kernel is not aware of
clusters, they have some chance to be randomly packed. But this
patch will make them more likely spread.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-09-24 16:51:03 +08:00
config SCHED_CLUSTER
bool "Cluster scheduler support"
help
Cluster scheduler support improves the CPU scheduler's decision
making when dealing with machines that have clusters of CPUs.
Cluster usually means a couple of CPUs which are placed closely
by sharing mid-level caches, last-level cache tags or internal
busses.
2014-03-04 15:51:17 +08:00
config SCHED_SMT
bool "SMT scheduler support"
help
Improves the CPU scheduler's decision making when dealing with
MultiThreading at a cost of slightly increased overhead in some
places. If unsure say N here.
2012-04-20 21:45:54 +08:00
config NR_CPUS
2015-03-18 19:01:18 +08:00
int "Maximum number of CPUs (2-4096)"
range 2 4096
2019-01-14 19:41:25 +08:00
default "256"
2012-04-20 21:45:54 +08:00
2013-10-25 03:30:18 +08:00
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
2015-09-24 17:32:14 +08:00
select GENERIC_IRQ_MIGRATION
2013-10-25 03:30:18 +08:00
help
Say Y here to experiment with turning CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu.
2016-04-09 06:50:27 +08:00
# Common NUMA Features
config NUMA
2020-02-01 09:51:06 +08:00
bool "NUMA Memory Allocation and Scheduler Support"
2020-11-19 08:38:26 +08:00
select GENERIC_ARCH_NUMA
2016-09-26 15:36:50 +08:00
select ACPI_NUMA if ACPI
select OF_NUMA
2016-04-09 06:50:27 +08:00
help
2020-02-01 09:51:06 +08:00
Enable NUMA (Non-Uniform Memory Access) support.
2016-04-09 06:50:27 +08:00
The kernel will try to allocate memory used by a CPU on the
local memory of the CPU and add some more
NUMA awareness to the kernel.
config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)"
range 1 10
2020-10-31 01:30:50 +08:00
default "4"
2021-06-29 10:43:01 +08:00
depends on NUMA
2016-04-09 06:50:27 +08:00
help
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.
config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
2016-09-01 14:55:00 +08:00
config HAVE_SETUP_PER_CPU_AREA
def_bool y
depends on NUMA
config NEED_PER_CPU_EMBED_FIRST_CHUNK
def_bool y
depends on NUMA
2021-11-06 04:39:44 +08:00
config NEED_PER_CPU_PAGE_FIRST_CHUNK
def_bool y
depends on NUMA
2018-12-11 19:01:04 +08:00
source "kernel/Kconfig.hz"
2012-04-20 21:45:54 +08:00
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_VMEMMAP_ENABLE
2021-04-20 17:35:59 +08:00
select SPARSEMEM_VMEMMAP
2018-07-07 01:47:24 +08:00
2012-04-20 21:45:54 +08:00
config HW_PERF_EVENTS
2015-10-02 17:55:03 +08:00
def_bool y
depends on ARM_PMU
2012-04-20 21:45:54 +08:00
2021-03-13 01:38:10 +08:00
config ARCH_HAS_FILTER_PGPROT
def_bool y
2020-04-28 00:00:16 +08:00
# Supported by clang >= 7.0
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
2015-11-23 18:33:49 +08:00
config PARAVIRT
bool "Enable paravirtualization code"
help
This changes the kernel so it can modify itself when it is run
under a hypervisor, potentially improving performance significantly
over full virtualization.
config PARAVIRT_TIME_ACCOUNTING
bool "Paravirtual steal time accounting"
select PARAVIRT
help
Select this option to enable fine granularity task steal time
accounting. Time spent executing other tasks in parallel with
the current vCPU is discounted from the vCPU power. To account for
that, there can be a small performance impact.
If in doubt, say N here.
2016-06-24 01:54:48 +08:00
config KEXEC
depends on PM_SLEEP_SMP
select KEXEC_CORE
bool "kexec system call"
2020-06-14 00:50:22 +08:00
help
2016-06-24 01:54:48 +08:00
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.
2018-11-15 13:52:48 +08:00
config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
2021-02-22 01:49:30 +08:00
select HAVE_IMA_KEXEC if IMA
2018-11-15 13:52:48 +08:00
help
This is new version of kexec system call. This system call is
file based and takes file descriptors as system call argument
for kernel and initramfs as opposed to list of segments as
accepted by previous system call.
2019-08-20 08:17:44 +08:00
config KEXEC_SIG
2018-11-15 13:52:54 +08:00
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
help
Select this option to verify a signature with loaded kernel
image. If configured, any attempt of loading a image without
valid signature will fail.
In addition to that option, you need to enable signature
verification for the corresponding kernel image type being
loaded in order for this to work.
config KEXEC_IMAGE_VERIFY_SIG
bool "Enable Image signature verification support"
default y
2019-08-20 08:17:44 +08:00
depends on KEXEC_SIG
2018-11-15 13:52:54 +08:00
depends on EFI && SIGNED_PE_FILE_VERIFICATION
help
Enable Image signature verification support.
comment "Support for PE file signature verification disabled"
2019-08-20 08:17:44 +08:00
depends on KEXEC_SIG
2018-11-15 13:52:54 +08:00
depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
arm64: kdump: provide /proc/vmcore file
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.
On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.
Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-03 10:24:38 +08:00
config CRASH_DUMP
bool "Build kdump crash kernel"
help
Generate crash dump after being started by kexec. This should
be normally only set in special crash dump kernels which are
loaded in the main kernel with kexec-tools into a specially
reserved region and then later executed after a crash by
kdump/kexec.
2019-06-14 02:21:39 +08:00
For more details see Documentation/admin-guide/kdump/kdump.rst
arm64: kdump: provide /proc/vmcore file
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user space tool, like kexec-tools, is responsible for allocating
a separate region for the core's ELF header within crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to crash dump kernel via a new
device-tree property, "linux,elfcorehdr", and crash dump kernel preserves
the region for later use with reserve_elfcorehdr() at boot time.
On crash dump kernel, /proc/vmcore will access the primary kernel's memory
with copy_oldmem_page(), which feeds the data page-by-page by ioremap'ing
it since it does not reside in linear mapping on crash dump kernel.
Meanwhile, elfcorehdr_read() is simple as the region is always mapped.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-04-03 10:24:38 +08:00
2021-01-26 03:19:08 +08:00
config TRANS_TABLE
def_bool y
2021-09-30 22:31:06 +08:00
depends on HIBERNATION || KEXEC_CORE
2021-01-26 03:19:08 +08:00
2013-06-04 01:05:43 +08:00
config XEN_DOM0
def_bool y
depends on XEN
config XEN
2014-09-18 05:07:06 +08:00
bool "Xen guest support on ARM64"
2013-06-04 01:05:43 +08:00
depends on ARM64 && OF
2013-10-10 21:40:44 +08:00
select SWIOTLB_XEN
2015-11-23 18:33:49 +08:00
select PARAVIRT
2013-06-04 01:05:43 +08:00
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
2013-04-25 22:19:21 +08:00
config FORCE_MAX_ZONEORDER
int
2021-03-01 19:25:14 +08:00
default "14" if ARM64_64K_PAGES
default "12" if ARM64_16K_PAGES
2013-04-25 22:19:21 +08:00
default "11"
2015-10-19 21:19:37 +08:00
help
The kernel memory allocator divides physically contiguous memory
blocks into "zones", where each zone is a power of two number of
pages. This option selects the largest power of two that the kernel
keeps in the memory allocator. If you need to allocate very large
blocks of physically contiguous memory, then you may need to
increase this value.
This config option is actually maximum order plus one. For example,
a value of 11 means that the largest free memory block is 2^10 pages.
We make sure that we can allocate upto a HugePage size for each configuration.
Hence we have :
MAX_ORDER = (PMD_SHIFT - PAGE_SHIFT) + 1 => PAGE_SHIFT - 2
However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
4M allocations matching the default size used by generic code.
2013-04-25 22:19:21 +08:00
2017-11-14 22:41:01 +08:00
config UNMAP_KERNEL_AT_EL0
2017-11-15 00:19:39 +08:00
bool "Unmap kernel when running in userspace (aka \"KAISER\")" if EXPERT
2017-11-14 22:41:01 +08:00
default y
help
2017-11-15 00:19:39 +08:00
Speculation attacks against some high-performance processors can
be used to bypass MMU permission checks and leak kernel data to
userspace. This can be defended against by unmapping the kernel
when running in userspace, mapping it back in on exception entry
via a trampoline page in the vector table.
2017-11-14 22:41:01 +08:00
If unsure, say Y.
arm64: mm: apply r/o permissions of VM areas to its linear alias as well
On arm64, we use block mappings and contiguous hints to map the linear
region, to minimize the TLB footprint. However, this means that the
entire region is mapped using read/write permissions, which we cannot
modify at page granularity without having to take intrusive measures to
prevent TLB conflicts.
This means the linear aliases of pages belonging to read-only mappings
(executable or otherwise) in the vmalloc region are also mapped read/write,
and could potentially be abused to modify things like module code, bpf JIT
code or other read-only data.
So let's fix this, by extending the set_memory_ro/rw routines to take
the linear alias into account. The consequence of enabling this is
that we can no longer use block mappings or contiguous hints, so in
cases where the TLB footprint of the linear region is a bottleneck,
performance may be affected.
Therefore, allow this feature to be runtime en/disabled, by setting
rodata=full (or 'on' to disable just this enhancement, or 'off' to
disable read-only mappings for code and r/o data entirely) on the
kernel command line. Also, allow the default value to be set via a
Kconfig option.
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-07 18:36:20 +08:00
config RODATA_FULL_DEFAULT_ENABLED
bool "Apply r/o permissions of VM areas also to their linear aliases"
default y
help
Apply read-only attributes of VM areas to the linear alias of
the backing pages as well. This prevents code or read-only data
from being modified (inadvertently or intentionally) via another
mapping of the same memory page. This additional enhancement can
be turned off at runtime by passing rodata=[off|on] (and turned on
with rodata=full if this option is set to 'n')
This requires the linear region to be mapped down to pages,
which may adversely affect performance in some cases.
2019-04-23 21:37:24 +08:00
config ARM64_SW_TTBR0_PAN
bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
help
Enabling this option prevents the kernel from accessing
user-space memory directly by pointing TTBR0_EL1 to a reserved
zeroed area and reserved ASID. The user access routines
restore the valid TTBR0_EL1 temporarily.
2019-07-24 01:58:39 +08:00
config ARM64_TAGGED_ADDR_ABI
bool "Enable the tagged user addresses syscall ABI"
default y
help
When this option is enabled, user applications can opt in to a
relaxed ABI via prctl() allowing tagged addresses to be passed
to system calls as pointer arguments. For details, see
2019-09-18 03:52:27 +08:00
Documentation/arm64/tagged-address-abi.rst.
2019-07-24 01:58:39 +08:00
2019-04-23 21:37:24 +08:00
menuconfig COMPAT
bool "Kernel support for 32-bit EL0"
depends on ARM64_4K_PAGES || EXPERT
select HAVE_UID16
select OLD_SIGSUSPEND3
select COMPAT_OLD_SIGACTION
help
This option enables support for a 32-bit EL0 running under a 64-bit
kernel at EL1. AArch32-specific components such as system calls,
the user helper functions, VFP support and the ptrace interface are
handled appropriately by the kernel.
If you use a page size other than 4KB (i.e, 16KB or 64KB), please be aware
that you will only be able to execute AArch32 binaries that were compiled
with page size aligned segments.
If you want to execute 32-bit userspace applications, say Y.
if COMPAT
config KUSER_HELPERS
2019-10-07 20:03:12 +08:00
bool "Enable kuser helpers page for 32-bit applications"
2019-04-23 21:37:24 +08:00
default y
help
Warning: disabling this option may break 32-bit user programs.
Provide kuser helpers to compat tasks. The kernel provides
helper code to userspace in read only form at a fixed location
to allow userspace to be independent of the CPU type fitted to
the system. This permits binaries to be run on ARMv4 through
to ARMv8 without modification.
2019-04-15 02:51:10 +08:00
See Documentation/arm/kernel_user_helpers.rst for details.
2019-04-23 21:37:24 +08:00
However, the fixed address nature of these helpers can be used
by ROP (return orientated programming) authors when creating
exploits.
If all of the binaries and libraries which run on your platform
are built specifically for your platform, and make no use of
these helpers, then you can turn this option off to hinder
such exploits. However, in that case, if a binary or library
relying on those helpers is run, it will not function correctly.
Say N here only if you are absolutely certain that you do not
need these helpers; otherwise, the safe option is to say Y.
2019-10-07 20:03:12 +08:00
config COMPAT_VDSO
bool "Enable vDSO for 32-bit applications"
2021-10-20 06:36:46 +08:00
depends on !CPU_BIG_ENDIAN
depends on (CC_IS_CLANG && LD_IS_LLD) || "$(CROSS_COMPILE_COMPAT)" != ""
2019-10-07 20:03:12 +08:00
select GENERIC_COMPAT_VDSO
default y
help
Place in the process address space of 32-bit applications an
ELF shared object providing fast implementations of gettimeofday
and clock_gettime.
You must have a 32-bit build of glibc 2.22 or later for programs
to seamlessly take advantage of this.
2019-04-23 21:37:24 +08:00
2020-06-09 04:57:08 +08:00
config THUMB2_COMPAT_VDSO
bool "Compile the 32-bit vDSO for Thumb-2 mode" if EXPERT
depends on COMPAT_VDSO
default y
help
Compile the compat vDSO with '-mthumb -fomit-frame-pointer' if y,
otherwise with '-marm'.
2014-11-21 00:51:10 +08:00
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
2017-11-07 02:07:11 +08:00
depends on SYSCTL
2014-11-21 00:51:10 +08:00
help
Legacy software support may require certain instructions
that have been deprecated or obsoleted in the architecture.
Enable this config to enable selective emulation of these
features.
If unsure, say Y
if ARMV8_DEPRECATED
config SWP_EMULATION
bool "Emulate SWP/SWPB instructions"
help
ARMv8 obsoletes the use of A32 SWP/SWPB instructions such that
they are always undefined. Say Y here to enable software
emulation of these instructions for userspace using LDXR/STXR.
2020-06-25 21:15:07 +08:00
This feature can be controlled at runtime with the abi.swp
sysctl which is disabled by default.
2014-11-21 00:51:10 +08:00
In some older versions of glibc [<=2.8] SWP is used during futex
trylock() operations with the assumption that the code will not
be preempted. This invalid assumption may be more likely to fail
with SWP emulation enabled, leading to deadlock of the user
application.
NOTE: when accessing uncached shared regions, LDXR/STXR rely
on an external transaction monitoring block called a global
monitor to maintain update atomicity. If your system does not
implement a global monitor, this option can cause programs that
perform SWP operations to uncached memory to deadlock.
If unsure, say Y
config CP15_BARRIER_EMULATION
bool "Emulate CP15 Barrier instructions"
help
The CP15 barrier instructions - CP15ISB, CP15DSB, and
CP15DMB - are deprecated in ARMv8 (and ARMv7). It is
strongly recommended to use the ISB, DSB, and DMB
instructions instead.
Say Y here to enable software emulation of these
instructions for AArch32 userspace code. When this option is
enabled, CP15 barrier usage is traced which can help
2020-06-25 21:15:07 +08:00
identify software that needs updating. This feature can be
controlled at runtime with the abi.cp15_barrier sysctl.
2014-11-21 00:51:10 +08:00
If unsure, say Y
2015-01-21 20:43:11 +08:00
config SETEND_EMULATION
bool "Emulate SETEND instruction"
help
The SETEND instruction alters the data-endianness of the
AArch32 EL0, and is deprecated in ARMv8.
Say Y here to enable software emulation of the instruction
2020-06-25 21:15:07 +08:00
for AArch32 userspace code. This feature can be controlled
at runtime with the abi.setend sysctl.
2015-01-21 20:43:11 +08:00
Note: All the cpus on the system must have mixed endian support at EL0
for this feature to be enabled. If a new CPU - which doesn't support mixed
endian - is hotplugged in after this feature has been enabled, there could
be unexpected results in the applications.
If unsure, say Y
2014-11-21 00:51:10 +08:00
endif
2019-04-23 21:37:24 +08:00
endif
2016-07-02 01:25:31 +08:00
2015-07-27 22:54:13 +08:00
menu "ARMv8.1 architectural features"
config ARM64_HW_AFDBM
bool "Support for hardware updates of the Access and Dirty page flags"
default y
help
The ARMv8.1 architecture extensions introduce support for
hardware updates of the access and dirty information in page
table entries. When enabled in TCR_EL1 (HA and HD bits) on
capable processors, accesses to pages with PTE_AF cleared will
set this bit instead of raising an access flag fault.
Similarly, writes to read-only pages with the DBM bit set will
clear the read-only bit (AP[2]) instead of raising a
permission fault.
Kernels built with this configuration option enabled continue
to work on pre-ARMv8.1 hardware and the performance impact is
minimal. If unsure, say Y.
config ARM64_PAN
bool "Enable support for Privileged Access Never (PAN)"
default y
help
Privileged Access Never (PAN; part of the ARMv8.1 Extensions)
prevents the kernel or hypervisor from accessing user-space (EL0)
memory directly.
Choosing this option will cause any unprotected (not using
copy_to_user et al) memory access to fail with a permission fault.
The feature is detected at runtime, and will remain as a 'nop'
instruction if the cpu does not implement the feature.
2020-06-30 21:02:22 +08:00
config AS_HAS_LDAPR
def_bool $(as-instr,.arch_extension rcpc)
2021-04-10 01:37:10 +08:00
config AS_HAS_LSE_ATOMICS
def_bool $(as-instr,.arch_extension lse)
2015-07-27 22:54:13 +08:00
config ARM64_LSE_ATOMICS
2020-01-15 19:30:08 +08:00
bool
default ARM64_USE_LSE_ATOMICS
2021-04-10 01:37:10 +08:00
depends on AS_HAS_LSE_ATOMICS
2020-01-15 19:30:08 +08:00
config ARM64_USE_LSE_ATOMICS
2015-07-27 22:54:13 +08:00
bool "Atomic instructions"
arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
Support for LSE atomic instructions (CONFIG_ARM64_LSE_ATOMICS) relies on
a static key to select between the legacy LL/SC implementation which is
available on all arm64 CPUs and the super-duper LSE implementation which
is available on CPUs implementing v8.1 and later.
Unfortunately, when building a kernel with CONFIG_JUMP_LABEL disabled
(e.g. because the toolchain doesn't support 'asm goto'), the static key
inside the atomics code tries to use atomics itself. This results in a
mess of circular includes and a build failure:
In file included from ./arch/arm64/include/asm/lse.h:11,
from ./arch/arm64/include/asm/atomic.h:16,
from ./include/linux/atomic.h:7,
from ./include/asm-generic/bitops/atomic.h:5,
from ./arch/arm64/include/asm/bitops.h:26,
from ./include/linux/bitops.h:19,
from ./include/linux/kernel.h:12,
from ./include/asm-generic/bug.h:18,
from ./arch/arm64/include/asm/bug.h:26,
from ./include/linux/bug.h:5,
from ./include/linux/page-flags.h:10,
from kernel/bounds.c:10:
./include/linux/jump_label.h: In function ‘static_key_count’:
./include/linux/jump_label.h:254:9: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
return atomic_read(&key->enabled);
^~~~~~~~~~~
[ ... more of the same ... ]
Since LSE atomic instructions are not critical to the operation of the
kernel, make them depend on JUMP_LABEL at compile time.
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-08-29 18:52:47 +08:00
depends on JUMP_LABEL
2018-05-22 02:14:22 +08:00
default y
2015-07-27 22:54:13 +08:00
help
As part of the Large System Extensions, ARMv8.1 introduces new
atomic instructions that are designed specifically to scale in
very large systems.
Say Y here to make use of these instructions for the in-kernel
atomic routines. This incurs a small overhead on CPUs that do
not support these instructions and requires the kernel to be
2018-05-22 02:14:22 +08:00
built with binutils >= 2.25 in order for the new instructions
to be used.
2015-07-27 22:54:13 +08:00
endmenu
2016-02-27 00:30:14 +08:00
menu "ARMv8.2 architectural features"
2017-07-25 18:55:42 +08:00
config ARM64_PMEM
bool "Enable support for persistent memory"
select ARCH_HAS_PMEM_API
2017-07-25 18:55:43 +08:00
select ARCH_HAS_UACCESS_FLUSHCACHE
2017-07-25 18:55:42 +08:00
help
Say Y to enable support for the persistent memory API based on the
ARMv8.2 DCPoP feature.
The feature is detected at runtime, and the kernel will use DC CVAC
operations if DC CVAP is not supported (following the behaviour of
DC CVAP itself if the system does not define a point of persistence).
2018-01-16 03:38:56 +08:00
config ARM64_RAS_EXTN
bool "Enable support for RAS CPU Extensions"
default y
help
CPUs that support the Reliability, Availability and Serviceability
(RAS) Extensions, part of ARMv8.2 are able to track faults and
errors, classify them and report them to software.
On CPUs with these extensions system software can use additional
barriers to determine if faults are pending and read the
classification from a new set of registers.
Selecting this feature will allow the kernel to use these barriers
and access the new registers if the system supports the extension.
Platform RAS features may additionally depend on firmware support.
2018-07-31 21:08:56 +08:00
config ARM64_CNP
bool "Enable support for Common Not Private (CNP) translations"
default y
depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
help
Common Not Private (CNP) allows translation table entries to
be shared between different PEs in the same inner shareable
domain, so the hardware can use this fact to optimise the
caching of such entries in the TLB.
Selecting this option allows the CNP feature to be detected
at runtime, and does not affect PEs that do not implement
this feature.
2016-02-27 00:30:14 +08:00
endmenu
2018-12-08 02:39:30 +08:00
menu "ARMv8.3 architectural features"
config ARM64_PTR_AUTH
bool "Enable support for pointer authentication"
default y
help
Pointer authentication (part of the ARMv8.3 Extensions) provides
instructions for signing and authenticating pointers against secret
keys, which can be used to mitigate Return Oriented Programming (ROP)
and other attacks.
This option enables these instructions at EL0 (i.e. for userspace).
Choosing this option will cause the kernel to initialise secret keys
for each process at exec() time, with these keys being
context-switched along with the process.
The feature is detected at runtime. If the feature is not present in
KVM: arm/arm64: Context-switch ptrauth registers
When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.
Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.
When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.
Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.
Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.
This switch of key is done from guest enter/exit assembly as preparation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
[maz: various fixups]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-23 12:42:35 +08:00
hardware it will not be advertised to userspace/KVM guest nor will it
2020-06-11 19:18:58 +08:00
be enabled.
2018-12-08 02:39:30 +08:00
2020-03-13 17:04:55 +08:00
If the feature is present on the boot CPU but not on a late CPU, then
the late CPU will be parked. Also, if the boot CPU does not have
address auth and the late CPU has then the late CPU will still boot
but with the feature disabled. On such a system, this option should
not be selected.
2021-06-13 17:26:31 +08:00
config ARM64_PTR_AUTH_KERNEL
2021-06-13 17:26:32 +08:00
bool "Use pointer authentication for kernel"
2021-06-13 17:26:31 +08:00
default y
depends on ARM64_PTR_AUTH
depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) && AS_HAS_PAC
# Modern compilers insert a .note.gnu.property section note for PAC
# which is only understood by binutils starting with version 2.33.1.
depends on LD_IS_LLD || LD_VERSION >= 23301 || (CC_IS_GCC && GCC_VERSION < 90100)
depends on !CC_IS_CLANG || AS_HAS_CFI_NEGATE_RA_STATE
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
help
If the compiler supports the -mbranch-protection or
-msign-return-address flag (e.g. GCC 7 or later), then this option
will cause the kernel itself to be compiled with return address
protection. In this case, and if the target hardware is known to
support pointer authentication, then CONFIG_STACKPROTECTOR can be
disabled with minimal loss of protection.
arm64: compile the kernel with ptrauth return address signing
Compile all functions with two ptrauth instructions: PACIASP in the
prologue to sign the return address, and AUTIASP in the epilogue to
authenticate the return address (from the stack). If authentication
fails, the return will cause an instruction abort to be taken, followed
by an oops and killing the task.
This should help protect the kernel against attacks using
return-oriented programming. As ptrauth protects the return address, it
can also serve as a replacement for CONFIG_STACKPROTECTOR, although note
that it does not protect other parts of the stack.
The new instructions are in the HINT encoding space, so on a system
without ptrauth they execute as NOPs.
CONFIG_ARM64_PTR_AUTH now not only enables ptrauth for userspace and KVM
guests, but also automatically builds the kernel with ptrauth
instructions if the compiler supports it. If there is no compiler
support, we do not warn that the kernel was built without ptrauth
instructions.
GCC 7 and 8 support the -msign-return-address option, while GCC 9
deprecates that option and replaces it with -mbranch-protection. Support
both options.
Clang uses an external assembler hence this patch makes sure that the
correct parameters (-march=armv8.3-a) are passed down to help it recognize
the ptrauth instructions.
Ftrace function tracer works properly with Ptrauth only when
patchable-function-entry feature is present and is ensured by the
Kconfig dependency.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> # not co-dev parts
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Cover leaf function, comments, Ftrace Kconfig]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13 17:05:03 +08:00
This feature works with FUNCTION_GRAPH_TRACER option only if
DYNAMIC_FTRACE_WITH_REGS is enabled.
config CC_HAS_BRANCH_PROT_PAC_RET
# GCC 9 or later, clang 8 or later
def_bool $(cc-option,-mbranch-protection=pac-ret+leaf)
config CC_HAS_SIGN_RETURN_ADDRESS
# GCC 7, 8
def_bool $(cc-option,-msign-return-address=all)
config AS_HAS_PAC
2020-06-14 22:43:41 +08:00
def_bool $(cc-option,-Wa$(comma)-march=armv8.3-a)
arm64: compile the kernel with ptrauth return address signing
Compile all functions with two ptrauth instructions: PACIASP in the
prologue to sign the return address, and AUTIASP in the epilogue to
authenticate the return address (from the stack). If authentication
fails, the return will cause an instruction abort to be taken, followed
by an oops and killing the task.
This should help protect the kernel against attacks using
return-oriented programming. As ptrauth protects the return address, it
can also serve as a replacement for CONFIG_STACKPROTECTOR, although note
that it does not protect other parts of the stack.
The new instructions are in the HINT encoding space, so on a system
without ptrauth they execute as NOPs.
CONFIG_ARM64_PTR_AUTH now not only enables ptrauth for userspace and KVM
guests, but also automatically builds the kernel with ptrauth
instructions if the compiler supports it. If there is no compiler
support, we do not warn that the kernel was built without ptrauth
instructions.
GCC 7 and 8 support the -msign-return-address option, while GCC 9
deprecates that option and replaces it with -mbranch-protection. Support
both options.
Clang uses an external assembler hence this patch makes sure that the
correct parameters (-march=armv8.3-a) are passed down to help it recognize
the ptrauth instructions.
Ftrace function tracer works properly with Ptrauth only when
patchable-function-entry feature is present and is ensured by the
Kconfig dependency.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> # not co-dev parts
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[Amit: Cover leaf function, comments, Ftrace Kconfig]
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13 17:05:03 +08:00
2020-03-20 02:19:51 +08:00
config AS_HAS_CFI_NEGATE_RA_STATE
def_bool $(as-instr,.cfi_startproc\n.cfi_negate_ra_state\n.cfi_endproc\n)
2018-12-08 02:39:30 +08:00
endmenu
2020-03-05 17:06:21 +08:00
menu "ARMv8.4 architectural features"
config ARM64_AMU_EXTN
bool "Enable support for the Activity Monitors Unit CPU extension"
default y
help
The activity monitors extension is an optional extension introduced
by the ARMv8.4 CPU architecture. This enables support for version 1
of the activity monitors architecture, AMUv1.
To enable the use of this extension on CPUs that implement it, say Y.
Note that for architectural reasons, firmware _must_ implement AMU
support when running on CPUs that present the activity monitors
extension. The required support is present in:
* Version 1.5 and later of the ARM Trusted Firmware
For kernels that have this configuration enabled but boot with broken
firmware, you may need to say N here until the firmware is fixed.
Otherwise you may experience firmware panics or lockups when
accessing the counter registers. Even if you are not observing these
symptoms, the values returned by the register reads might not
correctly reflect reality. Most commonly, the value read will be 0,
indicating that the counter is not enabled.
2020-07-15 15:19:44 +08:00
config AS_HAS_ARMV8_4
def_bool $(cc-option,-Wa$(comma)-march=armv8.4-a)
config ARM64_TLB_RANGE
bool "Enable support for tlbi range feature"
default y
depends on AS_HAS_ARMV8_4
help
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses.
The feature introduces new assembly instructions, and they were
support when binutils >= 2.30.
2018-12-08 02:39:30 +08:00
endmenu
2019-12-10 02:12:14 +08:00
menu "ARMv8.5 architectural features"
2020-12-23 04:01:24 +08:00
config AS_HAS_ARMV8_5
def_bool $(cc-option,-Wa$(comma)-march=armv8.5-a)
2020-03-17 00:50:55 +08:00
config ARM64_BTI
bool "Branch Target Identification support"
default y
help
Branch Target Identification (part of the ARMv8.5 Extensions)
provides a mechanism to limit the set of locations to which computed
branch instructions such as BR or BLR can jump.
To make use of BTI on CPUs that support it, say Y.
BTI is intended to provide complementary protection to other control
flow integrity protection mechanisms, such as the Pointer
authentication mechanism provided as part of the ARMv8.3 Extensions.
For this reason, it does not make sense to enable this option without
also enabling support for pointer authentication. Thus, when
enabling this option you should also select ARM64_PTR_AUTH=y.
Userspace binaries must also be specifically compiled to make use of
this mechanism. If you say N here or the hardware does not support
BTI, such binaries can still run, but you get no additional
enforcement of branch destinations.
2020-05-07 03:51:34 +08:00
config ARM64_BTI_KERNEL
bool "Use Branch Target Identification for kernel"
default y
depends on ARM64_BTI
2021-06-13 17:26:31 +08:00
depends on ARM64_PTR_AUTH_KERNEL
2020-05-07 03:51:34 +08:00
depends on CC_HAS_BRANCH_PROT_PAC_RET_BTI
2020-05-12 19:45:40 +08:00
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697
depends on !CC_IS_GCC || GCC_VERSION >= 100100
2021-07-13 05:46:37 +08:00
# https://github.com/llvm/llvm-project/commit/a88c722e687e6780dcd6a58718350dc76fcc4cc9
depends on !CC_IS_CLANG || CLANG_VERSION >= 120000
2020-05-07 03:51:34 +08:00
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
help
Build the kernel with Branch Target Identification annotations
and enable enforcement of this for kernel code. When this option
is enabled and the system supports BTI all kernel code including
modular code must have BTI enabled.
config CC_HAS_BRANCH_PROT_PAC_RET_BTI
# GCC 9 or later, clang 8 or later
def_bool $(cc-option,-mbranch-protection=pac-ret+leaf+bti)
2019-12-10 02:12:14 +08:00
config ARM64_E0PD
bool "Enable support for E0PD"
default y
help
2020-01-22 19:23:54 +08:00
E0PD (part of the ARMv8.5 extensions) allows us to ensure
that EL0 accesses made via TTBR1 always fault in constant time,
providing similar benefits to KASLR as those provided by KPTI, but
with lower overhead and without disrupting legitimate access to
kernel memory such as SPE.
2019-12-10 02:12:14 +08:00
2020-01-22 19:23:54 +08:00
This option enables E0PD for TTBR1 where available.
2019-12-10 02:12:14 +08:00
2020-01-21 20:58:52 +08:00
config ARCH_RANDOM
bool "Enable support for random number generation"
default y
help
Random number generation (part of the ARMv8.5 Extensions)
provides a high bandwidth, cryptographically secure
hardware random number generator.
2019-09-06 18:08:29 +08:00
config ARM64_AS_HAS_MTE
# Initial support for MTE went in binutils 2.32.0, checked with
# ".arch armv8.5-a+memtag" below. However, this was incomplete
# as a late addition to the final architecture spec (LDGM/STGM)
# is only supported in the newer 2.32.x and 2.33 binutils
# versions, hence the extra "stgm" instruction check below.
def_bool $(as-instr,.arch armv8.5-a+memtag\nstgm xzr$(comma)[x0])
config ARM64_MTE
bool "Memory Tagging Extension support"
default y
depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
2020-12-23 04:01:24 +08:00
depends on AS_HAS_ARMV8_5
2021-04-10 01:37:10 +08:00
depends on AS_HAS_LSE_ATOMICS
2020-12-23 04:01:35 +08:00
# Required for tag checking in the uaccess routines
depends on ARM64_PAN
2019-09-06 18:08:29 +08:00
select ARCH_USES_HIGH_VMA_FLAGS
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
architectural support for run-time, always-on detection of
various classes of memory error to aid with software debugging
to eliminate vulnerabilities arising from memory-unsafe
languages.
This option enables the support for the Memory Tagging
Extension at EL0 (i.e. for userspace).
Selecting this option allows the feature to be detected at
runtime. Any secondary CPU not implementing this feature will
not be allowed a late bring-up.
Userspace binaries that want to use this feature must
explicitly opt in. The mechanism for the userspace is
described in:
Documentation/arm64/memory-tagging-extension.rst.
2019-12-10 02:12:14 +08:00
endmenu
2021-03-13 01:38:10 +08:00
menu "ARMv8.7 architectural features"
config ARM64_EPAN
bool "Enable support for Enhanced Privileged Access Never (EPAN)"
default y
depends on ARM64_PAN
help
Enhanced Privileged Access Never (EPAN) allows Privileged
Access Never to be used with Execute-only mappings.
The feature is detected at runtime, and will remain disabled
if the cpu does not implement the feature.
endmenu
2017-10-31 23:51:02 +08:00
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
help
The Scalable Vector Extension (SVE) is an extension to the AArch64
execution state which complements and extends the SIMD functionality
of the base architecture to support much larger vectors and to enable
additional vectorisation opportunities.
To enable use of this extension on CPUs that implement it, say Y.
2019-04-19 01:41:38 +08:00
On CPUs that support the SVE2 extensions, this option will enable
those too.
2018-03-24 02:08:31 +08:00
Note that for architectural reasons, firmware _must_ implement SVE
support when running on SVE capable hardware. The required support
is present in:
* version 1.5 and later of the ARM Trusted Firmware
* the AArch64 boot wrapper since commit 5e1261e08abf
("bootwrapper: SVE: Enable SVE for EL2 and below").
For other firmware implementations, consult the firmware documentation
or vendor.
If you need the kernel to boot on SVE-capable hardware with broken
firmware, you may need to say N here until you get your firmware
fixed. Otherwise, you may experience firmware panics or lockups when
booting the kernel. If unsure and you are not observing these
symptoms, you should assume that it is safe to say Y.
2015-11-24 19:37:35 +08:00
config ARM64_MODULE_PLTS
2019-06-18 06:29:59 +08:00
bool "Use PLTs to allow module memory to spill over into vmalloc area"
2019-06-25 16:32:11 +08:00
depends on MODULES
2015-11-24 19:37:35 +08:00
select HAVE_MOD_ARCH_SPECIFIC
2019-06-18 06:29:59 +08:00
help
Allocate PLTs when loading modules so that jumps and calls whose
targets are too far away for their relative offsets to be encoded
in the instructions themselves can be bounced via veneers in the
module's PLT. This allows modules to be allocated in the generic
vmalloc area after the dedicated module memory area has been
exhausted.
When running with address space randomization (KASLR), the module
region itself may be too far away for ordinary relative jumps and
calls, and so in that case, module PLTs are required and cannot be
disabled.
Specific errata workaround(s) might also force module PLTs to be
enabled (ARM64_ERRATUM_843419).
2015-11-24 19:37:35 +08:00
2019-01-31 22:59:03 +08:00
config ARM64_PSEUDO_NMI
bool "Support for NMI-like interrupts"
2019-12-31 17:54:57 +08:00
select ARM_GIC_V3
2019-01-31 22:59:03 +08:00
help
Adds support for mimicking Non-Maskable Interrupts through the use of
GIC interrupt priority. This support requires version 3 or later of
2019-04-29 21:21:11 +08:00
ARM GIC.
2019-01-31 22:59:03 +08:00
This high priority configuration for interrupts needs to be
explicitly enabled by setting the kernel parameter
"irqchip.gicv3_pseudo_nmi" to 1.
If unsure, say N
2019-06-11 17:38:11 +08:00
if ARM64_PSEUDO_NMI
config ARM64_DEBUG_PRIORITY_MASKING
bool "Debug interrupt priority masking"
help
This adds runtime checks to functions enabling/disabling
interrupts when using priority masking. The additional checks verify
the validity of ICC_PMR_EL1 when calling concerned functions.
If unsure, say N
endif
2016-01-26 16:13:44 +08:00
config RELOCATABLE
2020-06-11 20:43:30 +08:00
bool "Build a relocatable kernel image" if EXPERT
2019-08-01 09:18:42 +08:00
select ARCH_HAS_RELR
2020-06-11 20:43:30 +08:00
default y
2016-01-26 16:13:44 +08:00
help
This builds the kernel as a Position Independent Executable (PIE),
which retains all relocation metadata required to relocate the
kernel binary at runtime to a different virtual address than the
address it was linked at.
Since AArch64 uses the RELA relocation format, this requires a
relocation pass at runtime even if the kernel is loaded at the
same address it was linked at.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
2016-07-27 01:16:55 +08:00
select ARM64_MODULE_PLTS if MODULES
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
select RELOCATABLE
help
Randomizes the virtual address at which the kernel image is
loaded, as a security feature that deters exploit attempts
relying on knowledge of the location of kernel internals.
It is the bootloader's job to provide entropy, by passing a
random u64 value in /chosen/kaslr-seed at kernel entry.
2016-01-26 21:48:29 +08:00
When booting via the UEFI stub, it will invoke the firmware's
EFI_RNG_PROTOCOL implementation (if available) to supply entropy
to the kernel proper. In addition, it will randomise the physical
location of the kernel Image as well.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
If unsure, say N.
config RANDOMIZE_MODULE_REGION_FULL
2021-07-30 20:51:31 +08:00
bool "Randomize the module region over a 2 GB range"
2017-06-07 01:00:22 +08:00
depends on RANDOMIZE_BASE
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
default y
help
2021-07-30 20:51:31 +08:00
Randomizes the location of the module region inside a 2 GB window
arm64/kernel: kaslr: reduce module randomization range to 4 GB
We currently have to rely on the GCC large code model for KASLR for
two distinct but related reasons:
- if we enable full randomization, modules will be loaded very far away
from the core kernel, where they are out of range for ADRP instructions,
- even without full randomization, the fact that the 128 MB module region
is now no longer fully reserved for kernel modules means that there is
a very low likelihood that the normal bottom-up allocation of other
vmalloc regions may collide, and use up the range for other things.
Large model code is suboptimal, given that each symbol reference involves
a literal load that goes through the D-cache, reducing cache utilization.
But more importantly, literals are not instructions but part of .text
nonetheless, and hence mapped with executable permissions.
So let's get rid of our dependency on the large model for KASLR, by:
- reducing the full randomization range to 4 GB, thereby ensuring that
ADRP references between modules and the kernel are always in range,
- reduce the spillover range to 4 GB as well, so that we fallback to a
region that is still guaranteed to be in range
- move the randomization window of the core kernel to the middle of the
VMALLOC space
Note that KASAN always uses the module region outside of the vmalloc space,
so keep the kernel close to that if KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-07 01:15:32 +08:00
covering the core kernel. This way, it is less likely for modules
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
to leak information about the location of core kernel data structures
but it does imply that function calls between modules and the core
kernel will need to be resolved via veneers in the module PLT.
When this option is not set, the module region will be randomized over
a limited range that contains the [_stext, _etext] interval of the
2021-07-30 20:51:31 +08:00
core kernel, so branch relocations are almost always in range unless
ARM64_MODULE_PLTS is enabled and the region is exhausted. In this
particular case of region exhaustion, modules might be able to fall
back to a larger 2GB area.
arm64: add support for kernel ASLR
This adds support for KASLR is implemented, based on entropy provided by
the bootloader in the /chosen/kaslr-seed DT property. Depending on the size
of the address space (VA_BITS) and the page size, the entropy in the
virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all
4 levels), with the sidenote that displacements that result in the kernel
image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB
granule kernels, respectively) are not allowed, and will be rounded up to
an acceptable value.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is
randomized independently from the core kernel. This makes it less likely
that the location of core kernel data structures can be determined by an
adversary, but causes all function calls from modules into the core kernel
to be resolved via entries in the module PLTs.
If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is
randomized by choosing a page aligned 128 MB region inside the interval
[_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of
entropy (depending on page size), independently of the kernel randomization,
but still guarantees that modules are within the range of relative branch
and jump instructions (with the caveat that, since the module region is
shared with other uses of the vmalloc area, modules may need to be loaded
further away if the module region is exhausted)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-01-26 21:12:01 +08:00
2018-12-12 20:08:44 +08:00
config CC_HAVE_STACKPROTECTOR_SYSREG
def_bool $(cc-option,-mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=0)
config STACKPROTECTOR_PER_TASK
def_bool y
depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_SYSREG
2012-04-20 21:45:54 +08:00
endmenu
menu "Boot options"
2016-01-26 19:10:38 +08:00
config ARM64_ACPI_PARKING_PROTOCOL
bool "Enable support for the ARM64 ACPI parking protocol"
depends on ACPI
help
Enable support for the ARM64 ACPI parking protocol. If disabled
the kernel will not allow booting through the ARM64 ACPI parking
protocol even if the corresponding data is present in the ACPI
MADT table.
2012-04-20 21:45:54 +08:00
config CMDLINE
string "Default kernel command string"
default ""
help
Provide a set of default command-line options at build time by
entering them here. As a minimum, you should specify the the
root device (e.g. root=/dev/nfs).
2020-09-22 03:15:57 +08:00
choice
prompt "Kernel command line type" if CMDLINE != ""
default CMDLINE_FROM_BOOTLOADER
help
Choose how the kernel will handle the provided default kernel
command line string.
config CMDLINE_FROM_BOOTLOADER
bool "Use bootloader kernel arguments if available"
help
Uses the command-line options passed by the boot loader. If
the boot loader doesn't provide any, the default kernel command
string provided in CMDLINE will be used.
2012-04-20 21:45:54 +08:00
config CMDLINE_FORCE
bool "Always use the default kernel command string"
help
Always use the default kernel command string, even if the boot
loader passes other arguments to the kernel.
This is useful if you cannot or don't want to change the
command-line options your boot loader passes to the kernel.
2020-09-22 03:15:57 +08:00
endchoice
2014-07-02 20:54:43 +08:00
config EFI_STUB
bool
2014-04-16 09:59:30 +08:00
config EFI
bool "UEFI runtime support"
depends on OF && !CPU_BIG_ENDIAN
2017-10-31 23:50:57 +08:00
depends on KERNEL_MODE_NEON
2018-07-24 17:48:45 +08:00
select ARCH_SUPPORTS_ACPI
2014-04-16 09:59:30 +08:00
select LIBFDT
select UCS2_STRING
select EFI_PARAMS_FROM_FDT
2014-07-05 01:41:53 +08:00
select EFI_RUNTIME_WRAPPERS
2014-07-02 20:54:43 +08:00
select EFI_STUB
2020-04-16 03:54:18 +08:00
select EFI_GENERIC_STUB
2020-10-30 14:08:40 +08:00
imply IMA_SECURE_AND_OR_TRUSTED_BOOT
2014-04-16 09:59:30 +08:00
default y
help
This option provides support for runtime services provided
by UEFI firmware (such as non-volatile variables, realtime
2014-04-16 10:47:52 +08:00
clock, and platform reset). A UEFI stub is also provided to
allow the kernel to be booted as an EFI application. This
is only useful on systems that have UEFI firmware.
2014-04-16 09:59:30 +08:00
2014-10-04 23:46:43 +08:00
config DMI
bool "Enable support for SMBIOS (DMI) tables"
depends on EFI
default y
help
This enables SMBIOS/DMI feature for systems.
This option is only useful on systems that have UEFI firmware.
However, even with this option, the resultant kernel should
continue to boot on existing non-UEFI platforms.
2012-04-20 21:45:54 +08:00
endmenu
config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC
2013-11-08 02:37:14 +08:00
menu "Power management options"
source "kernel/power/Kconfig"
2016-04-28 00:47:12 +08:00
config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on CPU_PM
config ARCH_HIBERNATION_HEADER
def_bool y
depends on HIBERNATION
2013-11-08 02:37:14 +08:00
config ARCH_SUSPEND_POSSIBLE
def_bool y
endmenu
2013-07-17 21:54:21 +08:00
menu "CPU Power Management"
source "drivers/cpuidle/Kconfig"
2014-02-24 10:27:57 +08:00
source "drivers/cpufreq/Kconfig"
endmenu
2015-03-24 22:02:53 +08:00
source "drivers/acpi/Kconfig"
2013-07-04 20:34:32 +08:00
source "arch/arm64/kvm/Kconfig"
2014-03-06 16:23:33 +08:00
if CRYPTO
source "arch/arm64/crypto/Kconfig"
endif