2019-06-03 13:44:50 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2012-03-05 19:49:27 +08:00
|
|
|
/*
|
|
|
|
* Low-level exception handling code
|
|
|
|
*
|
|
|
|
* Copyright (C) 2012 ARM Ltd.
|
|
|
|
* Authors: Catalin Marinas <catalin.marinas@arm.com>
|
|
|
|
* Will Deacon <will.deacon@arm.com>
|
|
|
|
*/
|
|
|
|
|
2018-05-29 20:11:06 +08:00
|
|
|
#include <linux/arm-smccc.h>
|
2012-03-05 19:49:27 +08:00
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/linkage.h>
|
|
|
|
|
2015-06-01 17:47:41 +08:00
|
|
|
#include <asm/alternative.h>
|
2012-03-05 19:49:27 +08:00
|
|
|
#include <asm/assembler.h>
|
|
|
|
#include <asm/asm-offsets.h>
|
2020-03-13 17:04:51 +08:00
|
|
|
#include <asm/asm_pointer_auth.h>
|
2020-06-30 20:53:07 +08:00
|
|
|
#include <asm/bug.h>
|
2015-03-24 03:07:02 +08:00
|
|
|
#include <asm/cpufeature.h>
|
2012-03-05 19:49:27 +08:00
|
|
|
#include <asm/errno.h>
|
2013-04-09 00:17:03 +08:00
|
|
|
#include <asm/esr.h>
|
2015-12-04 19:02:27 +08:00
|
|
|
#include <asm/irq.h>
|
2017-11-14 22:07:40 +08:00
|
|
|
#include <asm/memory.h>
|
|
|
|
#include <asm/mmu.h>
|
2017-08-31 16:30:50 +08:00
|
|
|
#include <asm/processor.h>
|
2016-09-02 21:54:03 +08:00
|
|
|
#include <asm/ptrace.h>
|
2020-04-28 00:00:16 +08:00
|
|
|
#include <asm/scs.h>
|
2012-03-05 19:49:27 +08:00
|
|
|
#include <asm/thread_info.h>
|
2016-12-26 17:10:19 +08:00
|
|
|
#include <asm/asm-uaccess.h>
|
2012-03-05 19:49:27 +08:00
|
|
|
#include <asm/unistd.h>
|
|
|
|
|
2014-05-31 03:34:15 +08:00
|
|
|
/*
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
* Context tracking and irqflag tracing need to instrument transitions between
|
|
|
|
* user and kernel mode.
|
2014-05-31 03:34:15 +08:00
|
|
|
*/
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
.macro user_exit_irqoff
|
|
|
|
#if defined(CONFIG_CONTEXT_TRACKING) || defined(CONFIG_TRACE_IRQFLAGS)
|
2019-08-21 01:45:57 +08:00
|
|
|
bl enter_from_user_mode
|
2014-05-31 03:34:15 +08:00
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
.macro user_enter_irqoff
|
|
|
|
#if defined(CONFIG_CONTEXT_TRACKING) || defined(CONFIG_TRACE_IRQFLAGS)
|
|
|
|
bl exit_to_user_mode
|
2014-05-31 03:34:15 +08:00
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
2018-07-11 21:56:48 +08:00
|
|
|
.macro clear_gp_regs
|
|
|
|
.irp n,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
|
|
|
|
mov x\n, xzr
|
|
|
|
.endr
|
|
|
|
.endm
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
/*
|
|
|
|
* Bad Abort numbers
|
|
|
|
*-----------------
|
|
|
|
*/
|
|
|
|
#define BAD_SYNC 0
|
|
|
|
#define BAD_IRQ 1
|
|
|
|
#define BAD_FIQ 2
|
|
|
|
#define BAD_ERROR 3
|
|
|
|
|
2017-11-14 22:20:21 +08:00
|
|
|
.macro kernel_ventry, el, label, regsize = 64
|
2017-07-20 00:24:49 +08:00
|
|
|
.align 7
|
2017-11-14 22:24:29 +08:00
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
|
|
|
.if \el == 0
|
2020-01-10 00:02:59 +08:00
|
|
|
alternative_if ARM64_UNMAP_KERNEL_AT_EL0
|
2017-11-14 22:24:29 +08:00
|
|
|
.if \regsize == 64
|
|
|
|
mrs x30, tpidrro_el0
|
|
|
|
msr tpidrro_el0, xzr
|
|
|
|
.else
|
|
|
|
mov x30, xzr
|
|
|
|
.endif
|
2017-11-14 22:38:19 +08:00
|
|
|
alternative_else_nop_endif
|
2020-01-10 00:02:59 +08:00
|
|
|
.endif
|
2017-11-14 22:24:29 +08:00
|
|
|
#endif
|
|
|
|
|
2021-01-12 09:58:13 +08:00
|
|
|
sub sp, sp, #PT_REGS_SIZE
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
#ifdef CONFIG_VMAP_STACK
|
|
|
|
/*
|
|
|
|
* Test whether the SP has overflowed, without corrupting a GPR.
|
2019-12-02 19:37:02 +08:00
|
|
|
* Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
|
|
|
|
* should always be zero.
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
*/
|
|
|
|
add sp, sp, x0 // sp' = sp + x0
|
|
|
|
sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
|
|
|
|
tbnz x0, #THREAD_SHIFT, 0f
|
|
|
|
sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
|
|
|
|
sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
|
2017-11-14 22:20:21 +08:00
|
|
|
b el\()\el\()_\label
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
|
|
|
|
0:
|
|
|
|
/*
|
|
|
|
* Either we've just detected an overflow, or we've taken an exception
|
|
|
|
* while on the overflow stack. Either way, we won't return to
|
|
|
|
* userspace, and can clobber EL0 registers to free up GPRs.
|
|
|
|
*/
|
|
|
|
|
2021-01-12 09:58:13 +08:00
|
|
|
/* Stash the original SP (minus PT_REGS_SIZE) in tpidr_el0. */
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
msr tpidr_el0, x0
|
|
|
|
|
|
|
|
/* Recover the original x0 value and stash it in tpidrro_el0 */
|
|
|
|
sub x0, sp, x0
|
|
|
|
msr tpidrro_el0, x0
|
|
|
|
|
|
|
|
/* Switch to the overflow stack */
|
|
|
|
adr_this_cpu sp, overflow_stack + OVERFLOW_STACK_SIZE, x0
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether we were already on the overflow stack. This may happen
|
|
|
|
* after panic() re-enables interrupts.
|
|
|
|
*/
|
|
|
|
mrs x0, tpidr_el0 // sp of interrupted context
|
|
|
|
sub x0, sp, x0 // delta with top of overflow stack
|
|
|
|
tst x0, #~(OVERFLOW_STACK_SIZE - 1) // within range?
|
|
|
|
b.ne __bad_stack // no? -> bad stack pointer
|
|
|
|
|
|
|
|
/* We were already on the overflow stack. Restore sp/x0 and carry on. */
|
|
|
|
sub sp, sp, x0
|
|
|
|
mrs x0, tpidrro_el0
|
|
|
|
#endif
|
2017-11-14 22:20:21 +08:00
|
|
|
b el\()\el\()_\label
|
2017-07-20 00:24:49 +08:00
|
|
|
.endm
|
|
|
|
|
2017-11-14 22:24:29 +08:00
|
|
|
.macro tramp_alias, dst, sym
|
|
|
|
mov_q \dst, TRAMP_VALIAS
|
|
|
|
add \dst, \dst, #(\sym - .entry.tramp.text)
|
2017-07-20 00:24:49 +08:00
|
|
|
.endm
|
|
|
|
|
2020-07-09 05:10:01 +08:00
|
|
|
/*
|
|
|
|
* This macro corrupts x0-x3. It is the caller's duty to save/restore
|
|
|
|
* them if required.
|
|
|
|
*/
|
2018-07-11 21:56:47 +08:00
|
|
|
.macro apply_ssbd, state, tmp1, tmp2
|
2020-09-18 18:54:33 +08:00
|
|
|
alternative_cb spectre_v4_patch_fw_mitigation_enable
|
|
|
|
b .L__asm_ssbd_skip\@ // Patched to NOP
|
2018-05-29 20:11:11 +08:00
|
|
|
alternative_cb_end
|
2018-05-29 20:11:07 +08:00
|
|
|
ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1
|
2018-07-11 21:56:47 +08:00
|
|
|
cbz \tmp2, .L__asm_ssbd_skip\@
|
2018-05-29 20:11:13 +08:00
|
|
|
ldr \tmp2, [tsk, #TSK_TI_FLAGS]
|
2018-07-11 21:56:47 +08:00
|
|
|
tbnz \tmp2, #TIF_SSBD, .L__asm_ssbd_skip\@
|
2018-05-29 20:11:06 +08:00
|
|
|
mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2
|
|
|
|
mov w1, #\state
|
2020-09-18 18:54:33 +08:00
|
|
|
alternative_cb spectre_v4_patch_fw_mitigation_conduit
|
2018-05-29 20:11:06 +08:00
|
|
|
nop // Patched to SMC/HVC #0
|
|
|
|
alternative_cb_end
|
2018-07-11 21:56:47 +08:00
|
|
|
.L__asm_ssbd_skip\@:
|
2018-05-29 20:11:06 +08:00
|
|
|
.endm
|
|
|
|
|
2019-09-16 18:51:17 +08:00
|
|
|
/* Check for MTE asynchronous tag check faults */
|
|
|
|
.macro check_mte_async_tcf, flgs, tmp
|
|
|
|
#ifdef CONFIG_ARM64_MTE
|
|
|
|
alternative_if_not ARM64_MTE
|
|
|
|
b 1f
|
|
|
|
alternative_else_nop_endif
|
|
|
|
mrs_s \tmp, SYS_TFSRE0_EL1
|
|
|
|
tbz \tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
|
|
|
|
/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
|
|
|
|
orr \flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
|
|
|
|
str \flgs, [tsk, #TSK_TI_FLAGS]
|
|
|
|
msr_s SYS_TFSRE0_EL1, xzr
|
|
|
|
1:
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
|
|
|
/* Clear the MTE asynchronous tag check faults */
|
|
|
|
.macro clear_mte_async_tcf
|
|
|
|
#ifdef CONFIG_ARM64_MTE
|
|
|
|
alternative_if ARM64_MTE
|
|
|
|
dsb ish
|
|
|
|
msr_s SYS_TFSRE0_EL1, xzr
|
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
2020-12-23 04:01:45 +08:00
|
|
|
.macro mte_set_gcr, tmp, tmp2
|
|
|
|
#ifdef CONFIG_ARM64_MTE
|
|
|
|
/*
|
|
|
|
* Calculate and set the exclude mask preserving
|
|
|
|
* the RRND (bit[16]) setting.
|
|
|
|
*/
|
|
|
|
mrs_s \tmp2, SYS_GCR_EL1
|
|
|
|
bfi \tmp2, \tmp, #0, #16
|
|
|
|
msr_s SYS_GCR_EL1, \tmp2
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
|
|
|
.macro mte_set_kernel_gcr, tmp, tmp2
|
|
|
|
#ifdef CONFIG_KASAN_HW_TAGS
|
|
|
|
alternative_if_not ARM64_MTE
|
|
|
|
b 1f
|
|
|
|
alternative_else_nop_endif
|
|
|
|
ldr_l \tmp, gcr_kernel_excl
|
|
|
|
|
|
|
|
mte_set_gcr \tmp, \tmp2
|
2020-12-03 15:34:58 +08:00
|
|
|
isb
|
2020-12-23 04:01:45 +08:00
|
|
|
1:
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
|
|
|
.macro mte_set_user_gcr, tsk, tmp, tmp2
|
|
|
|
#ifdef CONFIG_ARM64_MTE
|
|
|
|
alternative_if_not ARM64_MTE
|
|
|
|
b 1f
|
|
|
|
alternative_else_nop_endif
|
|
|
|
ldr \tmp, [\tsk, #THREAD_GCR_EL1_USER]
|
|
|
|
|
|
|
|
mte_set_gcr \tmp, \tmp2
|
|
|
|
1:
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
2017-07-20 00:24:49 +08:00
|
|
|
.macro kernel_entry, el, regsize = 64
|
2012-03-05 19:49:27 +08:00
|
|
|
.if \regsize == 32
|
|
|
|
mov w0, w0 // zero upper 32 bits of x0
|
|
|
|
.endif
|
2014-09-29 19:26:41 +08:00
|
|
|
stp x0, x1, [sp, #16 * 0]
|
|
|
|
stp x2, x3, [sp, #16 * 1]
|
|
|
|
stp x4, x5, [sp, #16 * 2]
|
|
|
|
stp x6, x7, [sp, #16 * 3]
|
|
|
|
stp x8, x9, [sp, #16 * 4]
|
|
|
|
stp x10, x11, [sp, #16 * 5]
|
|
|
|
stp x12, x13, [sp, #16 * 6]
|
|
|
|
stp x14, x15, [sp, #16 * 7]
|
|
|
|
stp x16, x17, [sp, #16 * 8]
|
|
|
|
stp x18, x19, [sp, #16 * 9]
|
|
|
|
stp x20, x21, [sp, #16 * 10]
|
|
|
|
stp x22, x23, [sp, #16 * 11]
|
|
|
|
stp x24, x25, [sp, #16 * 12]
|
|
|
|
stp x26, x27, [sp, #16 * 13]
|
|
|
|
stp x28, x29, [sp, #16 * 14]
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
.if \el == 0
|
2018-07-11 21:56:48 +08:00
|
|
|
clear_gp_regs
|
2012-03-05 19:49:27 +08:00
|
|
|
mrs x21, sp_el0
|
2020-01-17 02:35:48 +08:00
|
|
|
ldr_this_cpu tsk, __entry_task, x20
|
|
|
|
msr sp_el0, tsk
|
|
|
|
|
2020-07-09 05:10:01 +08:00
|
|
|
/*
|
|
|
|
* Ensure MDSCR_EL1.SS is clear, since we can unmask debug exceptions
|
|
|
|
* when scheduling.
|
|
|
|
*/
|
2020-01-17 02:35:48 +08:00
|
|
|
ldr x19, [tsk, #TSK_TI_FLAGS]
|
|
|
|
disable_step_tsk x19, x20
|
2015-12-10 18:22:41 +08:00
|
|
|
|
2019-09-16 18:51:17 +08:00
|
|
|
/* Check for asynchronous tag check faults in user space */
|
|
|
|
check_mte_async_tcf x19, x22
|
2018-07-11 21:56:47 +08:00
|
|
|
apply_ssbd 1, x22, x23
|
2018-05-29 20:11:06 +08:00
|
|
|
|
2020-04-23 18:16:05 +08:00
|
|
|
ptrauth_keys_install_kernel tsk, x20, x22, x23
|
2020-04-28 00:00:16 +08:00
|
|
|
|
2020-12-23 04:01:45 +08:00
|
|
|
mte_set_kernel_gcr x22, x23
|
|
|
|
|
2020-04-28 00:00:16 +08:00
|
|
|
scs_load tsk, x20
|
2012-03-05 19:49:27 +08:00
|
|
|
.else
|
2021-01-12 09:58:13 +08:00
|
|
|
add x21, sp, #PT_REGS_SIZE
|
2019-02-22 17:32:50 +08:00
|
|
|
get_current_task tsk
|
2016-06-21 01:28:01 +08:00
|
|
|
.endif /* \el == 0 */
|
2012-03-05 19:49:27 +08:00
|
|
|
mrs x22, elr_el1
|
|
|
|
mrs x23, spsr_el1
|
|
|
|
stp lr, x21, [sp, #S_LR]
|
2016-09-02 21:54:03 +08:00
|
|
|
|
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-23 01:45:33 +08:00
|
|
|
/*
|
2021-01-14 01:31:55 +08:00
|
|
|
* For exceptions from EL0, terminate the callchain here.
|
|
|
|
* For exceptions from EL1, create a synthetic frame record so the
|
|
|
|
* interrupted code shows up in the backtrace.
|
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-23 01:45:33 +08:00
|
|
|
*/
|
|
|
|
.if \el == 0
|
2021-01-14 01:31:55 +08:00
|
|
|
mov x29, xzr
|
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-23 01:45:33 +08:00
|
|
|
.else
|
|
|
|
stp x29, x22, [sp, #S_STACKFRAME]
|
|
|
|
add x29, sp, #S_STACKFRAME
|
2021-01-14 01:31:55 +08:00
|
|
|
.endif
|
arm64: unwind: reference pt_regs via embedded stack frame
As it turns out, the unwind code is slightly broken, and probably has
been for a while. The problem is in the dumping of the exception stack,
which is intended to dump the contents of the pt_regs struct at each
level in the call stack where an exception was taken and routed to a
routine marked as __exception (which means its stack frame is right
below the pt_regs struct on the stack).
'Right below the pt_regs struct' is ill defined, though: the unwind
code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
struct at each level, and dump_backtrace() happily dereferences that as
the pt_regs pointer when encountering an __exception routine. However,
the actual size of the stack frame created by this routine (which could
be one of many __exception routines we have in the kernel) is not known,
and so frame.sp is pretty useless to figure out where struct pt_regs
really is.
So it seems the only way to ensure that we can find our struct pt_regs
when walking the stack frames is to put it at a known fixed offset of
the stack frame pointer that is passed to such __exception routines.
The simplest way to do that is to put it inside pt_regs itself, which is
the main change implemented by this patch. As a bonus, doing this allows
us to get rid of a fair amount of cruft related to walking from one stack
to the other, which is especially nice since we intend to introduce yet
another stack for overflow handling once we add support for vmapped
stacks. It also fixes an inconsistency where we only add a stack frame
pointing to ELR_EL1 if we are executing from the IRQ stack but not when
we are executing from the task stack.
To consistly identify exceptions regs even in the presence of exceptions
taken from entry code, we must check whether the next frame was created
by entry text, rather than whether the current frame was crated by
exception text.
To avoid backtracing using PCs that fall in the idmap, or are controlled
by userspace, we must explcitly zero the FP and LR in startup paths, and
must ensure that the frame embedded in pt_regs is zeroed upon entry from
EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
is updated to avoid them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: compare current frame against .entry.text, avoid bogus PCs]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-23 01:45:33 +08:00
|
|
|
|
2016-09-02 21:54:03 +08:00
|
|
|
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
|
2020-07-21 16:33:15 +08:00
|
|
|
alternative_if_not ARM64_HAS_PAN
|
|
|
|
bl __swpan_entry_el\el
|
2016-09-02 21:54:03 +08:00
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
stp x22, x23, [sp, #S_PC]
|
|
|
|
|
2017-08-01 22:35:54 +08:00
|
|
|
/* Not in a syscall by default (el0_svc overwrites for real syscall) */
|
2012-03-05 19:49:27 +08:00
|
|
|
.if \el == 0
|
2017-08-01 22:35:54 +08:00
|
|
|
mov w21, #NO_SYSCALL
|
arm64: syscallno is secretly an int, make it official
The upper 32 bits of the syscallno field in thread_struct are
handled inconsistently, being sometimes zero extended and sometimes
sign-extended. In fact, only the lower 32 bits seem to have any
real significance for the behaviour of the code: it's been OK to
handle the upper bits inconsistently because they don't matter.
Currently, the only place I can find where those bits are
significant is in calling trace_sys_enter(), which may be
unintentional: for example, if a compat tracer attempts to cancel a
syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the
syscall-enter-stop, it will be traced as syscall 4294967295
rather than -1 as might be expected (and as occurs for a native
tracer doing the same thing). Elsewhere, reads of syscallno cast
it to an int or truncate it.
There's also a conspicuous amount of code and casting to bodge
around the fact that although semantically an int, syscallno is
stored as a u64.
Let's not pretend any more.
In order to preserve the stp x instruction that stores the syscall
number in entry.S, this patch special-cases the layout of struct
pt_regs for big endian so that the newly 32-bit syscallno field
maps onto the low bits of the stored value. This is not beautiful,
but benchmarking of the getpid syscall on Juno suggests indicates a
minor slowdown if the stp is split into an stp x and stp w.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-01 22:35:53 +08:00
|
|
|
str w21, [sp, #S_SYSCALLNO]
|
2012-03-05 19:49:27 +08:00
|
|
|
.endif
|
|
|
|
|
2019-01-31 22:58:46 +08:00
|
|
|
/* Save pmr */
|
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
mrs_s x20, SYS_ICC_PMR_EL1
|
|
|
|
str x20, [sp, #S_PMR_SAVE]
|
|
|
|
alternative_else_nop_endif
|
|
|
|
|
2019-09-16 18:51:17 +08:00
|
|
|
/* Re-enable tag checking (TCO set on exception entry) */
|
|
|
|
#ifdef CONFIG_ARM64_MTE
|
|
|
|
alternative_if ARM64_MTE
|
|
|
|
SET_PSTATE_TCO(0)
|
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
/*
|
|
|
|
* Registers that may be useful after this macro is invoked:
|
|
|
|
*
|
2019-06-11 17:38:10 +08:00
|
|
|
* x20 - ICC_PMR_EL1
|
2012-03-05 19:49:27 +08:00
|
|
|
* x21 - aborted SP
|
|
|
|
* x22 - aborted PC
|
|
|
|
* x23 - aborted PSTATE
|
|
|
|
*/
|
|
|
|
.endm
|
|
|
|
|
2015-08-19 22:57:09 +08:00
|
|
|
.macro kernel_exit, el
|
2016-06-21 01:28:01 +08:00
|
|
|
.if \el != 0
|
2017-11-02 20:12:37 +08:00
|
|
|
disable_daif
|
2016-06-21 01:28:01 +08:00
|
|
|
.endif
|
|
|
|
|
2019-01-31 22:58:46 +08:00
|
|
|
/* Restore pmr */
|
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
ldr x20, [sp, #S_PMR_SAVE]
|
|
|
|
msr_s SYS_ICC_PMR_EL1, x20
|
2019-10-02 17:06:12 +08:00
|
|
|
mrs_s x21, SYS_ICC_CTLR_EL1
|
|
|
|
tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE
|
|
|
|
dsb sy // Ensure priority change is seen by redistributor
|
|
|
|
.L__skip_pmr_sync\@:
|
2019-01-31 22:58:46 +08:00
|
|
|
alternative_else_nop_endif
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
|
2016-09-02 21:54:03 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
|
2020-07-21 16:33:15 +08:00
|
|
|
alternative_if_not ARM64_HAS_PAN
|
|
|
|
bl __swpan_exit_el\el
|
2016-09-02 21:54:03 +08:00
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
|
|
|
|
.if \el == 0
|
2012-03-05 19:49:27 +08:00
|
|
|
ldr x23, [sp, #S_SP] // load return stack pointer
|
2014-09-29 19:26:41 +08:00
|
|
|
msr sp_el0, x23
|
2017-11-14 22:24:29 +08:00
|
|
|
tst x22, #PSR_MODE32_BIT // native task?
|
|
|
|
b.eq 3f
|
|
|
|
|
2015-03-24 03:07:02 +08:00
|
|
|
#ifdef CONFIG_ARM64_ERRATUM_845719
|
2016-09-07 18:07:09 +08:00
|
|
|
alternative_if ARM64_WORKAROUND_845719
|
2015-07-22 19:21:03 +08:00
|
|
|
#ifdef CONFIG_PID_IN_CONTEXTIDR
|
|
|
|
mrs x29, contextidr_el1
|
|
|
|
msr contextidr_el1, x29
|
2015-03-24 03:07:02 +08:00
|
|
|
#else
|
2015-07-22 19:21:03 +08:00
|
|
|
msr contextidr_el1, xzr
|
2015-03-24 03:07:02 +08:00
|
|
|
#endif
|
2016-09-07 18:07:09 +08:00
|
|
|
alternative_else_nop_endif
|
2015-03-24 03:07:02 +08:00
|
|
|
#endif
|
2017-11-14 22:24:29 +08:00
|
|
|
3:
|
2020-04-28 00:00:16 +08:00
|
|
|
scs_save tsk, x0
|
|
|
|
|
2020-03-13 17:04:56 +08:00
|
|
|
/* No kernel C function calls after this as user keys are set. */
|
2020-03-13 17:04:51 +08:00
|
|
|
ptrauth_keys_install_user tsk, x0, x1, x2
|
|
|
|
|
2020-12-23 04:01:45 +08:00
|
|
|
mte_set_user_gcr tsk, x0, x1
|
|
|
|
|
2018-07-11 21:56:47 +08:00
|
|
|
apply_ssbd 0, x0, x1
|
2012-03-05 19:49:27 +08:00
|
|
|
.endif
|
2016-09-02 21:54:03 +08:00
|
|
|
|
2014-09-29 19:26:41 +08:00
|
|
|
msr elr_el1, x21 // set up the return data
|
|
|
|
msr spsr_el1, x22
|
|
|
|
ldp x0, x1, [sp, #16 * 0]
|
|
|
|
ldp x2, x3, [sp, #16 * 1]
|
|
|
|
ldp x4, x5, [sp, #16 * 2]
|
|
|
|
ldp x6, x7, [sp, #16 * 3]
|
|
|
|
ldp x8, x9, [sp, #16 * 4]
|
|
|
|
ldp x10, x11, [sp, #16 * 5]
|
|
|
|
ldp x12, x13, [sp, #16 * 6]
|
|
|
|
ldp x14, x15, [sp, #16 * 7]
|
|
|
|
ldp x16, x17, [sp, #16 * 8]
|
|
|
|
ldp x18, x19, [sp, #16 * 9]
|
|
|
|
ldp x20, x21, [sp, #16 * 10]
|
|
|
|
ldp x22, x23, [sp, #16 * 11]
|
|
|
|
ldp x24, x25, [sp, #16 * 12]
|
|
|
|
ldp x26, x27, [sp, #16 * 13]
|
|
|
|
ldp x28, x29, [sp, #16 * 14]
|
|
|
|
ldr lr, [sp, #S_LR]
|
2021-01-12 09:58:13 +08:00
|
|
|
add sp, sp, #PT_REGS_SIZE // restore sp
|
2017-11-14 22:24:29 +08:00
|
|
|
|
|
|
|
.if \el == 0
|
2017-11-14 22:38:19 +08:00
|
|
|
alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0
|
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
2020-07-09 05:10:01 +08:00
|
|
|
bne 4f
|
2017-11-14 22:24:29 +08:00
|
|
|
msr far_el1, x30
|
|
|
|
tramp_alias x30, tramp_exit_native
|
|
|
|
br x30
|
2020-07-09 05:10:01 +08:00
|
|
|
4:
|
2017-11-14 22:24:29 +08:00
|
|
|
tramp_alias x30, tramp_exit_compat
|
|
|
|
br x30
|
2017-11-14 22:38:19 +08:00
|
|
|
#endif
|
2017-11-14 22:24:29 +08:00
|
|
|
.else
|
2020-10-29 02:28:39 +08:00
|
|
|
/* Ensure any device/NC reads complete */
|
|
|
|
alternative_insn nop, "dmb sy", ARM64_WORKAROUND_1508412
|
|
|
|
|
2017-11-14 22:24:29 +08:00
|
|
|
eret
|
|
|
|
.endif
|
2018-06-14 18:23:38 +08:00
|
|
|
sb
|
2012-03-05 19:49:27 +08:00
|
|
|
.endm
|
|
|
|
|
2020-07-21 16:33:15 +08:00
|
|
|
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
|
|
|
|
/*
|
|
|
|
* Set the TTBR0 PAN bit in SPSR. When the exception is taken from
|
|
|
|
* EL0, there is no need to check the state of TTBR0_EL1 since
|
|
|
|
* accesses are always enabled.
|
|
|
|
* Note that the meaning of this bit differs from the ARMv8.1 PAN
|
|
|
|
* feature as all TTBR0_EL1 accesses are disabled, not just those to
|
|
|
|
* user mappings.
|
|
|
|
*/
|
|
|
|
SYM_CODE_START_LOCAL(__swpan_entry_el1)
|
|
|
|
mrs x21, ttbr0_el1
|
|
|
|
tst x21, #TTBR_ASID_MASK // Check for the reserved ASID
|
|
|
|
orr x23, x23, #PSR_PAN_BIT // Set the emulated PAN in the saved SPSR
|
|
|
|
b.eq 1f // TTBR0 access already disabled
|
|
|
|
and x23, x23, #~PSR_PAN_BIT // Clear the emulated PAN in the saved SPSR
|
|
|
|
SYM_INNER_LABEL(__swpan_entry_el0, SYM_L_LOCAL)
|
|
|
|
__uaccess_ttbr0_disable x21
|
|
|
|
1: ret
|
|
|
|
SYM_CODE_END(__swpan_entry_el1)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Restore access to TTBR0_EL1. If returning to EL0, no need for SPSR
|
|
|
|
* PAN bit checking.
|
|
|
|
*/
|
|
|
|
SYM_CODE_START_LOCAL(__swpan_exit_el1)
|
|
|
|
tbnz x22, #22, 1f // Skip re-enabling TTBR0 access if the PSR_PAN_BIT is set
|
|
|
|
__uaccess_ttbr0_enable x0, x1
|
|
|
|
1: and x22, x22, #~PSR_PAN_BIT // ARMv8.0 CPUs do not understand this bit
|
|
|
|
ret
|
|
|
|
SYM_CODE_END(__swpan_exit_el1)
|
|
|
|
|
|
|
|
SYM_CODE_START_LOCAL(__swpan_exit_el0)
|
|
|
|
__uaccess_ttbr0_enable x0, x1
|
|
|
|
/*
|
|
|
|
* Enable errata workarounds only if returning to user. The only
|
|
|
|
* workaround currently required for TTBR0_EL1 changes are for the
|
|
|
|
* Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
|
|
|
|
* corruption).
|
|
|
|
*/
|
|
|
|
b post_ttbr_update_workaround
|
|
|
|
SYM_CODE_END(__swpan_exit_el0)
|
|
|
|
#endif
|
|
|
|
|
2015-12-15 19:21:25 +08:00
|
|
|
.macro irq_stack_entry
|
2015-12-04 19:02:27 +08:00
|
|
|
mov x19, sp // preserve the original sp
|
2020-04-28 00:00:16 +08:00
|
|
|
#ifdef CONFIG_SHADOW_CALL_STACK
|
2020-05-15 21:46:46 +08:00
|
|
|
mov x24, scs_sp // preserve the original shadow stack
|
2020-04-28 00:00:16 +08:00
|
|
|
#endif
|
2015-12-04 19:02:27 +08:00
|
|
|
|
|
|
|
/*
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
* Compare sp with the base of the task stack.
|
|
|
|
* If the top ~(THREAD_SIZE - 1) bits match, we are on a task stack,
|
|
|
|
* and should switch to the irq stack.
|
2015-12-04 19:02:27 +08:00
|
|
|
*/
|
arm64: split thread_info from task stack
This patch moves arm64's struct thread_info from the task stack into
task_struct. This protects thread_info from corruption in the case of
stack overflows, and makes its address harder to determine if stack
addresses are leaked, making a number of attacks more difficult. Precise
detection and handling of overflow is left for subsequent patches.
Largely, this involves changing code to store the task_struct in sp_el0,
and acquire the thread_info from the task struct. Core code now
implements current_thread_info(), and as noted in <linux/sched.h> this
relies on offsetof(task_struct, thread_info) == 0, enforced by core
code.
This change means that the 'tsk' register used in entry.S now points to
a task_struct, rather than a thread_info as it used to. To make this
clear, the TI_* field offsets are renamed to TSK_TI_*, with asm-offsets
appropriately updated to account for the structural change.
Userspace clobbers sp_el0, and we can no longer restore this from the
stack. Instead, the current task is cached in a per-cpu variable that we
can safely access from early assembly as interrupts are disabled (and we
are thus not preemptible).
Both secondary entry and idle are updated to stash the sp and task
pointer separately.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-11-04 04:23:13 +08:00
|
|
|
ldr x25, [tsk, TSK_STACK]
|
|
|
|
eor x25, x25, x19
|
|
|
|
and x25, x25, #~(THREAD_SIZE - 1)
|
|
|
|
cbnz x25, 9998f
|
2015-12-04 19:02:27 +08:00
|
|
|
|
2017-08-01 04:17:03 +08:00
|
|
|
ldr_this_cpu x25, irq_stack_ptr, x26
|
arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP
For historical reasons, we leave the top 16 bytes of our task and IRQ
stacks unused, a practice used to ensure that the SP can always be
masked to find the base of the current stack (historically, where
thread_info could be found).
However, this is not necessary, as:
* When an exception is taken from a task stack, we decrement the SP by
S_FRAME_SIZE and stash the exception registers before we compare the
SP against the task stack. In such cases, the SP must be at least
S_FRAME_SIZE below the limit, and can be safely masked to determine
whether the task stack is in use.
* When transitioning to an IRQ stack, we'll place a dummy frame onto the
IRQ stack before enabling asynchronous exceptions, or executing code
we expect to trigger faults. Thus, if an exception is taken from the
IRQ stack, the SP must be at least 16 bytes below the limit.
* We no longer mask the SP to find the thread_info, which is now found
via sp_el0. Note that historically, the offset was critical to ensure
that cpu_switch_to() found the correct stack for new threads that
hadn't yet executed ret_from_fork().
Given that, this initial offset serves no purpose, and can be removed.
This brings us in-line with other architectures (e.g. x86) which do not
rely on this masking.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[Mark: rebase, kill THREAD_START_SP, commit msg additions]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-21 00:15:45 +08:00
|
|
|
mov x26, #IRQ_STACK_SIZE
|
2015-12-04 19:02:27 +08:00
|
|
|
add x26, x25, x26
|
arm64: remove irq_count and do_softirq_own_stack()
sysrq_handle_reboot() re-enables interrupts while on the irq stack. The
irq_stack implementation wrongly assumed this would only ever happen
via the softirq path, allowing it to update irq_count late, in
do_softirq_own_stack().
This means if an irq occurs in sysrq_handle_reboot(), during
emergency_restart() the stack will be corrupted, as irq_count wasn't
updated.
Lose the optimisation, and instead of moving the adding/subtracting of
irq_count into irq_stack_entry/irq_stack_exit, remove it, and compare
sp_el0 (struct thread_info) with sp & ~(THREAD_SIZE - 1). This tells us
if we are on a task stack, if so, we can safely switch to the irq stack.
Finally, remove do_softirq_own_stack(), we don't need it anymore.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: use get_thread_info macro]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-19 00:01:47 +08:00
|
|
|
|
|
|
|
/* switch to the irq stack */
|
2015-12-04 19:02:27 +08:00
|
|
|
mov sp, x26
|
2020-04-28 00:00:16 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_SHADOW_CALL_STACK
|
|
|
|
/* also switch to the irq shadow stack */
|
2020-12-01 07:34:42 +08:00
|
|
|
ldr_this_cpu scs_sp, irq_shadow_call_stack_ptr, x26
|
2020-04-28 00:00:16 +08:00
|
|
|
#endif
|
|
|
|
|
2015-12-04 19:02:27 +08:00
|
|
|
9998:
|
|
|
|
.endm
|
|
|
|
|
|
|
|
/*
|
2020-04-28 00:00:16 +08:00
|
|
|
* The callee-saved regs (x19-x29) should be preserved between
|
|
|
|
* irq_stack_entry and irq_stack_exit, but note that kernel_entry
|
|
|
|
* uses x20-x23 to store data for later use.
|
2015-12-04 19:02:27 +08:00
|
|
|
*/
|
|
|
|
.macro irq_stack_exit
|
|
|
|
mov sp, x19
|
2020-04-28 00:00:16 +08:00
|
|
|
#ifdef CONFIG_SHADOW_CALL_STACK
|
2020-05-15 21:46:46 +08:00
|
|
|
mov scs_sp, x24
|
2020-04-28 00:00:16 +08:00
|
|
|
#endif
|
2015-12-04 19:02:27 +08:00
|
|
|
.endm
|
|
|
|
|
2019-01-03 21:23:10 +08:00
|
|
|
/* GPRs used by entry code */
|
2012-03-05 19:49:27 +08:00
|
|
|
tsk .req x28 // current thread_info
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Interrupt handling.
|
|
|
|
*/
|
|
|
|
.macro irq_handler
|
2015-12-04 19:02:27 +08:00
|
|
|
ldr_l x1, handle_arch_irq
|
2012-03-05 19:49:27 +08:00
|
|
|
mov x0, sp
|
2015-12-15 19:21:25 +08:00
|
|
|
irq_stack_entry
|
2012-03-05 19:49:27 +08:00
|
|
|
blr x1
|
2015-12-04 19:02:27 +08:00
|
|
|
irq_stack_exit
|
2012-03-05 19:49:27 +08:00
|
|
|
.endm
|
|
|
|
|
2019-06-11 17:38:09 +08:00
|
|
|
#ifdef CONFIG_ARM64_PSEUDO_NMI
|
|
|
|
/*
|
|
|
|
* Set res to 0 if irqs were unmasked in interrupted context.
|
|
|
|
* Otherwise set res to non-0 value.
|
|
|
|
*/
|
|
|
|
.macro test_irqs_unmasked res:req, pmr:req
|
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
sub \res, \pmr, #GIC_PRIO_IRQON
|
|
|
|
alternative_else
|
|
|
|
mov \res, xzr
|
|
|
|
alternative_endif
|
|
|
|
.endm
|
|
|
|
#endif
|
|
|
|
|
2019-06-11 17:38:10 +08:00
|
|
|
.macro gic_prio_kentry_setup, tmp:req
|
|
|
|
#ifdef CONFIG_ARM64_PSEUDO_NMI
|
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
mov \tmp, #(GIC_PRIO_PSR_I_SET | GIC_PRIO_IRQON)
|
|
|
|
msr_s SYS_ICC_PMR_EL1, \tmp
|
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
|
|
|
.macro gic_prio_irq_setup, pmr:req, tmp:req
|
|
|
|
#ifdef CONFIG_ARM64_PSEUDO_NMI
|
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
orr \tmp, \pmr, #GIC_PRIO_PSR_I_SET
|
|
|
|
msr_s SYS_ICC_PMR_EL1, \tmp
|
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif
|
|
|
|
.endm
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
.text
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Exception vectors.
|
|
|
|
*/
|
2016-07-09 00:35:50 +08:00
|
|
|
.pushsection ".entry.text", "ax"
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
.align 11
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START(vectors)
|
2017-11-14 22:20:21 +08:00
|
|
|
kernel_ventry 1, sync_invalid // Synchronous EL1t
|
|
|
|
kernel_ventry 1, irq_invalid // IRQ EL1t
|
|
|
|
kernel_ventry 1, fiq_invalid // FIQ EL1t
|
|
|
|
kernel_ventry 1, error_invalid // Error EL1t
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2017-11-14 22:20:21 +08:00
|
|
|
kernel_ventry 1, sync // Synchronous EL1h
|
|
|
|
kernel_ventry 1, irq // IRQ EL1h
|
|
|
|
kernel_ventry 1, fiq_invalid // FIQ EL1h
|
|
|
|
kernel_ventry 1, error // Error EL1h
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2017-11-14 22:20:21 +08:00
|
|
|
kernel_ventry 0, sync // Synchronous 64-bit EL0
|
|
|
|
kernel_ventry 0, irq // IRQ 64-bit EL0
|
|
|
|
kernel_ventry 0, fiq_invalid // FIQ 64-bit EL0
|
|
|
|
kernel_ventry 0, error // Error 64-bit EL0
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_COMPAT
|
2017-11-14 22:20:21 +08:00
|
|
|
kernel_ventry 0, sync_compat, 32 // Synchronous 32-bit EL0
|
|
|
|
kernel_ventry 0, irq_compat, 32 // IRQ 32-bit EL0
|
|
|
|
kernel_ventry 0, fiq_invalid_compat, 32 // FIQ 32-bit EL0
|
|
|
|
kernel_ventry 0, error_compat, 32 // Error 32-bit EL0
|
2012-03-05 19:49:27 +08:00
|
|
|
#else
|
2017-11-14 22:20:21 +08:00
|
|
|
kernel_ventry 0, sync_invalid, 32 // Synchronous 32-bit EL0
|
|
|
|
kernel_ventry 0, irq_invalid, 32 // IRQ 32-bit EL0
|
|
|
|
kernel_ventry 0, fiq_invalid, 32 // FIQ 32-bit EL0
|
|
|
|
kernel_ventry 0, error_invalid, 32 // Error 32-bit EL0
|
2012-03-05 19:49:27 +08:00
|
|
|
#endif
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(vectors)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
#ifdef CONFIG_VMAP_STACK
|
|
|
|
/*
|
|
|
|
* We detected an overflow in kernel_ventry, which switched to the
|
|
|
|
* overflow stack. Stash the exception regs, and head to our overflow
|
|
|
|
* handler.
|
|
|
|
*/
|
|
|
|
__bad_stack:
|
|
|
|
/* Restore the original x0 value */
|
|
|
|
mrs x0, tpidrro_el0
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Store the original GPRs to the new stack. The orginal SP (minus
|
2021-01-12 09:58:13 +08:00
|
|
|
* PT_REGS_SIZE) was stashed in tpidr_el0 by kernel_ventry.
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
*/
|
2021-01-12 09:58:13 +08:00
|
|
|
sub sp, sp, #PT_REGS_SIZE
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
kernel_entry 1
|
|
|
|
mrs x0, tpidr_el0
|
2021-01-12 09:58:13 +08:00
|
|
|
add x0, x0, #PT_REGS_SIZE
|
arm64: add VMAP_STACK overflow detection
This patch adds stack overflow detection to arm64, usable when vmap'd stacks
are in use.
Overflow is detected in a small preamble executed for each exception entry,
which checks whether there is enough space on the current stack for the general
purpose registers to be saved. If there is not enough space, the overflow
handler is invoked on a per-cpu overflow stack. This approach preserves the
original exception information in ESR_EL1 (and where appropriate, FAR_EL1).
Task and IRQ stacks are aligned to double their size, enabling overflow to be
detected with a single bit test. For example, a 16K stack is aligned to 32K,
ensuring that bit 14 of the SP must be zero. On an overflow (or underflow),
this bit is flipped. Thus, overflow (of less than the size of the stack) can be
detected by testing whether this bit is set.
The overflow check is performed before any attempt is made to access the
stack, avoiding recursive faults (and the loss of exception information
these would entail). As logical operations cannot be performed on the SP
directly, the SP is temporarily swapped with a general purpose register
using arithmetic operations to enable the test to be performed.
This gives us a useful error message on stack overflow, as can be trigger with
the LKDTM overflow test:
[ 305.388749] lkdtm: Performing direct entry OVERFLOW
[ 305.395444] Insufficient stack space to handle exception!
[ 305.395482] ESR: 0x96000047 -- DABT (current EL)
[ 305.399890] FAR: 0xffff00000a5e7f30
[ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000]
[ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0]
[ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.412785] Hardware name: linux,dummy-virt (DT)
[ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000
[ 305.419221] PC is at recursive_loop+0x10/0x48
[ 305.421637] LR is at recursive_loop+0x38/0x48
[ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145
[ 305.428020] sp : ffff00000a5e7f50
[ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00
[ 305.433191] x27: ffff000008981000 x26: ffff000008f80400
[ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8
[ 305.440369] x23: ffff000008f80138 x22: 0000000000000009
[ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188
[ 305.444552] x19: 0000000000000013 x18: 0000000000000006
[ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8
[ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8
[ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff
[ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d
[ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770
[ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1
[ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000
[ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400
[ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012
[ 305.467724] Kernel panic - not syncing: kernel stack overflow
[ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5
[ 305.473325] Hardware name: linux,dummy-virt (DT)
[ 305.475070] Call trace:
[ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378
[ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20
[ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8
[ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280
[ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70
[ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128
[ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0)
[ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000
[ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313
[ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548
[ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d
[ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013
[ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138
[ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000
[ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50
[ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000
[ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330
[ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c
[ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48
[ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48
[ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20
[ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28
[ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170
[ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8
[ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128
[ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0
[ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0
[ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000)
[ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080
[ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f
[ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038
[ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000
[ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0
[ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458
[ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0
[ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040
[ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28
[ 305.504720] Kernel Offset: disabled
[ 305.505189] CPU features: 0x002082
[ 305.505473] Memory Limit: none
[ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow
This patch was co-authored by Ard Biesheuvel and Mark Rutland.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
2017-07-15 03:30:35 +08:00
|
|
|
str x0, [sp, #S_SP]
|
|
|
|
|
|
|
|
/* Stash the regs for handle_bad_stack */
|
|
|
|
mov x0, sp
|
|
|
|
|
|
|
|
/* Time to die */
|
|
|
|
bl handle_bad_stack
|
|
|
|
ASM_BUG()
|
|
|
|
#endif /* CONFIG_VMAP_STACK */
|
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
/*
|
|
|
|
* Invalid mode handlers
|
|
|
|
*/
|
|
|
|
.macro inv_entry, el, reason, regsize = 64
|
2016-03-18 17:58:09 +08:00
|
|
|
kernel_entry \el, \regsize
|
2012-03-05 19:49:27 +08:00
|
|
|
mov x0, sp
|
|
|
|
mov x1, #\reason
|
|
|
|
mrs x2, esr_el1
|
arm64: consistently use bl for C exception entry
In most cases, our exception entry assembly branches to C handlers with
a BL instruction, but in cases where we do not expect to return, we use
B instead.
While this is correct today, it means that backtraces for fatal
exceptions miss the entry assembly (as the LR is stale at the point we
call C code), while non-fatal exceptions have the entry assembly in the
LR. In subsequent patches, we will need the LR to be set in these cases
in order to backtrace reliably.
This patch updates these sites to use a BL, ensuring consistency, and
preparing for backtrace rework. An ASM_BUG() is added after each of
these new BLs, which both catches unexpected returns, and ensures that
the LR value doesn't point to another function label.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
2017-07-26 18:14:53 +08:00
|
|
|
bl bad_mode
|
|
|
|
ASM_BUG()
|
2012-03-05 19:49:27 +08:00
|
|
|
.endm
|
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_sync_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 0, BAD_SYNC
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_sync_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_irq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 0, BAD_IRQ
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_irq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_fiq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 0, BAD_FIQ
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_fiq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_error_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 0, BAD_ERROR
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_error_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_COMPAT
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_fiq_invalid_compat)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 0, BAD_FIQ, 32
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_fiq_invalid_compat)
|
2012-03-05 19:49:27 +08:00
|
|
|
#endif
|
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el1_sync_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 1, BAD_SYNC
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_sync_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el1_irq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 1, BAD_IRQ
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_irq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el1_fiq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 1, BAD_FIQ
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_fiq_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el1_error_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
inv_entry 1, BAD_ERROR
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_error_invalid)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* EL1 mode handlers.
|
|
|
|
*/
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el1_sync)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 1
|
|
|
|
mov x0, sp
|
2019-10-26 00:42:13 +08:00
|
|
|
bl el1_sync_handler
|
2018-08-07 20:43:06 +08:00
|
|
|
kernel_exit 1
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_sync)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el1_irq)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 1
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_prio_irq_setup pmr=x20, tmp=x1
|
2017-11-02 20:12:41 +08:00
|
|
|
enable_da_f
|
2019-06-11 17:38:09 +08:00
|
|
|
|
2020-11-30 19:59:45 +08:00
|
|
|
mov x0, sp
|
|
|
|
bl enter_el1_irq_or_nmi
|
2013-11-13 01:11:53 +08:00
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
irq_handler
|
2013-11-13 01:11:53 +08:00
|
|
|
|
2019-10-16 03:17:49 +08:00
|
|
|
#ifdef CONFIG_PREEMPTION
|
2018-12-11 21:41:32 +08:00
|
|
|
ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count
|
2019-01-31 22:59:01 +08:00
|
|
|
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
|
|
|
|
/*
|
|
|
|
* DA_F were cleared at start of handling. If anything is set in DAIF,
|
|
|
|
* we come back from an NMI, so skip preemption
|
|
|
|
*/
|
|
|
|
mrs x0, daif
|
|
|
|
orr x24, x24, x0
|
|
|
|
alternative_else_nop_endif
|
|
|
|
cbnz x24, 1f // preempt count != 0 || NMI return path
|
arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled
Preempting from IRQ-return means that the task has its PSTATE saved
on the stack, which will get restored when the task is resumed and does
the actual IRQ return.
However, enabling some CPU features requires modifying the PSTATE. This
means that, if a task was scheduled out during an IRQ-return before all
CPU features are enabled, the task might restore a PSTATE that does not
include the feature enablement changes once scheduled back in.
* Task 1:
PAN == 0 ---| |---------------
| |<- return from IRQ, PSTATE.PAN = 0
| <- IRQ |
+--------+ <- preempt() +--
^
|
reschedule Task 1, PSTATE.PAN == 1
* Init:
--------------------+------------------------
^
|
enable_cpu_features
set PSTATE.PAN on all CPUs
Worse than this, since PSTATE is untouched when task switching is done,
a task missing the new bits in PSTATE might affect another task, if both
do direct calls to schedule() (outside of IRQ/exception contexts).
Fix this by preventing preemption on IRQ-return until features are
enabled on all CPUs.
This way the only PSTATE values that are saved on the stack are from
synchronous exceptions. These are expected to be fatal this early, the
exception is BRK for WARN_ON(), but as this uses do_debug_exception()
which keeps IRQs masked, it shouldn't call schedule().
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[james: Replaced a really cool hack, with an even simpler static key in C.
expanded commit message with Julien's cover-letter ascii art]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16 01:25:44 +08:00
|
|
|
bl arm64_preempt_schedule_irq // irq en/disable is done inside
|
2012-03-05 19:49:27 +08:00
|
|
|
1:
|
|
|
|
#endif
|
2019-06-11 17:38:09 +08:00
|
|
|
|
2020-11-30 19:59:45 +08:00
|
|
|
mov x0, sp
|
|
|
|
bl exit_el1_irq_or_nmi
|
2019-01-31 22:59:02 +08:00
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_exit 1
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_irq)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* EL0 mode handlers.
|
|
|
|
*/
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el0_sync)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 0
|
2019-10-26 00:42:14 +08:00
|
|
|
mov x0, sp
|
|
|
|
bl el0_sync_handler
|
|
|
|
b ret_to_user
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_sync)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el0_sync_compat)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 0, 32
|
2018-07-11 21:56:45 +08:00
|
|
|
mov x0, sp
|
2019-10-26 00:42:14 +08:00
|
|
|
bl el0_sync_compat_handler
|
2018-07-11 21:56:45 +08:00
|
|
|
b ret_to_user
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_sync_compat)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el0_irq_compat)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 0, 32
|
|
|
|
b el0_irq_naked
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_irq_compat)
|
2017-11-02 20:12:42 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el0_error_compat)
|
2017-11-02 20:12:42 +08:00
|
|
|
kernel_entry 0, 32
|
|
|
|
b el0_error_naked
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_error_compat)
|
2018-02-03 01:31:39 +08:00
|
|
|
#endif
|
2012-03-05 19:49:27 +08:00
|
|
|
|
|
|
|
.align 6
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL_NOALIGN(el0_irq)
|
2012-03-05 19:49:27 +08:00
|
|
|
kernel_entry 0
|
|
|
|
el0_irq_naked:
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_prio_irq_setup pmr=x20, tmp=x0
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
user_exit_irqoff
|
2017-11-02 20:12:41 +08:00
|
|
|
enable_da_f
|
2019-06-11 17:38:10 +08:00
|
|
|
|
2018-02-03 01:31:40 +08:00
|
|
|
tbz x22, #55, 1f
|
|
|
|
bl do_el0_irq_bp_hardening
|
|
|
|
1:
|
2012-03-05 19:49:27 +08:00
|
|
|
irq_handler
|
2013-11-13 01:11:53 +08:00
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
b ret_to_user
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_irq)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el1_error)
|
2017-11-02 20:12:42 +08:00
|
|
|
kernel_entry 1
|
|
|
|
mrs x1, esr_el1
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_prio_kentry_setup tmp=x2
|
2017-11-02 20:12:42 +08:00
|
|
|
enable_dbg
|
|
|
|
mov x0, sp
|
|
|
|
bl do_serror
|
|
|
|
kernel_exit 1
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el1_error)
|
2017-11-02 20:12:42 +08:00
|
|
|
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_START_LOCAL(el0_error)
|
2017-11-02 20:12:42 +08:00
|
|
|
kernel_entry 0
|
|
|
|
el0_error_naked:
|
2019-08-21 01:45:57 +08:00
|
|
|
mrs x25, esr_el1
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_prio_kentry_setup tmp=x2
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
user_exit_irqoff
|
2017-11-02 20:12:42 +08:00
|
|
|
enable_dbg
|
|
|
|
mov x0, sp
|
2019-08-21 01:45:57 +08:00
|
|
|
mov x1, x25
|
2017-11-02 20:12:42 +08:00
|
|
|
bl do_serror
|
2019-06-11 17:38:06 +08:00
|
|
|
enable_da_f
|
2017-11-02 20:12:42 +08:00
|
|
|
b ret_to_user
|
2020-02-19 03:58:27 +08:00
|
|
|
SYM_CODE_END(el0_error)
|
2017-11-02 20:12:42 +08:00
|
|
|
|
2012-03-05 19:49:27 +08:00
|
|
|
/*
|
|
|
|
* "slow" syscall return path.
|
|
|
|
*/
|
2020-05-01 19:54:28 +08:00
|
|
|
SYM_CODE_START_LOCAL(ret_to_user)
|
2017-11-02 20:12:37 +08:00
|
|
|
disable_daif
|
2019-06-11 17:38:10 +08:00
|
|
|
gic_prio_kentry_setup tmp=x3
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
#ifdef CONFIG_TRACE_IRQFLAGS
|
|
|
|
bl trace_hardirqs_off
|
|
|
|
#endif
|
2020-11-30 19:59:44 +08:00
|
|
|
ldr x19, [tsk, #TSK_TI_FLAGS]
|
|
|
|
and x2, x19, #_TIF_WORK_MASK
|
2012-03-05 19:49:27 +08:00
|
|
|
cbnz x2, work_pending
|
2016-07-15 04:48:14 +08:00
|
|
|
finish_ret_to_user:
|
arm64: entry: fix non-NMI user<->kernel transitions
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-11-30 19:59:46 +08:00
|
|
|
user_enter_irqoff
|
2019-09-16 18:51:17 +08:00
|
|
|
/* Ignore asynchronous tag check faults in the uaccess routines */
|
|
|
|
clear_mte_async_tcf
|
2020-11-30 19:59:44 +08:00
|
|
|
enable_step_tsk x19, x2
|
2018-07-21 05:41:54 +08:00
|
|
|
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
|
|
|
|
bl stackleak_erase
|
|
|
|
#endif
|
2015-08-19 22:57:09 +08:00
|
|
|
kernel_exit 0
|
2020-05-01 19:54:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Ok, we need to do extra processing, enter the slow path.
|
|
|
|
*/
|
|
|
|
work_pending:
|
|
|
|
mov x0, sp // 'regs'
|
2020-11-30 19:59:44 +08:00
|
|
|
mov x1, x19
|
2020-05-01 19:54:28 +08:00
|
|
|
bl do_notify_resume
|
2020-11-30 19:59:44 +08:00
|
|
|
ldr x19, [tsk, #TSK_TI_FLAGS] // re-check for single-step
|
2020-05-01 19:54:28 +08:00
|
|
|
b finish_ret_to_user
|
|
|
|
SYM_CODE_END(ret_to_user)
|
2012-03-05 19:49:27 +08:00
|
|
|
|
2016-07-09 00:35:50 +08:00
|
|
|
.popsection // .entry.text
|
|
|
|
|
2017-11-14 22:07:40 +08:00
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
|
|
|
/*
|
|
|
|
* Exception vectors trampoline.
|
|
|
|
*/
|
|
|
|
.pushsection ".entry.tramp.text", "ax"
|
|
|
|
|
2020-11-03 18:22:29 +08:00
|
|
|
// Move from tramp_pg_dir to swapper_pg_dir
|
2017-11-14 22:07:40 +08:00
|
|
|
.macro tramp_map_kernel, tmp
|
|
|
|
mrs \tmp, ttbr1_el1
|
2021-02-02 20:36:58 +08:00
|
|
|
add \tmp, \tmp, #TRAMP_SWAPPER_OFFSET
|
2017-11-14 22:07:40 +08:00
|
|
|
bic \tmp, \tmp, #USER_ASID_FLAG
|
|
|
|
msr ttbr1_el1, \tmp
|
2017-11-14 22:29:19 +08:00
|
|
|
#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
|
|
|
|
alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
|
|
|
|
/* ASID already in \tmp[63:48] */
|
|
|
|
movk \tmp, #:abs_g2_nc:(TRAMP_VALIAS >> 12)
|
|
|
|
movk \tmp, #:abs_g1_nc:(TRAMP_VALIAS >> 12)
|
|
|
|
/* 2MB boundary containing the vectors, so we nobble the walk cache */
|
|
|
|
movk \tmp, #:abs_g0_nc:((TRAMP_VALIAS & ~(SZ_2M - 1)) >> 12)
|
|
|
|
isb
|
|
|
|
tlbi vae1, \tmp
|
|
|
|
dsb nsh
|
|
|
|
alternative_else_nop_endif
|
|
|
|
#endif /* CONFIG_QCOM_FALKOR_ERRATUM_1003 */
|
2017-11-14 22:07:40 +08:00
|
|
|
.endm
|
|
|
|
|
2020-11-03 18:22:29 +08:00
|
|
|
// Move from swapper_pg_dir to tramp_pg_dir
|
2017-11-14 22:07:40 +08:00
|
|
|
.macro tramp_unmap_kernel, tmp
|
|
|
|
mrs \tmp, ttbr1_el1
|
2021-02-02 20:36:58 +08:00
|
|
|
sub \tmp, \tmp, #TRAMP_SWAPPER_OFFSET
|
2017-11-14 22:07:40 +08:00
|
|
|
orr \tmp, \tmp, #USER_ASID_FLAG
|
|
|
|
msr ttbr1_el1, \tmp
|
|
|
|
/*
|
2018-01-29 19:59:58 +08:00
|
|
|
* We avoid running the post_ttbr_update_workaround here because
|
|
|
|
* it's only needed by Cavium ThunderX, which requires KPTI to be
|
|
|
|
* disabled.
|
2017-11-14 22:07:40 +08:00
|
|
|
*/
|
|
|
|
.endm
|
|
|
|
|
|
|
|
.macro tramp_ventry, regsize = 64
|
|
|
|
.align 7
|
|
|
|
1:
|
|
|
|
.if \regsize == 64
|
|
|
|
msr tpidrro_el0, x30 // Restored in kernel_ventry
|
|
|
|
.endif
|
2017-11-15 00:15:59 +08:00
|
|
|
/*
|
|
|
|
* Defend against branch aliasing attacks by pushing a dummy
|
|
|
|
* entry onto the return stack and using a RET instruction to
|
|
|
|
* enter the full-fat kernel vectors.
|
|
|
|
*/
|
|
|
|
bl 2f
|
|
|
|
b .
|
|
|
|
2:
|
2017-11-14 22:07:40 +08:00
|
|
|
tramp_map_kernel x30
|
2017-12-06 19:24:02 +08:00
|
|
|
#ifdef CONFIG_RANDOMIZE_BASE
|
|
|
|
adr x30, tramp_vectors + PAGE_SIZE
|
|
|
|
alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003
|
|
|
|
ldr x30, [x30]
|
|
|
|
#else
|
2017-11-14 22:07:40 +08:00
|
|
|
ldr x30, =vectors
|
2017-12-06 19:24:02 +08:00
|
|
|
#endif
|
2019-04-09 23:22:24 +08:00
|
|
|
alternative_if_not ARM64_WORKAROUND_CAVIUM_TX2_219_PRFM
|
2017-11-14 22:07:40 +08:00
|
|
|
prfm plil1strm, [x30, #(1b - tramp_vectors)]
|
2019-04-09 23:22:24 +08:00
|
|
|
alternative_else_nop_endif
|
2017-11-14 22:07:40 +08:00
|
|
|
msr vbar_el1, x30
|
|
|
|
add x30, x30, #(1b - tramp_vectors)
|
|
|
|
isb
|
2017-11-15 00:15:59 +08:00
|
|
|
ret
|
2017-11-14 22:07:40 +08:00
|
|
|
.endm
|
|
|
|
|
|
|
|
.macro tramp_exit, regsize = 64
|
|
|
|
adr x30, tramp_vectors
|
|
|
|
msr vbar_el1, x30
|
|
|
|
tramp_unmap_kernel x30
|
|
|
|
.if \regsize == 64
|
|
|
|
mrs x30, far_el1
|
|
|
|
.endif
|
|
|
|
eret
|
2018-06-14 18:23:38 +08:00
|
|
|
sb
|
2017-11-14 22:07:40 +08:00
|
|
|
.endm
|
|
|
|
|
|
|
|
.align 11
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_START_NOALIGN(tramp_vectors)
|
2017-11-14 22:07:40 +08:00
|
|
|
.space 0x400
|
|
|
|
|
|
|
|
tramp_ventry
|
|
|
|
tramp_ventry
|
|
|
|
tramp_ventry
|
|
|
|
tramp_ventry
|
|
|
|
|
|
|
|
tramp_ventry 32
|
|
|
|
tramp_ventry 32
|
|
|
|
tramp_ventry 32
|
|
|
|
tramp_ventry 32
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_END(tramp_vectors)
|
2017-11-14 22:07:40 +08:00
|
|
|
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_START(tramp_exit_native)
|
2017-11-14 22:07:40 +08:00
|
|
|
tramp_exit
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_END(tramp_exit_native)
|
2017-11-14 22:07:40 +08:00
|
|
|
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_START(tramp_exit_compat)
|
2017-11-14 22:07:40 +08:00
|
|
|
tramp_exit 32
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_CODE_END(tramp_exit_compat)
|
2017-11-14 22:07:40 +08:00
|
|
|
|
|
|
|
.ltorg
|
|
|
|
.popsection // .entry.tramp.text
|
2017-12-06 19:24:02 +08:00
|
|
|
#ifdef CONFIG_RANDOMIZE_BASE
|
|
|
|
.pushsection ".rodata", "a"
|
|
|
|
.align PAGE_SHIFT
|
2020-02-19 03:58:35 +08:00
|
|
|
SYM_DATA_START(__entry_tramp_data_start)
|
2017-12-06 19:24:02 +08:00
|
|
|
.quad vectors
|
2020-02-19 03:58:35 +08:00
|
|
|
SYM_DATA_END(__entry_tramp_data_start)
|
2017-12-06 19:24:02 +08:00
|
|
|
.popsection // .rodata
|
|
|
|
#endif /* CONFIG_RANDOMIZE_BASE */
|
2017-11-14 22:07:40 +08:00
|
|
|
#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
|
|
|
|
|
2017-07-26 23:05:20 +08:00
|
|
|
/*
|
|
|
|
* Register switch for AArch64. The callee-saved registers need to be saved
|
|
|
|
* and restored. On entry:
|
|
|
|
* x0 = previous task_struct (must be preserved across the switch)
|
|
|
|
* x1 = next task_struct
|
|
|
|
* Previous and next are guaranteed not to be the same.
|
|
|
|
*
|
|
|
|
*/
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_FUNC_START(cpu_switch_to)
|
2017-07-26 23:05:20 +08:00
|
|
|
mov x10, #THREAD_CPU_CONTEXT
|
|
|
|
add x8, x0, x10
|
|
|
|
mov x9, sp
|
|
|
|
stp x19, x20, [x8], #16 // store callee-saved registers
|
|
|
|
stp x21, x22, [x8], #16
|
|
|
|
stp x23, x24, [x8], #16
|
|
|
|
stp x25, x26, [x8], #16
|
|
|
|
stp x27, x28, [x8], #16
|
|
|
|
stp x29, x9, [x8], #16
|
|
|
|
str lr, [x8]
|
|
|
|
add x8, x1, x10
|
|
|
|
ldp x19, x20, [x8], #16 // restore callee-saved registers
|
|
|
|
ldp x21, x22, [x8], #16
|
|
|
|
ldp x23, x24, [x8], #16
|
|
|
|
ldp x25, x26, [x8], #16
|
|
|
|
ldp x27, x28, [x8], #16
|
|
|
|
ldp x29, x9, [x8], #16
|
|
|
|
ldr lr, [x8]
|
|
|
|
mov sp, x9
|
|
|
|
msr sp_el0, x1
|
2020-04-23 18:16:05 +08:00
|
|
|
ptrauth_keys_install_kernel x1, x8, x9, x10
|
2020-04-28 00:00:16 +08:00
|
|
|
scs_save x0, x8
|
|
|
|
scs_load x1, x8
|
2017-07-26 23:05:20 +08:00
|
|
|
ret
|
2020-02-19 03:58:29 +08:00
|
|
|
SYM_FUNC_END(cpu_switch_to)
|
2017-07-26 23:05:20 +08:00
|
|
|
NOKPROBE(cpu_switch_to)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This is how we return from a fork.
|
|
|
|
*/
|
2020-02-19 03:58:28 +08:00
|
|
|
SYM_CODE_START(ret_from_fork)
|
2017-07-26 23:05:20 +08:00
|
|
|
bl schedule_tail
|
|
|
|
cbz x19, 1f // not a kernel thread
|
|
|
|
mov x0, x20
|
|
|
|
blr x19
|
2019-02-22 17:32:50 +08:00
|
|
|
1: get_current_task tsk
|
2017-07-26 23:05:20 +08:00
|
|
|
b ret_to_user
|
2020-02-19 03:58:28 +08:00
|
|
|
SYM_CODE_END(ret_from_fork)
|
2017-07-26 23:05:20 +08:00
|
|
|
NOKPROBE(ret_from_fork)
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_ARM_SDE_INTERFACE
|
|
|
|
|
|
|
|
#include <asm/sdei.h>
|
|
|
|
#include <uapi/linux/arm_sdei.h>
|
|
|
|
|
2018-01-08 23:38:18 +08:00
|
|
|
.macro sdei_handler_exit exit_mode
|
|
|
|
/* On success, this call never returns... */
|
|
|
|
cmp \exit_mode, #SDEI_EXIT_SMC
|
|
|
|
b.ne 99f
|
|
|
|
smc #0
|
|
|
|
b .
|
|
|
|
99: hvc #0
|
|
|
|
b .
|
|
|
|
.endm
|
|
|
|
|
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
|
|
|
/*
|
|
|
|
* The regular SDEI entry point may have been unmapped along with the rest of
|
|
|
|
* the kernel. This trampoline restores the kernel mapping to make the x1 memory
|
|
|
|
* argument accessible.
|
|
|
|
*
|
|
|
|
* This clobbers x4, __sdei_handler() will restore this from firmware's
|
|
|
|
* copy.
|
|
|
|
*/
|
|
|
|
.ltorg
|
|
|
|
.pushsection ".entry.tramp.text", "ax"
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_START(__sdei_asm_entry_trampoline)
|
2018-01-08 23:38:18 +08:00
|
|
|
mrs x4, ttbr1_el1
|
|
|
|
tbz x4, #USER_ASID_BIT, 1f
|
|
|
|
|
|
|
|
tramp_map_kernel tmp=x4
|
|
|
|
isb
|
|
|
|
mov x4, xzr
|
|
|
|
|
|
|
|
/*
|
arm64: uaccess: remove set_fs()
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 21:15:55 +08:00
|
|
|
* Remember whether to unmap the kernel on exit.
|
2018-01-08 23:38:18 +08:00
|
|
|
*/
|
arm64: uaccess: remove set_fs()
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 21:15:55 +08:00
|
|
|
1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_SDEI_TTBR1)]
|
2018-01-08 23:38:18 +08:00
|
|
|
|
|
|
|
#ifdef CONFIG_RANDOMIZE_BASE
|
|
|
|
adr x4, tramp_vectors + PAGE_SIZE
|
|
|
|
add x4, x4, #:lo12:__sdei_asm_trampoline_next_handler
|
|
|
|
ldr x4, [x4]
|
|
|
|
#else
|
|
|
|
ldr x4, =__sdei_asm_handler
|
|
|
|
#endif
|
|
|
|
br x4
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_END(__sdei_asm_entry_trampoline)
|
2018-01-08 23:38:18 +08:00
|
|
|
NOKPROBE(__sdei_asm_entry_trampoline)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make the exit call and restore the original ttbr1_el1
|
|
|
|
*
|
|
|
|
* x0 & x1: setup for the exit API call
|
|
|
|
* x2: exit_mode
|
|
|
|
* x4: struct sdei_registered_event argument from registration time.
|
|
|
|
*/
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_START(__sdei_asm_exit_trampoline)
|
arm64: uaccess: remove set_fs()
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02 21:15:55 +08:00
|
|
|
ldr x4, [x4, #(SDEI_EVENT_INTREGS + S_SDEI_TTBR1)]
|
2018-01-08 23:38:18 +08:00
|
|
|
cbnz x4, 1f
|
|
|
|
|
|
|
|
tramp_unmap_kernel tmp=x4
|
|
|
|
|
|
|
|
1: sdei_handler_exit exit_mode=x2
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_END(__sdei_asm_exit_trampoline)
|
2018-01-08 23:38:18 +08:00
|
|
|
NOKPROBE(__sdei_asm_exit_trampoline)
|
|
|
|
.ltorg
|
|
|
|
.popsection // .entry.tramp.text
|
|
|
|
#ifdef CONFIG_RANDOMIZE_BASE
|
|
|
|
.pushsection ".rodata", "a"
|
2020-02-19 03:58:35 +08:00
|
|
|
SYM_DATA_START(__sdei_asm_trampoline_next_handler)
|
2018-01-08 23:38:18 +08:00
|
|
|
.quad __sdei_asm_handler
|
2020-02-19 03:58:35 +08:00
|
|
|
SYM_DATA_END(__sdei_asm_trampoline_next_handler)
|
2018-01-08 23:38:18 +08:00
|
|
|
.popsection // .rodata
|
|
|
|
#endif /* CONFIG_RANDOMIZE_BASE */
|
|
|
|
#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
|
|
|
|
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
/*
|
|
|
|
* Software Delegated Exception entry point.
|
|
|
|
*
|
|
|
|
* x0: Event number
|
|
|
|
* x1: struct sdei_registered_event argument from registration time.
|
|
|
|
* x2: interrupted PC
|
|
|
|
* x3: interrupted PSTATE
|
2018-01-08 23:38:18 +08:00
|
|
|
* x4: maybe clobbered by the trampoline
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
*
|
|
|
|
* Firmware has preserved x0->x17 for us, we must save/restore the rest to
|
|
|
|
* follow SMC-CC. We save (or retrieve) all the registers as the handler may
|
|
|
|
* want them.
|
|
|
|
*/
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_START(__sdei_asm_handler)
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
stp x2, x3, [x1, #SDEI_EVENT_INTREGS + S_PC]
|
|
|
|
stp x4, x5, [x1, #SDEI_EVENT_INTREGS + 16 * 2]
|
|
|
|
stp x6, x7, [x1, #SDEI_EVENT_INTREGS + 16 * 3]
|
|
|
|
stp x8, x9, [x1, #SDEI_EVENT_INTREGS + 16 * 4]
|
|
|
|
stp x10, x11, [x1, #SDEI_EVENT_INTREGS + 16 * 5]
|
|
|
|
stp x12, x13, [x1, #SDEI_EVENT_INTREGS + 16 * 6]
|
|
|
|
stp x14, x15, [x1, #SDEI_EVENT_INTREGS + 16 * 7]
|
|
|
|
stp x16, x17, [x1, #SDEI_EVENT_INTREGS + 16 * 8]
|
|
|
|
stp x18, x19, [x1, #SDEI_EVENT_INTREGS + 16 * 9]
|
|
|
|
stp x20, x21, [x1, #SDEI_EVENT_INTREGS + 16 * 10]
|
|
|
|
stp x22, x23, [x1, #SDEI_EVENT_INTREGS + 16 * 11]
|
|
|
|
stp x24, x25, [x1, #SDEI_EVENT_INTREGS + 16 * 12]
|
|
|
|
stp x26, x27, [x1, #SDEI_EVENT_INTREGS + 16 * 13]
|
|
|
|
stp x28, x29, [x1, #SDEI_EVENT_INTREGS + 16 * 14]
|
|
|
|
mov x4, sp
|
|
|
|
stp lr, x4, [x1, #SDEI_EVENT_INTREGS + S_LR]
|
|
|
|
|
|
|
|
mov x19, x1
|
|
|
|
|
2020-04-28 00:00:17 +08:00
|
|
|
#if defined(CONFIG_VMAP_STACK) || defined(CONFIG_SHADOW_CALL_STACK)
|
|
|
|
ldrb w4, [x19, #SDEI_EVENT_PRIORITY]
|
|
|
|
#endif
|
|
|
|
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
#ifdef CONFIG_VMAP_STACK
|
|
|
|
/*
|
|
|
|
* entry.S may have been using sp as a scratch register, find whether
|
|
|
|
* this is a normal or critical event and switch to the appropriate
|
|
|
|
* stack for this CPU.
|
|
|
|
*/
|
|
|
|
cbnz w4, 1f
|
|
|
|
ldr_this_cpu dst=x5, sym=sdei_stack_normal_ptr, tmp=x6
|
|
|
|
b 2f
|
|
|
|
1: ldr_this_cpu dst=x5, sym=sdei_stack_critical_ptr, tmp=x6
|
|
|
|
2: mov x6, #SDEI_STACK_SIZE
|
|
|
|
add x5, x5, x6
|
|
|
|
mov sp, x5
|
|
|
|
#endif
|
|
|
|
|
2020-04-28 00:00:17 +08:00
|
|
|
#ifdef CONFIG_SHADOW_CALL_STACK
|
|
|
|
/* Use a separate shadow call stack for normal and critical events */
|
|
|
|
cbnz w4, 3f
|
2020-12-01 07:34:42 +08:00
|
|
|
ldr_this_cpu dst=scs_sp, sym=sdei_shadow_call_stack_normal_ptr, tmp=x6
|
2020-04-28 00:00:17 +08:00
|
|
|
b 4f
|
2020-12-01 07:34:42 +08:00
|
|
|
3: ldr_this_cpu dst=scs_sp, sym=sdei_shadow_call_stack_critical_ptr, tmp=x6
|
2020-04-28 00:00:17 +08:00
|
|
|
4:
|
|
|
|
#endif
|
|
|
|
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
/*
|
|
|
|
* We may have interrupted userspace, or a guest, or exit-from or
|
|
|
|
* return-to either of these. We can't trust sp_el0, restore it.
|
|
|
|
*/
|
|
|
|
mrs x28, sp_el0
|
|
|
|
ldr_this_cpu dst=x0, sym=__entry_task, tmp=x1
|
|
|
|
msr sp_el0, x0
|
|
|
|
|
|
|
|
/* If we interrupted the kernel point to the previous stack/frame. */
|
|
|
|
and x0, x3, #0xc
|
|
|
|
mrs x1, CurrentEL
|
|
|
|
cmp x0, x1
|
|
|
|
csel x29, x29, xzr, eq // fp, or zero
|
|
|
|
csel x4, x2, xzr, eq // elr, or zero
|
|
|
|
|
|
|
|
stp x29, x4, [sp, #-16]!
|
|
|
|
mov x29, sp
|
|
|
|
|
|
|
|
add x0, x19, #SDEI_EVENT_INTREGS
|
|
|
|
mov x1, x19
|
|
|
|
bl __sdei_handler
|
|
|
|
|
|
|
|
msr sp_el0, x28
|
|
|
|
/* restore regs >x17 that we clobbered */
|
2018-01-08 23:38:18 +08:00
|
|
|
mov x4, x19 // keep x4 for __sdei_asm_exit_trampoline
|
|
|
|
ldp x28, x29, [x4, #SDEI_EVENT_INTREGS + 16 * 14]
|
|
|
|
ldp x18, x19, [x4, #SDEI_EVENT_INTREGS + 16 * 9]
|
|
|
|
ldp lr, x1, [x4, #SDEI_EVENT_INTREGS + S_LR]
|
|
|
|
mov sp, x1
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
|
|
|
|
mov x1, x0 // address to complete_and_resume
|
|
|
|
/* x0 = (x0 <= 1) ? EVENT_COMPLETE:EVENT_COMPLETE_AND_RESUME */
|
|
|
|
cmp x0, #1
|
|
|
|
mov_q x2, SDEI_1_0_FN_SDEI_EVENT_COMPLETE
|
|
|
|
mov_q x3, SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME
|
|
|
|
csel x0, x2, x3, ls
|
|
|
|
|
|
|
|
ldr_l x2, sdei_exit_mode
|
2018-01-08 23:38:18 +08:00
|
|
|
|
|
|
|
alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
|
|
|
|
sdei_handler_exit exit_mode=x2
|
|
|
|
alternative_else_nop_endif
|
|
|
|
|
|
|
|
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
|
|
|
|
tramp_alias dst=x5, sym=__sdei_asm_exit_trampoline
|
|
|
|
br x5
|
|
|
|
#endif
|
2020-02-19 03:58:40 +08:00
|
|
|
SYM_CODE_END(__sdei_asm_handler)
|
arm64: kernel: Add arch-specific SDEI entry code and CPU masking
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-08 23:38:12 +08:00
|
|
|
NOKPROBE(__sdei_asm_handler)
|
|
|
|
#endif /* CONFIG_ARM_SDE_INTERFACE */
|