The ARM errata 819472, 826319, 827319 and 824069 for affected
Cortex-A53 cores demand to promote "dc cvau" instructions to
"dc civac". Since we allow userspace to also emit those instructions,
we should make sure that "dc cvau" gets promoted there too.
So lets grasp the nettle here and actually trap every userland cache
maintenance instruction once we detect at least one affected core in
the system.
We then emulate the instruction by executing it on behalf of userland,
promoting "dc cvau" to "dc civac" on the way and injecting access
fault back into userspace.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[catalin.marinas@arm.com: s/set_segfault/arm64_notify_segfault/]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The code for injecting a signal into userland if a trapped instruction
fails emulation due to a _userland_ error (like an illegal address)
will be used more often with the next patch.
Factor out the core functionality into a separate function and use
that both for the existing trap handler and for the deprecated
instructions emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[catalin.marinas@arm.com: s/set_segfault/arm64_notify_segfault/]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we call the (optional) enable function for CPU _features_
only. As CPU _errata_ descriptions share the same data structure and
having an enable function is useful for errata as well (for instance
to set bits in SCTLR), lets call it when enumerating erratas too.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As Kees Cook notes in the ARM counterpart of this patch [0]:
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext.
In particular, Kees is referring to the HARDENED_USERCOPY patch set [1],
which rejects attempts to call copy_to_user() on kernel ranges containing
executable code, but does allow access to the .rodata segment. Regardless
of whether one may or may not agree with the distinction, it makes sense
for _etext to have the same meaning across architectures.
So let's put _etext where it belongs, between .text and .rodata, and fix
up existing references to use __init_begin instead, which unlike _end_rodata
includes the exception and notes sections as well.
The _etext references in kaslr.c are left untouched, since its references
to [_stext, _etext) are meant to capture potential jump instruction targets,
and so disregarding .rodata is actually an improvement here.
[0] http://article.gmane.org/gmane.linux.kernel/2245084
[1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To aid in debugging kexec problems or when adding new functionality to
kexec add a new routine kexec_image_info() and several inline pr_debug
statements.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to the
arm64 architecture that add support for the kexec re-boot mechanism
(CONFIG_KEXEC) on arm64 platforms.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: James Morse <james.morse@arm.com>
[catalin.marinas@arm.com: removed dead code following James Morse's comments]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 68234df4ea ("arm64: kill flush_cache_all()") removed the global
arm64 routines cpu_reset() and cpu_soft_restart() needed by the arm64
kexec and kdump support. Add back a simplified version of
cpu_soft_restart() with some changes needed for kexec in the new files
cpu_reset.S, and cpu_reset.h.
When a CPU is reset it needs to be put into the exception level it had when
it entered the kernel. Update cpu_soft_restart() to accept an argument
which signals if the reset address should be entered at EL1 or EL2, and
add a new hypercall HVC_SOFT_RESTART which is used for the EL2 switch.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
kernel/smp.c has a fancy counter that keeps track of the number of CPUs
it marked as not-present and left in cpu_park_loop(). If there are any
CPUs spinning in here, features like kexec or hibernate may release them
by overwriting this memory.
This problem also occurs on machines using spin-tables to release
secondary cores.
After commit 44dbcc93ab ("arm64: Fix behavior of maxcpus=N")
we bring all known cpus into the secondary holding pen, meaning this
memory can't be re-used by kexec or hibernate.
Add a function cpus_are_stuck_in_kernel() to determine if either of these
cases have occurred.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[catalin.marinas@arm.com: cherry-picked from mainline for kexec dependency]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing
instruction aborts from data aborts, but this bit is architecturally
RES0 for instruction aborts, and could be allocated for an arbitrary
purpose in future. Additionally, we hard-code the value in entry.S
without the mnemonic, making the code difficult to understand.
Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr,
which we already pass to the sole use of ESR_LNX_EXEC. A new helper,
is_el0_instruction_abort() is added to make the logic clear. Any
instruction aborts taken from EL1 will already have been handled by
bad_mode, so we need not handle that case in the helper.
For consistency, the existing permission_fault helper is renamed to
is_permission_fault, and the return type is changed to bool. There
should be no functional changes as the return value was a boolean
expression, and the result is only used in another boolean expression.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Several places open-code extraction of the EC field from an ESR_ELx
value, in subtly different ways. This is unfortunate duplication and
variation, and the precise logic used to extract the field is a
distraction.
This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from
an ESR_ELx value in a consistent fashion.
Existing open-coded extractions in core arm64 code are moved over to the
new helper. KVM code is left as-is for the moment.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Huang Shijie <shijie.huang@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently dump_mem attempts to dump memory in 64-bit chunks when
reporting a failure in 64-bit code, or 32-bit chunks when reporting a
failure in 32-bit code. We added code to handle these two cases
separately in commit e147ae6d7f ("arm64: modify the dump mem for
64 bit addresses").
However, in all cases dump_mem is called, the failing context is a
kernel rather than user context. Additionally dump_mem is assumed to
only be used for kernel contexts, as internally it switches to
KERNEL_DS, and its callers pass kernel stack bounds.
This patch removes the redundant 32-bit chunk logic and associated
compat parameter, largely reverting the aforementioned commit. For the
call in __die(), the check of in_interrupt() is removed also, as __die()
is only called in response to faults from the kernel's exception level,
and thus the !user_mode(regs) check is sufficient. Were this not the
case, the used of task_stack_page(tsk) to generate the stack bounds
would be erroneous.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The upstream commit 1771c6e1a5
("x86/kasan: instrument user memory access API") added KASAN instrument to
x86 user memory access API, so added such instrument to ARM64 too.
Define __copy_to/from_user in C in order to add kasan_check_read/write call,
rename assembly implementation to __arch_copy_to/from_user.
Tested by test_kasan module.
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Current versions of gdb do not interoperate cleanly with kgdb on arm64
systems because gdb and kgdb do not use the same register description.
This patch modifies kgdb to work with recent releases of gdb (>= 7.8.1).
Compatibility with gdb (after the patch is applied) is as follows:
gdb-7.6 and earlier Ok
gdb-7.7 series Works if user provides custom target description
gdb-7.8(.0) Works if user provides custom target description
gdb-7.8.1 and later Ok
When commit 44679a4f14 ("arm64: KGDB: Add step debugging support") was
introduced it was paired with a gdb patch that made an incompatible
change to the gdbserver protocol. This patch was eventually merged into
the gdb sources:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=a4d9ba85ec5597a6a556afe26b712e878374b9dd
The change to the protocol was mostly made to simplify big-endian support
inside the kernel gdb stub. Unfortunately the gdb project released
gdb-7.7.x and gdb-7.8.0 before the protocol incompatibility was identified
and reversed:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bdc144174bcb11e808b4e73089b850cf9620a7ee
This leaves us in a position where kgdb still uses the no-longer-used
protocol; gdb-7.8.1, which restored the original behaviour, was
released on 2014-10-29.
I don't believe it is possible to detect/correct the protocol
incompatiblity which means the kernel must take a view about which
version of the gdb remote protocol is "correct". This patch takes the
view that the original/current version of the protocol is correct
and that version found in gdb-7.7.x and gdb-7.8.0 is anomalous.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If the kernel is set to show unhandled signals, and a user task does not
handle a SIGILL as a result of an instruction abort, we will attempt to
log the offending instruction with dump_instr before killing the task.
We use dump_instr to log the encoding of the offending userspace
instruction. However, dump_instr is also used to dump instructions from
kernel space, and internally always switches to KERNEL_DS before dumping
the instruction with get_user. When both PAN and UAO are in use, reading
a user instruction via get_user while in KERNEL_DS will result in a
permission fault, which leads to an Oops.
As we have regs corresponding to the context of the original instruction
abort, we can inspect this and only flip to KERNEL_DS if the original
abort was taken from the kernel, avoiding this issue. At the same time,
remove the redundant (and incorrect) comments regarding the order
dump_mem and dump_instr are called in.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org> #4.6+
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Fixes: 57f4959bad ("arm64: kernel: Add support for User Access Override")
Signed-off-by: Will Deacon <will.deacon@arm.com>
If we take an exception we don't expect (e.g. SError), we report this in
the bad_mode handler with pr_crit. Depending on the configured log
level, we may or may not log additional information in functions called
subsequently. Notably, the messages in dump_stack (including the CPU
number) are printed with KERN_DEFAULT and may not appear.
Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and
knowing the CPU number is crucial to correctly decode them. To ensure
that this is always possible, we should log the CPU number along with
the ESR_ELx value, so we are not reliant on subsequent logs or
additional printk configuration options.
This patch logs the CPU number in bad_mode such that it is possible for
a developer to decode these exceptions, provided access to sufficient
documentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Al Grant <Al.Grant@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with
the 32-bit ARM one by providing an additional line:
model name : ARMv8 Processor rev X (v8l)
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull perf updates from Ingo Molnar:
"Mostly tooling and PMU driver fixes, but also a number of late updates
such as the reworking of the call-chain size limiting logic to make
call-graph recording more robust, plus tooling side changes for the
new 'backwards ring-buffer' extension to the perf ring-buffer"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
perf record: Read from backward ring buffer
perf record: Rename variable to make code clear
perf record: Prevent reading invalid data in record__mmap_read
perf evlist: Add API to pause/resume
perf trace: Use the ptr->name beautifier as default for "filename" args
perf trace: Use the fd->name beautifier as default for "fd" args
perf report: Add srcline_from/to branch sort keys
perf evsel: Record fd into perf_mmap
perf evsel: Add overwrite attribute and check write_backward
perf tools: Set buildid dir under symfs when --symfs is provided
perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
perf annotate: Sort list of recognised instructions
perf annotate: Fix identification of ARM blt and bls instructions
perf tools: Fix usage of max_stack sysctl
perf callchain: Stop validating callchains by the max_stack sysctl
perf trace: Fix exit_group() formatting
perf top: Use machine->kptr_restrict_warned
perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
perf machine: Do not bail out if not managing to read ref reloc symbol
perf/x86/intel/p4: Trival indentation fix, remove space
...
most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages. If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving. Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Andy Lutomirski <luto@amacapital.net> [x86 vdso]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.
This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.
[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
User visible:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXOn7eAAoJENZQFvNTUqpAsOAP/3f/XJekPQAnMcKRBp2noCuj
nRu1kBltVJyP8iOU5PKSJwel4F9ykNNMl+/rzzxHDo13IM8uc+HnZOJZ6e9mJIJ1
xqjdqM4EDlYYoFApJzCjTK6CMlevCazosdQT1bbmMDYVPc2uQR/GnutFrzqf/Plg
hEougIGtfrdy85g95CRdxpy2yMwDK4EwsiDRm9ib1hnuamQZl97buWemBVqSJmLY
p82E2aMU5Fv5+B8AO4I7V88ZmgpmryjxpM+LjffgNUDSKsSHrlG4NiQ3znV1bgst
Rc++w78+qxoIozOu6/IX8eSI2L/1eyM/yQ6Qre0KuvYXCl+NopTAYSSJlaA4tyHF
c55z7HucuyATN3PrFRHlbWUT/RMIVC0j0lnZOc7SJLl90hJQ+nv0iZcbYwMbeHu1
3LGlcd9jDwQYiClbaT9ATxZJ8B9An0/k/HJdatbAHN0wRomP2Ozz/qD2nmEbUwpV
sCyLOo/LJkvVkuUjSg6ZiOArNIk4iTSPSAUV+SAL6YOEOZMAX5ISUJQ174+zFC9a
gqtVsCXvwLIsndXb8ys1r9/fit/MUci0OzKX3SG1K765+E4Bk23KcAgMNbM/a7lp
ZmHDXMC+yBYcnYNnaxkp7c55CWUlKGOeR4e+KmB99KoeIleYgPhD2UM5beo61TmN
yUEPtiiFiZmTRkiAu83R
=7OdF
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-20160516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Honour the kernel.perf_event_max_stack knob more precisely by not counting
PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
- Store vdso buildid unconditionally, as it appears in callchains and
we're not checking those when creating the build-id table, so we
end up not being able to resolve VDSO symbols when doing analysis
on a different machine than the one where recording was done, possibly
of a different arch even (arm -> x86_64) (He Kuang)
Infrastructure changes:
- Generalize max_stack sysctl handler, will be used for configuring
multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
Cleanups:
- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
open coded strings (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.
This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Support for the PMU in Broadcom's Vulcan CPU
- Dynamic event detection using the PMCEIDn_EL0 ID registers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJXNhyFAAoJELescNyEwWM0A0gIAL52VQid16PvLgEO4g6mzv5B
S1ef/y45342R/DYczcUSFboMPuqYSxZ/i7dCwpvLUKX/YjyqQrrGvvS4IYOS99Mp
/OAcf8eTyzzVpJiGQetta3q20gNHGXOxd48R1zcgt+bbEax89lyHQul0A8+rPWLq
RZhEI6Hcq9fb70AjXjWvDxdbbJhtDKc8BGuptygOEqc8LO3mrb1J60TclU629XOH
Jn4Vdu5f6Rx8hPFdw5HXn+Vdheymphz0qj1lyGCQS4Am97bM5J/54a/A4tyHnHuQ
s9Y26NIAvrktp9wCMlXGQhYL94e1rZowXCWxF98D9XrlIzYORIdf/OZ5DCS8LCA=
=0ntf
-----END PGP SIGNATURE-----
Merge tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 perf updates from Will Deacon:
"The main addition here is support for Broadcom's Vulcan core using the
architected ID registers for discovering supported events.
- Support for the PMU in Broadcom's Vulcan CPU
- Dynamic event detection using the PMCEIDn_EL0 ID registers"
* tag 'arm64-perf' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: perf: don't expose CHAIN event in sysfs
arm64/perf: Add Broadcom Vulcan PMU support
arm64/perf: Filter common events based on PMCEIDn_EL0
arm64/perf: Access pmu register using <read/write>_sys_reg
arm64/perf: Define complete ARMv8 recommended implementation defined events
arm64/perf: Changed events naming as per the ARM ARM
arm64: dts: Add Broadcom Vulcan PMU in dts
Documentation: arm64: pmu: Add Broadcom Vulcan PMU binding
- virt_to_page/page_address optimisations
- Support for NUMA systems described using device-tree
- Support for hibernate/suspend-to-disk
- Proper support for maxcpus= command line parameter
- Detection and graceful handling of AArch64-only CPUs
- Miscellaneous cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJXNbgkAAoJELescNyEwWM0PtcIAK11xaOMmSqXz8fcTeNLw4dS
taaPWhjCYus8EhJyvTetfwk74+qVApdvKXKNKgODJXQEjeQx2brdUfbQZb31DTGT
798UYCAyEYCWkXspqi+/dpZEgUGPYH7uGOu2eDd19+PhTeX/EQSRX3fC9k0BNhvh
PN9pOgRcKAlIExZ6QYmT0g56VLtbCfFShN41mQ8HdpShl6pPJuhQ+kDDzudmRjuD
11/oYuOaVTnwbPuXn+sjOrWvMkfINHI70BAQnnBs0v+5c45mzpqEMsy0dYo2Pl2m
ar5lUFVIZggQkiqcOzqBzEgF+4gNw4LUu1DgK6cNKNMtL6k8E9zeOZMWeSVr0lg=
=bT5E
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- virt_to_page/page_address optimisations
- support for NUMA systems described using device-tree
- support for hibernate/suspend-to-disk
- proper support for maxcpus= command line parameter
- detection and graceful handling of AArch64-only CPUs
- miscellaneous cleanups and non-critical fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
arm64: do not enforce strict 16 byte alignment to stack pointer
arm64: kernel: Fix incorrect brk randomization
arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
arm64: secondary_start_kernel: Remove unnecessary barrier
arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
arm64: Replace hard-coded values in the pmd/pud_bad() macros
arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
arm64: Fix typo in the pmdp_huge_get_and_clear() definition
arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL
arm64: always use STRICT_MM_TYPECHECKS
arm64: kvm: Fix kvm teardown for systems using the extended idmap
arm64: kaslr: increase randomization granularity
arm64: kconfig: drop CONFIG_RTC_LIB dependency
arm64: make ARCH_SUPPORTS_DEBUG_PAGEALLOC depend on !HIBERNATION
arm64: hibernate: Refuse to hibernate if the boot cpu is offline
arm64: kernel: Add support for hibernate/suspend-to-disk
PM / Hibernate: Call flush_icache_range() on pages restored in-place
arm64: Add new asm macro copy_page
arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
arm64: kernel: Include _AC definition in page.h
...
Pull perf updates from Ingo Molnar:
"Bigger kernel side changes:
- Add backwards writing capability to the perf ring-buffer code,
which is preparation for future advanced features like robust
'overwrite support' and snapshot mode. (Wang Nan)
- Add pause and resume ioctls for the perf ringbuffer (Wang Nan)
- x86 Intel cstate code cleanups and reorgnization (Thomas Gleixner)
- x86 Intel uncore and CPU PMU driver updates (Kan Liang, Peter
Zijlstra)
- x86 AUX (Intel PT) related enhancements and updates (Alexander
Shishkin)
- x86 MSR PMU driver enhancements and updates (Huang Rui)
- ... and lots of other changes spread out over 40+ commits.
Biggest tooling side changes:
- 'perf trace' features and enhancements. (Arnaldo Carvalho de Melo)
- BPF tooling updates (Wang Nan)
- 'perf sched' updates (Jiri Olsa)
- 'perf probe' updates (Masami Hiramatsu)
- ... plus 200+ other enhancements, fixes and cleanups to tools/
The merge commits, the shortlog and the changelogs contain a lot more
details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (249 commits)
perf/core: Disable the event on a truncated AUX record
perf/x86/intel/pt: Generate PMI in the STOP region as well
perf buildid-cache: Use lsdir() for looking up buildid caches
perf symbols: Use lsdir() for the search in kcore cache directory
perf tools: Use SBUILD_ID_SIZE where applicable
perf tools: Fix lsdir to set errno correctly
perf trace: Move seccomp args beautifiers to tools/perf/trace/beauty/
perf trace: Move flock op beautifier to tools/perf/trace/beauty/
perf build: Add build-test for debug-frame on arm/arm64
perf build: Add build-test for libunwind cross-platforms support
perf script: Fix export of callchains with recursion in db-export
perf script: Fix callchain addresses in db-export
perf script: Fix symbol insertion behavior in db-export
perf symbols: Add dso__insert_symbol function
perf scripting python: Use Py_FatalError instead of die()
perf tools: Remove xrealloc and ALLOC_GROW
perf help: Do not use ALLOC_GROW in add_cmd_list
perf pmu: Make pmu_formats_string to check return value of strbuf
perf header: Make topology checkers to check return value of strbuf
perf tools: Make alias handler to check return value of strbuf
...
copy_thread should not be enforcing 16 byte aligment and returning
-EINVAL. Other architectures trap misaligned stack access with SIGBUS
so arm64 should follow this convention, so remove the strict enforcement
check.
For example, currently clone(2) fails with -EINVAL when passing
a misaligned stack and this gives little clue to what is wrong. Instead,
it is arguable that a SIGBUS on the fist access to a misaligned stack
allows one to figure out that it is a misaligned stack issue rather
than trying to figure out why an unconventional (and undocumented)
-EINVAL is being returned.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This fixes two issues with the arm64 brk randomziation. First, the
STACK_RND_MASK was being used incorrectly. The original code was:
unsigned long range_end = base + (STACK_RND_MASK << PAGE_SHIFT) + 1;
STACK_RND_MASK is 0x7ff (32-bit) or 0x3ffff (64-bit), with 4K pages where
PAGE_SHIFT is 12:
#define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \
0x7ff >> (PAGE_SHIFT - 12) : \
0x3ffff >> (PAGE_SHIFT - 12))
This means the resulting offset from base would be 0x7ff0001 or 0x3ffff0001,
which is wrong since it creates an unaligned end address. It was likely
intended to be:
unsigned long range_end = base + ((STACK_RND_MASK + 1) << PAGE_SHIFT)
Which would result in offsets of 0x800000 (32-bit) and 0x40000000 (64-bit).
However, even this corrected 32-bit compat offset (0x00800000) is much
smaller than native ARM's brk randomization value (0x02000000):
unsigned long arch_randomize_brk(struct mm_struct *mm)
{
unsigned long range_end = mm->brk + 0x02000000;
return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}
So, instead of basing arm64's brk randomization on mistaken STACK_RND_MASK
calculations, just use specific corrected values for compat (0x2000000)
and native arm64 (0x40000000).
Reviewed-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
[will: use is_compat_task() as suggested by tixy]
Signed-off-by: Will Deacon <will.deacon@arm.com>
The loop that browses the array compat_hwcap_str will stop when a NULL
is encountered, however NULL is missing at the end of array. This will
lead to overrun until a NULL is found somewhere in the following memory.
In reality, this works out because the compat_hwcap2_str array tends to
follow immediately in memory, and that *is* terminated correctly.
Furthermore, the unsigned int compat_elf_hwcap is checked before
printing each capability, so we end up doing the right thing because
the size of the two arrays is less than 32. Still, this is an obvious
mistake and should be fixed.
Note for backporting: commit 12d11817ea ("arm64: Move
/proc/cpuinfo handling code") moved this code in v4.4. Prior to that
commit, the same change should be made in arch/arm64/kernel/setup.c.
Fixes: 44b82b7700 "arm64: Fix up /proc/cpuinfo"
Cc: <stable@vger.kernel.org> # v3.19+ (but see note above prior to v4.4)
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Remove the unnecessary smp_wmb(), which was added to make sure
that the update_cpu_boot_status() completes before we mark the
CPU online. But update_cpu_boot_status() already has dsb() (required
for the failing CPUs) to ensure the correct behavior.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hibernation represents a system state save/restore through
a system reboot; this implies that the logical cpus carrying
out hibernation/thawing must be the same, so that the context
saved in the snapshot image on hibernation is consistent with
the state of the system on resume. If resume from hibernation
is driven through kernel command line parameter, the cpu responsible
for thawing the system will be whatever CPU firmware boots the system
on upon cold-boot (ie logical cpu 0); this means that in order to
keep system context consistent between the hibernate snapshot image
and system state on kernel resume from hibernate, logical cpu 0 must
be online on hibernation and must be the logical cpu that creates
the snapshot image.
This patch adds a PM notifier that enforces logical cpu 0 is online
when the hibernation is started (and prevents hibernation if it is
not), which is sufficient to guarantee it will be the one creating
the snapshot image therefore providing the resume cpu a consistent
snapshot of the system to resume to.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add support for hibernate/suspend-to-disk.
Suspend borrows code from cpu_suspend() to write cpu state onto the stack,
before calling swsusp_save() to save the memory image.
Restore creates a set of temporary page tables, covering only the
linear map, copies the restore code to a 'safe' page, then uses the copy to
restore the memory image. The copied code executes in the lower half of the
address space, and once complete, restores the original kernel's page
tables. It then calls into cpu_resume(), and follows the normal
cpu_suspend() path back into the suspend code.
To restore a kernel using KASLR, the address of the page tables, and
cpu_resume() are stored in the hibernate arch-header and the el2
vectors are pivotted via the 'safe' page in low memory.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Kevin Hilman <khilman@baylibre.com> # Tested on Juno R2
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
KERNEL_START and KERNEL_END are useful outside head.S, move them to a
header file.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
By enabling the MMU early in cpu_resume(), the sleep_save_sp and stack can
be accessed by VA, which avoids the need to convert-addresses and clean to
PoC on the suspend path.
MMU setup is shared with the boot path, meaning the swapper_pg_dir is
restored directly: ttbr1_el1 is no longer saved/restored.
struct sleep_save_sp is removed, replacing it with a single array of
pointers.
cpu_do_{suspend,resume} could be further reduced to not restore: cpacr_el1,
mdscr_el1, tcr_el1, vbar_el1 and sctlr_el1, all of which are set by
__cpu_setup(). However these values all contain res0 bits that may be used
to enable future features.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hibernate could make use of the cpu_suspend() code to save/restore cpu
state, however it needs to be able to return '0' from the 'finisher'.
Rework cpu_suspend() so that the finisher is called from C code,
independently from the save/restore of cpu state. Space to save the context
in is allocated in the caller's stack frame, and passed into
__cpu_suspend_enter().
Hibernate's use of this API will look like a copy of the cpu_suspend()
function.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A later patch implements kvm_arch_hardware_disable(), to remove kvm
from el2, and re-instate the hyp-stub.
This can happen while guests are running, particularly when kvm_reboot()
calls kvm_arch_hardware_disable() on each cpu. This can interrupt a guest,
remove kvm, then allow the guest to be scheduled again. This causes
kvm_call_hyp() to be run against the hyp-stub.
Change the hyp-stub to return a new exception type when this happens,
and add code to kvm's handle_exit() to tell userspace we failed to
enter the guest.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The existing arm64 hcall implementations are limited in that they only
allow for two distinct hcalls; with the x0 register either zero or not
zero. Also, the API of the hyp-stub exception vector routines and the
KVM exception vector routines differ; hyp-stub uses a non-zero value in
x0 to implement __hyp_set_vectors, whereas KVM uses it to implement
kvm_call_hyp.
To allow for additional hcalls to be defined and to make the arm64 hcall
API more consistent across exception vector routines, change the hcall
implementations to reserve all x0 values below 0xfff for hcalls such
as {s,g}et_vectors().
Define two new preprocessor macros HVC_GET_VECTORS, and HVC_SET_VECTORS
to be used as hcall type specifiers and convert the existing
__hyp_get_vectors() and __hyp_set_vectors() routines to use these new
macros when executing an HVC call. Also, change the corresponding
hyp-stub and KVM el1_sync exception vector routines to use these new
macros.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Merged two hcall patches, moved immediate value from esr to x0, use lr
as a scratch register, changed limit to 0xfff]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Today the 'hvc' calling KVM or the hyp-stub is expected to preserve all
registers. KVM saves/restores the registers it needs on the EL2 stack using
do_el2_call(). The hyp-stub has no stack, later patches need to be able to
be able to clobber the link register.
Move the link register save/restore to the the call sites.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Unlike on 32-bit ARM, where we need to pass the stub's version of struct
screen_info to the kernel proper via a configuration table, on 64-bit ARM
it simply involves making the core kernel's copy of struct screen_info
visible to the stub by exposing an __efistub_ alias for it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-21-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recent UEFI versions expose permission attributes for runtime services
memory regions, either in the UEFI memory map or in the separate memory
attributes table. This allows the kernel to map these regions with
stricter permissions, rather than the RWX permissions that are used by
default. So wire this up in our mapping routine.
Note that in the absence of permission attributes, we still only map
regions of type EFI_RUNTIME_SERVICE_CODE with the executable bit set.
Also, we base the mapping attributes of EFI_MEMORY_MAPPED_IO on the
type directly rather than on the absence of the EFI_MEMORY_WB attribute.
This is more correct, but is also required for compatibility with the
upcoming support for the Memory Attributes Table, which only carries
permission attributes, not memory type attributes.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-12-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.
And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.
The new file is:
# cat /proc/sys/kernel/perf_event_max_stack
127
Chaging it:
# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256
But as soon as there is some event using callchains we get:
# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#
Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If both ACPI and DT platform descriptions are available, and the
kernel was configured at build time to support both flavours, the
default policy is to prefer DT over ACPI, and preferring ACPI over
DT while still allowing DT as a fallback is not possible.
Since some enterprise features (such as RAS) depend on ACPI, it may
be desirable for, e.g., distro installers to prefer ACPI boot but
fall back to DT rather than failing completely if no ACPI tables are
available.
So introduce the 'acpi=on' kernel command line parameter for arm64,
which signifies that ACPI should be used if available, and DT should
only be used as a fallback.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When booting a relocatable kernel image, there is no practical reason
to refuse an image whose load address is not exactly TEXT_OFFSET bytes
above a 2 MB aligned base address, as long as the physical and virtual
misalignment with respect to the swapper block size are equal, and are
both aligned to THREAD_SIZE.
Since the virtual misalignment is under our control when we first enter
the kernel proper, we can simply choose its value to be equal to the
physical misalignment.
So treat the misalignment of the physical load address as the initial
KASLR offset, and fix up the remaining code to deal with that.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For historical reasons, the kernel Image must be loaded into physical
memory at a 512 KB offset above a 2 MB aligned base address. The region
between the base address and the start of the kernel Image has no
significance to the kernel itself, but it is currently mapped explicitly
into the early kernel VMA range for all translation granules.
In some cases (i.e., 4 KB granule), this is unavoidable, due to the 2 MB
granularity of the early kernel mappings. However, in other cases, e.g.,
when running with larger page sizes, or in the future, with more granular
KASLR, there is no reason to map it explicitly like we do currently.
So update the logic so that the region is mapped only if that happens as
a side effect of rounding the start address of the kernel to swapper block
size, and leave it unmapped otherwise.
Since the symbol kernel_img_size now simply resolves to the memory
footprint of the kernel Image, we can drop its definition from image.h
and opencode its calculation.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When building a relocatable kernel, we currently rely on the fact that
early 64-bit literal loads need to be deferred to after the relocation
has been performed only if they involve symbol references, and not if
they involve assemble time constants. While this is not an unreasonable
assumption to make, it is better to switch to movk/movz sequences, since
these are guaranteed to be resolved at link time, simply because there are
no dynamic relocation types to describe them.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Refactor the relocation processing so that the code executes from the
ID map while accessing the relocation tables via the virtual mapping.
This way, we can use literals containing virtual addresses as before,
instead of having to use convoluted absolute expressions.
For symmetry with the secondary code path, the relocation code and the
subsequent jump to the virtual entry point are implemented in a function
called __primary_switch(), and __mmap_switched() is renamed to
__primary_switched(). Also, the call sequence in stext() is aligned with
the one in secondary_startup(), by replacing the awkward 'adr_l lr' and
'b cpu_setup' sequence with a simple branch and link.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can simply use a relocated 64-bit literal to store the address of
__secondary_switched(), and the relocation code will ensure that it
holds the correct value at secondary entry time, as long as we make sure
that the literal is not dereferenced until after we have enabled the MMU.
So jump via a small __secondary_switch() function covered by the ID map
that performs the literal load and branch-to-register.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This unexports some symbols from head.S that are only used locally.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
maxcpu=n sets the number of CPUs activated at boot time to a max of n,
but allowing the remaining CPUs to be brought up later if the user
decides to do so. However, on arm64 due to various reasons, we disallowed
hotplugging CPUs beyond n, by marking them not present. Now that
we have checks in place to make sure the hotplugged CPUs have compatible
features with system and requires no new errata, relax the restriction.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CPU Errata work arounds are detected and applied to the
kernel code at boot time and the data is then freed up.
If a new hotplugged CPU requires a work around which
was not applied at boot time, there is nothing we can
do but simply fail the booting.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>