When cross compiling x86 on an ARM machine with clang, there are several
errors along the lines of:
arch/x86/include/asm/string_64.h:27:10: error: invalid output constraint '=&c' in asm
This happens because the compressed boot Makefile reassigns KBUILD_CFLAGS
and drops the clang flags that set the target architecture ('--target=')
and the path to the GNU cross tools ('--prefix='), meaning that the host
architecture is targeted.
These flags are available as $(CLANG_FLAGS) from the main Makefile so
add them to the compressed boot folder's KBUILD_CFLAGS so that cross
compiling works as expected.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20210326000435.4785-3-nathan@kernel.org
GCC gets confused by the comparison of a pointer to an integer literal,
with the assumption that this is an offset from a NULL pointer and that
dereferencing it is invalid:
In file included from arch/x86/boot/compressed/misc.c:18:
In function ‘parse_elf’,
inlined from ‘extract_kernel’ at arch/x86/boot/compressed/misc.c:442:2:
arch/x86/boot/compressed/../string.h:15:23: error: ‘__builtin_memcpy’ reading 64 bytes from a region of size 0 [-Werror=stringop-overread]
15 | #define memcpy(d,s,l) __builtin_memcpy(d,s,l)
| ^~~~~~~~~~~~~~~~~~~~~~~
arch/x86/boot/compressed/misc.c:283:9: note: in expansion of macro ‘memcpy’
283 | memcpy(&ehdr, output, sizeof(ehdr));
| ^~~~~~
I could not find any good workaround for this, but as this is only
a warning for a failure during early boot, removing the line entirely
works around the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Sebor <msebor@gmail.com>
Link: https://lore.kernel.org/r/20210322160253.4032422-2-arnd@kernel.org
Fix another ~42 single-word typos in arch/x86/ code comments,
missed a few in the first pass, in particular in .S files.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
Fix the following coccicheck warnings:
./arch/x86/boot/compressed/kaslr.c:642:10-11: WARNING: return of 0/1 in
function 'process_mem_region' with return type bool.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1615283963-67277-1-git-send-email-jiapeng.chong@linux.alibaba.com
There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-9-joro@8bytes.org
Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-8-joro@8bytes.org
The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.
But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.
Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-7-joro@8bytes.org
Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-6-joro@8bytes.org
This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-5-joro@8bytes.org
Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-4-joro@8bytes.org
A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.
Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.
This handles the following cases:
1) SEV(-ES) guest where CPUID intercept is disabled. The guest
will still see leaf 0x8000001f and the SEV bit. It can
retrieve the C-bit and boot normally.
2) Non-encrypted guests with intercepted CPUID will check
the SEV_STATUS MSR and find it 0 and will try to enable SME.
This will fail when the guest finds MSR_K8_SYSCFG to be zero,
as it is emulated by KVM. But we can't rely on that, as there
might be other hypervisors which return this MSR with bit
23 set. The Hypervisor bit check will prevent that the guest
tries to enable SME in this case.
3) Non-encrypted guests on SEV capable hosts with CPUID intercept
disabled (by a malicious hypervisor) will try to boot into
the SME path. This will fail, but it is also not considered
a problem because non-encrypted guests have no protection
against the hypervisor anyway.
[ bp: s/non-SEV/non-encrypted/g ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20210312123824.306-3-joro@8bytes.org
Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210312123824.306-2-joro@8bytes.org
- Don't move BSS section around pointlessly in the x86 decompressor
- Refactor helper for discovering the EFI secure boot mode
- Wire up EFI secure boot to IMA for arm64
- Some fixes for the capsule loader
- Expose the RT_PROP table via the EFI test module
- Relax DT and kernel placement restrictions on ARM
+ followup fixes:
- fix the build breakage on IA64 caused by recent capsule loader changes
- suppress a type mismatch build warning in the expansion of
EFI_PHYS_ALIGN on ARM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/kWCMACgkQEsHwGGHe
VUqVlxAAg3jSS5w5fuaXON2xYZmgKdlRB0fjbklo1ZrWS6sEHrP+gmVmrJSWGZP+
qFleQ6AxaYK57UiBXxS6Xfn7hHRToqdOAGnaSYzIg1aQIofRoLxvm3YHBMKllb+g
x73IBS/Hu9/kiH8EVDrJSkBpVdbPwDnw+FeW4ZWUMF9GVmV8oA6Zx23BVSVsbFda
jat/cEsJQS3GfECJ/Fg5ae+c/2zn5NgbaVtLxVnMnJfAwEpoPz3ogKoANSskdZg3
z6pA1aMFoHr+lnlzcsM5zdboQlwZRKPHvFpsXPexESBy5dPkYhxFnHqgK4hSZglC
c3QoO9Gn+KOJl4KAKJWNzCrd3G9kKY5RXkoei4bH9wGMjW2c68WrbFyXgNsO3vYR
v5CKpq3+jlwGo03GiLJgWQFdgqX0EgTVHHPTcwFpt8qAMi9/JIPSIeTE41p2+AjZ
cW5F0IlikaR+N8vxc2TDvQTuSsroMiLcocvRWR61oV/48pFlEjqiUjV31myDsASg
gGkOxZOOz2iBJfK8lCrKp5p9JwGp0M0/GSHTxlYQFy+p4SrcOiPX4wYYdLsWxioK
AbVhvOClgB3kN7y7TpLvdjND00ciy4nKEC0QZ5p5G59jSLnpSBM/g6av24LsSQwo
S1HJKhQPbzcI1lhaPjo91HQoOOMZHWLes0SqK4FGNIH+0imHliA=
=n7Gc
-----END PGP SIGNATURE-----
Merge tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Borislav Petkov:
"These got delayed due to a last minute ia64 build issue which got
fixed in the meantime.
EFI updates collected by Ard Biesheuvel:
- Don't move BSS section around pointlessly in the x86 decompressor
- Refactor helper for discovering the EFI secure boot mode
- Wire up EFI secure boot to IMA for arm64
- Some fixes for the capsule loader
- Expose the RT_PROP table via the EFI test module
- Relax DT and kernel placement restrictions on ARM
with a few followup fixes:
- fix the build breakage on IA64 caused by recent capsule loader
changes
- suppress a type mismatch build warning in the expansion of
EFI_PHYS_ALIGN on ARM"
* tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: arm: force use of unsigned type for EFI_PHYS_ALIGN
efi: ia64: disable the capsule loader
efi: stub: get rid of efi_get_max_fdt_addr()
efi/efi_test: read RuntimeServicesSupported
efi: arm: reduce minimum alignment of uncompressed kernel
efi: capsule: clean scatter-gather entries from the D-cache
efi: capsule: use atomic kmap for transient sglist mappings
efi: x86/xen: switch to efi_get_secureboot_mode helper
arm64/ima: add ima_arch support
ima: generalize x86/EFI arch glue for other EFI architectures
efi: generalize efi_get_secureboot
efi/libstub: EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER should not default to yes
efi/x86: Only copy the compressed kernel image in efi_relocate_kernel()
efi/libstub/x86: simplify efi_is_native()
With the intoduction of hardware tag-based KASAN some kernel checks of
this kind:
ifdef CONFIG_KASAN
will be updated to:
if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
x86 and s390 use a trick to #undef CONFIG_KASAN for some of the code
that isn't linked with KASAN runtime and shouldn't have any KASAN
annotations.
Also #undef CONFIG_KASAN_GENERIC with CONFIG_KASAN.
Link: https://lkml.kernel.org/r/9d84bfaaf8fabe0fc89f913c9e420a30bd31a260.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(Arvind Sankar)
- Remove -m16 workaround now that the GCC versions that need it are unsupported
(Nick Desaulniers)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XhwYACgkQEsHwGGHe
VUqYPQ/9ECDO6Ms8rkvZf4f7+LfKO8XsRYf0h411h5zuVssgPybIbZCvuiQ5Jj9r
IzTe0pO0AoVoX5/JE+mxbvroBb5hRokNehJayPmdVjZSrzsYWZR3RVW6iSRo/TQ0
bjzY1cHC4EPmtyYA4FDr+b0JAEEbyPuPOAcqSzBBadsx5FURvwH7s1hUILCZ1nmy
i8DADZDo7Puk1W8VQkGkjUeo3QVY8KG2VzZegU0iUR/M7Hoek3WBPObYFg0bsKN4
6hvRWPMt2lFXOSNZOFhG+LjtWXTJ22g7ENU4SHXhHBdxmM8H/OZvh/3UZTes7rdo
u/LZJ580GfOa3TYHOToGQq9iBxoi9Se3vLXUtt2I9Z7sbBVUGTKx2Llezvt9/QRh
kJkpvMc2r2b9n7Cu52gHKQ89OrqFzs2scNyjQ2KRU61bPwLjWoGiobugnl8T9q8Y
Li8LKS2tCD6yo7L1eYt0NDriWVF3/Tv1rebWMNIf3sGSHFkxgHxFviTx3L3SWEBS
8JMuKLPh20WC9ombXP/tDN2rtTmmL9hEoqZ9uSTG1tRNLOLLNv790ucUh+lp6RFp
7y8dzl+L3hXsFdcCYooOIGO1phmdThwSLMqZHmbpRWpKNr+8P0wCqZP16twnoXNC
kO7OiNd6XHVpJg4WnN3C3X70n7TPXSZ7dkpDNFBJaH2QilNEHa8=
=Qk/S
-----END PGP SIGNATURE-----
Merge tag 'x86_build_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
"Two x86 build fixes:
- Fix the vmlinux size check on 64-bit along with adding useful
clarifications on the topic (Arvind Sankar)
- Remove -m16 workaround now that the GCC versions that need it are
unsupported (Nick Desaulniers)"
* tag 'x86_build_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Remove -m16 workaround for unsupported versions of GCC
x86/build: Fix vmlinux size check on 64-bit
(Gabriel Krisman Bertazi)
- All kinds of minor cleanups all over the tree.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XgtoACgkQEsHwGGHe
VUqGuA/9GqN2zNQdhgRvAQ+FLZiOYK9MfXcoayfMq8T61VRPDBWaQRfVYKmfmEjS
0l5OnYgZQ9n6vzqFy6pmgc/ix8Jr553dZp5NCamcOqjCTcuO/LwRRh+ZBeFSBTPi
r2qFYKKRYvM7nbyUMm4WqvAakxJ18xsjNbIslr9Aqe8WtHBKKX3MOu8SOpFtGyXz
aEc4rhsS45iZa5gTXhvOn73tr3yHGWU1rzyyAAAmDGTgAxRwsTna8v16C4+v+Bua
Zg18Wiutj8ZjtFpzKJtGWGZoSBap3Jw2Ys64g42MBQUE56KY/99tQVo/SvbYvvlf
PHWLH0f3rPNJ6J2qeKwhtNzPlEAH/6e416A1/6TVwsK+8pdfGmkfaQh2iDHLhJ5i
CSwF61H44ZaE3pc1tHHbC5ALvydPlup7D4MKgztfq0mZ3OoV2Vg7dtyyr+Ybz72b
G+Kl/tmyacQTXo0FiYbZKETo3/VfTdBXGyVax1rHkx3pt8zvhFg3kxb1TT/l/CoM
eSTx53PtTdVtbGOq1CjnUm0FKlbh4+kLoNuo9DYKeXUQBs8PWOCZmL3wXmm4cqlZ
mDZVWvll7CjToY8izzcE/AG279cWkgcL5Tcg7W7CR66+egfDdpuqOZ4tv4TyzoWq
0J7WeNj+TAo98b7RA0Ux8LOlszRxS2ykuI6uB2MgwCaRMbbaQao=
=lLiH
-----END PGP SIGNATURE-----
Merge tag 'x86_cleanups_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
"Another branch with a nicely negative diffstat, just the way I
like 'em:
- Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in
the end (Gabriel Krisman Bertazi)
- All kinds of minor cleanups all over the tree"
* tag 'x86_cleanups_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/ia32_signal: Propagate __user annotation properly
x86/alternative: Update text_poke_bp() kernel-doc comment
x86/PCI: Make a kernel-doc comment a normal one
x86/asm: Drop unused RDPID macro
x86/boot/compressed/64: Use TEST %reg,%reg instead of CMP $0,%reg
x86/head64: Remove duplicate include
x86/mm: Declare 'start' variable where it is used
x86/head/64: Remove unused GET_CR2_INTO() macro
x86/boot: Remove unused finalize_identity_maps()
x86/uaccess: Document copy_from_user_nmi()
x86/dumpstack: Make show_trace_log_lvl() static
x86/mtrr: Fix a kernel-doc markup
x86/setup: Remove unused MCA variables
x86, libnvdimm/test: Remove COPY_MC_TEST
x86: Reclaim TIF_IA32 and TIF_X32
x86/mm: Convert mmu context ia32_compat into a proper flags field
x86/elf: Use e_machine to check for x32/ia32 in setup_additional_pages()
elf: Expose ELF header on arch_setup_additional_pages()
x86/elf: Use e_machine to select start_thread for x32
elf: Expose ELF header in compat_start_thread()
...
- Make the AMD L3 QoS code and data priorization enable/disable mechanism
work correctly. The control bit was only set/cleared on one of the CPUs
in a L3 domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one from
Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes the UV
hubs to be identified wrongly. The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the uprobes
code which fails to handle repeated prefixes properly. The aggregate
size of the prefixes can be larger than the bytes array but the code
blindly iterated over the aggregate size beyond the array boundary.
Add a macro to handle this case properly and use it at the affected
places.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M2GoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUGeD/0TxQyVIZTJjY/ExDYvQZeSWgvFVon9
A/QcwkgtkWzYKI3YThgxBtSNKhHPP8eQ9QK8ErcaQKEHU5TdXVEOwHgTm1A6OGat
0+M9/EWEe7tTu+cLpu7esQ+1VbSvEcFXZbljbhCKrShlBlsIjFG4eCPAprSw9yjI
RSgfXKZu4NmHVS6nfJTwmIIaTeLQ6U3b7b1D5s/66slBFScqnLbRNhABVbHbos4F
pl/lxDCFOddy2YbEojHjjGqMA7oxPav7c0nYFOM/zG+wAqfEjbqOxReT31bGQPi2
XT9K4JEqDqILo0KnhV4GsYoWAhes3BtmsJ9IoZ7IijsMriYl80mD9URAORidJ4PX
28Ckk9V/DlE8uDrAnBDcWDSoKlg78mhVV7V9L6v43teg/gJfSZNROtNDBmqRmwG4
Op2NJfzJITtaxVQuSZRkSs8rzGv+QUfaM1sBUQ+Oz4KYeIjjA7G2MAOECrzIAWKB
GWc5toYRVS6oGT+RbZhSxZYoh8ASoGJ2MrL8K4OV4RqEqHHcXcih0WmmljtsDIFI
td4FHHH6fghIb9S6iYKiApd6k2qKa33mwJwa/xZOoIrv0w5xT0WDJnxT60gu/Mec
YDkqhmA009CNSD2G4oNRNF5MH7gp34UII+25jOGatbVh+5DDPYs+5Jnh/DR7jssR
PryAG9ER7UUb6w==
=rNBq
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper
check must be:
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
Revert the following two commits:
de3accdaec ("x86, build: Build 16-bit code with -m16 where possible")
a9cfccee66 ("x86, build: Change code16gcc.h from a C header to an assembly header")
Since
0bddd227f3 ("Documentation: update for gcc 4.9 requirement")
the minimum supported version of GCC is gcc-4.9. It's now safe to remove
this code.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20201201011307.3676986-1-ndesaulniers@google.com
Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).
To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.
To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.
Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Use TEST %reg,%reg which sets the zero flag in the same way as CMP
$0,%reg, but the encoding uses one byte less.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201029160258.139216-1-ubizjak@gmail.com
Commit
8570978ea0 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code")
removed all the references to finalize_identity_maps(), but neglected to
delete the actual function. Remove it.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201005151208.2212886-2-nivedita@alum.mit.edu
Generalize the efi_get_secureboot() function so not only efistub but also
other subsystems can use it.
Note that the MokSbState handling is not factored out: the variable is
boot time only, and so it cannot be parameterized as easily. Also, the
IMA code will switch to this version in a future patch, and it does not
incorporate the MokSbState exception in the first place.
Note that the new efi_get_secureboot_mode() helper treats any failures
to read SetupMode as setup mode being disabled.
Co-developed-by: Chester Lin <clin@suse.com>
Signed-off-by: Chester Lin <clin@suse.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.
The check function is in a separate file:
arch/x86/kernel/sev_verify_cbit.S
so that it can be re-used in the running kernel image.
[ bp: Massage. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-4-joro@8bytes.org
Introduce sev_status and initialize it together with sme_me_mask to have
an indicator which SEV features are enabled.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-2-joro@8bytes.org
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
8570978ea0 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code")
set up a new page table in the decompressor stub, but without explicit
mappings for boot_params and the kernel command line, relying on the #PF
handler instead.
This is fragile, as boot_params and the command line mappings are
required for the main kernel. If EARLY_PRINTK and RANDOMIZE_BASE are
disabled, a QEMU/OVMF boot never accesses the command line in the
decompressor stub, and so it never gets mapped. The main kernel accesses
it from the identity mapping if AMD_MEM_ENCRYPT is enabled, and will
crash.
Fix this by adding back the explicit mapping of boot_params and the
command line.
Note: the changes also removed the explicit mapping of the main kernel,
with the result that .bss and .brk may not be in the identity mapping,
but those don't get accessed by the main kernel before it switches to
its own page tables.
[ bp: Pass boot_params with a MOV %rsp... instead of PUSH/POP. Use
block formatting for the comment. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20201016200404.1615994-1-nivedita@alum.mit.edu
Commit
ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
started using a new set of pagetables even without KASLR.
After that commit, initialize_identity_maps() is called before the
5-level paging variables are setup in choose_random_location(), which
will not work if 5-level paging is actually enabled.
Fix this by moving the initialization of __pgtable_l5_enabled,
pgdir_shift and ptrs_per_p4d into cleanup_trampoline(), which is called
immediately after the finalization of whether the kernel is executing
with 4- or 5-level paging. This will be earlier than anything that might
require those variables, and keeps the 4- vs 5-level paging code all in
one place.
Fixes: ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20201010191110.4060905-1-nivedita@alum.mit.edu
called SEV by also encrypting the guest register state, making the
registers inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared between
the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
in order for that exception mechanism to work, the early x86 init code
needed to be made able to handle exceptions, which, in itself, brings
a bunch of very nice cleanups and improvements to the early boot code
like an early page fault handler, allowing for on-demand building of the
identity mapping. With that, !KASLR configurations do not use the EFI
page table anymore but switch to a kernel-controlled one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly
separate from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and behind
static keys to minimize the performance impact on !SEV-ES setups.
Work by Joerg Roedel and Thomas Lendacky and others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
=toqi
-----END PGP SIGNATURE-----
Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
the .fixup section, by Uros Bizjak.
* Replace __force_order dummy variable with a memory clobber to fix LLVM
requiring a definition for former and to prevent memory accesses from
still being cached/reordered, by Arvind Sankar.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EODIACgkQEsHwGGHe
VUqPBRAAguaiNy8gPGNRSvqRWTzbxh/IAqB+5rjSH48biRnZm4o7Nsw9tL8kSXN/
yWcGJxEtvheaITFh+rN31jINPCuLdQ2/LaJ+fX13zhgaMmX5RrLZ3FPoGa+eu+y5
yAN8GaBM3VZ14Yzou8q5JF5001yRxXM8UsRzg8XVO7TORB6OOxnnrUbxYvUcLer5
O219NnRtClU1ojZc5u2P1vR5McwIMf66qIkH1gn477utxeFOL380p/ukPOTNPYUH
HsCVLJl0RPVQMI0UNiiRw6V76fHi38kIYJfR7Rg6Jy+k/U0z+eDXPg2/aHZj63NP
K7pZ7XgbaBWbHSr8C9+CsCCAmTBYOascVpcu7X+qXJPS93IKpg7e+9rAKKqlY5Wq
oe6IN975TjzZ+Ay0ZBRlxzFOn2ZdSPJIJhCC3MyDlBgx7KNIVgmvKQ1BiKQ/4ZQX
foEr6HWIIKzQQwyI++pC0AvZ63hwM8X3xIF+6YsyXvNrGs+ypEhsAQpa4Q3XXvDi
88afyFAhdgClbvAjbjefkPekzzLv+CYJa2hUCqsuR8Kh55DiAs204oszVHs4HzBk
nqLffuaKXo7Vg6XOMiK/y8pGWsu5Sdp1YMBVoedvENvrVf5awt1SapV31dKC+6g9
iF6ljSMJWYmLOmNSC3wmdivEgMLxWfgejKH6ltWnR6MZUE5KeGE=
=8moV
-----END PGP SIGNATURE-----
Merge tag 'x86_asm_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Borislav Petkov:
"Two asm wrapper fixes:
- Use XORL instead of XORQ to avoid a REX prefix and save some bytes
in the .fixup section, by Uros Bizjak.
- Replace __force_order dummy variable with a memory clobber to fix
LLVM requiring a definition for former and to prevent memory
accesses from still being cached/reordered, by Arvind Sankar"
* tag 'x86_asm_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Replace __force_order with a memory clobber
x86/uaccess: Use XORL %0,%0 in __get_user_asm()
such as the CRC sum or limited size data - most of which can be gained
via tools.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EkawRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gZBA//SSiRpEHsaOKKcpuoh12OKIy/mxAsHc2K
1iHmnP3S2VGLaG1VrP/J1zDHUaksPB3QaiT8YM8EmWWZVNVsKbBbbPA2gzz85EBB
N6NsZyYt+emKVDejx09rZ9RlN5Zse7mO4pZrOhVSTZfwbfVOwxSv7SYXhFURV7dh
YX42TkMAELeGZ1jA9Ez6heUkV2bYVqhNdhsVExkSJrY2YwvP5UR2bmcz0jCuGoqk
TjuYxr6nrFu+GKRsNqWc92mLWcs3OHN3gaauqOphXXGlz2jQBZ6gVVv/xHfR3v0J
iD72fEGx/5qBWbz3d3uToBEbgQK+6WbB1qUji4zhtwDAE6OuafsujSGUJE0LFZuy
24m1kxmzsOUarJP78b/Hs8pHWmt6WPfwpffMYEY1WrkUCoXjlZhSPJs88J5PNGrg
NwYo/w2vVvP7/cudm6faVF2KDjd0n6g0IkF9S3xA7rWmajbzwZpCBRHWTSd9Vhtq
7Wj8dC92zre3JsHupBqsn9yIjM7JfWK+S7jbBudlUW+SjHz/NbyR+guhUwRdN6qd
ERhCQWKCHid9V4LsgKI/PP/AoWzqq8Xj4qI4dvOZM2aFzGSiJ6egKcjfbJw+Kmt3
Ffn5CHu4zGi3j+nDTs7Hw7lBh0tiavkNBV/zbopNyF9AHReUJtPrN2TGxw0GBeim
Dd1dK/7ANRY=
=5rfu
-----END PGP SIGNATURE-----
Merge tag 'x86-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build update from Ingo Molnar:
"Remove a couple of ancient and distracting printouts from the x86
build, such as the CRC sum or limited size data - most of which can be
gained via tools"
* tag 'x86-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Declutter the build output
also fixes some corner case bugs.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EjbwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g5ihAAw1B4ZV+yG6BKMIx8oC+LSefSX0lk6nv0
pmPWIdvmlL1FHlyvALymU2UCneA7KYkGjdz6LYtXlvdGTpVj1M7mJAFBp8ugJj2k
oOelkohTsTHOU/Oae4+OX8E8Kt5PsFS21wSoTo+WYI2pgl1iGzVPPCreU+uMB3lP
IaenOvBlNeXUy0S1tDYLFQT3wtS+lC2Yu6mG5AklDxGvAxpO73U9KJnRwpx34aCL
Z24fW5gZkZg5KQa1ULjZLMUoCNEz24QP/V5PMmgbD3QRkxTeNliaGPvbBV5c2fJd
WOaI4QUEah2wTtRgXv3z1SMbhAFnIoo1fAZKs1DEfT+HUO+Aw6YDpH58TdaPCs4/
ES3Tk9lcRUITAeilbO4pGPUEq+9FPHZ+hZaevQTDcjHmE4VPG984UqFzm2Pc4NEb
gZ6R0zKU9I7w6GFyXn3KeaPwf79CVGuJ4Zd1Is86ab2ft7V3m4cyHT7BKBkF+skD
MQEAhroE7WL/+E1mvK/zej9I9PTbHtk50GdmKTzcnrNdFLehGz6hyMM/PnBKE2j5
Gsw+x/DjCXQSpwVTkJCBLf3BqMpbi7C5vzwbIPYcKuE3Ggo0f1n9rUUJwCNybo21
XhSWq8GFX35+5nBLfznXRbaU1uyvWwWvheWwR56LbTHgdPUlgVr/aa/KQ5bliz0a
JCUUcDnUZjw=
=+VGg
-----END PGP SIGNATURE-----
Merge tag 'x86-kaslr-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kaslr updates from Ingo Molnar:
"This cleans up and simplifies the x86 KASLR code, and also fixes some
corner case bugs"
* tag 'x86-kaslr-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/kaslr: Replace strlen() with strnlen()
x86/kaslr: Add a check that the random address is in range
x86/kaslr: Make local variables 64-bit
x86/kaslr: Replace 'unsigned long long' with 'u64'
x86/kaslr: Make minimum/image_size 'unsigned long'
x86/kaslr: Small cleanup of find_random_phys_addr()
x86/kaslr: Drop unnecessary alignment in find_random_virt_addr()
x86/kaslr: Drop redundant check in store_slot_info()
x86/kaslr: Make the type of number of slots/slot areas consistent
x86/kaslr: Drop test for command-line parameters before parsing
x86/kaslr: Simplify process_gb_huge_pages()
x86/kaslr: Short-circuit gb_huge_pages on x86-32
x86/kaslr: Fix off-by-one error in process_gb_huge_pages()
x86/kaslr: Drop some redundant checks from __process_mem_region()
x86/kaslr: Drop redundant variable in __process_mem_region()
x86/kaslr: Eliminate 'start_orig' local variable from __process_mem_region()
x86/kaslr: Drop redundant cur_entry from __process_mem_region()
x86/kaslr: Fix off-by-one error in __process_mem_region()
x86/kaslr: Initialize mem_limit to the real maximum address
x86/kaslr: Fix process_efi_entries comment
...
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Edv4RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hiKBAApdJEOaK7hMc3013DYNctklIxEPJL2mFJ
11YJRIh4pUJTF0TE+EHT/D+rSIuRsyuoSmOQBQ61/wVSnyG067GjjVJRqh/eYaJ1
fDhJi2FuHOjXl+CiN0KxzBjjp+V4NhF7jHT59tpQSvfZeg7FjteoxfztxaCp5ek3
S3wHB3CC4c4jE3lfjHem1E9/PwT4kwPYx1c3gAUdEqJdjkihjX9fWusfjLeqW6/d
Y5VkApi6bL9XiZUZj5l0dEIweLJJ86+PkKJqpo3spxxEak1LSn1MEix+lcJ8e1Kg
sb/bEEivDcmFlFWOJnn0QLquCR0Cx5bz1pwsL0tuf0yAd4+sXX5IMuGUysZlEdKM
BHL9h5HbevGF4BScwZwZH7lyEg7q67s5KnRu4hxy0Swfcj7y0oT/9lXqpbpZ2DqO
Hd+bRRQKIbqnTMp0hcit9LfpLp93vj0dBlaV5ocAJJlu62u9VnwGG5HQuZ5giLUr
kA1SLw63Y1wopFRxgFyER8les7eLsu0zxHeK44rRVlVnfI99OMTOgVNicmDFy3Fm
AfcnfJG0BqBEJGQz5es34uQQKKBwFPtC9NztopI62KiwOspYYZyrO1BNxdOc6DlS
mIHrmO89HMXuid5eolvLaFqUWirHoWO8TlycgZxUWVHc2txVPjAEU/axouU/dSSU
w/6GpzAa+7g=
=fXAw
-----END PGP SIGNATURE-----
Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"
* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
The CRn accessor functions use __force_order as a dummy operand to
prevent the compiler from reordering CRn reads/writes with respect to
each other.
The fact that the asm is volatile should be enough to prevent this:
volatile asm statements should be executed in program order. However GCC
4.9.x and 5.x have a bug that might result in reordering. This was fixed
in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x,
may reorder volatile asm statements with respect to each other.
There are some issues with __force_order as implemented:
- It is used only as an input operand for the write functions, and hence
doesn't do anything additional to prevent reordering writes.
- It allows memory accesses to be cached/reordered across write
functions, but CRn writes affect the semantics of memory accesses, so
this could be dangerous.
- __force_order is not actually defined in the kernel proper, but the
LLVM toolchain can in some cases require a definition: LLVM (as well
as GCC 4.9) requires it for PIE code, which is why the compressed
kernel has a definition, but also the clang integrated assembler may
consider the address of __force_order to be significant, resulting in
a reference that requires a definition.
Fix this by:
- Using a memory clobber for the write functions to additionally prevent
caching/reordering memory accesses across CRn writes.
- Using a dummy input operand with an arbitrary constant address for the
read functions, instead of a global variable. This will prevent reads
from being reordered across writes, while allowing memory loads to be
cached/reordered across CRn reads, which should be safe.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/
Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
The x86-64 psABI [0] specifies special relocation types
(R_X86_64_[REX_]GOTPCRELX) for indirection through the Global Offset
Table, semantically equivalent to R_X86_64_GOTPCREL, which the linker
can take advantage of for optimization (relaxation) at link time. This
is supported by LLD and binutils versions 2.26 onwards.
The compressed kernel is position-independent code, however, when using
LLD or binutils versions before 2.27, it must be linked without the -pie
option. In this case, the linker may optimize certain instructions into
a non-position-independent form, by converting foo@GOTPCREL(%rip) to $foo.
This potential issue has been present with LLD and binutils-2.26 for a
long time, but it has never manifested itself before now:
- LLD and binutils-2.26 only relax
movq foo@GOTPCREL(%rip), %reg
to
leaq foo(%rip), %reg
which is still position-independent, rather than
mov $foo, %reg
which is permitted by the psABI when -pie is not enabled.
- GCC happens to only generate GOTPCREL relocations on mov instructions.
- CLang does generate GOTPCREL relocations on non-mov instructions, but
when building the compressed kernel, it uses its integrated assembler
(due to the redefinition of KBUILD_CFLAGS dropping -no-integrated-as),
which has so far defaulted to not generating the GOTPCRELX
relocations.
Nick Desaulniers reports [1,2]:
"A recent change [3] to a default value of configuration variable
(ENABLE_X86_RELAX_RELOCATIONS OFF -> ON) in LLVM now causes Clang's
integrated assembler to emit R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX
relocations. LLD will relax instructions with these relocations based
on whether the image is being linked as position independent or not.
When not, then LLD will relax these instructions to use absolute
addressing mode (R_RELAX_GOT_PC_NOPIC). This causes kernels built with
Clang and linked with LLD to fail to boot."
Patch series [4] is a solution to allow the compressed kernel to be
linked with -pie unconditionally, but even if merged is unlikely to be
backported. As a simple solution that can be applied to stable as well,
prevent the assembler from generating the relaxed relocation types using
the -mrelax-relocations=no option. For ease of backporting, do this
unconditionally.
[0] https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/linker-optimization.tex#L65
[1] https://lore.kernel.org/lkml/20200807194100.3570838-1-ndesaulniers@google.com/
[2] https://github.com/ClangBuiltLinux/linux/issues/1121
[3] https://reviews.llvm.org/rGc41a18cf61790fc898dcda1055c3efbf442c14c0
[4] https://lore.kernel.org/lkml/20200731202738.2577854-1-nivedita@alum.mit.edu/
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200812004308.1448603-1-nivedita@alum.mit.edu
Make sure the machine supports RDRAND, otherwise there is no trusted
source of randomness in the system.
To also check this in the pre-decompression stage, make has_cpuflag()
not depend on CONFIG_RANDOMIZE_BASE anymore.
Signed-off-by: Martin Radev <martin.b.radev@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-73-joro@8bytes.org
Calling down to EFI runtime services can result in the firmware
performing VMGEXIT calls. The firmware is likely to use the GHCB of the
OS (e.g., for setting EFI variables), so each GHCB in the system needs
to be identity-mapped in the EFI page tables, as unencrypted, to avoid
page faults.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Moved GHCB mapping loop to sev-es.c ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-72-joro@8bytes.org
Implement a handler for #VC exceptions caused by RDTSC and RDTSCP
instructions. Also make it available in the pre-decompression stage
because the KASLR code uses RDTSC/RDTSCP to gather entropy and some
hypervisors intercept these instructions.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Adapt to #VC handling infrastructure
- Make it available early ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-55-joro@8bytes.org
Force a page-fault on any further accesses to the GHCB page when they
shouldn't happen anymore. This will catch any bugs where a #VC exception
is raised even though none is expected anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-25-joro@8bytes.org
Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-24-joro@8bytes.org
The functions are needed to map the GHCB for SEV-ES guests. The GHCB
is used for communication with the hypervisor, so its content must not
be encrypted. After the GHCB is not needed anymore it must be mapped
encrypted again so that the running kernel image can safely re-use the
memory.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-23-joro@8bytes.org
Call set_sev_encryption_mask() while still on the stage 1 #VC-handler
because the stage 2 handler needs the kernel's own page tables to be
set up, to which calling set_sev_encryption_mask() is a prerequisite.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-21-joro@8bytes.org
Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet because the kernel might still be running on the EFI page table.
The stage 1 handler is limited to the MSR-based protocol to talk to the
hypervisor and can only support CPUID exit-codes, but that is enough to
get to stage 2.
[ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-20-joro@8bytes.org
Changing the function to take start and end as parameters instead of
start and size simplifies the callers which don't need to calculate the
size if they already have start and end.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-19-joro@8bytes.org
With the page-fault handler in place, he identity mapping can be built
on-demand. So remove the code which manually creates the mappings and
unexport/remove the functions used for it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-18-joro@8bytes.org
When booted through startup_64(), the kernel keeps running on the EFI
page table until the KASLR code sets up its own page table. Without
KASLR, the pre-decompression boot code never switches off the EFI page
table. Change that by unconditionally switching to a kernel-controlled
page table after relocation.
This makes sure the kernel can make changes to the mapping when
necessary, for example map pages unencrypted in SEV and SEV-ES guests.
Also, remove the debug_putstr() calls in initialize_identity_maps()
because the function now runs before console_init() is called.
[ bp: Massage commit message. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-17-joro@8bytes.org
Install a page-fault handler to add an identity mapping to addresses
not yet mapped. Also do some checking whether the error code is sane.
This makes non SEV-ES machines use the exception handling
infrastructure in the pre-decompressions boot code too, making it less
likely to break in the future.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-16-joro@8bytes.org
The file contains only code related to identity-mapped page tables.
Rename the file and compile it always in.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-15-joro@8bytes.org
Add code needed to setup an IDT in the early pre-decompression
boot-code. The IDT is loaded first in startup_64, which is after
EfiExitBootServices() has been called, and later reloaded when the
kernel image has been relocated to the end of the decompression area.
This allows to setup different IDT handlers before and after the
relocation.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-14-joro@8bytes.org
The x86-64 ABI defines a red-zone on the stack:
The 128-byte area beyond the location pointed to by %rsp is considered
to be reserved and shall not be modified by signal or interrupt
handlers. Therefore, functions may use this area for temporary data
that is not needed across function calls. In particular, leaf
functions may use this area for their entire stack frame, rather than
adjusting the stack pointer in the prologue and epilogue. This area is
known as the red zone.
This is not compatible with exception handling, because the IRET frame
written by the hardware at the stack pointer and the functions to handle
the exception will overwrite the temporary variables of the interrupted
function, causing undefined behavior. So disable red-zones for the
pre-decompression boot code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200907131613.12703-13-joro@8bytes.org
We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly handled in the linker
script.
Now that all sections are explicitly handled, enable orphan section
warnings.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20200902025347.2504702-6-keescook@chromium.org
In preparation for warning on orphan sections, stop the linker from
generating the .eh_frame* sections, discard unwanted non-zero-sized
generated sections, and enforce other expected-to-be-zero-sized sections
(since discarding them might hide problems with them suddenly gaining
unexpected entries).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200821194310.3089815-28-keescook@chromium.org
The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org
We have some really ancient debug printouts in the x86 boot image build code:
Setup is 14108 bytes (padded to 14336 bytes).
System is 8802 kB
CRC 27e909d4
None of these ever helped debug any sort of breakage that I know of, and they
clutter the build output.
Remove them - if anyone needs the see the various interim stages of this to
debug an obscure bug, they can add these printfs and more.
We still keep this one:
Kernel: arch/x86/boot/bzImage is ready (#19)
As a sentimental leftover, plus the '#19' build count tag is mildly useful.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Since commits
c041b5ad86 ("x86, boot: Create a separate string.h file to provide standard string functions")
fb4cac573e ("x86, boot: Move memcmp() into string.h and string.c")
the decompressor stub has been using the compiler's builtin memcpy,
memset and memcmp functions, _except_ where it would likely have the
largest impact, in the decompression code itself.
Remove the #undef's of memcpy and memset in misc.c so that the
decompressor code also uses the compiler builtins.
The rationale given in the comment doesn't really apply: just because
some functions use the out-of-line version is no reason to not use the
builtin version in the rest.
Replace the comment with an explanation of why memzero and memmove are
being #define'd.
Drop the suggestion to #undef in boot/string.h as well: the out-of-line
versions are not really optimized versions, they're generic code that's
good enough for the preboot environment. The compiler will likely
generate better code for constant-size memcpy/memset/memcmp if it is
allowed to.
Most decompressors' performance is unchanged, with the exception of LZ4
and 64-bit ZSTD.
Before After ARCH
LZ4 73ms 10ms 32
LZ4 120ms 10ms 64
ZSTD 90ms 74ms 64
Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels.
Decompressor code size has small differences, with the largest being
that 64-bit ZSTD decreases just over 2k. The largest code size increase
was on 64-bit XZ, of about 400 bytes.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Nick Terrell <nickrterrell@gmail.com>
Tested-by: Nick Terrell <nickrterrell@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a linker script check that there are no run-time relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.
Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-8-keescook@chromium.org
The BFD linker generates run-time relocations for z_input_len and
z_output_len, even though they are absolute symbols.
This is fixed for binutils-2.35 [1]. Work around this for earlier
versions by defining two variables input_len and output_len in addition
to the symbols, and use them via position-independent references.
This eliminates the last two run-time relocations in the head code and
allows us to drop the -z noreloc-overflow flag to the linker.
Move the -pie and --no-dynamic-linker LDFLAGS to LDFLAGS_vmlinux instead
of KBUILD_LDFLAGS. There shouldn't be anything else getting linked, but
this is the more logical location for these flags, and modversions might
call the linker if an EXPORT_SYMBOL is left over accidentally in one of
the decompressors.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=25754
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-7-keescook@chromium.org
The assembly code in head_{32,64}.S, while meant to be
position-independent, generates run-time relocations because it uses
instructions such as:
leal gdt(%edx), %eax
which make the assembler and linker think that the code is using %edx as
an index into gdt, and hence gdt needs to be relocated to its run-time
address.
On 32-bit, with lld Dmitry Golovin reports that this results in a
link-time error with default options (i.e. unless -z notext is
explicitly passed):
LD arch/x86/boot/compressed/vmlinux
ld.lld: error: can't create dynamic relocation R_386_32 against local
symbol in readonly segment; recompile object files with -fPIC or pass
'-Wl,-z,notext' to allow text relocations in the output
With the BFD linker, this generates a warning during the build, if
--warn-shared-textrel is enabled, which at least Gentoo enables by
default:
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating a DT_TEXTREL in object
On 64-bit, it is not possible to link the kernel as -pie with lld, and
it is only possible with a BFD linker that supports -z noreloc-overflow,
i.e. versions >2.26. This is because these instructions cannot really be
relocated: the displacement field is only 32-bits wide, and thus cannot
be relocated for a 64-bit load address. The -z noreloc-overflow option
simply overrides the linker error, and results in R_X86_64_RELATIVE
relocations that apply a 64-bit relocation to a 32-bit field anyway.
This happens to work because nothing will process these run-time
relocations.
Start fixing this by removing relocations from .head.text:
- On 32-bit, use a base register that holds the address of the GOT and
reference symbol addresses using @GOTOFF, i.e.
leal gdt@GOTOFF(%edx), %eax
- On 64-bit, most of the code can (and already does) use %rip-relative
addressing, however the .code32 bits can't, and the 64-bit code also
needs to reference symbol addresses as they will be after moving the
compressed kernel to the end of the decompression buffer.
For these cases, reference the symbols as an offset to startup_32 to
avoid creating relocations, i.e.:
leal (gdt-startup_32)(%bp), %eax
This only works in .head.text as the subtraction cannot be represented
as a PC-relative relocation unless startup_32 is in the same section
as the code. Move efi32_pe_entry into .head.text so that it can use
the same method to avoid relocations.
Reported-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-6-keescook@chromium.org
GCC puts the main function into .text.startup when compiled with -Os (or
-O2). This results in arch/x86/boot/main.c having a .text.startup
section which is currently not included explicitly in the linker script
setup.ld in the same directory.
The BFD linker places this orphan section immediately after .text, so
this still works. However, LLD git, since [1], is choosing to place it
immediately after the .bstext section instead (this is the first code
section). This plays havoc with the section layout that setup.elf
requires to create the setup header, for eg on 64-bit:
LD arch/x86/boot/setup.elf
ld.lld: error: section .text.startup file range overlaps with .header
>>> .text.startup range is [0x200040, 0x2001FE]
>>> .header range is [0x2001EF, 0x20026B]
ld.lld: error: section .header file range overlaps with .bsdata
>>> .header range is [0x2001EF, 0x20026B]
>>> .bsdata range is [0x2001FF, 0x200398]
ld.lld: error: section .bsdata file range overlaps with .entrytext
>>> .bsdata range is [0x2001FF, 0x200398]
>>> .entrytext range is [0x20026C, 0x2002D3]
ld.lld: error: section .text.startup virtual address range overlaps
with .header
>>> .text.startup range is [0x40, 0x1FE]
>>> .header range is [0x1EF, 0x26B]
ld.lld: error: section .header virtual address range overlaps with
.bsdata
>>> .header range is [0x1EF, 0x26B]
>>> .bsdata range is [0x1FF, 0x398]
ld.lld: error: section .bsdata virtual address range overlaps with
.entrytext
>>> .bsdata range is [0x1FF, 0x398]
>>> .entrytext range is [0x26C, 0x2D3]
ld.lld: error: section .text.startup load address range overlaps with
.header
>>> .text.startup range is [0x40, 0x1FE]
>>> .header range is [0x1EF, 0x26B]
ld.lld: error: section .header load address range overlaps with
.bsdata
>>> .header range is [0x1EF, 0x26B]
>>> .bsdata range is [0x1FF, 0x398]
ld.lld: error: section .bsdata load address range overlaps with
.entrytext
>>> .bsdata range is [0x1FF, 0x398]
>>> .entrytext range is [0x26C, 0x2D3]
Add .text.* to the .text output section to fix this, and also prevent
any future surprises if the compiler decides to create other such
sections.
[1] https://reviews.llvm.org/D75225
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20200731230820.1742553-5-keescook@chromium.org
In a previous patch, we have eliminated GOT entries from the decompressor
binary and added an assertion that the .got section is empty. This means
that the GOT fixup routines that exist in both the 32-bit and 64-bit
startup routines have become dead code, and can be removed.
While at it, drop the KEEP() from the linker script, as it has no effect
on the contents of output sections that are created by the linker itself.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-4-keescook@chromium.org
Eliminate all GOT entries in the decompressor binary, by forcing hidden
visibility for all symbol references, which informs the compiler that
such references will be resolved at link time without the need for
allocating GOT entries.
To ensure that no GOT entries will creep back in, add an assertion to
the decompressor linker script that will fire if the .got section has
a non-zero size.
[Arvind: move hidden.h to include/linux instead of making a copy]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-3-keescook@chromium.org
The .got.plt section contains the part of the GOT which is used by PLT
entries, and which gets updated lazily by the dynamic loader when
function calls are dispatched through those PLT entries.
On fully linked binaries such as the kernel proper or the decompressor,
this never happens, and so in practice, the .got.plt section consists
only of the first 3 magic entries that are meant to point at the _DYNAMIC
section and at the fixup routine in the loader. However, since we don't
use a dynamic loader, those entries are never populated or used.
This means that treating those entries like ordinary GOT entries, and
updating their values based on the actual placement of the executable in
memory is completely pointless, and we can just ignore the .got.plt
section entirely, provided that it has no additional entries beyond
the first 3 ones.
So add an assertion in the linker script to ensure that this assumption
holds, and move the contents out of the [_got, _egot) memory range that
is modified by the GOT fixup routines.
While at it, drop the KEEP(), since it has no effect on the contents
of output sections that are created by the linker itself.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-2-keescook@chromium.org
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl8wJXEVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGMGEP/0jDq/WafbfPN0aU83EqEWLt/sKg
bluzmf/6HGx3XVRnuAzsHNNqysUx77WJiDsU/jbC/zdH8Iox3Sc1diE2sELLNAfY
iJmQ8NBPggyU74aYG3OJdpDjz8T9EX/nVaYrjyFlbuXElM+Qvo8Z4Fz6NpWqKWlA
gU+yGxEPPdX6MLHcSPSIu1hGWx7UT4fgfx3zDFTI2qvbQgQjKtzyTjAH5Cm3o87h
rfomvHSSoAUg+Fh1LediRh1tJlkdVO+w7c+LNwCswmdBtkZuxecj1bQGUTS8GaLl
CCWOKYfWp0KsVf1veXNNNaX/ecbp+Y34WErFq3V9Fdq5RmVlp+FPSGMyjDMRiQ/p
LGvzbJLPpG586MnK8of0dOj6Es6tVPuq6WH2HuvsyTGcZJDpFTTxRcK3HDkE8ig6
ZtuM3owB/Mep8IzwY2yWQiDrc7TX5Fz8S4hzGPU1zG9cfj4VT6TBqHGAy1Eql/0l
txj6vJpnbQSdXiIX8MIU3yH35Y7eW3JYWgspTZH5Woj1S/wAWwuG93Fuuxq6mQIJ
q6LSkMavtOfuCjOA9vJBZewpKXRU6yo0CzWNL/5EZ6z/r/I+DGtfb/qka8oYUDjX
9H0cecL37AQxDHRPTxCZDQF0TpYiFJ6bmnMftK9NKNuIdvsk9DF7UBa3EdUNIj38
yKS3rI7Lw55xWuY3
=bkNQ
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
* tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
kbuild: introduce hostprogs-always-y and userprogs-always-y
kbuild: sort hostprogs before passing it to ifneq
kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
kbuild: Replace HTTP links with HTTPS ones
kbuild: trace functions in subdirectories of lib/
kbuild: introduce ccflags-remove-y and asflags-remove-y
kbuild: do not export LDFLAGS_vmlinux
kbuild: always create directories of targets
powerpc/boot: add DTB to 'targets'
kbuild: buildtar: add dtbs support
kbuild: remove cc-option test of -ffreestanding
kbuild: remove cc-option test of -fno-stack-protector
Revert "kbuild: Create directory for target DTB"
kbuild: run the checker after the compiler
- Add support for zstd compressed kernel
- Define __DISABLE_EXPORTS in Makefile
- Remove __DISABLE_EXPORTS definition from kaslr.c
- Bump the heap size for zstd.
- Update the documentation.
Integrates the ZSTD decompression code to the x86 pre-boot code.
Zstandard requires slightly more memory during the kernel decompression
on x86 (192 KB vs 64 KB), and the memory usage is independent of the
window size.
__DISABLE_EXPORTS is now defined in the Makefile, which covers both
the existing use in kaslr.c, and the use needed by the zstd decompressor
in misc.c.
This patch has been boot tested with both a zstd and gzip compressed
kernel on i386 and x86_64 using buildroot and QEMU.
Additionally, this has been tested in production on x86_64 devices.
We saw a 2 second boot time reduction by switching kernel compression
from xz to zstd.
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com
Bump the ZO_z_extra_bytes margin for zstd.
Zstd needs 3 bytes per 128 KB, and has a 22 byte fixed overhead.
Zstd needs to maintain 128 KB of space at all times, since that is
the maximum block size. See the comments regarding in-place
decompression added in lib/decompress_unzstd.c for details.
The existing code is written so that all the compression algorithms use
the same ZO_z_extra_bytes. It is taken to be the maximum of the growth
rate plus the maximum fixed overhead. The comments just above this diff
state that:
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-6-nickrterrell@gmail.com
Change the type of local variables/fields that store mem_vector
addresses to u64 to make it less likely that 32-bit overflow will cause
issues on 32-bit.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-21-nivedita@alum.mit.edu
Change type of minimum/image_size arguments in process_mem_region to
'unsigned long'. These actually can never be above 4G (even on x86_64),
and they're 'unsigned long' in every other function except this one.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-19-nivedita@alum.mit.edu
Just a trivial rearrangement to do all the processing together, and only
have one call to slots_fetch_random() in the source.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-18-nivedita@alum.mit.edu
Drop unnecessary alignment of image_size to CONFIG_PHYSICAL_ALIGN in
find_random_virt_addr, it cannot change the result: the largest valid
slot is the largest n that satisfies
minimum + n * CONFIG_PHYSICAL_ALIGN + image_size <= KERNEL_IMAGE_SIZE
(since minimum is already aligned) and so n is equal to
(KERNEL_IMAGE_SIZE - minimum - image_size) / CONFIG_PHYSICAL_ALIGN
even if image_size is not aligned to CONFIG_PHYSICAL_ALIGN.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-17-nivedita@alum.mit.edu
Drop unnecessary check that number of slots is not zero in
store_slot_info, it's guaranteed to be at least 1 by the calculation.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-16-nivedita@alum.mit.edu
The number of slots can be 'unsigned int', since on 64-bit, the maximum
amount of memory is 2^52, the minimum alignment is 2^21, so the slot
number cannot be greater than 2^31. But in case future processors have
more than 52 physical address bits, make it 'unsigned long'.
The slot areas are limited by MAX_SLOT_AREA, currently 100. It is
indexed by an int, but the number of areas is stored as 'unsigned long'.
Change both to 'unsigned int' for consistency.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-15-nivedita@alum.mit.edu
This check doesn't save anything. In the case when none of the
parameters are present, each strstr will scan args twice (once to find
the length and then for searching), six scans in total. Just going ahead
and parsing the arguments only requires three scans: strlen, memcpy, and
parsing. This will be the first malloc, so free will actually free up
the memory, so the check doesn't save heap space either.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-14-nivedita@alum.mit.edu
32-bit does not have GB pages, so don't bother checking for them. Using
the IS_ENABLED() macro allows the compiler to completely remove the
gb_huge_pages code.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-12-nivedita@alum.mit.edu
Clip the start and end of the region to minimum and mem_limit prior to
the loop. region.start can only increase during the loop, so raising it
to minimum before the loop is enough.
A region that becomes empty due to this will get checked in
the first iteration of the loop.
Drop the check for overlap extending beyond the end of the region. This
will get checked in the next loop iteration anyway.
Rename end to region_end for symmetry with region.start.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-10-nivedita@alum.mit.edu
region.size can be trimmed to store the portion of the region before the
overlap, instead of a separate mem_vector variable.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-9-nivedita@alum.mit.edu
In case of an overlap, the beginning of the region should be used even
if it is exactly image_size, not just strictly larger.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200728225722.67457-6-nivedita@alum.mit.edu
On 64-bit, the kernel must be placed below MAXMEM (64TiB with 4-level
paging or 4PiB with 5-level paging). This is currently not enforced by
KASLR, which thus implicitly relies on physical memory being limited to
less than 64TiB.
On 32-bit, the limit is KERNEL_IMAGE_SIZE (512MiB). This is enforced by
special checks in __process_mem_region().
Initialize mem_limit to the maximum (depending on architecture), instead
of ULLONG_MAX, and make sure the command-line arguments can only
decrease it. This makes the enforcement explicit on 64-bit, and
eliminates the 32-bit specific checks to keep the kernel below 512M.
Check upfront to make sure the minimum address is below the limit before
doing any work.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-5-nivedita@alum.mit.edu
Since commit:
0982adc746 ("x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice")
process_efi_entries() will return true if we have an EFI memmap, not just
if it contained EFI_MEMORY_MORE_RELIABLE regions.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-4-nivedita@alum.mit.edu
Drop the warning on seeing "--" in handle_mem_options(). This will trigger
whenever one of the memory options is present in the command line
together with "--", but there's no problem if that is the case.
Replace goto with break.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-3-nivedita@alum.mit.edu
Handle the possibility that the command line is NULL.
Replace open-coded strlen with a function call.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200727230801.3468620-2-nivedita@alum.mit.edu
- Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the
recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to
get out of sync.
- Use the proper vectors for HYPERV.
- Make disabling of stack protector for the entry code work with GCC
builds which enable stack protector by default. Removing the option is
not sufficient, it needs an explicit -fno-stack-protector to shut it
off.
- Mark check_user_regs() noinstr as it is called from noinstr code. The
missing annotation causes it to be placed in the text section which
makes it instrumentable.
- Add the missing interrupt disable in exc_alignment_check()
- Fixup a XEN_PV build dependency in the 32bit entry code
- A few fixes to make the Clang integrated assembler happy
- Move EFI stub build to the right place for out of tree builds
- Make prepare_exit_to_usermode() static. It's not longer called from ASM
code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UR+MTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQCUD/4/9W5FFvdZvQPwmXsHPaVnW9hUsXxG
0tjc34xqDcgEl1U3khu+6jj+oHx+JM+4wGP/V49Wqx6xkrJ33/a8uYErAgI7+Pmp
s3T2gXMWkgJtYFlDQdAMbeuuM2cOFZJw4BxxvTMth5EixQvk1EkX6QyBjLaSGo8y
78sWtZ6Oh5Ql9ua/9TOilewLsCsQSFIFn0o/hawwwPUMrwGvD29scha0XHom+AO7
uwejfU8klq2HJJaLaaiUaiNBkFz0TNGJtY+3mQpw8BPjCuuBQhYygrS0X4uQzo01
4XJzhDnOVbAYWqi0/T+mAEmuJ9NBZJwYiYrwBYCkZgELwJKLzhzO2GOgP9xEsFY4
VUNgqHFhKrQp10k2k4L/A5tmr+0GntiCQhdZi+/gty6oO/t3ni57pRcAhA9qBNOb
8ZqumBwgaaAIqcmdtoyXAIveWOHnzwKEg6wmIGFbyCwHjeLJKJG7KhpXIpEuX+j2
DC7EfYvRB+jllAk1CBypBvzD0DHfMZ0myPxCcZiW2wHTVAlkpY7hiIyPHqocjE9L
OjOQ7FS6E2/p24lYVcLUFWcESxGFvQjjxwXk7htjpGUIZsQOhz/LOW+CIPCsfbqE
HoEsHmNltksYYV9FDfevXRp5sbxpx3wQSLOgqNqiOpy4cTCG8boalUqHQ0OsN8Oa
EgU067yF77ymRg==
=QAeH
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull x86 fixes from Thomas Gleixner:
"A pile of fixes for x86:
- Fix the I/O bitmap invalidation on XEN PV, which was overlooked in
the recent ioperm/iopl rework. This caused the TSS and XEN's I/O
bitmap to get out of sync.
- Use the proper vectors for HYPERV.
- Make disabling of stack protector for the entry code work with GCC
builds which enable stack protector by default. Removing the option
is not sufficient, it needs an explicit -fno-stack-protector to
shut it off.
- Mark check_user_regs() noinstr as it is called from noinstr code.
The missing annotation causes it to be placed in the text section
which makes it instrumentable.
- Add the missing interrupt disable in exc_alignment_check()
- Fixup a XEN_PV build dependency in the 32bit entry code
- A few fixes to make the Clang integrated assembler happy
- Move EFI stub build to the right place for out of tree builds
- Make prepare_exit_to_usermode() static. It's not longer called from
ASM code"
* tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Don't add the EFI stub to targets
x86/entry: Actually disable stack protector
x86/ioperm: Fix io bitmap invalidation on Xen PV
x86: math-emu: Fix up 'cmp' insn for clang ias
x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV
x86/entry: Add compatibility with IAS
x86/entry/common: Make prepare_exit_to_usermode() static
x86/entry: Mark check_user_regs() noinstr
x86/traps: Disable interrupts in exc_aligment_check()
x86/entry/32: Fix XEN_PV build dependency
vmlinux-objs-y is added to targets, which currently means that the EFI
stub gets added to the targets as well. It shouldn't be added since it
is built elsewhere.
This confuses Makefile.build which interprets the EFI stub as a target
$(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a
and will create drivers/firmware/efi/libstub/ underneath
arch/x86/boot/compressed, to hold this supposed target, if building
out-of-tree. [0]
Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y.
[0] See scripts/Makefile.build near the end:
# Create directories for object files if they do not exist
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu
Some Makefiles already pass -ffreestanding unconditionally.
For example, arch/arm64/lib/Makefile, arch/x86/purgatory/Makefile.
No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -ffreestanding.
I confirmed GCC 4.8 and Clang manuals document this option.
Get rid of cc-option from -ffreestanding.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Some Makefiles already pass -fno-stack-protector unconditionally.
For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.
No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -fno-stack-protector.
GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)
Get rid of cc-option from -fno-stack-protector.
Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.
Note:
arch/mips/vdso/Makefile adds -fno-stack-protector twice, first
unconditionally, and second conditionally. I removed the second one.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Commit
17054f492d ("efi/x86: Implement mixed mode boot without the handover protocol")
introduced a new entry point for the EFI stub to be booted in mixed mode
on 32-bit firmware.
When entered via efi32_pe_entry, control is first transferred to
startup_32 to setup for the switch to long mode, and then the EFI stub
proper is entered via efi_pe_entry. efi_pe_entry is an MS ABI function,
and the ABI requires 32 bytes of shadow stack space to be allocated by
the caller, as well as the stack being aligned to 8 mod 16 on entry.
Allocate 40 bytes on the stack before switching to 64-bit mode when
calling efi_pe_entry to account for this.
For robustness, explicitly align boot_stack_end to 16 bytes. It is
currently implicitly aligned since .bss is cacheline-size aligned,
head_64.o is the first object file with a .bss section, and the heap and
boot sizes are aligned.
Fixes: 17054f492d ("efi/x86: Implement mixed mode boot without the handover protocol")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200617131957.2507632-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.
Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes. Fix this up with the aid of
the below script and manual adjustments here and there.
import sys
import re
if len(sys.argv) is not 3:
print "USAGE: %s <file> <header>" % (sys.argv[0])
sys.exit(1)
hdr_to_move="#include <linux/%s>" % sys.argv[2]
moved = False
in_hdrs = False
with open(sys.argv[1], "r") as f:
lines = f.readlines()
for _line in lines:
line = _line.rstrip('
')
if line == hdr_to_move:
continue
if line.startswith("#include <linux/"):
in_hdrs = True
elif not moved and in_hdrs:
moved = True
print hdr_to_move
print line
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.
Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VK9cRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ibgw//acOg/6o7HzHS19nEDfRf2grtipPq0lZN
laIBlGNQdyQHoTMbvF4X8hE1VuALdcr+kVCXirvHnTVsE62fqR8KzdTeEPHHSamy
VWZkaOGq+jZiJnM4EZ1j6y0E6Cf9SWU2Zho4Ov/j88s3aYhkYG6EU+8dZMpI2pLU
EqZAqzuZ8lJYDchv+Xbd/dN3p8DoCzbcZ5nJN+mDaHiVruLB3fk3cqBjAhAbvYFO
X2Fk4yNccvHWjGbBNbgoddTRt/ZHC+PhiIGvE+KzcDLZipjUj4M7WxznLGdILFT/
Vpys3Uewa64bQk/GURuxh7A/IjzqohCKq0pLugU3B1FW6nASCUuySbN8KroIiGo8
Vnesc6G4G+KtxJGq18/umSaDoX9RmNM7iyeGt2G3yyV5MFPz83XZmtCVHizY6ayk
PPDB1lPXks3NpdKBgH/SYDfm7GBI3CwH7ttr3+DSl8nfadfIjQtu5hnhdBLeGWj4
AVhWSTyaLfABkRoU+DEg9YbzvcywjNOp0sblIxhxFiPKECymhNdBmljQmW6EMTRg
j1El5pdYp0D+MNyBTewgD033yMm5pLsHZX+aiyG5ULizevemjWrnprzFYFnSYBZY
ivfRnsK7zzWh+cejJJiZKPPR4RDu+VNneCd2PWjqX6VwPd03QjmOI8zw7WeLSbZl
kzzhOThwvdo=
=idS6
-----END PGP SIGNATURE-----
Merge tag 'x86-build-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"Misc dependency fixes, plus a documentation update about memory
protection keys support"
* tag 'x86-build-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Update config and kernel doc for MPK feature on AMD
x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux
x86/boot/build: Add phony targets in arch/x86/boot/Makefile to PHONY
x86/boot/build: Make 'make bzlilo' not depend on vmlinux or $(obj)/bzImage
x86/boot/build: Add cpustr.h to targets and remove clean-files
- Add the initrdmem= boot option to specify an initrd embedded in RAM (flash most likely)
- Sanitize the CS value earlier during boot, which also fixes SEV-ES.
- Various fixes and smaller cleanups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7VKk8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i25w/6A8okusHJMXyXMYddRHNiL57x3DcTRsTO
09Wz7e0YrL53HqQEyaqtSam/0VqgSaHDQb/gRb2Ci0G+XzZ3BFYvICVWTW6NcvnA
VSUoHC8Mr83Aq3UfAEcJZZ0bHNuoKymO256v2tZPGCSGgZoxQdoe4/6W1uMxxjLr
NFpeyAm93zTe1+MmA/ZcFxH+xOZPYVPhl7+KgO3muMH/hGoS3Dt+RCuB9VHTgMvf
4mN6IxN3cVHDogt7usdtWjgrYnhY0SjiWo858+MDWsrW5oXifsXLJ5jJr1Ea1nGx
qqVyaCqAVNobOkpsBLHg1DiD/rr9A4sfS/etmAjWsPO6kAx9Mq9+B2DG5fTU/gB+
zd76M3Jl3wyjdy6hPMyiZGlFFM9l3efyp/iYPhFWgPqVlkkOvbO+9FWVDbFtErQw
WpEG2d8KHN4+ph8D04ExeKJKCKaYnAaHKk13fZnjjeQhatyGGAYn6hx+rT/x+onM
2CeRG/+KcnlzKgXqYX6/YT++XlaCKgMntO/FdLT99/4CD92rqQdhwJ6JNH1U8nXO
LWjrV5ZH6R3n5Hr5+J/Kcd9/kIfAqWG3t/eiTEPEjJIUWXEdhBoQWErSce4on5a7
6eBfkKEQxIYAdC1iO2uoKEtEpMDvFWoIIVjdlVTFiJ8Np9uvv7lPByr/0TJ+N5b7
fgOrzglWuxo=
=U/uh
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"Misc updates:
- Add the initrdmem= boot option to specify an initrd embedded in RAM
(flash most likely)
- Sanitize the CS value earlier during boot, which also fixes SEV-ES
- Various fixes and smaller cleanups"
* tag 'x86-boot-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Correct relocation destination on old linkers
x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded
x86/boot: Fix -Wint-to-pointer-cast build warning
x86/boot: Add kstrtoul() from lib/
x86/tboot: Mark tboot static
x86/setup: Add an initrdmem= option to specify initrd physical address
- Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently
- Simplify and unify initrd loading
- Parse the builtin command line on x86 (if provided)
- Implement printk() support, including support for wide character strings
- Some fixes for issues introduced by the first batch of v5.8 changes
- Fix a missing prototypes warning
- Simplify GDT handling in early mixed mode thunking code
- Some other minor fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl7Lb8UACgkQwjcgfpV0
+n3/aAgAkEqqR/BoyzFiyYHujq6bXjESKYr8LrIjNWfnofB6nZqp1yXwFdL0qbj/
PTZ1qIQAnOMmj11lvy1X894h2ZLqE6XEkqv7Xd2oxkh3fF6amlQUWfMpXUuGLo1k
C4QGSfA0OOiM0OOi0Aqk1fL7sTmH23/j63dTR+fH8JMuYgjdls/yWNs0miqf8W2H
ftj8fAKgHIJzFvdTC0vn1DZ6dEKczGLPEcVZ2ns2IJOJ69DsStKPLcD0mlW+EgV2
EyfRSCQv55RYZRhdUOb+yVLRfU0M0IMDrrCDErHxZHXnQy00tmKXiEL20yuegv3u
MUtRRw8ocn2/RskjgZkxtMjAAlty9A==
=AwCh
-----END PGP SIGNATURE-----
Merge tag 'efi-changes-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
More EFI changes for v5.8:
- Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently
- Simplify and unify initrd loading
- Parse the builtin command line on x86 (if provided)
- Implement printk() support, including support for wide character strings
- Some fixes for issues introduced by the first batch of v5.8 changes
- Fix a missing prototypes warning
- Simplify GDT handling in early mixed mode thunking code
- Some other minor fixes and cleanups
Conflicts:
drivers/firmware/efi/libstub/efistub.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of using efi_gdt64 to switch back to 64-bit mode and then
switching to the real boot-time GDT, just switch to the boot-time GDT
directly. The two GDT's are identical other than efi_gdt64 not including
the 32-bit code segment.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200523221513.1642948-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
With commit
ce5e3f909f ("efi/printf: Add 64-bit and 8-bit integer support")
arch/x86/boot/compressed/vmlinux may have an undesired .discard.unreachable
section coming from drivers/firmware/efi/libstub/vsprintf.stub.o. That section
gets generated from unreachable() annotations when CONFIG_STACK_VALIDATION is
enabled.
.discard.unreachable contains an R_X86_64_PC32 relocation which will be
warned about by LLD: a non-SHF_ALLOC section (.discard.unreachable) is
not part of the memory image, thus conceptually the distance between a
non-SHF_ALLOC and a SHF_ALLOC is not a constant which can be resolved at
link time:
% ld.lld -m elf_x86_64 -T arch/x86/boot/compressed/vmlinux.lds ... -o arch/x86/boot/compressed/vmlinux
ld.lld: warning: vsprintf.c:(.discard.unreachable+0x0): has non-ABS relocation R_X86_64_PC32 against symbol ''
Reuse the DISCARDS macro which includes .discard.* to drop
.discard.unreachable.
[ bp: Massage and complete the commit message. ]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Arvind Sankar <nivedita@alum.mit.edu>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lkml.kernel.org/r/20200520182010.242489-1-maskray@google.com
For the 32-bit kernel, as described in
6d92bc9d48 ("x86/build: Build compressed x86 kernels as PIE"),
pre-2.26 binutils generates R_386_32 relocations in PIE mode. Since the
startup code does not perform relocation, any reloc entry with R_386_32
will remain as 0 in the executing code.
Commit
974f221c84 ("x86/boot: Move compressed kernel to the end of the
decompression buffer")
added a new symbol _end but did not mark it hidden, which doesn't give
the correct offset on older linkers. This causes the compressed kernel
to be copied beyond the end of the decompression buffer, rather than
flush against it. This region of memory may be reserved or already
allocated for other purposes by the bootloader.
Mark _end as hidden to fix. This changes the relocation from R_386_32 to
R_386_RELATIVE even on the pre-2.26 binutils.
For 64-bit, this is not strictly necessary, as the 64-bit kernel is only
built as PIE if the linker supports -z noreloc-overflow, which implies
binutils-2.27+, but for consistency, mark _end as hidden here too.
The below illustrates the before/after impact of the patch using
binutils-2.25 and gcc-4.6.4 (locally compiled from source) and QEMU.
Disassembly before patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 00 00 00 sub $0x0,%eax
4f: R_386_32 _end
Disassembly after patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 f0 76 00 sub $0x76f000,%eax
4f: R_386_RELATIVE *ABS*
Dump from extract_kernel before patch:
early console in extract_kernel
input_data: 0x0207c098 <--- this is at output + init_size
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Dump from extract_kernel after patch:
early console in extract_kernel
input_data: 0x0190d098 <--- this is at output + init_size - _end
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Fixes: 974f221c84 ("x86/boot: Move compressed kernel to the end of the decompression buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200207214926.3564079-1-nivedita@alum.mit.edu
Mike Lothian reports that after commit
964124a97b ("efi/x86: Remove extra headroom for setup block")
gcc 10.1.0 fails with
HOSTCC arch/x86/boot/tools/build
/usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
error: linker defined: multiple definition of '_end'
/usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
/tmp/ccEkW0jM.o: previous definition here
collect2: error: ld returned 1 exit status
make[1]: *** [scripts/Makefile.host:103: arch/x86/boot/tools/build] Error 1
make: *** [arch/x86/Makefile:303: bzImage] Error 2
The issue is with the _end variable that was added, to hold the end of
the compressed kernel from zoffsets.h (ZO__end). The name clashes with
the linker-defined _end symbol that indicates the end of the build
program itself.
Even when there is no compile-time error, this causes build to use
memory past the end of its .bss section.
To solve this, mark _end as static, and for symmetry, mark the rest of
the variables that keep track of symbols from the compressed kernel as
static as well.
Fixes: 964124a97b ("efi/x86: Remove extra headroom for setup block")
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200511225849.1311869-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When the pre-decompression code loads its first GDT in startup_64(), it
is still running on the CS value of the previous GDT. In the case of
SEV-ES, this is the EFI GDT but it can be anything depending on what has
loaded the kernel (boot loader, container runtime, etc.)
To make exception handling work (especially IRET) the CPU needs to
switch to a CS value in the current GDT, so jump to __KERNEL_CS after
the first GDT is loaded. This is prudent also as a general sanitization
of CS to a known good value.
[ bp: Massage commit message. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200428151725.31091-13-joro@8bytes.org
Fix this warning when building 32-bit with
CONFIG_RANDOMIZE_BASE=y
CONFIG_MEMORY_HOTREMOVE=y
arch/x86/boot/compressed/acpi.c:316:9: warning: \
cast to pointer from integer of different size [-Wint-to-pointer-cast]
Have get_cmdline_acpi_rsdp() return unsigned long which is the proper
type to convert to a pointer of the respective width.
[ bp: Rewrite commit message, touch ups. ]
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1587645588-7130-3-git-send-email-vamshi.k.sthambamkadi@gmail.com
Add kstrtoul() to ../boot/ to be used by facilities there too.
[
bp: Massage, make _kstrtoul() static. Prepend function names with
"boot_". This is a temporary workaround for build errors like:
ld: arch/x86/boot/compressed/acpi.o: in function `count_immovable_mem_regions':
acpi.c:(.text+0x463): undefined reference to `_kstrtoul'
make[2]: *** [arch/x86/boot/compressed/Makefile:117: arch/x86/boot/compressed/vmlinux] Error 1
due to the namespace clash between x86/boot/ and kernel proper.
Future reorg will get rid of the linux/linux/ namespace as much as
possible so that x86/boot/ can be independent from kernel proper. ]
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1587645588-7130-2-git-send-email-vamshi.k.sthambamkadi@gmail.com
Instead of using __efistub_global to force variables into the .data
section, leave them in the .bss but pull the EFI stub's .bss section
into .data in the linker script for the compressed kernel.
Add relocation checking for x86 as well to catch non-PC-relative
relocations that require runtime processing, since the EFI stub does not
do any runtime relocation processing.
This will catch, for example, data relocations created by static
initializers of pointers.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200416151227.3360778-3-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
These targets are correctly added to PHONY in arch/x86/Makefile, but
not in arch/x86/boot/Makefile. Thus, with a file 'install' in the top
directory, 'make install' does nothing:
$ touch install
$ make install
make[1]: 'install' is up to date.
Add them to the PHONY targets in the boot Makefile too.
[ bp: Massage. ]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200215063852.8298-2-masahiroy@kernel.org
bzlilo is an installation target because it copies files to
$(INSTALL_PATH)/, then runs 'lilo'. However, arch/x86/Makefile and
arch/x86/boot/Makefile have it depend on vmlinux and $(obj)/bzImage,
respectively.
'make bzlilo' may update some build artifacts in the source tree.
As commit
19514fc665 ("arm, kbuild: make "make install" not depend on vmlinux")
explained, this should not happen.
Make 'bzlilo' not depend on any build artifact.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200215063852.8298-1-masahiroy@kernel.org
Files in $(targets) are always cleaned up. Move the 'targets' assignment
out of the ifdef and remove 'clean-files'.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200215063241.7437-1-masahiroy@kernel.org
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
+km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
3pEsQuE=
=whbL
-----END PGP SIGNATURE-----
Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh
Resolve these conflicts:
arch/x86/Kconfig
arch/x86/kernel/Makefile
Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Pull trivial tree updates from Jiri Kosina:
"My attempt to revitalize trivial queue I've been neglecting for years
(what a disaster that was for this world, right? :) ) with patches
collected from backlog that were still relevant and not applied
elsewhere in the meantime"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
err.h: remove deprecated PTR_RET for good
blk-mq: Fix typo in comment
x86/boot: Fix comment spelling
sh: mach-highlander: Fix comment spelling
s390/dasd: Fix comment spelling
mfd: wm8994: Fix comment spelling
docs: Add reference in binfmt-misc.rst
genirq: fix kerneldoc comment for irq_desc
drm/amdgpu: fix two documentation mismatch issues
HID: fix Kconfig word ordering
list/hashtable: minor documentation corrections.
Pull x86 boot updates from Ingo Molnar:
"Misc cleanups and small enhancements all around the map"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Fix debug_puthex() parameter type
x86/setup: Fix static memory detection
x86/vmlinux: Drop unneeded linker script discard of .eh_frame
x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
x86/boot/compressed: Remove .eh_frame section from bzImage
x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len
x86/boot/compressed/64: Use LEA to initialize boot stack pointer
In the CONFIG_X86_VERBOSE_BOOTUP=Y case, the debug_puthex() macro just
turns into __puthex(), which takes 'unsigned long' as parameter.
But in the CONFIG_X86_VERBOSE_BOOTUP=N case, it is a function which
takes 'unsigned char *', causing compile warnings when the function is
used. Fix the parameter type to get rid of the warnings.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200319091407.1481-11-joro@8bytes.org
Add alignment slack to the PE image size, so that we can realign the
decompression buffer within the space allocated for the image.
Only relocate the kernel if it has been loaded at an unsuitable address:
- Below LOAD_PHYSICAL_ADDR, or
- Above 64T for 64-bit and 512MiB for 32-bit
For 32-bit, the upper limit is conservative, but the exact limit can be
difficult to calculate.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-20-ardb@kernel.org
The following commit:
223e3ee56f ("efi/x86: add headroom to decompressor BSS to account for setup block")
added headroom to the PE image to account for the setup block, which
wasn't used for the decompression buffer.
Now that the decompression buffer is located at the start of the image,
and includes the setup block, this is no longer required.
Add a check to make sure that the head section of the compressed kernel
won't overwrite itself while relocating. This is only for
future-proofing as with current limits on the setup and the actual size
of the head section, this can never happen.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-19-ardb@kernel.org
Store the kernel's link address as ImageBase in the PE header. Note that
the PE specification requires the ImageBase to be 64k aligned. The
preferred address should almost always satisfy that, except for 32-bit
kernel if the configuration has been customized.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-4-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-18-ardb@kernel.org
In preparation for being able to decompress into a buffer starting at a
different address than startup_32, save the calculated output address
instead of recalculating it later.
We now keep track of three addresses:
%edx: startup_32 as we were loaded by bootloader
%ebx: new location of compressed kernel
%ebp: start of decompression buffer
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
The load address is compared with LOAD_PHYSICAL_ADDR using a signed
comparison currently (using jge instruction).
When loading a 64-bit kernel using the new efi32_pe_entry() point added by:
97aa276579 ("efi/x86: Add true mixed mode entry point into .compat section")
using Qemu with -m 3072, the firmware actually loads us above 2Gb,
resulting in a very early crash.
Use the JAE instruction to perform a unsigned comparison instead, as physical
addresses should be considered unsigned.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
code32_start is meant for 16-bit real-mode bootloaders to inform the
kernel where the 32-bit protected mode code starts. Nothing in the
protected mode kernel except the EFI stub uses it.
efi_main() currently returns boot_params, with code32_start set inside it
to tell efi_stub_entry() where startup_32 is located. Since it was invoked
by efi_stub_entry() in the first place, boot_params is already known.
Return the address of startup_32 instead.
This will allow a 64-bit kernel to live above 4Gb, for example, and it's
cleaner as well.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
The following commit:
ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")
introduced GDT setup into the 32-bit kernel's startup_32, and reloads
the GDTR after relocating the kernel for paranoia's sake.
A followup commit:
32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")
introduced a similar GDTR reload in the 64-bit kernel as well.
The GDTR is adjusted by (init_size-_end), however this may not be the
correct offset to apply if the kernel was loaded at a misaligned address
or below LOAD_PHYSICAL_ADDR, as in that case the decompression buffer
has an additional offset from the original load address.
This should never happen for a conformant bootloader, but we're being
paranoid anyway, so just store the new GDT address in there instead of
adding any offsets, which is simpler as well.
Fixes: ef5a7b5eb1 ("efi/x86: Remove GDT setup from efi_main")
Fixes: 32d009137a ("x86/boot: Reload GDTR after copying to the end of the buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20200226230031.3011645-2-nivedita@alum.mit.edu
This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:
- Get things cleaned up before EFI support for RISC-V arrives, which will
increase the size of the validation matrix, and therefore the threshold to
making drastic changes,
- After years of defunct maintainership, the GRUB project has finally started
to consider changes from the distros regarding UEFI boot, some of which are
highly specific to the way x86 does UEFI secure boot and measured boot,
based on knowledge of both shim internals and the layout of bootparams and
the x86 setup header. Having this maintenance burden on other architectures
(which don't need shim in the first place) is hard to justify, so instead,
we are introducing a generic Linux/UEFI boot protocol.
Summary of changes:
- Boot time GDT handling changes (Arvind)
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file I/O,
memory allocation, etc.
- Introduce a generic initrd loading method based on calling back into
the firmware, instead of relying on the x86 EFI handover protocol or
device tree.
- Introduce a mixed mode boot method that does not rely on the x86 EFI
handover protocol either, and could potentially be adopted by other
architectures (if another one ever surfaces where one execution mode
is a superset of another)
- Clean up the contents of struct efi, and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit firmware
implementations to return EFI_UNSUPPORTED from UEFI runtime services at
OS runtime, and expose a mask of which ones are supported or unsupported
via a configuration table.
- Various documentation updates and minor code cleanups (Heinrich)
- Partial fix for the lack of by-VA cache maintenance in the decompressor
on 32-bit ARM. Note that these patches were deliberately put at the
beginning so they can be used as a stable branch that will be shared with
a PR containing the complete fix, which I will send to the ARM tree.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl5S7WYACgkQwjcgfpV0
+n1jmQgAmwV3V8pbgB4mi4P2Mv8w5Zj5feUe6uXnTR2AFv5nygLcTzdxN+TU/6lc
OmZv2zzdsAscYlhuUdI/4t4cXIjHAZI39+UBoNRuMqKbkbvXCFscZANLxvJjHjZv
FFbgUk0DXkF0BCEDuSLNavidAv4b3gZsOmHbPfwuB8xdP05LbvbS2mf+2tWVAi2z
ULfua/0o9yiwl19jSS6iQEPCvvLBeBzTLW7x5Rcm/S0JnotzB59yMaeqD7jO8JYP
5PvI4WM/l5UfVHnZp2k1R76AOjReALw8dQgqAsT79Q7+EH3sNNuIjU6omdy+DFf4
tnpwYfeLOaZ1SztNNfU87Hsgnn2CGw==
=pDE3
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
Pull EFI updates for v5.7 from Ard Biesheuvel:
This time, the set of changes for the EFI subsystem is much larger than
usual. The main reasons are:
- Get things cleaned up before EFI support for RISC-V arrives, which will
increase the size of the validation matrix, and therefore the threshold to
making drastic changes,
- After years of defunct maintainership, the GRUB project has finally started
to consider changes from the distros regarding UEFI boot, some of which are
highly specific to the way x86 does UEFI secure boot and measured boot,
based on knowledge of both shim internals and the layout of bootparams and
the x86 setup header. Having this maintenance burden on other architectures
(which don't need shim in the first place) is hard to justify, so instead,
we are introducing a generic Linux/UEFI boot protocol.
Summary of changes:
- Boot time GDT handling changes (Arvind)
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file I/O,
memory allocation, etc.
- Introduce a generic initrd loading method based on calling back into
the firmware, instead of relying on the x86 EFI handover protocol or
device tree.
- Introduce a mixed mode boot method that does not rely on the x86 EFI
handover protocol either, and could potentially be adopted by other
architectures (if another one ever surfaces where one execution mode
is a superset of another)
- Clean up the contents of struct efi, and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit firmware
implementations to return EFI_UNSUPPORTED from UEFI runtime services at
OS runtime, and expose a mask of which ones are supported or unsupported
via a configuration table.
- Various documentation updates and minor code cleanups (Heinrich)
- Partial fix for the lack of by-VA cache maintenance in the decompressor
on 32-bit ARM. Note that these patches were deliberately put at the
beginning so they can be used as a stable branch that will be shared with
a PR containing the complete fix, which I will send to the ARM tree.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that .eh_frame sections for the files in setup.elf and realmode.elf
are not generated anymore, the linker scripts don't need the special
output section name /DISCARD/ any more.
Remove the one in the main kernel linker script as well, since there are
no .eh_frame sections already, and fix up a comment referencing .eh_frame.
Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
to make sense, as well as being more specific.
[ bp: Touch up commit message. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200224232129.597160-3-nivedita@alum.mit.edu
While discussing a patch to discard .eh_frame from the compressed
vmlinux using the linker script, Fangrui Song pointed out [1] that these
sections shouldn't exist in the first place because arch/x86/Makefile
uses -fno-asynchronous-unwind-tables.
It turns out this is because the Makefiles used to build the compressed
kernel redefine KBUILD_CFLAGS, dropping this flag.
Add the flag to the Makefile for the compressed kernel, as well as the
EFI stub Makefile to fix this.
Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
kernel's boot code (boot/setup.elf) and realmode trampoline
(realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
since their linker scripts also just discard them.
[1] https://lore.kernel.org/lkml/20200222185806.ywnqhfqmy67akfsa@google.com/
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200224232129.597160-2-nivedita@alum.mit.edu
Discarding unnecessary sections with "*(*)" (see thread at Link: below)
works fine with the bfd linker but fails with lld:
$ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
ld.lld: error: discarding .shstrtab section is not allowed
lld tries to also discard essential sections like .shstrtab, .symtab and
.strtab, which results in the link failing since .shstrtab is required
by the ELF specification: the e_shstrndx field in the ELF header is the
index of .shstrtab, and each section in the section table is required to
have an sh_name that points into the .shstrtab.
.symtab and .strtab are also necessary to generate the zoffset.h file
for the bzImage header.
Since the only sizeable section that can be discarded is .eh_frame,
restrict the discard to only .eh_frame to be safe.
[ bp: Flesh out commit message and replace offending commit with this one. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200109150218.16544-2-nivedita@alum.mit.edu
Now that we have added new ways to load the initrd or the mixed mode
kernel, we will also need a way to tell the loader about this. Add
symbolic constants for the PE/COFF major/minor version numbers (which
fortunately have always been 0x0 for all architectures), so that we
can bump them later to document the capabilities of the stub.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Replace bare numbers in the PE/COFF header structure with symbolic
constants so they become self documenting.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Currently, mixed mode is closely tied to the EFI handover protocol
and relies on intimate knowledge of the bootparams structure, setup
header etc, all of which are rather byzantine and entirely specific
to x86.
Even though no other EFI supported architectures are currently known
that could support something like mixed mode, it still makes sense to
abstract a bit from this, and make it part of a generic Linux on EFI
boot protocol.
To that end, add a .compat section to the mixed mode binary, and populate
it with the PE machine type and entry point address, allowing firmware
implementations to match it to their native machine type, and invoke
non-native binaries using a secondary entry point.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Add support for booting 64-bit x86 kernels from 32-bit firmware running
on 64-bit capable CPUs without requiring the bootloader to implement
the EFI handover protocol or allocate the setup block, etc etc, all of
which can be done by the stub itself, using code that already exists.
Instead, create an ordinary EFI application entrypoint but implemented
in 32-bit code [so that it can be invoked by 32-bit firmware], and stash
the address of this 32-bit entrypoint in the .compat section where the
bootloader can find it.
Note that we use the setup block embedded in the binary to go through
startup_32(), but it gets reallocated and copied in efi_pe_entry(),
using the same code that runs when the x86 kernel is booted in EFI
mode from native firmware. This requires the loaded image protocol to
be installed on the kernel image's EFI handle, and point to the kernel
image itself and not to its loader. This, in turn, requires the
bootloader to use the LoadImage() boot service to load the 64-bit
image from 32-bit firmware, which is in fact supported by firmware
based on EDK2. (Only StartImage() will fail, and instead, the newly
added entrypoint needs to be invoked)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In commit
c7fb93ec51 ("x86/efi: Include a .bss section within the PE/COFF headers")
we added a separate .bss section to the PE/COFF header of the compressed
kernel describing the static memory footprint of the decompressor, to
ensure that it has enough headroom to decompress itself.
We can achieve the exact same result by increasing the virtual size of
the .text section, without changing the raw size, which, as per the
PE/COFF specification, requires the loader to zero initialize the delta.
Doing so frees up a slot in the section table, which we will use later
to describe the mixed mode entrypoint.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In the bootparams struct, init_size defines the static footprint of the
bzImage, counted from the start of the kernel image, i.e., startup_32().
The PE/COFF metadata declares the same size for the entire image, but this
time, the image includes the setup block as well, and so the space reserved
by UEFI is a bit too small. This usually doesn't matter, since we normally
relocate the kernel into a memory allocation of the correct size.
But in the unlikely case that the image happens to be loaded at exactly
the preferred offset, we skip this relocation, and execute the image in
place, stepping on memory beyond the provided allocation, which may be
in use for other purposes.
Let's fix this by adding the size of the setup block to the image size as
declared in the PE/COFF header.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Most of the EFI stub source files of all architectures reside under
drivers/firmware/efi/libstub, where they share a Makefile with special
CFLAGS and an include file with declarations that are only relevant
for stub code.
Currently, we carry a lot of stub specific stuff in linux/efi.h only
because eboot.c in arch/x86 needs them as well. So let's move eboot.c
into libstub/, and move the contents of eboot.h that we still care
about into efistub.h
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When using the native PE entry point (as opposed to the EFI handover
protocol entry point that is used more widely), we set code32_start,
which is a 32-bit wide field, to the effective symbol address of
startup_32, which could overflow given that the EFI loader may have
located the running image anywhere in memory, and we haven't reached
the point yet where we relocate ourselves.
Since we relocate ourselves if code32_start != pref_address, this
isn't likely to lead to problems in practice, given how unlikely
it is that the truncated effective address of startup_32 happens
to equal pref_address. But it is better to defer the assignment
of code32_start to after the relocation, when it is guaranteed to
fit.
While at it, move the call to efi_relocate_kernel() to an earlier
stage so it is more likely that our preferred offset in memory has
not been occupied by other memory allocations done in the mean time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
We have some code in the EFI stub entry point that takes the address
of the apm_bios_info struct in the newly allocated and zeroed out
boot_params structure, only to zero it out again. This is pointless
so remove it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rearrange the instructions a bit to use a 32-bit displacement once
instead of 2/3 times. This saves 8 bytes of machine code.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-8-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The limit value for the GDTR should be such that adding it to the base
address gives the address of the last byte of the GDT, i.e. it should be
one less than the size, not the size.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-7-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The 64-bit kernel will already load a GDT in startup_64, which is the
next function to execute after return from efi_main.
Add GDT setup code to the 32-bit kernel's startup_32 as well. Doing it
in the head code has the advantage that we can avoid potentially
corrupting the GDT during copy/decompression. This also removes
dependence on having a specific GDT layout setup by the bootloader.
Both startup_32 and startup_64 now clear interrupts on entry, so we can
remove that from efi_main as well.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-6-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
startup_32 already clears these flags on entry, do it in startup_64 as
well for consistency.
The direction flag in particular is not specified to be cleared in the
boot protocol documentation, and we currently call into C code
(paging_prepare) without explicitly clearing it.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-5-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The GDT may get overwritten during the copy or during extract_kernel,
which will cause problems if any segment register is touched before the
GDTR is reloaded by the decompressed kernel. For safety update the GDTR
to point to the GDT within the copied kernel.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-4-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When booting in mixed mode, the firmware's GDT is still installed at
handover entry in efi32_stub_entry. We save the GDTR for later use in
__efi64_thunk but we are assuming that descriptor 2 (__KERNEL_CS) is a
valid 32-bit code segment descriptor and that descriptor 3
(__KERNEL_DS/__BOOT_DS) is a valid data segment descriptor.
This happens to be true for OVMF (it actually uses descriptor 1 for data
segments, but descriptor 3 is also setup as data), but we shouldn't
depend on this being the case.
Fix this by saving the code and data selectors in addition to the GDTR
in efi32_stub_entry, and restoring them in __efi64_thunk before calling
the firmware. The UEFI specification guarantees that selectors will be
flat, so using the DS selector for all the segment registers should be
enough.
We also need to install our own GDT before initializing segment
registers in startup_32, so move the GDT load up to the beginning of the
function.
[ardb: mention mixed mode in the commit log]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-3-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Commit
5b11f1cee5 ("x86, boot: straighten out ranges to copy/zero in
compressed/head*.S")
introduced a separate .pgtable section, splitting it out from the rest
of .bss. This section was added without the writeable flag, marking it
as read-only. This results in the linker putting the .rela.dyn section
(containing bogus dynamic relocations from head_64.o) after the .bss and
.pgtable sections.
When objcopy is used to convert compressed/vmlinux into a binary for
the bzImage:
$ objcopy -O binary -R .note -R .comment -S arch/x86/boot/compressed/vmlinux \
arch/x86/boot/vmlinux.bin
the .bss and .pgtable sections get materialized as ~176KiB of zero
bytes in the binary in order to place .rela.dyn at the correct location.
Fix this by marking .pgtable as writeable. This moves the .rela.dyn
section up in the ELF image layout so that .bss and .pgtable are the
last allocated sections and so don't appear in bzImage.
[ bp: Massage commit message. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200109150218.16544-1-nivedita@alum.mit.edu
GCC 10 changed the default to -fno-common, which leads to
LD arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order'; \
arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here
make[2]: *** [arch/x86/boot/compressed/Makefile:119: arch/x86/boot/compressed/vmlinux] Error 1
Since __force_order is already provided in pgtable_64.c, there is no
need to declare __force_order in kaslr_64.c.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200124181811.4780-1-hjl.tools@gmail.com
z_output_len is the size of the decompressed payload (i.e. vmlinux +
vmlinux.relocs) and is generated as an unsigned 32-bit quantity by
mkpiggy.c.
The current
movq $z_output_len, %r9
instruction generates a sign-extended move to %r9. Using
movl $z_output_len, %r9d
will instead zero-extend into %r9, which is appropriate for an unsigned
32-bit quantity. This is also what is already done for z_input_len, the
size of the compressed payload.
[ bp:
Also, z_output_len cannot be a 64-bit quantity because it participates
in:
init_size: .long INIT_SIZE # kernel initialization size
through INIT_SIZE which is a 32-bit quantity determined by the .long
directive (vs .quad for 64-bit). Furthermore, if it really must be a
64-bit quantity, then the insn must be MOVABS which can accommodate a
64-bit immediate and which the toolchain does not generate automatically.
]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200211173333.1722739-1-nivedita@alum.mit.edu
- fix randconfig to generate a sane .config
- rename hostprogs-y / always to hostprogs / always-y, which are
more natual syntax.
- optimize scripts/kallsyms
- fix yes2modconfig and mod2yesconfig
- make multiple directory targets ('make foo/ bar/') work
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl47NfMVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRGwP/3AHO8P0wGEeFKs3ziSMjs2W7/Pj
lN08Kuxm0u3LnyEEcHVUveoi+xBYqvrw0RsGgYf5S8q0Mpep7MPqbfkDUxV/0Zkj
QP2CsvOTbjdBjH7q3ojkwLcDl0Pxu9mg3eZMRXZ2WQeNXuMRw6Bicoh7ElvB1Bv/
HC+j30i2Me3cf/riQGSAsstvlXyIR8RaerR8PfRGESTysiiN76+JcHTatJHhOJL9
O6XKkzo8/CXMYKKVF4Ae4NP+WFg6E96/pAPx0Rf47RbPX9UG35L9rkzTDnk70Ms6
OhKiu3hXsRX7mkqApuoTqjge4+iiQcKZxYmMXU1vGlIRzjwg19/4YFP6pDSCcnIu
kKb8KN4o4N41N7MFS3OLZWwISA8Vw6RbtwDZ3AghDWb7EHb9oNW42mGfcAPr1+wZ
/KH6RHTzaz+5q2MgyMY1NhADFrhIT9CvDM+UJECgbokblnw7PHAnPmbsuVak9ZOH
u9ojO1HpTTuIYO6N6v4K5zQBZF1N+RvkmBnhHd8j6SksppsCoC/G62QxgXhF2YK3
FQMpATCpuyengLxWAmPEjsyyPOlrrdu9UxqNsXVy5ol40+7zpxuHwKcQKCa9urJR
rcpbIwLaBcLhHU4BmvBxUk5aZxxGV2F0O0gXTOAbT2xhd6BipZSMhUmN49SErhQm
NC/coUmQX7McxMXh
=sv4U
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- fix randconfig to generate a sane .config
- rename hostprogs-y / always to hostprogs / always-y, which are more
natual syntax.
- optimize scripts/kallsyms
- fix yes2modconfig and mod2yesconfig
- make multiple directory targets ('make foo/ bar/') work
* tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: make multiple directory targets work
kconfig: Invalidate all symbols after changing to y or m.
kallsyms: fix type of kallsyms_token_table[]
scripts/kallsyms: change table to store (strcut sym_entry *)
scripts/kallsyms: rename local variables in read_symbol()
kbuild: rename hostprogs-y/always to hostprogs/always-y
kbuild: fix the document to use extra-y for vmlinux.lds
kconfig: fix broken dependency in randconfig-generated .config
- Ensure that the PIT is set up when the local APIC is disable or
configured in legacy mode. This is caused by an ordering issue
introduced in the recent changes which skip PIT initialization when the
TSC and APIC frequencies are already known.
- Handle malformed SRAT tables during early ACPI parsing which caused an
infinite loop anda boot hang.
- Fix a long standing race in the affinity setting code which affects PCI
devices with non-maskable MSI interrupts. The problem is caused by the
non-atomic writes of the MSI address (destination APIC id) and data
(vector) fields which the device uses to construct the MSI message. The
non-atomic writes are mandated by PCI.
If both fields change and the device raises an interrupt after writing
address and before writing data, then the MSI block constructs a
inconsistent message which causes interrupts to be lost and subsequent
malfunction of the device.
The fix is to redirect the interrupt to the new vector on the current
CPU first and then switch it over to the new target CPU. This allows to
observe an eventually raised interrupt in the transitional stage (old
CPU, new vector) to be observed in the APIC IRR and retriggered on the
new target CPU and the new vector. The potential spurious interrupts
caused by this are harmless and can in the worst case expose a buggy
driver (all handlers have to be able to deal with spurious interrupts as
they can and do happen for various reasons).
- Add the missing suspend/resume mechanism for the HYPERV hypercall page
which prevents resume hibernation on HYPERV guests. This change got
lost before the merge window.
- Mask the IOAPIC before disabling the local APIC to prevent potentially
stale IOAPIC remote IRR bits which cause stale interrupt lines after
resume.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5AEJwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWY2D/47ur9gsVQGryKzneVAr0SCsq4Un11e
uifX4ldu4gCEBRTYhpgcpiFKeLvY/QJ6uOD+gQUHyy/s+lCf6yzE6UhXEqSCtcT7
LkSxD8jAFf6KhMA6iqYBfyxUsPMXBetLjjHWsyc/kf15O/vbYm7qf05timmNZkDS
S7C+yr3KRqRjLR7G7t4twlgC9aLcNUQihUdsH2qyTvjnlkYHJLDa0/Js7bFYYKVx
9GdUDLvPFB1mZ76g012De4R3kJsWitiyLlQ38DP5VysKulnszUCdiXlgCEFrgxvQ
OQhLafQzOAzvxQmP+1alODR0dmJZA8k0zsDeeTB/vTpRvv6+Pe2qUswLSpauBzuq
TpDsrv8/5pwZh28+91f/Unk+tH8NaVNtGe/Uf+ePxIkn1nbqL84o4NHGplM6R97d
HAWdZQZ1cGRLf6YRRJ+57oM/5xE3vBbF1Wn0+QDTFwdsk2vcxuQ4eB3M/8E1V7Zk
upp8ty50bZ5+rxQ8XTq/eb8epSRnfLoBYpi4ux6MIOWRdmKDl40cDeZCzA2kNP7m
qY1haaRN3ksqvhzc0Yf6cL+CgvC4ur8gRHezfOqmBzVoaLyVEFIVjgjR/ojf0bq8
/v+L9D5+IdIv4jEZruRRs0gOXNDzoBbvf0qKGaO0tUTWiDsv7c5AGixp8aozniHS
HXsv1lIpRuC7WQ==
=WxKD
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Ensure that the PIT is set up when the local APIC is disable or
configured in legacy mode. This is caused by an ordering issue
introduced in the recent changes which skip PIT initialization when
the TSC and APIC frequencies are already known.
- Handle malformed SRAT tables during early ACPI parsing which caused
an infinite loop anda boot hang.
- Fix a long standing race in the affinity setting code which affects
PCI devices with non-maskable MSI interrupts. The problem is caused
by the non-atomic writes of the MSI address (destination APIC id)
and data (vector) fields which the device uses to construct the MSI
message. The non-atomic writes are mandated by PCI.
If both fields change and the device raises an interrupt after
writing address and before writing data, then the MSI block
constructs a inconsistent message which causes interrupts to be
lost and subsequent malfunction of the device.
The fix is to redirect the interrupt to the new vector on the
current CPU first and then switch it over to the new target CPU.
This allows to observe an eventually raised interrupt in the
transitional stage (old CPU, new vector) to be observed in the APIC
IRR and retriggered on the new target CPU and the new vector.
The potential spurious interrupts caused by this are harmless and
can in the worst case expose a buggy driver (all handlers have to
be able to deal with spurious interrupts as they can and do happen
for various reasons).
- Add the missing suspend/resume mechanism for the HYPERV hypercall
page which prevents resume hibernation on HYPERV guests. This
change got lost before the merge window.
- Mask the IOAPIC before disabling the local APIC to prevent
potentially stale IOAPIC remote IRR bits which cause stale
interrupt lines after resume"
* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Mask IOAPIC entries when disabling the local APIC
x86/hyperv: Suspend/resume the hypercall page for hibernation
x86/apic/msi: Plug non-maskable MSI affinity race
x86/boot: Handle malformed SRAT tables during early ACPI parsing
x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
In old days, the "host-progs" syntax was used for specifying host
programs. It was renamed to the current "hostprogs-y" in 2004.
It is typically useful in scripts/Makefile because it allows Kbuild to
selectively compile host programs based on the kernel configuration.
This commit renames like follows:
always -> always-y
hostprogs-y -> hostprogs
So, scripts/Makefile will look like this:
always-$(CONFIG_BUILD_BIN2C) += ...
always-$(CONFIG_KALLSYMS) += ...
...
hostprogs := $(always-y) $(always-m)
I think this makes more sense because a host program is always a host
program, irrespective of the kernel configuration. We want to specify
which ones to compile by CONFIG options, so always-y will be handier.
The "always", "hostprogs-y", "hostprogs-m" will be kept for backward
compatibility for a while.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Break an infinite loop when early parsing of the SRAT table is caused
by a subtable with zero length. Known to affect the ASUS WS X299 SAGE
motherboard with firmware version 1201 which has a large block of
zeros in its SRAT table. The kernel could boot successfully on this
board/firmware prior to the introduction of early parsing this table or
after a BIOS update.
[ bp: Fixup whitespace damage and commit message. Make it return 0 to
denote that there are no immovable regions because who knows what
else is broken in this BIOS. ]
Fixes: 02a3e3cdb7 ("x86/boot: Parse SRAT table and count immovable memory regions")
Signed-off-by: Steven Clarkson <sc@lambdal.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-acpi@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206343
Link: https://lkml.kernel.org/r/CAHKq8taGzj0u1E_i=poHUam60Bko5BpiJ9jn0fAupFUYexvdUQ@mail.gmail.com
Pull x86 cpu-features updates from Ingo Molnar:
"The biggest change in this cycle was a large series from Sean
Christopherson to clean up the handling of VMX features. This both
fixes bugs/inconsistencies and makes the code more coherent and
future-proof.
There are also two cleanups and a minor TSX syslog messages
enhancement"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/cpu: Remove redundant cpu_detect_cache_sizes() call
x86/cpu: Print "VMX disabled" error message iff KVM is enabled
KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs
perf/x86: Provide stubs of KVM helpers for non-Intel CPUs
KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits
KVM: VMX: Check for full VMX support when verifying CPU compatibility
KVM: VMX: Use VMX feature flag to query BIOS enabling
KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
x86/vmx: Introduce VMX_FEATURES_*
x86/cpu: Clear VMX feature flag if VMX is not fully enabled
x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
x86/centaur: Use common IA32_FEAT_CTL MSR initialization
x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
x86/intel: Initialize IA32_FEAT_CTL MSR at boot
tools/x86: Sync msr-index.h from kernel sources
selftests, kvm: Replace manual MSR defs with common msr-index.h
...
Pull x86 boot update from Ingo Molnar:
"Two minor changes: fix an atypical binutils combination build bug, and
also fix a VRAM size check for simplefb"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sysfb: Fix check for bad VRAM size
x86/boot: Discard .eh_frame sections
Pull x86 asm updates from Ingo Molnar:
"Misc updates:
- Remove last remaining calls to exception_enter/exception_exit() and
simplify the entry code some more.
- Remove force_iret()
- Add support for "Fast Short Rep Mov", which is available starting
with Ice Lake Intel CPUs - and make the x86 assembly version of
memmove() use REP MOV for all sizes when FSRM is available.
- Micro-optimize/simplify the 32-bit boot code a bit.
- Use a more future-proof SYSRET instruction mnemonic"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Simplify calculation of output address
x86/entry/64: Add instruction suffix to SYSRET
x86: Remove force_iret()
x86/cpufeatures: Add support for fast short REP; MOVSB
x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
x86_64 EFI systems are unable to boot due to a typo in a recent commit:
EFI config tables not found.
-- System halted
This was probably due to the absense of CONFIG_EFI_MIXED=y in testing.
Fixes: 796eb8d26a ("efi/libstub/x86: Use const attribute for efi_is_64bit()")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20200122191430.4888-1-cai@lca.pw
Condense the calculation of decompressed kernel start a little.
Committer notes:
before:
ebp = ebx - (init_size - _end)
after:
eax = (ebx + _end) - init_size
where in both ebx contains the temporary address the kernel is moved to
for in-place decompression.
The before and after difference in register state is %eax and %ebp
but that is immaterial because the compressed image is not built with
-mregparm, i.e., all arguments of the following extract_kernel() call
are passed on the stack.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200107194436.2166846-1-nivedita@alum.mit.edu
The final build stage of the x86 kernel captures some symbol
addresses from the decompressor binary and copies them into zoffset.h.
It uses sed with a regular expression that matches the address, symbol
type and symbol name, and mangles the captured addresses and the names
of symbols of interest into #define directives that are added to
zoffset.h
The symbol type is indicated by a single letter, which we match
strictly: only letters in the set 'ABCDGRSTVW' are matched, even
though the actual symbol type is relevant and therefore ignored.
Commit bc7c9d620 ("efi/libstub/x86: Force 'hidden' visibility for
extern declarations") made a change to the way external symbol
references are classified, resulting in 'startup_32' now being
emitted as a hidden symbol. This prevents the use of GOT entries to
refer to this symbol via its absolute address, which recent toolchains
(including Clang based ones) already avoid by default, making this
change a no-op in the majority of cases.
However, as it turns out, the LLVM linker classifies such hidden
symbols as symbols with static linkage in fully linked ELF binaries,
causing tools such as NM to output a lowercase 't' rather than an upper
case 'T' for the type of such symbols. Since our sed expression only
matches upper case letters for the symbol type, the line describing
startup_32 is disregarded, resulting in a build error like the following
arch/x86/boot/header.S:568:18: error: symbol 'ZO_startup_32' can not be
undefined in a subtraction expression
init_size: .long (0x00000000008fd000 - ZO_startup_32 +
(((0x0000000001f6361c + ((0x0000000001f6361c >> 8) + 65536)
- 0x00000000008c32e5) + 4095) & ~4095)) # kernel initialization size
Given that we are only interested in the value of the symbol, let's match
any character in the set 'a-zA-Z' instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
The only users of these got removed, so they also need to be
removed to avoid warnings:
arch/x86/boot/compressed/eboot.c: In function 'setup_efi_pci':
arch/x86/boot/compressed/eboot.c:117:16: error: unused variable 'nr_pci' [-Werror=unused-variable]
unsigned long nr_pci;
^~~~~~
arch/x86/boot/compressed/eboot.c: In function 'setup_uga':
arch/x86/boot/compressed/eboot.c:244:16: error: unused variable 'nr_ugas' [-Werror=unused-variable]
unsigned long nr_ugas;
^~~~~~~
Fixes: 2732ea0d5c ("efi/libstub: Use a helper to iterate over a EFI handle array")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-4-ardb@kernel.org
Reduce the stack frame of the EFI stub's mixed mode thunk routine by
8 bytes, by moving the GDT and return addresses to EBP and EBX, which
we need to preserve anyway, since their top halves will be cleared by
the call into 32-bit firmware code. Doing so results in the UEFI code
being entered with a 16 byte aligned stack, as mandated by the UEFI
spec, fixing the last occurrence in the 64-bit kernel where we violate
this requirement.
Also, move the saved GDT from a global variable to an unused part of the
stack frame, and touch up some other parts of the code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-3-ardb@kernel.org
Reshuffle the x86 stub code a bit so that we can tag the efi_is_64bit()
function with the 'const' attribute, which permits the compiler to
optimize away any redundant calls. Since we have two different entry
points for 32 and 64 bit firmware in the startup code, this also
simplifies the C code since we'll enter it with the efi_is64 variable
already set.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-2-ardb@kernel.org
Add support for generating VMX feature names in capflags.c and use the
resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't
print VMX flags if no bits are set in word 0, which holds Pin Controls.
Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if
they are not supported then the CPU is broken, it does not actually
support VMX, or the kernel wasn't built with support for the target CPU.
Print the features in a dedicated "vmx flags" line to avoid polluting
the common "flags" and to avoid having to prefix all flags with "vmx_",
which results in horrendously long names.
Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI
for those flags. This means that "flags" and "vmx flags" will have
duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority,
vpid and ept_ad, but caps the pollution of "flags" at those six VMX
features. The vendor-specific code that populates the synthetic flags
will be consolidated in a future patch to further minimize the lasting
damage.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-12-sean.j.christopherson@intel.com
Introduce the ability to define macros to perform argument translation
for the calls that need it, and define them for the boot services that
we currently use.
When calling 32-bit firmware methods in mixed mode, all output
parameters that are 32-bit according to the firmware, but 64-bit in the
kernel (ie OUT UINTN * or OUT VOID **) must be initialized in the
kernel, or the upper 32 bits may contain garbage. Define macros that
zero out the upper 32 bits of the output before invoking the firmware
method.
When a 32-bit EFI call takes 64-bit arguments, the mixed-mode call must
push the two 32-bit halves as separate arguments onto the stack. This
can be achieved by splitting the argument into its two halves when
calling the assembler thunk. Define a macro to do this for the
free_pages boot service.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-17-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On x86 we need to thunk through assembler stubs to call the EFI services
for mixed mode, and for runtime services in 64-bit mode. The assembler
stubs have limits on how many arguments it handles. Introduce a few
macros to check that we do not try to pass too many arguments to the
stubs.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-16-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit c3710de506 ("efi/libstub/x86: Drop __efi_early() export and
efi_config struct") introduced a reference from C code in eboot.c to
the startup_32 symbol defined in the .S startup code. This results in
a GOT based reference to startup_32, and since GOT entries carry
absolute addresses, they need to be fixed up before they can be used.
On modern toolchains (binutils 2.26 or later), this reference is
relaxed into a R_386_GOTOFF relocation (or the analogous X86_64 one)
which never uses the absolute address in the entry, and so we get
away with not fixing up the GOT table before calling the EFI entry
point. However, GCC 4.6 combined with a binutils of the era (2.24)
will produce a true GOT indirected reference, resulting in a wrong
value to be returned for the address of startup_32() if the boot
code is not running at the address it was linked at.
Fortunately, we can easily override this behavior, and force GCC to
emit the GOTOFF relocations explicitly, by setting the visibility
pragma 'hidden'.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-3-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The mixed mode refactor actually broke mixed mode by failing to
pass the bootparam structure to startup_32(). This went unnoticed
because it apparently has a high tolerance for being passed random
junk, and still boots fine in some cases. So let's fix this by
populating %esi as required when entering via efi32_stub_entry,
and while at it, preserve the arguments themselves instead of their
address in memory (via the stack pointer) since that memory could
be clobbered before we get to it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20200103113953.9571-2-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of storing the return address in a global variable when calling
a 32-bit EFI service from the 64-bit stub, avoid the indirection via
efi_exit32, and take the return address from the stack.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-26-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The macros efi_call_early and efi_call_runtime are used to call EFI
boot services and runtime services, respectively. However, the naming
is confusing, given that the early vs runtime distinction may suggest
that these are used for calling the same set of services either early
or late (== at runtime), while in reality, the sets of services they
can be used with are completely disjoint, and efi_call_runtime is also
only usable in 'early' code.
So do a global sweep to replace all occurrences with efi_bs_call or
efi_rt_call, respectively, where BS and RT match the idiom used by
the UEFI spec to refer to boot time or runtime services.
While at it, use 'func' as the macro parameter name for the function
pointers, which is less likely to collide and cause weird build errors.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-24-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
None of the definitions of the efi_table_attr() still refer to
their 'table' argument so let's get rid of it entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-23-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After refactoring the mixed mode support code, efi_call_proto()
no longer uses its protocol argument in any of its implementation,
so let's remove it altogether.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-22-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mixed mode translates calls from the 64-bit kernel into the 32-bit
firmware by wrapping them in a call to a thunking routine that
pushes a 32-bit word onto the stack for each argument passed to the
function, regardless of the argument type. This works surprisingly
well for most services and protocols, with the exception of ones that
take explicit 64-bit arguments.
efi_free() invokes the FreePages() EFI boot service, which takes
a efi_physical_addr_t as its address argument, and this is one of
those 64-bit types. This means that the 32-bit firmware will
interpret the (addr, size) pair as a single 64-bit quantity, and
since it is guaranteed to have the high word set (as size > 0),
it will always fail due to the fact that EFI memory allocations are
always < 4 GB on 32-bit firmware.
So let's fix this by giving the thunking code a little hand, and
pass two values for the address, and a third one for the size.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-21-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have a helper efi_system_table() that gives us the address of the
EFI system table in memory, so there is no longer point in passing
it around from each function to the next.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-20-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As a first step towards getting rid of the need to pass around a function
parameter 'sys_table_arg' pointing to the EFI system table, remove the
references to it in the printing code, which is represents the majority
of the use cases.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-19-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The various pointers we stash in the efi_config struct which we
retrieve using __efi_early() are simply copies of the ones in
the EFI system table, which we have started accessing directly
in the previous patch. So drop all the __efi_early() related
plumbing, as well as all the assembly code dealing with efi_config,
which allows us to move the PE/COFF entry point to C code as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-18-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a single implementation for efi_char16_printk() across all
architectures.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-17-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The efi_call macros on ARM have a dependency on a variable 'sys_table_arg'
existing in the scope of the macro instantiation. Since this variable
always points to the same data structure, let's create a global getter
for it and use that instead.
Note that the use of a global variable with external linkage is avoided,
given the problems we had in the past with early processing of the GOT
tables.
While at it, drop the redundant casts in the efi_table_attr and
efi_call_proto macros.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-16-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We use special wrapper routines to invoke firmware services in the
native case as well as the mixed mode case. For mixed mode, the need
is obvious, but for the native cases, we can simply rely on the
compiler to generate the indirect call, given that GCC now has
support for the MS calling convention (and has had it for quite some
time now). Note that on i386, the decompressor and the EFI stub are not
built with -mregparm=3 like the rest of the i386 kernel, so we can
safely allow the compiler to emit the indirect calls here as well.
So drop all the wrappers and indirection, and switch to either native
calls, or direct calls into the thunk routine for mixed mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-14-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Annotate all the firmware routines (boot services, runtime services and
protocol methods) called in the boot context as __efiapi, and make
it expand to __attribute__((ms_abi)) on 64-bit x86. This allows us
to use the compiler to generate the calls into firmware that use the
MS calling convention instead of the SysV one.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-13-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We will soon remove another level of pointer casting, so let's make
sure all type handling involving firmware calls at boot time is correct.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-12-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we have incorporated the mixed mode protocol definitions
into the native ones using unions, we no longer need the separate
32/64 bit struct definitions, with the exception of the EFI system
table definition and the boot services, runtime services and
configuration table definitions. So drop the unused ones.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-11-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, we support mixed mode by casting all boot time firmware
calls to 64-bit explicitly on native 64-bit systems, and to 32-bit
on 32-bit systems or 64-bit systems running with 32-bit firmware.
Due to this explicit awareness of the bitness in the code, we do a
lot of casting even on generic code that is shared with other
architectures, where mixed mode does not even exist. This casting
leads to loss of coverage of type checking by the compiler, which
we should try to avoid.
So instead of distinguishing between 32-bit vs 64-bit, distinguish
between native vs mixed, and limit all the nasty casting and
pointer mangling to the code that actually deals with mixed mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-10-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of moving to a native vs. mixed mode split rather than a
32 vs. 64 bit split when it comes to invoking EFI firmware services,
update all the native protocol definitions and redefine them as unions
containing an anonymous struct for the native view and a struct called
'mixed_mode' describing the 32-bit view of the protocol when called from
64-bit code.
While at it, flesh out some PCI I/O member definitions that we will be
needing shortly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-9-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Iterating over a EFI handle array is a bit finicky, since we have
to take mixed mode into account, where handles are only 32-bit
while the native efi_handle_t type is 64-bit.
So introduce a helper, and replace the various occurrences of
this pattern.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224151025.32482-8-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI mixed mode entry code goes through the ordinary startup_32()
routine before jumping into the kernel's EFI boot code in 64-bit
mode. The 32-bit startup code must be entered with paging disabled,
but this is not documented as a requirement for the EFI handover
protocol, and so we should disable paging explicitly when entering
the kernel from 32-bit EFI firmware.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191224132909.102540-4-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce a new READELF variable to top-level Makefile, so the name of
readelf binary can be specified.
Before this change the name of the binary was hardcoded to
"$(CROSS_COMPILE)readelf" which might not be present for every
toolchain.
This allows to build with LLVM Object Reader by using make parameter
READELF=llvm-readelf.
Link: https://github.com/ClangBuiltLinux/linux/issues/771
Signed-off-by: Dmitry Golovin <dima@golovin.in>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When using GCC as compiler and LLVM's lld as linker, linking setup.elf
fails:
LD arch/x86/boot/setup.elf
ld.lld: error: init sections too big!
This happens because GCC generates .eh_frame sections for most of the
files in that directory, then ld.lld places the merged section before
__end_init, triggering an assert in the linker script.
Fix this by discarding the .eh_frame sections, as suggested by Boris.
The kernel proper linker script discards them too.
[ bp: Going back in history, 64-bit kernel proper has been discarding
.eh_frame since 2002:
commit acca80acefe20420e69561cf55be64f16c34ea97
Author: Andi Kleen <ak@muc.de>
Date: Tue Oct 29 23:54:35 2002 -0800
[PATCH] x86-64 updates for 2.5.44
...
- Remove the .eh_frame on linking. This saves several hundred KB in the
bzImage
]
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ilie Halip <ilie.halip@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: clang-built-linux@googlegroups.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lore.kernel.org/lkml/20191118175223.GM6363@zn.tnic/
Link: https://github.com/ClangBuiltLinux/linux/issues/760
Link: https://lkml.kernel.org/r/20191126144545.19354-1-ilie.halip@gmail.com
- Update the ACPICA code in the kernel to upstream revision 20191018
including:
* Fixes for Clang warnings (Bob Moore).
* Fix for possible overflow in get_tick_count() (Bob Moore).
* Introduction of acpi_unload_table() (Bob Moore).
* Debugger and utilities updates (Erik Schmauss).
* Fix for unloading tables loaded via configfs (Nikolaus Voss).
- Add support for EFI specific purpose memory to optionally allow
either application-exclusive or core-kernel-mm managed access to
differentiated memory (Dan Williams).
- Fix and clean up processing of the HMAT table (Brice Goglin,
Qian Cai, Tao Xu).
- Update the ACPI EC driver to make it work on systems with
hardware-reduced ACPI (Daniel Drake).
- Always build in support for the Generic Event Device (GED) to
allow one kernel binary to work both on systems with full
hardware ACPI and hardware-reduced ACPI (Arjan van de Ven).
- Fix the table unload mechanism to unregister platform devices
created when the given table was loaded (Andy Shevchenko).
- Rework the lid blacklist handling in the button driver and add
more lid quirks to it (Hans de Goede).
- Improve ACPI-based device enumeration for some platforms based
on Intel BayTrail SoCs (Hans de Goede).
- Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC
and prevent handlers from being registered for unhandled PMIC
OpRegions (Hans de Goede).
- Unify ACPI _HID/_UID matching (Andy Shevchenko).
- Clean up documentation and comments (Cao jin, James Pack, Kacper
Piwiński).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl3dHNkSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx/NkP/2y6DWjslA6UW4gjZwaRBcjYoyWExMtQ
Z86goiRJtP+/NqOwm09wHFcV6FdZ4kitUno3UgMCDZJjrURapg1D0rxb1lSYtMzs
mGr2FBZlVsJ9erOVSzKj1x2afVhdgl0Rl0fxPzoKgCFt8tCJar6cXy4CVEQKdeLs
eUui2ksXMIEODGhpN/tr/fJqY4O4jlLmPY6gKWfFpSTsv6lnZmzcCxLf5EvUU7JW
O91/jXdWz4Vl6IdP32sce6dGDjkvwnY105c7HeBf5EQWUe9RHFuSex982qhCD8U+
iE+JzlhoYpUb03EktJSXbL++IKUHvoUpTanbhka6unMhazC86x0hDf7ruUtYo2Bk
V8347CFeQ1x2O5IabfJNnUfKaMYhYmOXIoFHJTLKFO5mcCJmP8KOOyDAYilC1psb
RJpl1fDoAhk7NqhMttyBqfxiotP0kMoKuqtAAl8Y0hTF0DwR9IfKntuTtp1yTGds
R4dpJrizUDzw1/o4fCWbc3dFZQR3NFGpL/EAyfPzqjGaeaBBkLoNYstqkal5XHwT
CILmQg2WHoNuQLXZ4NFFDrM2k2G+VUAjQdkYcb/MCOFbw+aTVPu1wyQq37RLtbMo
9UwGeeT6SXW3iA1nyMoM+YvitjmxS7gHPPPl+b9G6kBubAzBPp91Ra0Mj9dPIGRB
Evv5nzOIh8Hi
=7Cqr
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20191018, add support for EFI specific purpose memory, update the ACPI
EC driver to make it work on systems with hardware-reduced ACPI,
improve ACPI-based device enumeration for some platforms, rework the
lid blacklist handling in the button driver and add more lid quirks to
it, unify ACPI _HID/_UID matching, fix assorted issues and clean up
the code and documentation.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20191018
including:
* Fixes for Clang warnings (Bob Moore)
* Fix for possible overflow in get_tick_count() (Bob Moore)
* Introduction of acpi_unload_table() (Bob Moore)
* Debugger and utilities updates (Erik Schmauss)
* Fix for unloading tables loaded via configfs (Nikolaus Voss)
- Add support for EFI specific purpose memory to optionally allow
either application-exclusive or core-kernel-mm managed access to
differentiated memory (Dan Williams)
- Fix and clean up processing of the HMAT table (Brice Goglin, Qian
Cai, Tao Xu)
- Update the ACPI EC driver to make it work on systems with
hardware-reduced ACPI (Daniel Drake)
- Always build in support for the Generic Event Device (GED) to allow
one kernel binary to work both on systems with full hardware ACPI
and hardware-reduced ACPI (Arjan van de Ven)
- Fix the table unload mechanism to unregister platform devices
created when the given table was loaded (Andy Shevchenko)
- Rework the lid blacklist handling in the button driver and add more
lid quirks to it (Hans de Goede)
- Improve ACPI-based device enumeration for some platforms based on
Intel BayTrail SoCs (Hans de Goede)
- Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC and
prevent handlers from being registered for unhandled PMIC OpRegions
(Hans de Goede)
- Unify ACPI _HID/_UID matching (Andy Shevchenko)
- Clean up documentation and comments (Cao jin, James Pack, Kacper
Piwiński)"
* tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
ACPI: OSI: Shoot duplicate word
ACPI: HMAT: use %u instead of %d to print u32 values
ACPI: NUMA: HMAT: fix a section mismatch
ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxm
ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device
ACPI: NUMA: HMAT: Register HMAT at device_initcall level
device-dax: Add a driver for "hmem" devices
dax: Fix alloc_dax_region() compile warning
lib: Uplevel the pmem "region" ida to a global allocator
x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP
arm/efi: EFI soft reservation to memblock
x86/efi: EFI soft reservation to E820 enumeration
efi: Common enable/disable infrastructure for EFI soft reservation
x86/efi: Push EFI_MEMMAP check into leaf routines
efi: Enumerate EFI_MEMORY_SP
ACPI: NUMA: Establish a new drivers/acpi/numa/ directory
ACPICA: Update version to 20191018
ACPICA: debugger: remove leading whitespaces when converting a string to a buffer
ACPICA: acpiexec: initialize all simple types and field units from user input
ACPICA: debugger: add field unit support for acpi_db_get_next_token
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Wire up the EFI RNG code for x86. This enables an additional source
of entropy during early boot.
- Enable the TPM event log code on ARM platforms.
- Update Ard's email address"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: libstub/tpm: enable tpm eventlog function for ARM platforms
x86: efi/random: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table
efi/random: use arch-independent efi_call_proto()
MAINTAINERS: update Ard's email address to @kernel.org
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
Pull x86 boot updates from Ingo Molnar:
"The main changes were:
- Extend the boot protocol to allow future extensions without hitting
the setup_header size limit.
- Add quirk to devicetree systems to disable the RTC unless it's
listed as a supported device.
- Fix ld.lld linker pedantry"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Introduce setup_indirect
x86/boot: Introduce kernel_info.setup_type_max
x86/boot: Introduce kernel_info
x86/init: Allow DT configured systems to disable RTC at boot time
x86/realmode: Explicitly set entry point via ENTRY in linker script
This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The setup_data is a bit awkward to use for extremely large data objects,
both because the setup_data header has to be adjacent to the data object
and because it has a 32-bit length field. However, it is important that
intermediate stages of the boot process have a way to identify which
chunks of memory are occupied by kernel data. Thus introduce an uniform
way to specify such indirect data as setup_indirect struct and
SETUP_INDIRECT type.
And finally bump setup_header version in arch/x86/boot/header.S.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-4-daniel.kiper@oracle.com
This field contains maximal allowed type for setup_data.
Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-3-daniel.kiper@oracle.com
The relationships between the headers are analogous to the various data
sections:
setup_header = .data
boot_params/setup_data = .bss
What is missing from the above list? That's right:
kernel_info = .rodata
We have been (ab)using .data for things that could go into .rodata or .bss for
a long time, for lack of alternatives and -- especially early on -- inertia.
Also, the BIOS stub is responsible for creating boot_params, so it isn't
available to a BIOS-based loader (setup_data is, though).
setup_header is permanently limited to 144 bytes due to the reach of the
2-byte jump field, which doubles as a length field for the structure, combined
with the size of the "hole" in struct boot_params that a protected-mode loader
or the BIOS stub has to copy it into. It is currently 119 bytes long, which
leaves us with 25 very precious bytes. This isn't something that can be fixed
without revising the boot protocol entirely, breaking backwards compatibility.
boot_params proper is limited to 4096 bytes, but can be arbitrarily extended
by adding setup_data entries. It cannot be used to communicate properties of
the kernel image, because it is .bss and has no image-provided content.
kernel_info solves this by providing an extensible place for information about
the kernel image. It is readonly, because the kernel cannot rely on a
bootloader copying its contents anywhere, but that is OK; if it becomes
necessary it can still contain data items that an enabled bootloader would be
expected to copy into a setup_data chunk.
Do not bump setup_header version in arch/x86/boot/header.S because it
will be followed by additional changes coming into the Linux/x86 boot
protocol.
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: ard.biesheuvel@linaro.org
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: dave.hansen@linux.intel.com
Cc: eric.snowberg@oracle.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: kanth.ghatraju@oracle.com
Cc: linux-doc@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rdunlap@infradead.org
Cc: ross.philipson@oracle.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191112134640.16035-2-daniel.kiper@oracle.com
Given that EFI_MEMORY_SP is platform BIOS policy decision for marking
memory ranges as "reserved for a specific purpose" there will inevitably
be scenarios where the BIOS omits the attribute in situations where it
is desired. Unlike other attributes if the OS wants to reserve this
memory from the kernel the reservation needs to happen early in init. So
early, in fact, that it needs to happen before e820__memblock_setup()
which is a pre-requisite for efi_fake_memmap() that wants to allocate
memory for the updated table.
Introduce an x86 specific efi_fake_memmap_early() that can search for
attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
accordingly.
The KASLR code that scans the command line looking for user-directed
memory reservations also needs to be updated to consider
"efi_fake_mem=nn@ss:0x40000" requests.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a specific
purpose".
The proposed Linux behavior for specific purpose memory is that it is
reserved for direct-access (device-dax) by default and not available for
any kernel usage, not even as an OOM fallback. Later, through udev
scripts or another init mechanism, these device-dax claimed ranges can
be reconfigured and hot-added to the available System-RAM with a unique
node identifier. This device-dax management scheme implements "soft" in
the "soft reserved" designation by allowing some or all of the
reservation to be recovered as typical memory. This policy can be
disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
efi=nosoftreserve.
This patch introduces 2 new concepts at once given the entanglement
between early boot enumeration relative to memory that can optionally be
reserved from the kernel page allocator by default. The new concepts
are:
- E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
attribute on EFI_CONVENTIONAL memory, update the E820 map with this
new type. Only perform this classification if the
CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
typical ram.
- IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
a device driver to search iomem resources for application specific
memory. Teach the iomem code to identify such ranges as "Soft Reserved".
Note that the comment for do_add_efi_memmap() needed refreshing since it
seemed to imply that the efi map might overflow the e820 table, but that
is not an issue as of commit 7b6e4ba3cb "x86/boot/e820: Clean up the
E820_X_MAX definition" that removed the 128 entry limit for
e820__range_add().
A follow-on change integrates parsing of the ACPI HMAT to identify the
node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
now, just identify and reserve memory of this type.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Invoke the EFI_RNG_PROTOCOL protocol in the context of the x86 EFI stub,
same as is done on arm/arm64 since commit 568bc4e870 ("efi/arm*/libstub:
Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table"). Within the stub,
a Linux-specific RNG seed UEFI config table will be seeded. The EFI routines
in the core kernel will pick that up later, yet still early during boot,
to seed the kernel entropy pool. If CONFIG_RANDOM_TRUST_BOOTLOADER, entropy
is credited for this seed.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Currently, kernel fails to boot on some HyperV VMs when using EFI.
And it's a potential issue on all x86 platforms.
It's caused by broken kernel relocation on EFI systems, when below three
conditions are met:
1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
by the loader.
2. There isn't enough room to contain the kernel, starting from the
default load address (eg. something else occupied part the region).
3. In the memmap provided by EFI firmware, there is a memory region
starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
kernel.
EFI stub will perform a kernel relocation when condition 1 is met. But
due to condition 2, EFI stub can't relocate kernel to the preferred
address, so it fallback to ask EFI firmware to alloc lowest usable memory
region, got the low region mentioned in condition 3, and relocated
kernel there.
It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
is the lowest acceptable kernel relocation address.
The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
address if kernel is located below it. Then the relocation before
decompression, which move kernel to the end of the decompression buffer,
will overwrite other memory region, as there is no enough memory there.
To fix it, just don't let EFI stub relocate the kernel to any address
lower than lowest acceptable address.
[ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When building with "EXTRA_CFLAGS=-Wall" gcc warns:
arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function]
get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and
CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these
config options is disabled has this issue.
Move the function under the same ifdef guard as the call site.
[ tglx: Add context to the changelog so it becomes useful ]
Fixes: 41fa1ee9c6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Now, ENTRY/ENDPROC can be forced to be undefined on X86, so do so.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-28-jslaby@suse.cz
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.
Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate]
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Acked-by: Herbert Xu <herbert@gondor.apana.org.au> [crypto]
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Armijn Hemel <armijn@tjaldur.nl>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz
All these are functions which are invoked from elsewhere but they are
not typical C functions. So annotate them using the new SYM_CODE_START.
All these were not balanced with any END, so mark their ends by
SYM_CODE_END appropriately too.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [power mgmt]
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-23-jslaby@suse.cz
There are a couple of assembly functions which are invoked only locally
in the file they are defined. In C, they are marked "static". In
assembly, annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch
their ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE is used,
depends on whether ENDPROC or END was used for a particular function
before.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191011115108.12392-21-jslaby@suse.cz
GLOBAL is an x86's custom macro and is going to die very soon. It was
meant for global symbols, but here, it was used for functions. Instead,
use the new macros SYM_FUNC_START* and SYM_CODE_START* (depending on the
type of the function) which are dedicated to global functions. And since
they both require a closing by SYM_*_END, do that here too.
startup_64, which does not use GLOBAL but uses .globl explicitly, is
converted too.
"No alignments" are preserved.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-arch@vger.kernel.org
Cc: Maran Wilson <maran.wilson@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-17-jslaby@suse.cz
Use the new SYM_DATA, SYM_DATA_START, and SYM_DATA_END* macros for data,
so that the data in the object file look sane:
Value Size Type Bind Vis Ndx Name
0000 10 OBJECT GLOBAL DEFAULT 3 efi32_boot_gdt
000a 10 OBJECT LOCAL DEFAULT 3 save_gdt
0014 8 OBJECT LOCAL DEFAULT 3 func_rt_ptr
001c 48 OBJECT GLOBAL DEFAULT 3 efi_gdt64
004c 0 OBJECT LOCAL DEFAULT 3 efi_gdt64_end
0000 48 OBJECT LOCAL DEFAULT 3 gdt
0030 0 OBJECT LOCAL DEFAULT 3 gdt_end
0030 8 OBJECT LOCAL DEFAULT 3 efi_config
0038 49 OBJECT GLOBAL DEFAULT 3 efi32_config
0069 49 OBJECT GLOBAL DEFAULT 3 efi64_config
All have correct size and type now.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-13-jslaby@suse.cz
.Lrelocated, .Lpaging_enabled, .Lno_longmode, and .Lin_pm32 are
self-standing local functions, annotate them as such and preserve "no
alignment".
The annotations do not generate anything yet.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-8-jslaby@suse.cz
The kernel image map is created using PMD pages, which can include
some extra space beyond what's actually needed. Round the size of the
memory hole we search for up to the next PMD boundary, to be certain
all of the space to be mapped is usable RAM and includes no reserved
areas.
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dimitri.sivanich@hpe.com
Cc: Feng Tang <feng.tang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: mike.travis@hpe.com
Cc: russ.anderson@hpe.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
During the assembly cleanup patchset review, I found more symbols which
are used only locally. So make them really local by prepending ".L" to
them. Namely:
- wakeup_idt is used only in realmode/rm/wakeup_asm.S.
- in_pm32 is used only in boot/pmjump.S.
- retint_user is used only in entry/entry_64.S, perhaps since commit
2ec67971fa ("x86/entry/64/compat: Remove most of the fast system
call machinery"), where entry_64_compat's caller was removed.
Drop GLOBAL from all of them too. I do not see more candidates in the
series.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20191011092213.31470-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was observed that the kernel embeds the absolute build path in the
x86 boot image when the __FILE__ macro is expanded.
> From https://bugzilla.yoctoproject.org/show_bug.cgi?id=13458:
If you turn on the buildpaths QA test, or try a reproducible build, you
discover that the kernel image contains build paths.
$ strings bzImage-5.0.19-yocto-standard |grep tmp/
out of pgt_buf in
/data/poky-tmp/reproducible/tmp/work-shared/qemux86-64/kernel-source/arch/x86/boot/compressed/kaslr_64.c!?
But what's this in the top-level Makefile:
$ git grep prefix-map
Makefile:KBUILD_CFLAGS += $(call
cc-option,-fmacro-prefix-map=$(srctree)/=)
So the __FILE__ shouldn't be using the full path. However
arch/x86/boot/compressed/Makefile has this:
KBUILD_CFLAGS := -m$(BITS) -O2
So that clears KBUILD_FLAGS, removing the -fmacro-prefix-map option.
Use -fmacro-prefix-map to have relative paths in the boot image too.
[ bp: Massage commit message and put the KBUILD_CFLAGS addition in
..boot/Makefile after the KBUILD_AFLAGS assignment because gas
doesn't support -fmacro-prefix-map. ]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
Signed-off-by: Ross Burton <ross.burton@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Rimar <grimar@accesssoftek.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190926093226.8568-1-ross.burton@intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204333
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
Pull x86 boot code cleanup from Ingo Molnar:
"Clean up the BUILD_BUG_ON() definition which can cause build warnings"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use common BUILD_BUG_ON
Boris suggests to make a local label (prepend ".L") to these functions
to eliminate them from the symbol table. These are functions with very
local names and really should not be visible anywhere.
Note that objtool won't see these functions anymore (to generate ORC
debug info). But all the functions are not annotated with ENDPROC, so
they won't have objtool's attention anyway.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steve Winslow <swinslow@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20190906075550.23435-2-jslaby@suse.cz
Gustavo noticed that 'new' can be left uninitialized if 'bios_start'
happens to be less or equal to 'entry->addr + entry->size'.
Initialize the variable at the begin of the iteration to the current value
of 'bios_start'.
Fixes: 0a46fff2f9 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table")
Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box
This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to modify the workings of hardware. Reject
the option when the kernel is locked down. This requires some reworking
of the existing RSDP command line logic, since the early boot code also
makes use of a command-line passed RSDP when locating the SRAT table
before the lockdown code has been initialised. This is achieved by
separating the command line RSDP path in the early boot code from the
generic RSDP path, and then copying the command line RSDP into boot
params in the kernel proper if lockdown is not enabled. If lockdown is
enabled and an RSDP is provided on the command line, this will only be
used when parsing SRAT (which shouldn't permit kernel code execution)
and will be ignored in the rest of the kernel.
(Modified by Matthew Garrett in order to handle the early boot RSDP
environment)
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: Dave Young <dyoung@redhat.com>
cc: linux-acpi@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
BIOS on Samsung 500C Chromebook reports very rudimentary E820 table that
consists of 2 entries:
BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] usable
BIOS-e820: [mem 0x00000000fffff000-0x00000000ffffffff] reserved
It breaks logic in find_trampoline_placement(): bios_start lands on the
end of the first 4k page and trampoline start gets placed below 0.
Detect underflow and don't touch bios_start for such cases. It makes
kernel ignore E820 table on machines that doesn't have two usable pages
below BIOS_START_MAX.
Fixes: 1b3a626436 ("x86/boot/compressed/64: Validate trampoline placement against E820")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203463
Link: https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
Defining BUILD_BUG_ON causes redefinition warnings when adding includes of
include/linux/build_bug.h in files unrelated to x86/boot. For example,
adding an include of build_bug.h to include/linux/bits.h shows the
following warnings:
CC arch/x86/boot/cpucheck.o
In file included from ./include/linux/bits.h:22,
from ./arch/x86/include/asm/msr-index.h:5,
from arch/x86/boot/cpucheck.c:28:
./include/linux/build_bug.h:49: warning: "BUILD_BUG_ON" redefined
49 | #define BUILD_BUG_ON(condition) \
|
In file included from arch/x86/boot/cpucheck.c:22:
arch/x86/boot/boot.h:31: note: this is the location of the previous definition
31 | #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
|
The macro was added to boot.h in commit 62bd0337d0 ("Top header file for
new x86 setup code"). At that time, BUILD_BUG_ON was defined in
kernel.h. Presumably BUILD_BUG_ON was redefined to avoid pulling in
kernel.h. Since then, BUILD_BUG_ON and similar macros have been split to a
separate header file.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190811184938.1796-2-rikard.falkeborn@gmail.com
Implementing memcpy and memset in terms of __builtin_memcpy and
__builtin_memset is problematic.
GCC at -O2 will replace calls to the builtins with calls to memcpy and
memset (but will generate an inline implementation at -Os). Clang will
replace the builtins with these calls regardless of optimization level.
$ llvm-objdump -dr arch/x86/purgatory/string.o | tail
0000000000000339 memcpy:
339: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
000000000000033b: R_X86_64_64 memcpy
343: ff e0 jmpq *%rax
0000000000000345 memset:
345: 48 b8 00 00 00 00 00 00 00 00 movabsq $0, %rax
0000000000000347: R_X86_64_64 memset
34f: ff e0
Such code results in infinite recursion at runtime. This is observed
when doing kexec.
Instead, reuse an implementation from arch/x86/boot/compressed/string.c.
This requires to implement a stub function for warn(). Also, Clang may
lower memcmp's that compare against 0 to bcmp's, so add a small definition,
too. See also: commit 5f074f3e19 ("lib/string.c: implement a basic bcmp")
Fixes: 8fc5b4d412 ("purgatory: core purgatory functionality")
Reported-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Debugged-by: Manoj Gupta <manojgupta@google.com>
Suggested-by: Alistair Delva <adelva@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vaibhav Rustagi <vaibhavrustagi@google.com>
Cc: stable@vger.kernel.org
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=984056
Link: https://lkml.kernel.org/r/20190807221539.94583-1-ndesaulniers@google.com
Kernel build warns:
'sanitize_boot_params' defined but not used [-Wunused-function]
at below files:
arch/x86/boot/compressed/cmdline.c
arch/x86/boot/compressed/error.c
arch/x86/boot/compressed/early_serial_console.c
arch/x86/boot/compressed/acpi.c
That's becausethey each include misc.h which includes a definition of
sanitize_boot_params() via bootparam_utils.h.
Remove the inclusion from misc.h and have the c file including
bootparam_utils.h directly.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.com
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with other
trees, unfortunately. He has a lot more of these waiting on the wings
that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos, and one
on Spectre vulnerabilities.
- Various improvements to the build system, including automatic markup of
function() references because some people, for reasons I will never
understand, were of the opinion that :c:func:``function()`` is
unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
=D0eO
-----END PGP SIGNATURE-----
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Pull x86 boot updates from Thomas Gleixner:
"Assorted updates to kexec/kdump:
- Proper kexec support for 4/5-level paging and jumping from a
5-level to a 4-level paging kernel.
- Make the EFI support for kexec/kdump more robust
- Enforce that the GDT is properly aligned instead of getting the
alignment by chance"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kdump/64: Restrict kdump kernel reservation to <64TB
x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel
x86/boot: Add xloadflags bits to check for 5-level paging support
x86/boot: Make the GDT 8-byte aligned
x86/kexec: Add the ACPI NVS region to the ident map
x86/boot: Call get_rsdp_addr() after console_init()
Revert "x86/boot: Disable RSDP parsing temporarily"
x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels
x86/kexec: Add the EFI system tables and ACPI tables to the ident map
The current kernel supports 5-level paging mode, and supports dynamically
choosing the paging mode during bootup depending on the kernel image,
hardware and kernel parameter settings. This flexibility brings several
issues to kexec/kdump:
1) Dynamic switching between paging modes requires support in the target
kernel. This means kexec from a 5-level paging kernel into a kernel
which does not support mode switching is not possible. So the loader
needs to be able to analyze the supported paging modes of the kexec
target kernel.
2) If running on a 5-level paging kernel and the kexec target kernel is a
4-level paging kernel, the target immage cannot be loaded above the 64TB
address space limit. But the kexec loader searches for a load area from
top to bottom which would eventually put the target kernel above 64TB
when the machine has large enough RAM size. So the loader needs to be
able to analyze the paging mode of the target kernel to load it at a
suitable spot in the address space.
Solution:
Add two bits XLF_5LEVEL and XLF_5LEVEL_ENABLED:
- Bit XLF_5LEVEL indicates whether 5-level paging mode switching support
is available. (Issue #1)
- Bit XLF_5LEVEL_ENABLED indicates whether the kernel was compiled with
full 5-level paging support (CONFIG_X86_5LEVEL=y). (Issue #2)
The loader will use these bits to verify whether the target kernel is
suitable to be kexec'ed to from a 5-level paging kernel and to determine
the constraints of the target kernel load address.
The flags will be used by the kernel kexec subsystem and the userspace
kexec tools.
[ tglx: Massaged changelog ]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dyoung@redhat.com
Link: https://lkml.kernel.org/r/20190524073810.24298-2-bhe@redhat.com
The segment descriptors are loaded with an implicitly LOCK-ed instruction,
which could trigger the split lock #AC exception if the variable is not
properly aligned and crosses a cache line.
Align the GDT properly so the descriptors are all 8 byte aligned.
Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20190627045525.105266-1-xiaoyao.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is part of the linux kernel and is made available under
the terms of the gnu general public license version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 28 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.534229504@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlz8fAYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1asH/3ySguxqtqL1MCBa
4/SZ37PHeWKMerfX6ZyJdgEqK3B+PWlmuLiOMNK5h2bPLzeQQQAmHU/mfKmpXqgB
dHwUbG9yNnyUtTfsfRqAnCA6vpuw9Yb1oIzTCVQrgJLSWD0j7scBBvmzYqguOkto
ThwigLUq3AILr8EfR4rh+GM+5Dn9OTEFAxwil9fPHQo7QoczwZxpURhScT6Co9TB
DqLA3fvXbBvLs/CZy/S5vKM9hKzC+p39ApFTURvFPrelUVnythAM0dPDJg3pIn5u
g+/+gDxDFa+7ANxvxO2ng1sJPDqJMeY/xmjJYlYyLpA33B7zLNk2vDHhAP06VTtr
XCMhQ9s=
=cb80
-----END PGP SIGNATURE-----
Merge tag 'v5.2-rc4' into mauro
We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
... so that early debugging output from the RSDP parsing code can be
visible and collected.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: kexec@lists.infradead.org
Cc: x86@kernel.org
TODO:
- ask dyoung and Dirk van der Merwe <dirk.vandermerwe@netronome.com> to
test again.
This reverts commit 36f0c42355.
Now that the required fixes are in place, reenable early RSDP parsing.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Commit
3a63f70bf4 ("x86/boot: Early parse RSDP and save it in boot_params")
broke kexec boot on EFI systems. efi_get_rsdp_addr() in the early
parsing code tries to search RSDP from the EFI tables but that will
crash because the table address is virtual when the kernel was booted by
kexec (set_virtual_address_map() has run in the first kernel and cannot
be run again in the second kernel).
In the case of kexec, the physical address of EFI tables is provided via
efi_setup_data in boot_params, which is set up by kexec(1).
Factor out the table parsing code and use different pointers depending
on whether the kernel is booted by kexec or not.
[ bp: Massage. ]
Fixes: 3a63f70bf4 ("x86/boot: Early parse RSDP and save it in boot_params")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190408231011.GA5402@jeru.linux.bs1.fc.nec.co.jp
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 46 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.135501091@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 fixes from Ingo Molnar:
"Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
KASAN related build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
x86/boot: Provide KASAN compatible aliases for string routines
Based on 1 normalized pattern(s):
this file is part of the linux kernel and is made available under
the terms of the gnu general public license version 2 or at your
option any later version incorporated herein by reference
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 18 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520075211.321157221@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation inc 53 temple place ste 330 boston ma
02111 1307 usa either version 2 of the license or at your option any
later version incorporated herein by reference
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 13 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170858.645641371@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The KASAN subsystem wraps calls to memcpy(), memset() and memmove()
to sanitize the arguments before invoking the actual routines, which
have been renamed to __memcpy(), __memset() and __memmove(),
respectively. When CONFIG_KASAN is enabled for the kernel build but
KASAN code generation is disabled for the compilation unit (which is
needed for things like the EFI stub or the decompressor), the string
routines are just #define'd to their __ prefixed names so that they
are simply invoked directly.
This does however rely on those __ prefixed names to exist in the
symbol namespace, which is not currently the case for the x86
decompressor, which may lead to errors like
drivers/firmware/efi/libstub/tpm.o: In function `efi_retrieve_tpm2_eventlog':
tpm.c:(.text+0x2a8): undefined reference to `__memcpy'
So let's expose the __ prefixed symbols in the decompressor when
KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpica:
ACPICA: Update version to 20190405
ACPICA: Namespace: add check to avoid null pointer dereference
ACPICA: Update version to 20190329
ACPICA: utilities: fix spelling of PCC to platform_comm_channel
ACPICA: Rename nameseg length macro/define for clarity
ACPICA: Rename nameseg compare macro for clarity
ACPICA: Rename nameseg copy macro for clarity
The original intention to move RDSP parsing very early, before KASLR
does its ranges selection, was to accommodate movable memory regions
machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory
hotplug.
However, that broke kexec'ing a kernel on EFI machines because depending
on where the EFI systab was mapped, on at least one machine it isn't
present in the kexec mapping of the second kernel, leading to a triple
fault in the early code.
Fixing this properly requires significantly involved surgery and we
cannot allow ourselves to do that, that close to the merge window.
So disable the RSDP parsing code temporarily until it is fixed properly
in the next release cycle.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190419141952.GE10324@zn.tnic
ACPICA commit 92ec0935f27e217dff0b176fca02c2ec3d782bb5
ACPI_COMPARE_NAME changed to ACPI_COMPARE_NAMESEG
This clarifies (1) this is a compare on 4-byte namesegs, not
a generic compare. Improves understanding of the code.
Link: https://github.com/acpica/acpica/commit/92ec0935
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The declarations related to immovable memory handling are out of the
BOOT_COMPRESSED_MISC_H #ifdef scope, wrap them inside.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190304055546.18566-1-bhe@redhat.com
The inclusion of <linux/kernel.h> was causing issue as the definition of
__arch_hweight64 from arch/x86/include/asm/arch_hweight.h eventually gets
included. The definition is problematic when compiled with -m16 (all code
in arch/x86/boot/ is) as the "D" inline assembly constraint is rejected
by both compilers when passed an argument of type long long (regardless
of signedness, anything smaller is fine).
Because GCC performs inlining before semantic analysis, and
__arch_hweight64 is dead in this translation unit, GCC does not report
any issues at compile time. Clang does the semantic analysis in the
front end, before inlining (run in the middle) can determine the code is
dead. I consider this another case of PR33587, which I think we can do
more work to solve.
It turns out that arch/x86/boot/string.c doesn't actually need
linux/kernel.h, simply linux/limits.h and linux/compiler.h.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: bp@alien8.de
Cc: niravd@google.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://bugs.llvm.org/show_bug.cgi?id=33587
Link: https://github.com/ClangBuiltLinux/linux/issues/347
Link: https://lkml.kernel.org/r/20190314221458.83047-1-ndesaulniers@google.com
Pull x86 boot fix from Thomas Gleixner:
"A trivial fix for the previous x86/boot pull request which did not
make it in time"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Always return a value from process_mem_region
Pull x86 cleanups from Ingo Molnar:
"Various cleanups and simplifications, none of them really stands out,
they are all over the place"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uaccess: Remove unused __addr_ok() macro
x86/smpboot: Remove unused phys_id variable
x86/mm/dump_pagetables: Remove the unused prev_pud variable
x86/fpu: Move init_xstate_size() to __init section
x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section
x86/mtrr: Remove unused variable
x86/boot/compressed/64: Explain paging_prepare()'s return value
x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition
x86/asm/suspend: Drop ENTRY from local data
x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery
x86/boot: Save several bytes in decompressor
x86/trap: Remove useless declaration
x86/mm/tlb: Remove unused cpu variable
x86/events: Mark expected switch-case fall-throughs
x86/asm-prototypes: Remove duplicate include <asm/page.h>
x86/kernel: Mark expected switch-case fall-throughs
x86/insn-eval: Mark expected switch-case fall-through
x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls
x86/e820: Replace kmalloc() + memcpy() with kmemdup()
Pull x86 build updates from Ingo Molnar:
"Misc cleanups and a retpoline code generation optimization"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, retpolines: Raise limit for generating indirect calls from switch-case
x86/build: Use the single-argument OUTPUT_FORMAT() linker script command
x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
x86/build: Mark per-CPU symbols as absolute explicitly for LLD
Pull x86 boot updates from Ingo Molnar:
"Most of the changes center around the difficult problem of KASLR
pinning down hot-removable memory regions. At the very early stage
KASRL is making irreversible kernel address layout decisions we don't
have full knowledge about the memory maps yet.
So the changes from Chao Fan add this (parsing the RSDP table early),
together with fixes from Borislav Petkov"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed/64: Do not read legacy ROM on EFI system
x86/boot: Correct RSDP parsing with 32-bit EFI
x86/kexec: Fill in acpi_rsdp_addr from the first kernel
x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVE
x86/boot: Fix cmdline_find_option() prototype visibility
x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only
x86/boot: Parse SRAT table and count immovable memory regions
x86/boot: Early parse RSDP and save it in boot_params
x86/boot: Search for RSDP in memory
x86/boot: Search for RSDP in the EFI tables
x86/boot: Add "acpi_rsdp=" early parsing
x86/boot: Copy kstrtoull() to boot/string.c
x86/boot: Build the command line parsing code unconditionally
When compiling with -Wreturn-type, clang warns:
arch/x86/boot/compressed/kaslr.c:704:1: warning: control may reach end of
non-void function [-Wreturn-type]
This function's return statement should have been placed outside the
ifdeffed region. Move it there.
Fixes: 690eaa5320 ("x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only")
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: bhe@redhat.com
Cc: kirill.shutemov@linux.intel.com
Cc: jflat@chromium.org
Link: https://lkml.kernel.org/r/20190302184929.28971-1-louis@kragniz.eu
EFI systems do not necessarily provide a legacy ROM. If the ROM is missing
the memory is not mapped at all.
Trying to dereference values in the legacy ROM area leads to a crash on
Macbook Pro.
Only look for values in the legacy ROM area for non-EFI system.
Fixes: 3548e131ec ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
Reported-by: Pitam Mitra <pitamm@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Bockjoo Kim <bockjoo@phys.ufl.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190219075224.35058-1-kirill.shutemov@linux.intel.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351
Guenter Roeck reported triple faults of a 64-bit VM using a 32-bit OVMF
EFI image. After some singlestepping of the image in gdb, it turned out
that some of the EFI config tables were at bogus addresses. Which, as
Ard pointed out, results from using the wrong efi_config_table typedef.
So switch all EFI table pointers to unsigned longs and convert them to
the proper typedef only when accessing them. This way, the proper table
type is being used.
Shorten variable names, while at it.
Fixes: 33f0df8d84 ("x86/boot: Search for RSDP in the EFI tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190208190248.GA10854@roeck-us.net
paging_prepare() returns a two-quadword structure which lands
into RDX:RAX:
- Address of the trampoline is returned in RAX.
- Non zero RDX means trampoline needs to enable 5-level paging.
Document that explicitly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kyle D Pelton <kyle.d.pelton@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206154756.matwldebbxkmlnae@black.fi.intel.com
RDMSR in the trampoline code overwrites EDX but that register is used
to indicate whether 5-level paging has to be enabled and if clobbered,
leads to failure to boot on a 5-level paging machine.
Preserve EDX on the stack while we are dealing with EFER.
Fixes: b677dfae5a ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode")
Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: dave.hansen@linux.intel.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
When building randconfigs, one of the failures is:
ld: arch/x86/boot/compressed/kaslr.o: in function `choose_random_location':
kaslr.c:(.text+0xbf7): undefined reference to `count_immovable_mem_regions'
ld: kaslr.c:(.text+0xcbe): undefined reference to `immovable_mem'
make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
because CONFIG_ACPI is not enabled in this particular .config but
CONFIG_MEMORY_HOTREMOVE is and count_immovable_mem_regions() is
unresolvable because it is defined in compressed/acpi.c which is the
compilation unit that depends on CONFIG_ACPI.
Add CONFIG_ACPI to the explicit dependencies for MEMORY_HOTREMOVE.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190205131033.9564-1-bp@alien8.de
ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")
enabled building the command line parsing code unconditionally but it
forgot to remove the respective ifdeffery around the prototypes in the
misc.h header, leading to
arch/x86/boot/compressed/acpi.c: In function ‘get_acpi_rsdp’:
arch/x86/boot/compressed/acpi.c:37:8: warning: implicit declaration of function \
‘cmdline_find_option’ [-Wimplicit-function-declaration]
ret = cmdline_find_option("acpi_rsdp", val, MAX_ADDR_LEN);
^~~~~~~~~~~~~~~~~~~
for configs where neither CONFIG_EARLY_PRINTK nor CONFIG_RANDOMIZE_BASE
was defined.
Drop the ifdeffery in the header too.
Fixes: ac09c5f43c ("x86/boot: Build the command line parsing code unconditionally")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/5c51daf0.83pQEkvDZILqoSYW%lkp@intel.com
Link: https://lkml.kernel.org/r/20190205131352.GA27396@zn.tnic
KASLR may randomly choose a range which is located in movable memory
regions. As a result, this will break memory hotplug and make the
movable memory chosen by KASLR immovable.
Therefore, limit KASLR to choose memory regions in the immovable range
after consulting the SRAT table.
[ bp:
- Rewrite commit message.
- Trim comments.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: caoj.fnst@cn.fujitsu.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-8-fanc.fnst@cn.fujitsu.com
Parse SRAT for the immovable memory regions and use that information to
control which offset KASLR selects so that it doesn't overlap with any
movable region.
[ bp:
- Move struct mem_vector where it is visible so that it builds.
- Correct comments.
- Rewrite commit message.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: <caoj.fnst@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <indou.takao@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <kasong@redhat.com>
Cc: <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <msys.mizuma@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-7-fanc.fnst@cn.fujitsu.com
The RSDP is needed by KASLR so parse it early and save it in
boot_params.acpi_rsdp_addr, before KASLR setup runs.
RSDP is needed by other kernel facilities so have the parsing code
built-in instead of a long "depends on" line in Kconfig.
[ bp:
- Trim commit message and comments
- Add CONFIG_ACPI dependency in the Makefile
- Move ->acpi_rsdp_addr assignment with the rest of boot_params massaging in extract_kernel().
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-6-fanc.fnst@cn.fujitsu.com
The immovable memory ranges information in the SRAT table is necessary
to fix the issue of KASLR not paying attention to movable memory regions
when selecting the offset. Therefore, SRAT needs to be parsed.
Depending on the boot: KEXEC/EFI/BIOS, the methods to compute RSDP are
different. When booting from EFI, the EFI table points to the RSDP. So
iterate over the EFI system tables in order to find the RSDP.
[ bp:
- Heavily massage commit message
- Trim comments
- Move the CONFIG_ACPI ifdeffery into the Makefile.
]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-4-fanc.fnst@cn.fujitsu.com
KASLR may randomly choose offsets which are located in movable memory
regions resulting in the movable memory becoming immovable.
The ACPI SRAT (System/Static Resource Affinity Table) describes memory
ranges including ranges of memory provided by hot-added memory devices.
In order to access SRAT, one needs the Root System Description Pointer
(RSDP) with which to find the Root/Extended System Description Table
(R/XSDT) which then contains the system description tables of which SRAT
is one of.
In case the RSDP address has been passed on the command line (kexec-ing
a second kernel) parse it from there.
[ bp: Rewrite the commit message and cleanup the code. ]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-3-fanc.fnst@cn.fujitsu.com
Copy kstrtoull() and the other necessary functions from lib/kstrtox.c
to boot/string.c so that code in boot/ can use kstrtoull() and the old
simple_strtoull() can gradually be phased out.
Using div_u64() from math64.h directly will cause the dividend to be
handled as a 64-bit value and cause the infamous __divdi3 linker error
due to gcc trying to use its library function for the 64-bit division.
Therefore, separate the dividend into an upper and lower part.
[ bp: Rewrite commit message. ]
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: bhe@redhat.com
Cc: caoj.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: indou.takao@jp.fujitsu.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasong@redhat.com
Cc: Kees Cook <keescook@chromium.org>
Cc: msys.mizuma@gmail.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190123110850.12433-2-fanc.fnst@cn.fujitsu.com
gdt64 represents the content of GDTR under x86-64, which actually needs
10 bytes only, ".long" & ".word" is superfluous.
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <bp@alien8.de>
Cc: <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190123100014.23721-1-caoj.fnst@cn.fujitsu.com
In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0. As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
The various x86 linker scripts use the three-argument linker script
command variant OUTPUT_FORMAT(DEFAULT, BIG, LITTLE) which specifies
three object file formats when the -EL and -EB linker command line
options are used. When -EB is specified, OUTPUT_FORMAT issues the BIG
object file format, when -EL, LITTLE, respectively, and when neither is
specified, DEFAULT.
However, those -E[LB] options are not used by arch/x86/ so switch to the
simple OUTPUT_FORMAT(BFDNAME) macro variant.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190109181531.27513-1-bp@alien8.de
The kernel uses the OUTPUT_FORMAT linker script command in it's linker
scripts. Most of the time, the -m option is passed to the linker with
correct architecture, but sometimes (at least for x86_64) the -m option
contradicts the OUTPUT_FORMAT directive.
Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
files, but are linked with the -m elf_x86_64 linker flag when building
for x86_64.
The GNU linker manpage doesn't explicitly state any tie-breakers between
-m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
overrides the emulation value specified with the -m option.
LLVM lld has a different behavior, however. When supplied with
contradicting -m and OUTPUT_FORMAT values it fails with the following
error message:
ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64
Therefore, just add the correct -m after the incorrect one (it overrides
it), so the linker invocation looks like this:
ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf
This is not a functional change for GNU ld, because (although not
explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.
Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
it in QEMU.
[ bp: massage and clarify text. ]
Suggested-by: Dmitry Golovin <dima@golovin.in>
Signed-off-by: George Rimar <grimar@accesssoftek.com>
Signed-off-by: Tri Vo <trong@android.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tri Vo <trong@android.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Matz <matz@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morbo@google.com
Cc: ndesaulniers@google.com
Cc: ruiu@google.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com
Since commit 9c2af1c737 ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull x86 boot updates from Ingo Molnar:
"Two cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Add missing va_end() to die()
x86/boot: Simplify the detect_memory*() control flow
Pull EFI fixes from Ingo Molnar:
"Two fixes: a large-system fix and an earlyprintk fix with certain
resolutions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/earlyprintk/efi: Fix infinite loop on some screen widths
x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
The following commit:
d64934019f ("x86/efi: Use efi_exit_boot_services()")
introduced a regression on systems with large memory maps causing them
to hang on boot. The first "goto get_map" that was removed from
exit_boot() ensured there was enough room for the memory map when
efi_call_early(exit_boot_services) was called. This happens when
(nr_desc > ARRAY_SIZE(params->e820_table).
Chain of events:
exit_boot()
efi_exit_boot_services()
efi_get_memory_map <- at this point the mm can't grow over 8 desc
priv_func()
exit_boot_func()
allocate_e820ext() <- new mm grows over 8 desc from e820 alloc
efi_call_early(exit_boot_services) <- mm key doesn't match so retry
efi_call_early(get_memory_map) <- not enough room for new mm
system hangs
This patch allocates the e820 buffer before calling efi_exit_boot_services()
and fixes the regression.
[ mingo: minor cleanliness edits. ]
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Each call to va_start() must have a corresponding call to va_end()
before the end of the function. Add the missing va_end().
Found with Coccinelle.
Signed-off-by: Mattias Jacobsson <2pi@mok.nu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181128161607.10973-1-2pi@mok.nu
Peter Anvin pointed out that commit:
ae7e1238e6 ("x86/boot: Add ACPI RSDP address to setup_header")
should be reverted as setup_header should only contain items set by the
legacy BIOS.
So revert said commit. Instead of fully reverting the dependent commit
of:
e7b66d16fe ("x86/acpi, x86/boot: Take RSDP address for boot params if available")
just remove the setup_header reference in order to replace it by
a boot_params in a followup patch.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bp@alien8.de
Cc: daniel.kiper@oracle.com
Cc: sstabellini@kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181120072529.5489-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The return values of these functions are not used - so simplify the functions.
No change in functionality.
[ mingo: Simplified the changelog. ]
Suggested: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jordan Borgner <mail@jordan-borgner.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181102145622.zjx2t3mdu3rv6sgy@JordanDesktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
"sizeof(x)" is the canonical coding style used in arch/x86 most of the time.
Fix the few places that didn't follow the convention.
(Also do some whitespace cleanups in a few places while at it.)
[ mingo: Rewrote the changelog. ]
Signed-off-by: Jordan Borgner <mail@jordan-borgner.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
Pull x86 grub2 updates from Ingo Molnar:
"This extends the x86 boot protocol to include an address for the RSDP
table - utilized by Xen currently.
Matching Grub2 patches are pending as well. (Juergen Gross)"
* 'x86-grub2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi, x86/boot: Take RSDP address for boot params if available
x86/boot: Add ACPI RSDP address to setup_header
x86/xen: Fix boot loader version reported for PVH guests
Pull x86 boot updates from Ingo Molnar:
"Two cleanups and a bugfix for a rare boot option combination"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/KASLR: Remove return value from handle_mem_options()
x86/corruption-check: Use pr_*() instead of printk()
x86/corruption-check: Fix panic in memory_corruption_check() when boot option without value is provided
Pull EFI updates from Ingo Molnar:
"The main changes are:
- Add support for enlisting the help of the EFI firmware to create
memory reservations that persist across kexec.
- Add page fault handling to the runtime services support code on x86
so we can more gracefully recover from buggy EFI firmware.
- Fix command line handling on x86 for the boot path that omits the
stub's PE/COFF entry point.
- Other assorted fixes and updates"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: boot: Fix EFI stub alignment
efi/x86: Call efi_parse_options() from efi_main()
efi/x86: earlyprintk - Add 64bit efi fb address support
efi/x86: drop task_lock() from efi_switch_mm()
efi/x86: Handle page faults occurring while running EFI runtime services
efi: Make efi_rts_work accessible to efi page fault handler
efi/efi_test: add exporting ResetSystem runtime service
efi/libstub: arm: support building with clang
efi: add API to reserve memory persistently across kexec reboot
efi/arm: libstub: add a root memreserve config table
efi: honour memory reservations passed via a linux specific config table
When compiling the kernel with Clang, this warning appears even though
it is disabled for the whole kernel because this folder has its own set
of KBUILD_CFLAGS. It was disabled before the beginning of git history.
In file included from arch/x86/boot/compressed/kaslr.c:29:
In file included from arch/x86/boot/compressed/misc.h:21:
In file included from ./include/linux/elf.h:5:
In file included from ./arch/x86/include/asm/elf.h:77:
In file included from ./arch/x86/include/asm/vdso.h:11:
In file included from ./include/linux/mm_types.h:9:
In file included from ./include/linux/spinlock.h:88:
In file included from ./arch/x86/include/asm/spinlock.h:43:
In file included from ./arch/x86/include/asm/qrwlock.h:6:
./include/asm-generic/qrwlock.h:101:53: warning: passing 'u32 *' (aka
'unsigned int *') to parameter of type 'int *' converts between pointers
to integer types with different sign [-Wpointer-sign]
if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
^~~~~
./include/linux/compiler.h:76:40: note: expanded from macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
./include/asm-generic/atomic-instrumented.h:69:66: note: passing
argument to parameter 'old' here
static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
^
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20181013010713.6999-1-natechancellor@gmail.com
Xen PVH guests receive the address of the RSDP table from Xen. In order
to support booting a Xen PVH guest via Grub2 using the standard x86
boot entry we need a way for Grub2 to pass the RSDP address to the
kernel.
For this purpose expand the struct setup_header to hold the physical
address of the RSDP address. Being zero means it isn't specified and
has to be located the legacy way (searching through low memory or
EBDA).
While documenting the new setup_header layout and protocol version
2.14 add the missing documentation of protocol version 2.13.
There are Grub2 versions in several distros with a downstream patch
violating the boot protocol by writing past the end of setup_header.
This requires another update of the boot protocol to enable the kernel
to distinguish between a specified RSDP address and one filled with
garbage by such a broken Grub2.
From protocol 2.14 on Grub2 will write the version it is supporting
(but never a higher value than found to be supported by the kernel)
ored with 0x8000 to the version field of setup_header. This enables
the kernel to know up to which field Grub2 has written information
to. All fields after that are supposed to be clobbered.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: bp@alien8.de
Cc: corbet@lwn.net
Cc: linux-doc@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit
1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
can occasionally cause system resets when kexec-ing a second kernel even
if SEV is not active.
That's because get_sev_encryption_bit() uses 32-bit rIP-relative
addressing to read the value of enc_bit - a variable which caches a
previously detected encryption bit position - but kexec may allocate
the early boot code to a higher location, beyond the 32-bit addressing
limit.
In this case, garbage will be read and get_sev_encryption_bit() will
return the wrong value, leading to accessing memory with the wrong
encryption setting.
Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
rIP-relative addressing in the first place.
[ bp: massage commit message heavily. ]
Fixes: 1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: brijesh.singh@amd.com
Cc: kexec@lists.infradead.org
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: ghook@redhat.com
Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
We currently align the end of the compressed image to a multiple of
16. However, the PE-COFF header included in the EFI stub says that
the file alignment is 32 bytes, and when adding an EFI signature to
the file it must first be padded to this alignment.
sbsigntool commands warn about this:
warning: file-aligned section .text extends beyond end of file
warning: checksum areas are greater than image size. Invalid section table?
Worse, pesign -at least when creating a detached signature- uses the
hash of the unpadded file, resulting in an invalid signature if
padding is required.
Avoid both these problems by increasing alignment to 32 bytes when
CONFIG_EFI_STUB is enabled.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Before this commit we were only calling efi_parse_options() from
make_boot_params(), but make_boot_params() only gets called if the
kernel gets booted directly as an EFI executable. So when booted through
e.g. grub we ended up not parsing the commandline in the boot code.
This makes the drivers/firmware/efi/libstub code ignore the "quiet"
commandline argument resulting in the following message being printed:
"EFI stub: UEFI Secure Boot is enabled."
Despite the quiet request. This commits adds an extra call to
efi_parse_options() to efi_main() to make sure that the options are
always processed. This fixes quiet not working.
This also fixes the libstub code ignoring nokaslr and efi=nochunk.
Reported-by: Peter Robinson <pbrobinson@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
It's not used by its sole user, so remove this unused functionality.
Also remove a stray unused variable that GCC didn't warn about for some reason.
Suggested-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kirill.shutemov@linux.intel.com
Link: http://lkml.kernel.org/r/20180807015705.21697-1-fanc.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>