Commit Graph

6071 Commits

Author SHA1 Message Date
Vasily Gorbik
ac1256f826 s390/kasan: reipl and kexec support
Some functions from both arch/s390/kernel/ipl.c and
arch/s390/kernel/machine_kexec.c are called without DAT enabled
(or with and without DAT enabled code paths). There is no easy way
to partially disable kasan for those files without a substantial
rework. Disable kasan for both files for now.

To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is
enabled in diag308 call. pcpu_delegate which disables DAT is marked
with __no_sanitize_address to disable instrumentation for that one
function.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:27 +02:00
Vasily Gorbik
9e8df6daed s390/smp: kasan stack instrumentation support
smp_start_secondary function is called without DAT enabled. To avoid
disabling kasan instrumentation for entire arch/s390/kernel/smp.c
smp_start_secondary has been split in 2 parts. smp_start_secondary has
instrumentation disabled, it does minimal setup and enables DAT. Then
instrumentated __smp_start_secondary is called to do the rest.

__load_psw_mask function instrumentation has been disabled as well
to be able to call it from smp_start_secondary.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:26 +02:00
Vasily Gorbik
d58106c3ec s390/kasan: use noexec and large pages
To lower memory footprint and speed up kasan initialisation detect
EDAT availability and use large pages if possible. As we know how
much memory is needed for initialisation, another simplistic large
page allocator is introduced to avoid memory fragmentation.

Since facilities list is retrieved anyhow, detect noexec support and
adjust pages attributes. Handle noexec kernel option to avoid inconsistent
kasan shadow memory pages flags.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:24 +02:00
Vasily Gorbik
793213a82d s390/kasan: dynamic shadow mem allocation for modules
Move from modules area entire shadow memory preallocation to dynamic
allocation per module load.

This behaivior has been introduced for x86 with bebf56a1b: "This patch
also forces module_alloc() to return 8*PAGE_SIZE aligned address making
shadow memory handling ( kasan_module_alloc()/kasan_module_free() )
more simple. Such alignment guarantees that each shadow page backing
modules address space correspond to only one module_alloc() allocation"

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:23 +02:00
Vasily Gorbik
0dac8f6bc3 s390/mm: add kasan shadow to the debugfs pgtable dump
This change adds address space markers for kasan shadow memory.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:22 +02:00
Vasily Gorbik
b6cbe3e8bd s390/kasan: avoid user access code instrumentation
Kasan instrumentation adds "store" check for variables marked as
modified by inline assembly. With user pointers containing addresses
from another address space this produces false positives.

static inline unsigned long clear_user_xc(void __user *to, ...)
{
	asm volatile(
	...
	: "+a" (to) ...

User space access functions are wrapped by manually instrumented
functions in kasan common code, which should be sufficient to catch
errors. So, we just disable uaccess.o instrumentation altogether.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:21 +02:00
Vasily Gorbik
7fef92ccad s390/kasan: double the stack size
Kasan stack instrumentation pads stack variables with redzones, which
increases stack frames size significantly. Stack sizes are increased
from 16k to 32k in the code, as well as for the kernel stack overflow
detection option (CHECK_STACK).

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:21 +02:00
Vasily Gorbik
42db5ed860 s390/kasan: add initialization code and enable it
Kasan needs 1/8 of kernel virtual address space to be reserved as the
shadow area. And eventually it requires the shadow memory offset to be
known at compile time (passed to the compiler when full instrumentation
is enabled).  Any value picked as the shadow area offset for 3-level
paging would eat up identity mapping on 4-level paging (with 1PB
shadow area size). So, the kernel sticks to 3-level paging when kasan
is enabled. 3TB border is picked as the shadow offset.  The memory
layout is adjusted so, that physical memory border does not exceed
KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END.

Due to the fact that on s390 paging is set up very late and to cover
more code with kasan instrumentation, temporary identity mapping and
final shadow memory are set up early. The shadow memory mapping is
later carried over to init_mm.pgd during paging_init.

For the needs of paging structures allocation and shadow memory
population a primitive allocator is used, which simply chops off
memory blocks from the end of the physical memory.

Kasan currenty doesn't track vmemmap and vmalloc areas.

Current memory layout (for 3-level paging, 2GB physical memory).

---[ Identity Mapping ]---
0x0000000000000000-0x0000000000100000
---[ Kernel Image Start ]---
0x0000000000100000-0x0000000002b00000
---[ Kernel Image End ]---
0x0000000002b00000-0x0000000080000000        2G <- physical memory border
0x0000000080000000-0x0000030000000000     3070G PUD I
---[ Kasan Shadow Start ]---
0x0000030000000000-0x0000030010000000      256M PMD RW X  <- shadow for 2G memory
0x0000030010000000-0x0000037ff0000000   523776M PTE RO NX <- kasan zero ro page
0x0000037ff0000000-0x0000038000000000      256M PMD RW X  <- shadow for 2G modules
---[ Kasan Shadow End ]---
0x0000038000000000-0x000003d100000000      324G PUD I
---[ vmemmap Area ]---
0x000003d100000000-0x000003e080000000
---[ vmalloc Area ]---
0x000003e080000000-0x000003ff80000000
---[ Modules Area ]---
0x000003ff80000000-0x0000040000000000        2G

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:20 +02:00
Vasily Gorbik
d0e2eb0a36 s390: add pgd_page primitive
Add pgd_page primitive which is required by kasan common code.
Also fixes typo in p4d_page definition.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:19 +02:00
Vasily Gorbik
34377d3cfb s390: introduce MAX_PTRS_PER_P4D
Kasan common code requires MAX_PTRS_PER_P4D definition, which in case
of s390 is always PTRS_PER_P4D.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:18 +02:00
Vasily Gorbik
fb594ec13e s390/kasan: replace some memory functions
Follow the common kasan approach:

    "KASan replaces memory functions with manually instrumented
    variants.  Original functions declared as weak symbols so strong
    definitions in mm/kasan/kasan.c could replace them. Original
    functions have aliases with '__' prefix in name, so we could call
    non-instrumented variant if needed."

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:18 +02:00
Vasily Gorbik
0a9b40911b s390/kasan: avoid instrumentation of early C code
Instrumented C code cannot run without the kasan shadow area. Exempt
source code files from kasan which are running before / used during
kasan initialization.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:17 +02:00
Vasily Gorbik
3484984585 s390/kasan: avoid vdso instrumentation
vdso is mapped into user space processes, which won't have kasan
shodow mapped.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:16 +02:00
Vasily Gorbik
75f195420a s390/mm: add missing pfn_to_kaddr helper
kasan common code uses pfn_to_kaddr, which is defined by many other
architectures. Adding it as well to avoid a build error.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:15 +02:00
Vasily Gorbik
49698745e5 s390: move ipl block and cmd line handling to early boot phase
To distinguish zfcpdump case and to be able to parse some of the kernel
command line arguments early (e.g. mem=) ipl block retrieval and command
line construction code is moved to the early boot phase.

"memory_end" is set up correctly respecting "mem=" and hsa_size in case
of the zfcpdump.

arch/s390/boot/string.c is introduced to provide string handling and
command line parsing functions to early boot phase code for the compressed
kernel image case.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:14 +02:00
Vasily Gorbik
b09decfd99 s390/sclp: introduce sclp_early_get_hsa_size
Introduce sclp_early_get_hsa_size function to be used during early
memory detection. This function allows to find a memory limit imposed
during zfcpdump.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:13 +02:00
Vasily Gorbik
f01b8bca08 s390/mem_detect: add info source debug print
Print mem_detect info source when memblock=debug is specified.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:13 +02:00
Vasily Gorbik
54c57795e8 s390/mem_detect: replace tprot loop with binary search
In a situation when other memory detection methods are not available
(no SCLP and no z/VM diag260), continuous online memory is assumed.
Replacing tprot loop with faster binary search, as only online memory
end has to be found.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:12 +02:00
Vasily Gorbik
cd45c99561 s390/mem_detect: use SCLP info for continuous memory detection
When neither SCLP storage info, nor z/VM diag260 "storage configuration"
are available assume a continuous online memory of size specified by
SCLP info.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:11 +02:00
Vasily Gorbik
6e98e64329 s390/mem_detect: introduce z/VM specific diag260 call
In the case when z/VM memory is defined with "define storage config"
command, SCLP storage info is not available. Utilize diag260 "storage
configuration" call, to get information about z/VM specific guest memory
definitions with potential memory holes.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:10 +02:00
Vasily Gorbik
fddbaa5c42 s390/mem_detect: introduce SCLP storage info
SCLP storage info allows to detect continuous and non-continuous online
memory under LPAR, z/VM and KVM, when standby memory is defined.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:09 +02:00
Vasily Gorbik
251b72a440 s390: introduce .boot.data section compile time validation
Make sure that .boot.data sections of vmlinux and
arch/s390/compressed/vmlinux match before producing the compressed kernel
image. Symbols presence, order and sizes are cross-checked.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:08 +02:00
Vasily Gorbik
6966d604e2 s390/mem_detect: move tprot loop to early boot phase
Move memory detection to early boot phase. To store online memory
regions "struct mem_detect_info" has been introduced together with
for_each_mem_detect_block iterator. mem_detect_info is later converted
to memblock.

Also introduces sclp_early_get_meminfo function to get maximum physical
memory and maximum increment number.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:08 +02:00
Vasily Gorbik
17aacfbfa1 s390/sclp: move sclp_early_read_info to sclp_early_core.c
To enable early online memory detection sclp_early_read_info has
been moved to sclp_early_core.c. sclp_info_sccb has been made a part
of .boot.data, which allows to reuse it later during early kernel
startup and make sclp_early_read_info call just once.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:07 +02:00
Vasily Gorbik
d1b52a4388 s390: introduce .boot.data section
Introduce .boot.data section which is "shared" between the decompressor
code and the decompressed kernel. The decompressor will store values in
it, and copy over to the decompressed image before starting it. This
method allows to avoid using pre-defined addresses and other hacks to
pass values between those boot phases.

.boot.data section is a part of init data, and will be freed after kernel
initialization is complete.

For uncompressed kernel image, .boot.data section is basically the same
as .init.data

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:06 +02:00
Vasily Gorbik
7516fc11e4 s390/decompressor: clean up and rename compressed/misc.c
Since compressed/misc.c is conditionally compiled move error reporting
code to boot/main.c. With that being done compressed/misc.c has no
"miscellaneous" functions left and is all about plain decompression
now. Rename it accordingly.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:05 +02:00
Vasily Gorbik
15426ca43d s390: rescue initrd as early as possible
To avoid multi-stage initrd rescue operation and to simplify
assumptions during early memory allocations move initrd at some final
safe destination as early as possible. This would also allow us to
drop .bss usage restrictions for some files.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:05 +02:00
Vasily Gorbik
a2ac1bb1f3 s390/decompressor: get rid of .bss usage
Using .bss in early code should be avoided. It might overlay initrd
image or not yet be initialized. Clean up the last couple of places in
the decompressor's code where .bss is used and enfore no .bss usage
check on boot/compressed/misc.c. In particular:
- initializing free_mem_ptr and free_mem_end_ptr with values guarantee
that these variables won't end up in the .bss section.
- define STATIC_RW_DATA to go into .data section.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:03 +02:00
Vasily Gorbik
369f91c374 s390/decompressor: rework uncompressed image info collection
The kernel decompressor has to know several bits of information about
uncompressed image. Currently this info is collected by running "nm" on
uncompressed vmlinux + "sed" and producing sizes.h file. This method
worked well, but it has several disadvantages. Obscure symbols name
pattern matching is fragile. Adding new values makes pattern even
longer. Logic is spread across code and make file. Limited ability to
adjust symbols values (currently magic lma value of 0x100000 is always
subtracted). Apart from that same pieces of information (and more)
would be needed for early memory detection and features like KASLR
outside of boot/compressed/ folder where sizes.h is generated.

To overcome limitations new "struct vmlinux_info" has been introduced
to include values needed for the decompressor and the rest of the
boot code. The only static instance of vmlinux_info is produced during
vmlinux link step by filling in struct fields by the linker (like it is
done with input_data in boot/compressed/vmlinux.scr.lds.S). This way
individual values could be adjusted with all the knowledge linker has
and arithmetic it supports. Later .vmlinux.info section (which contains
struct vmlinux_info) is transplanted into the decompressor image and
dropped from uncompressed image altogether.

While doing that replace "compressed/vmlinux.scr.lds.S" linker
script (whose purpose is to rename .data section in piggy.o to
.rodata.compressed) with plain objcopy command. And simplify
decompressor's linker script.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:02 +02:00
Vasily Gorbik
8f75582a2f s390: remove decompressor's head.S
Decompressor's head.S provided "data mover" sole purpose of which has
been to safely move uncompressed kernel at 0x100000 and jump to it.

With current bzImage layout entire decompressor's code guaranteed to be
in a safe location under 0x100000, and hence could not be overwritten
during kernel move. For that reason head.S could be replaced with simple
memmove function. To do so introduce early boot code phase which is
executed from arch/s390/boot/head.S after "verify_facilities" and takes
care of optional kernel image decompression and transition to it.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:21:02 +02:00
Vasily Gorbik
32ce55a659 s390: unify stack size definitions
Remove STACK_ORDER and STACK_SIZE in favour of identical THREAD_SIZE_ORDER
and THREAD_SIZE definitions. THREAD_SIZE and THREAD_SIZE_ORDER naming is
misleading since it is used as general kernel stack size information. But
both those definitions are used in the common code and throughout
architectures specific code, so changing the naming is problematic.

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:58 +02:00
Martin Schwidefsky
ce3dc44749 s390: add support for virtually mapped kernel stacks
With virtually mapped kernel stacks the kernel stack overflow detection
is now fault based, every stack has a guard page in the vmalloc space.
The panic_stack is renamed to nodat_stack and is used for all function
that need to run without DAT, e.g. memcpy_real or do_start_kdump.

The main effect is a reduction in the kernel image size as with vmap
stacks the old style overflow checking that adds two instructions per
function is not needed anymore. Result from bloat-o-meter:

add/remove: 20/1 grow/shrink: 13/26854 up/down: 2198/-216240 (-214042)

In regard to performance the micro-benchmark for fork has a hit of a
few microseconds, allocating 4 pages in vmalloc space is more expensive
compare to an order-2 page allocation. But with real workload I could
not find a noticeable difference.

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:57 +02:00
Martin Schwidefsky
ff340d2472 s390: add stack switch helper
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:56 +02:00
Martin Schwidefsky
00e9e6645a s390/pfault: do not use stack buffers for hardware data
With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space.
Data structures passed to a hardware or a hypervisor interface that
requires V=R can not be allocated on the stack anymore.

Make the init and fini pfault parameter blocks static variables.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:54 +02:00
Martin Schwidefsky
8ef9eda018 s390/hypfs: do not use stack buffers for hardware data
With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space.
Data structures passed to a hardware or a hypervisor interface that
requires V=R can not be allocated on the stack anymore.

Use kmalloc to get memory for the hypsfs_diag304 structure.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:53 +02:00
Martin Schwidefsky
d36a928139 s390/appldata: do not use stack buffers for hardware data
With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space.
Data structures passed to a hardware or a hypervisor interface that
requires V=R can not be allocated on the stack anymore.

Use kmalloc to get memory for the appldata_product_id and the
appldata_parameter_list structures.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:52 +02:00
Martin Schwidefsky
f689789a28 s390/appldata: pass parameter list pointer to appldata_asm
In preparation for CONFIG_VMAP_STACK=y move the allocation of the
struct appldata_parameter_list to the caller of appldata_asm().

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09 11:20:50 +02:00
Julian Wiedmann
346e485d42 s390/ccwgroup: add get_ccwgroupdev_by_busid()
Provide function to find a ccwgroup device by its busid.

Acked-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-08 09:09:59 +02:00
Harald Freudenberger
00fab2350e s390/zcrypt: multiple zcrypt device nodes support
This patch is an extension to the zcrypt device driver to provide,
support and maintain multiple zcrypt device nodes. The individual
zcrypt device nodes can be restricted in terms of crypto cards,
domains and available ioctls. Such a device node can be used as a
base for container solutions like docker to control and restrict
the access to crypto resources.

The handling is done with a new sysfs subdir /sys/class/zcrypt.
Echoing a name (or an empty sting) into the attribute "create" creates
a new zcrypt device node. In /sys/class/zcrypt a new link will appear
which points to the sysfs device tree of this new device. The
attribute files "ioctlmask", "apmask" and "aqmask" in this directory
are used to customize this new zcrypt device node instance. Finally
the zcrypt device node can be destroyed by echoing the name into
/sys/class/zcrypt/destroy. The internal structs holding the device
info are reference counted - so a destroy will not hard remove a
device but only marks it as removable when the reference counter drops
to zero.

The mask values are bitmaps in big endian order starting with bit 0.
So adapter number 0 is the leftmost bit, mask is 0x8000...  The sysfs
attributes accept 2 different formats:
* Absolute hex string starting with 0x like "0x12345678" does set
  the mask starting from left to right. If the given string is shorter
  than the mask it is padded with 0s on the right. If the string is
  longer than the mask an error comes back (EINVAL).
* Relative format - a concatenation (done with ',') of the
  terms +<bitnr>[-<bitnr>] or -<bitnr>[-<bitnr>]. <bitnr> may be any
  valid number (hex, decimal or octal) in the range 0...255. Here are
  some examples:
    "+0-15,+32,-128,-0xFF"
    "-0-255,+1-16,+0x128"
    "+1,+2,+3,+4,-5,-7-10"

A simple usage examples:

  # create new zcrypt device 'my_zcrypt':
  echo "my_zcrypt" >/sys/class/zcrypt/create
  # go into the device dir of this new device
  echo "my_zcrypt" >create
  cd my_zcrypt/
  ls -l
  total 0
  -rw-r--r-- 1 root root 4096 Jul 20 15:23 apmask
  -rw-r--r-- 1 root root 4096 Jul 20 15:23 aqmask
  -r--r--r-- 1 root root 4096 Jul 20 15:23 dev
  -rw-r--r-- 1 root root 4096 Jul 20 15:23 ioctlmask
  lrwxrwxrwx 1 root root    0 Jul 20 15:23 subsystem -> ../../../../class/zcrypt
  ...
  # customize this zcrypt node clone
  # enable only adapter 0 and 2
  echo "0xa0" >apmask
  # enable only domain 6
  echo "+6" >aqmask
  # enable all 256 ioctls
  echo "+0-255" >ioctls
  # now the /dev/my_zcrypt may be used
  # finally destroy it
  echo "my_zcrypt" >/sys/class/zcrypt/destroy

Please note that a very similar 'filtering behavior' also applies to
the parent z90crypt device. The two mask attributes apmask and aqmask
in /sys/bus/ap act the very same for the z90crypt device node. However
the implementation here is totally different as the ap bus acts on
bind/unbind of queue devices and associated drivers but the effect is
still the same. So there are two filters active for each additional
zcrypt device node: The adapter/domain needs to be enabled on the ap
bus level and it needs to be active on the zcrypt device node level.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-08 09:09:58 +02:00
Julian Wiedmann
ccc413f621 s390/qdio: clean up AOB handling
I've stumbled over this too many times now... AOBs are only ever used on
Output Queues. So in qdio_kick_handler(), move the call to their handler
into the Output-only path, and get rid of the convoluted contains_aobs()
helper. No functional change.

While at it, also remove
1. the unused sbal_state->aob field. For processing an async completion,
   upper-layer drivers get their AOB pointer from the CQ buffer.
2. an unused EXPORT for qdio_allocate_aob(). External users would have
   no way of passing an allocated AOB back into qdio.ko anyways...

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:29 +02:00
Vasily Gorbik
4e62d45885 s390: clean up stacks setup
Replace hard coded stack frame overhead values with STACK_FRAME_OVERHEAD
definition. Avoid unnecessary arithmetic instructions.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:29 +02:00
Vasily Gorbik
26f4414a45 s390/vdso: correct CFI annotations of vDSO functions
Correct stack frame overhead for 31-bit vdso, which should be 96 rather
then 160. This is done by reusing STACK_FRAME_OVERHEAD definition which
contains correct value based on build flags. This fixes stack unwinding
within vdso code for 31-bit processes. While at it replace all hard coded
stack frame overhead values with the same definition in vdso64 as well.

Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:29 +02:00
Vasily Gorbik
d1befa6582 s390/vdso: avoid 64-bit vdso mapping for compat tasks
vdso_fault used is_compat_task function (on s390 it tests "current"
thread_info flags) to distinguish compat tasks and map 31-bit vdso
pages. But "current" task might not correspond to mm context.

When 31-bit compat inferior is executed under gdb, gdb does
PTRACE_PEEKTEXT on vdso page, causing vdso_fault with "current" being
64-bit gdb process. So, 31-bit inferior ends up with 64-bit vdso mapped.

To avoid this problem a new compat_mm flag has been introduced into
mm context. This flag is used in vdso_fault and vdso_mremap instead
of is_compat_task.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:29 +02:00
Martin Schwidefsky
8e5a7627b5 s390: add initial 64-bit restart PSW
To be able to start a kernel image loaded into memory with a
PSW restart, place a 64-bit restart PSW at 0x1a0 in absolute
lowcore.

Suggested-by: Dominik Klein <dominik.klein@linux.ibm.com>
Tested-by: Dominik Klein <dominik.klein@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:28 +02:00
Jan Höppner
6779df406b s390/sclp: Allow to request adapter reset
The SCLP event 24 "Adapter Error Notification" supports three different
action qualifier of which 'adapter reset' is currently not enabled in
the sysfs interface. However, userspace tools might want to be able
to use the reset functionality as well. Enable the 'adapter reset'
qualifier.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:27 +02:00
Gerald Schaefer
55a5542a54 s390/hibernate: fix error handling when suspend cpu != resume cpu
The resume code checks if the resume cpu is the same as the suspend cpu.
If not, and if it is also not possible to switch to the suspend cpu, an
error message should be printed and the resume process should be stopped
by loading a disabled wait psw.

The current logic is broken in multiple ways, the message is never printed,
and the disabled wait psw never loaded because the kernel panics before that:
- sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break
  on the first 64bit instruction in sclp_early_printk().
- The init stack should be used, but the stack pointer is not set up correctly
  (missing aghi %r15,-STACK_FRAME_OVERHEAD).
- __sclp_early_printk() checks the sclp_init_state. If it is not
  sclp_init_state_uninitialized, it simply returns w/o printing anything.
  In the resumed kernel however, sclp_init_state will never be uninitialized.

This patch fixes those issues by removing the sam31/ESA logic, adding a
correct init stack pointer, and also introducing sclp_early_printk_force()
to allow using sclp_early_printk() even when sclp_init_state is not
uninitialized.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:23 +02:00
Linus Torvalds
1d176582c7 s390 fixes for 4.19-rc4
One fix for the zcrypt driver to correctly handle incomplete
 encryption/decryption operations.
 
 A cleanup for the aqmask/apmask parsing to avoid variable
 length arrays on the stack.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbmkYUAAoJEDjwexyKj9rgrugH/21Uf1S0mkkRfDeYxD6lJva6
 zNmEZ3V+GXc8L/0CisFWQOcIU1fO+jozp9HPGkQxeTvAuqIfVhBRVoMMxiTRaZb3
 xdSBel8EvAGxlZq/6eq6fU59HPWGm+N53rC9J5MMQQqgpSmq8F2QeO5CoidflRh8
 bdio9cliLsjPu+3P2JU3noolhb/f577J3dgP4gKARRpOfh8vUI7NLU3Mham+7886
 ASjG8s/zr9spPnrErJusloOLDJt4M94J8KrIbB/WAT1wZv7GxClaGsCoyCuk4cDt
 TV8zIgMK9TChedwMOO0T8WMaxq+XJV+iI0dC1eZ7xNUlaIBw+UbifxCG335L3E0=
 =dHq1
 -----END PGP SIGNATURE-----

Merge tag 's390-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

 - One fix for the zcrypt driver to correctly handle incomplete
   encryption/decryption operations.

 - A cleanup for the aqmask/apmask parsing to avoid variable length
   arrays on the stack.

* tag 's390-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: remove VLA usage from the AP bus
  s390/crypto: Fix return code checking in cbc_paes_crypt()
2018-09-13 16:22:24 -10:00
Janosch Frank
df88f3181f KVM: s390: Properly lock mm context allow_gmap_hpage_1m setting
We have to do down_write on the mm semaphore to set a bitfield in the
mm context.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-09-04 11:40:26 +02:00
Pierre Morel
204c972456 KVM: s390: vsie: copy wrapping keys to right place
Copy the key mask to the right offset inside the shadow CRYCB

Fixes: bbeaa58b3 ("KVM: s390: vsie: support aes dea wrapping keys")
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Cc: stable@vger.kernel.org # v4.8+
Message-Id: <1535019956-23539-2-git-send-email-pmorel@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-09-04 11:26:11 +02:00
Janosch Frank
a11bdb1a6b KVM: s390: Fix pfmf and conditional skey emulation
We should not return with a lock.
We also have to increase the address when we do page clearing.

Fixes: bd096f6443 ("KVM: s390: Add skey emulation fault handling")
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20180830081355.59234-1-frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-09-04 11:24:43 +02:00