diag308 subcode 0 performes a clear reset which inlcudes the reset of
all registers in the system. While this is the preferred behavior when
loading a normal kernel via kexec it prevents the crash kernel to store
the register values in the dump. To prevent this use subcode 1 when
loading a crash kernel instead.
Fixes: ee337f5469 ("s390/kexec_file: Add crash support to image loader")
Cc: <stable@vger.kernel.org> # 4.17
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Reported-by: Xiaoying Yan <yiyan@redhat.com>
Tested-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As part of removing broken pm-support from s390 arch, remove
the pm callbacks from ccw-bus driver.The power-management functions
are unused since the 'commit 394216275c ("s390: remove broken
hibernate / power management support")'.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
clang W=1 warns about missing prototypes:
>> arch/s390/kernel/vdso64/getcpu.c:8:5: warning: no previous prototype for function '__s390_vdso_getcpu' [-Wmissing-prototypes]
int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
^
Add a local header file in order to get rid of this warnings.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
With the addition of more complete SR-IOV support, we are recommending
to raise this limit for distributions to 512, as the previous default of
128 can easily be hit with just the VFs of a single PCI physical
function.
With at least one distribution now shipping with this, supporting
only one fourth as many PCI functions on a default upstream build may
lead to confusion and reduced testing of the higher limit so increase
the default to 512.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Implement the previously removed getcpu vdso syscall by using the
TOD programmable field to pass the cpu number to user space.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE
If this is not the case: panic since something went terribly wrong.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Create a region 3 page table which contains only invalid entries, and
use that via "s390_invalid_asce" instead of the kernel ASCE whenever
there is either
- no user address space available, e.g. during early startup
- as an intermediate ASCE when address spaces are switched
This makes sure that user space accesses in such situations are
guaranteed to fail.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel, normal execution | kernel | user | kernel
kernel, kvm guest execution | gmap | user | kernel
To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.
The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
address space
- with mvcs/mvcp in primary space mode and copy from primary space to
secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space
Switching translation modes happens with sacf before and after
instructions which access user space, like before.
Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.
In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.
The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.
Cc: <stable@vger.kernel.org> # 5.8
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Rename some variable and functions to better clarify
what they are and what they do.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of creating the sysfs attributes for the stp root_dev by hand,
pass them to subsys_system_register() as parameter.
This also ensures that the attributes are available when the KOBJ_ADD
event is raised.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add kernel command line and last breaking event.
The kernel command line is taken from early_command_line and printed
only if kernel is not running as protected virtualization guest and
if it has been already initialized from the "COMMAND_LINE".
Linux version 5.10.0-rc3-22794-gecaa72788df0-dirty (gor@tuxmaker) #28 SMP PREEMPT Mon Nov 9 17:41:20 CET 2020
Kernel command line: audit_enable=0 audit=0 selinux=0 crashkernel=296M root=/dev/dasda1 dasd=ec5b
memblock=debug die
Kernel fault: interruption code 0005 ilc:2
PSW : 0000000180000000 0000000000012f92 (parse_boot_command_line+0x27a/0x46c)
R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3
GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 000000000001a65c
000000000000bf60 0000000000000000 00000000000003c0 0000000000000000
0000000000000080 000000000002322d 000000007f29ef20 0000000000efd018
000000000311c000 0000000000010070 0000000000012f82 000000000000bea8
Call Trace:
(sp:000000000000bea8 [<000000000002016e>] 000000000002016e)
sp:000000000000bf18 [<0000000000012408>] startup_kernel+0x88/0x2fc
sp:000000000000bf60 [<00000000000100c4>] startup_normal+0xb0/0xb0
Last Breaking-Event-Address:
[<00000000000135ba>] strcmp+0x22/0x24
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Information printed by print_pgm_check_info() is crucial for
debugging decompressor problems. Printing instruction addresses is
better than nothing, but turns further debugging into tedious job of
figuring out which function those addresses correspond to.
This change adds simplistic symbols resolution support. And adds %pS
format specifier support to decompressor_printk().
Decompressor symbols list is extracted and sorted with
nm -n -S:
...
0000000000010000 0000000000000014 T startup
0000000000010014 00000000000000b0 t startup_normal
0000000000010180 00000000000000b2 t startup_kdump
...
Then functions are filtered and contracted to a form:
"10000 14 startup\0""10014 b0 startup_normal\0""10180 b2 startup_kdump\0"
...
Which makes it trivial to find beginning of an entry and names are 0
terminated, so could be used as is. Symbols are binary-searched.
To get symbols list with final addresses and then get it into the
decompressor's image the same trick as for kallsyms is used.
Decompressor's vmlinux is linked twice.
Symbols are stored in .decompressor.syms section, current size is about
2kb.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use SYM_CODE_* annotations for asm functions, so that function lengths
are recognized correctly.
Also currently the most part of startup is marked as startup_kdump. Move
misplaced startup_kdump where it belongs.
$ nm -n -S arch/s390/boot/compressed/vmlinux
Before:
0000000000010000 T startup
0000000000010010 T startup_kdump
After:
0000000000010000 0000000000000014 T startup
0000000000010014 00000000000000b0 t startup_normal
0000000000010180 00000000000000b2 t startup_kdump
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The decompressor does not have any special debug means. Running the
kernel under qemu with gdb is helpful but tedious exercise if done
repeatedly. It is also not applicable to debugging under LPAR and z/VM.
One special thing which stands out is a working sclp_early_printk,
which could be used once the kernel switches to 64-bit addressing mode.
But sclp_early_printk does not provide any string formating capabilities.
Formatting and printing string without printk-alike function is a
not fun. The lack of printk-alike function means people would save up on
testing and introduce more bugs.
So, finally, provide decompressor_printk function, which fits on one
screen and trades features for simplicity.
It only supports "%s", "%x" and "%lx" specifiers and zero padding for
hex values.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently the kernel minimal compiler requirement is gcc 4.9 or
clang 10.0.1.
* gcc -mhotpatch option is supported since 4.8.
* A combination of -pg -mrecord-mcount -mnop-mcount -mfentry flags is
supported since gcc 9 and since clang 10.
Drop support for old -pg function prologues. Which leaves binary
compatible -mhotpatch / -mnop-mcount -mfentry prologues in a form:
brcl 0,0
Which are also do not require initial nop optimization / conversion and
presence of _mcount symbol.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently we have to consider too many different values which
in the end only affect identity mapping size. These are:
1. max_physmem_end - end of physical memory online or standby.
Always <= end of the last online memory block (get_mem_detect_end()).
2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
kernel is able to support.
3. "mem=" kernel command line option which limits physical memory usage.
4. OLDMEM_BASE which is a kdump memory limit when the kernel is executed as
crash kernel.
5. "hsa" size which is a memory limit when the kernel is executed during
zfcp/nvme dump.
Through out kernel startup and run we juggle all those values at once
but that does not bring any amusement, only confusion and complexity.
Unify all those values to a single one we should really care, that is
our identity mapping size.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of creating the sysfs attributes for the prng devices by hand,
describe them in .groups and let the misdevice core handle it.
This also ensures that the attributes are available when the KOBJ_ADD
event is raised.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
System call and program check handler both use the system call exit
path when returning to previous context. However the program check
handler jumps right to the end of the system call exit path if the
previous context is kernel context.
This lead to the quite odd double disabling of interrupts in the
system call exit path introduced with commit ce9dfafe29 ("s390:
fix system call exit path").
To avoid that have a separate program check handler exit path if the
previous context is kernel context.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As the number of cpus increases, the sccb response can exceed 4k for
read cpu and read scp info sclp commands. Hence, all cpu info entries
cant be embedded within a sccb response
Solution:
To overcome this limitation, extended sccb facility is provided by sclp.
1. Check if the extended sccb facility is installed.
2. If extended sccb is installed, perform the read scp and read cpu
command considering a max sccb length of three page size. This max
length is based on factors like max cpus, sccb header.
3. If extended sccb is not installed, perform the read scp and read cpu
sclp command considering a max sccb length of one page size.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
when we're missing the necessary machine facilities zPCI can
not function. Until now it would silently fail to be initialized,
add an informational print.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.
Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.
For further reference and details see:
[1] https://github.com/ibm-s390-tools/s390-tools/issues/97
Fixes: 69f239ed33 ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: <stable@vger.kernel.org> # 3.14
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
And move it earlier in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
It is relying on _REGION1_SHIFT / _REGION2_SHIFT values which come from
asm/pgtable.h, so include it.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Kasan early code is only working on init_mm, remove unneeded pgd
parameter from kasan_copy_shadow and rename it to
kasan_copy_shadow_mapping.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Kasan has nothing to do with vmemmap, strip vmemmap from function names
to avoid confusing people.
Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fixes the following warning with CONFIG_KERNEL_UNCOMPRESSED=y
arch/s390/boot/compressed/decompressor.h:6:46: warning: non-void function
does not return a value [-Wreturn-type]
static inline void *decompress_kernel(void) {}
^
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
To make sure that the vmalloc area size is for almost all cases large
enough let it depend on the (potential) physical memory size. There is
still the possibility to override this with the vmalloc kernel command
line parameter.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
We've seen several occurences in the past where the default vmalloc
size of 128GB is not sufficient. Therefore extend the default size.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Since commit 29d37e5b82 ("s390/protvirt: add ultravisor initialization")
vmax is adjusted to the ultravisor secure storage limit. This limit is
currently applied when 4-level paging is used. Later vmax is also used
to align vmemmap address to the top region table entry border. When vmax
is set to the ultravisor secure storage limit this is no longer the case.
Instead of changing vmax, make only MODULES_END be affected by the
secure storage limit, so that vmax stays intact for further vmemmap
address alignment.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Compiling the kernel with Kasan disables automatic 3-level vs 4-level
kernel space paging selection, because the shadow memory offset has
to be known at compile time and there is no such offset which would be
acceptable for both 3 and 4-level paging. Instead S390_4_LEVEL_PAGING
option was introduced which allowed to pick how many paging levels to
use under Kasan.
With the introduction of protected virtualization, kernel memory layout
may be affected due to ultravisor secure storage limit. This adds
additional complexity into how memory layout would look like in
combination with Kasan predefined shadow memory offsets. To simplify
this make Kasan 4-level paging default and remove Kasan 3-level paging
support.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
s390_base_ext_handler_fn haven't been used since its introduction in
commit ab14de6c37 ("[S390] Convert memory detection into C code.").
s390_base_ext_handler itself is currently falsely storing 16 registers
at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values:
cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and
async_enter_timer.
Besides that s390_base_ext_handler itself is only potentially hiding
EXT interrupts which should not have happen in the first place. Any
piece of code which requires EXT interrupts before fully functional
ext_int_handler is enabled has to do it on its own, like this is done
by sclp_early_cmd() which is doing EXT interrupts handling synchronously
in sclp_early_wait_irq().
With s390_base_ext_handler removed unexpected EXT interrupt leads
to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is
currently setup in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently udelay relies on working EXT interrupts handler, which is not
the case during early startup. In such cases udelay_simple() has to be
used instead.
To avoid mistakes of calling udelay too early, which could happen from
the common code as well - make udelay work for the early code by
introducing static branch and redirecting all udelay calls to
udelay_simple until EXT interrupts handler is fully initialized and
async stack is allocated.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Set io/ext handlers to disabled wait in the initial lowcore, so that they
are effective right from the kernel start, when a boot method used does
not rewrite this part of the lowcore for its own needs (i.e. kexec, z/vm
ipl reader boot, qemu direct boot, load from removable media or server).
When the kernel is loaded by zipl, scsi loader or qemu loader, some or
all of the io/ext/pgm handlers addresses might be rewritten. Rewrite them
to initial values again as early as possible.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.
This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.
Fix this by checking the corresponding bits only when interrupts are
disabled.
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Under some circumstances in particular with "Reconfigure I/O Path"
a zPCI function may first appear in Standby through a PCI event with
PEC 0x0302 which initially makes it visible to the zPCI subsystem,
Only after that is it configured with a zPCI event with PEC 0x0301.
If the zbus is still missing a PCI function zero (devfn == 0) when the
PCI event 0x0301 is handled zdev->zbus->bus is still NULL and gets
dereferenced in common code.
Check for this case and enable but don't scan the zPCI function.
This matches what would happen if we immediately got the 0x0301
configuration request or the function was included in CLP List PCI.
In all cases the PCI functions with devfn != 0 will be scanned once
function 0 appears.
Fixes: 3047766bc6 ("s390/pci: fix enabling a reserved PCI function")
Cc: <stable@vger.kernel.org> # 5.8
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The call to rcu_cpu_starting() in smp_init_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call Trace:
show_stack+0x158/0x1f0
dump_stack+0x1f2/0x238
__lock_acquire+0x2640/0x4dd0
lock_acquire+0x3a8/0xd08
_raw_spin_lock_irqsave+0xc0/0xf0
clockevents_register_device+0xa8/0x528
init_cpu_timer+0x33e/0x468
smp_init_secondary+0x11a/0x328
smp_start_secondary+0x82/0x88
This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the smp_init_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.
Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
pmd/pud_deref() assume that they will never operate on large pmd/pud
entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask.
With commit 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable()
to handle vmemmap"), that assumption is no longer true, at least for
pmd_deref().
In theory, we could end up with wrong addresses because some of the
non-address bits of a large entry would not be masked out.
In practice, this does not (yet) show any impact, because vmemmap_free()
is currently never used for s390.
Fix pmd/pud_deref() to check for the entry type and use the
_xxx_ENTRY_ORIGIN_LARGE mask for large entries.
While at it, also move pmd/pud_pfn() around, in order to avoid code
duplication, because they do the same thing.
Fixes: 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable() to handle vmemmap")
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>